[Beowulf] Performance characterising a HPC application
ashley at quadrics.com
Wed Apr 4 07:06:05 PDT 2007
Richard Walsh wrote:
> Ashley Pittman wrote:
>> Patrick Geoffray wrote:
>>> I would bet that UPC could more efficiently leverage a strided or vector
>>> communication primitive instead of message aggregation. I don't know if
>>> GasNet provides one, I know ARMCI does.
>> GasNet does however get extra credit for having a asynchronous
>> collective, namely barrier. Unfortunately when you read the spec it's
>> actually a special case asynchronous reduce which is almost impossible
>> to optimise anything like as well as barrier which is a shame.
> Berkeley's recent paper on the some optimization techniques that they have applied within their UPC compiler emphasizes reducing and hoisting shared pointer references, maximally split-phased reads and writes (implying that GASnet can do what Ashley suggests), and aggregating-
> coalescing communication. I have only read half of it, but no mention of pipelining or pseudo-vector operations (too bad) ... this would seem to be harder to do as it would required whole loop analysis.
I was thinking of gasnet_barrier_notify() and
gasnet_barrier_wait()/gasnet_barrier_try() specifically which allow you
to pipeline useful code with barriers. It's a surprisingly useful trick
and one that I hope will become commonplace in future.
More information about the Beowulf