[Beowulf] interconnect and compiler ?

Fri Jan 30 13:33:27 PST 2009

On Fri, Jan 30, 2009 at 04:16:29PM -0500, Patrick Geoffray wrote:

>> Mark asked for an example, not a research paper. And we were
>
> And Michael gave you a good practical example. You can always find a  
> code that does something stupid like a 27-point (!) 3d stencil code  
> sending tiny messages.

Whether or not you can send a wider area of ghost zones depends on the
algorithm and how it's structured. Most codes that I've looked at
which do this only use wider boundaries when they're iterating with
little computation. InfiniPath gets a speedup on lots of codes that
you wouldn't predict given the raw latency and bandwidth; how else
would you explain it?

>> network.  In the Berkeley "logp" model, for example, processor
>> overhead and the "gap" betweeen messsages are fundamental parameters.
>> The InfiniPath chip has a tiny "o" and a negative "g".  As a result,
>
> Ok, my turn to bite :-) What is a negative "g" ?

It means that the interconnect is ready to send a 2nd message before
the 1st one is on the wire. Think pipelining. Or you could ask
Christian Bell ;-) Adaptors like Myricom's have some pipelining, but
not as much as others; the effect is largest at small message sizes.

> In practice, stencil codes can start updating as soon as they receive a  
> border block, so not only you overlap communication and computation, but  
>  not all cores end up communicating at the same time.

In practice, people don't program that way. There are many codes which
have synchronized communications sections. I've yet to see an actual
stencil code which started computing when it only had one boundary
region.

> When the number of core will be one order of magnitude larger than today  
> (100s), then it will be a different discussion.

Of course. But you can already see the effect today, for some codes
and data sizes. The effect with real codes is of course less than with
a microbenchmark like message rate.

-- greg