[Beowulf] Help with inconsistent network performance

Mark Hahn hahn at mcmaster.ca
Tue Dec 18 21:55:51 PST 2007


> I guess I figured that the data is relatively small compared to the
> bandwidth,

I agree, in principle.  and relatively small compared to the amount of ram
in the switch as well.

> whereas the latency for ethernet is relatively high.  I also

not _that_ high, though.  with a little tuning (coalesce parameters),
I think 30-40 us half-rtt is pretty common, even over a normal 
tcp stack.  yes, that's 2+ 1.5k packets, but it not _that_ much 
compared to 1M images.

>> To make sure there was not an issue with the MPI broadcast, I did one test
>>> run with 5 nodes only sending back 4 bytes of data each.  The result was
>> a
>>> RTT of less than 0.3 ms.
>>
>> isn't that kind of high?  a single ping-pong latency should be ~50 us -
>> maybe I'm underestimating the latency of the broadcast itself.
>
>
> This is quite a bit more than a single ping-pong. The viewer sends to the
> master node (rank 0), and then the master node broadcasts to all other
> nodes, and then all nodes send back to the viewer node.  I don't know if
> this is still seems high?

the first message should take <50 us.  the broadcast to 5 nodes should 
take 2-3 more 50 us times.  so at about 200 us, all the slaves will start
the DOS attack on the viewer node's nic...

> But the bcast is always just sending 4 bytes (a single integer), and as

no, afaik no mpi implementations actually utilize the eth-level bcast,
but rather implement bcast as a tree of (uni) sends.



More information about the Beowulf mailing list