[Beowulf] New HPCC results, and an MX question
patrick at myri.com
Tue Jul 19 20:11:33 PDT 2005
Greg Lindahl wrote:
> I am referring to a comparison of the HPCC "random ring latency" to
> the HPCC "average ping-pong" on the same hardware, with the same
The random ring latency will increase with the size of the cluster,
whereas the average pingpong will not as the pair of nodes are ordered
and ordered nodes are likely to be in the same crossbar. If you
randomize the machine list, then there is no difference between the
random ring latency and the average pingpong.
On a tiny cluster, all nodes are on the same crossbar, so it does not
matter if the pair are ordered or not.
>>By the way, could you point me to the raw performance data on the
>>pathscale web pages ?
> As I said, it is in the process of being published, and I attached
> the relevant info to my posting.
I know, tongue-in-cheek. Will you publish the raw numbers on the web
site eventually ?
> I was referring to the 2.6 usec claim at:
This is Pallas too. I will ask to add a reference to it: Pallas between
2 nodes Opteron 2GHz, on the same crossbar with E cards.
>>Anyway, the cluster I ran Pallas on had a 0-byte MPI latency of 2.9 us.
>>Why ? Because it's a production cluster, deployed over a year ago, with
>>1.4 GHz Opteron CPUs (compare that with your 2.6 GHz).
> Thank you for the number. Does your latency change significantly with
> faster cpus? Ours does (from 1.50 usec at 2.0 Ghz to 1.32 usec at 2.6
> Ghz), but my impression was that your number ought to be relatively
> insensitive to the host cpu speed.
No, we do PIO for small messages too, but not for medium/large messages
(CPU cycles start to get expensive when you push data through a slow bus
like PCI-X. On PCI-Express or HT, the picture is different). So the CPU
clock will affect the latency up to 127 Bytes (the threshold may
change). Write combining affect the latency too.
It depends also on architecture (Opteron are better than EM64T for
example, but I suspect the PCI-E/PCI-X bridge to be the culprit) and the
cost of pthread mutex (MX is threadsafe).
More information about the Beowulf