[Beowulf] New HPCC results, and an MX question

Vincent Diepeveen diep at xs4all.nl
Wed Jul 20 06:06:08 PDT 2005

At 12:38 AM 7/20/2005 -0400, Patrick Geoffray wrote:
>Greg Lindahl wrote:
>> On Tue, Jul 19, 2005 at 11:11:33PM -0400, Patrick Geoffray wrote:
>>>If you randomize the machine list, then there is no difference
>>>between the random ring latency and the average pingpong.
>> Patrick,
>> There likely will be a difference, because average pingpong doesn't
>> run on all the cpus. On a 4-cpu node, that can make a big difference.
>I believe the difference will not be that big. I will get my hands on a 
>quad in the next couple of weeks, I will look into int.

The difference will be huge of course, network processors have a switch
latency. That's why.

If it must switch at the wrong moment that'll cost 50 us or something at
certain network chips.

Additional there will be software layers that have to lock in some way.

Locking +  unlocking is already like half a microsecond extra, just like that.

Tests at all processors at the same time make major sense.

Any denial in advance that it will be the same speed is just ballony.

>> To give you an example, look at the Quadrics reported numbers for
>> random ring latency of 11.4568 usec and average ping-pong of 1.552
>> usec. This is on a 2-cpu node (I think). I'd bet that most of this
>> difference has nothing to do with machine size. But I'd be happy to be
>> proven wrong.
>I would think 1.5 is shared memory in this case (all pairs are ordered 
>and they end up being on the same nodes). This is one of the thing I 
>don't like with HPCC, so much variation in results depending on size of 
>clusters, process mapping, order/topology.
>> Hopefully someone will publish a Myrinet MX-based set of HPCC results
>> soon. (hint, hint!)
>I don't have time to do that. At least, as long as HPCC, like HPL, take 
>a gazillions parameters. Give me HPCC with no parameters and I will take 
>5 minutes to start it. I was promised it would be this way eventually.
>I don't believe much in any analytic benchmarks. HPL can yield 90% of 
>peak if rewritten for modern MPI implementations, Pallas is nice to find 
>out when something is very wrong, but not much more, and the NAS are 
>marginaly more interesting.
>I prefer benchmarking real codes, and we will publish that, but 10G is 
>taking most of my time these days (got to get something for you to 
>compare against).
>>>I know, tongue-in-cheek. Will you publish the raw numbers on the web 
>>>site eventually ?
>> Yes. That's what I meant in the first place.
>I bet the next time you won't :-\
>Patrick Geoffray
>Myricom, Inc.
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list