[Beowulf] evaluating FLOPS capacity of our cluster

Mon May 11 14:23:50 PDT 2009

Rahul Nabar wrote:
> On Mon, May 11, 2009 at 12:23 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>> If you don't feel like running the HPL benchmark (It is fun,
>> but time consuming) to get your actual Gigaflops
>> (Rmax in Top500 jargon),
>> you can look up the Top500 list the Rmax/Rpeak ratio for clusters
>> with hardware similar to yours.
>> You can then apply this factor to your Rpeak calculated as above,
>> to get a reasonable guess for your Rmax.
>> This may be good enough for the purpose you mentioned.
> 
> Rmax/Rpeak= 0.83 seems a good guess based on one very similar system
> on the Top500.
> 
> Thus I come up with a number of around 1.34 TeraFLOPS for my cluster
> of 24 servers.  Does the value seem reasonable ballpark? Nothing too
> accurate but I do not want to be an order of magnitude off. [maybe  a
> decimal mistake in math! ]
> 
> Hardware:
>  Dell PowerEdgeSC1345's. All 64 bit machines with a dual channel
> bonded Gigabit ethernet interconnect. AMD Quad-Core AMD Opteron(tm)
> Processor 2354.
> 
> 
> PS.  The Athelon was my typo, earlier sorry!
> 

Hi Rahul, list

You may have read my other posting with the
actual HPL Rmax/Rpeak = 83.4%
I measured here with AMD Quad-core 2376 (Shanghai).
This matches the number you found on Top500.

Our clusters are very similar, 24 nodes, 192 cores,
AMD 3rd generation (Barcelona and Shagnhai) right?
So ~83% is what you should expect if you use Infiniband (which I used),
on a cluster of this size and processors, with a single IB switch.

For Gigabit Ethernet I would guess the number is less.
The (nominal) bandwidth of Inifiniband III
is 20Gb/s = 20 x 1 GigE (IIRR it the factor is actually 16, not 20).
I have yet to try HPL over GigE, to have numbers to compare,
but it just takes too long to run HPL with a decent range of parameters,
and I am reluctant to do it, and stop production.

However, if the computation/communication ratio of your real 
computational chemistry application is high,
the interconnect may not be so important, GigE may be perfectly good,
I would guess.
I.e. give enough numbers for each core to crunch (increase computation),
rather than splitting the task across too many of them (decrease 
communication).

If you can run each job on a single node (shared memory) even better.
Well, this is as long as you have enough RAM on each node
to fit your job on a single node, without triggering
memory swapping.
My single-node HPL test gave Rmax/Rpeak=84.6%.
(I have yet to try it processor affinity turned on,
which may be a little better.)
So, one node was just a bit better than the 83.4%
across the whole cluster.
However, the difference may be larger if using GigE instead of IB
on the cluster.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------