[Beowulf] evaluating FLOPS capacity of our cluster

Mon May 11 15:36:23 PDT 2009

Ashley Pittman wrote:
> On Mon, 2009-05-11 at 15:09 -0400, Gus Correa wrote:
>> Mark Hahn wrote:
> 
>> I haven't checked the Top500 list in detail,
>> but I think you are right about 80% being fairly high.
>> (For big clusters perhaps?).
> 
> Other way around, maintaining a high efficiency rating at large node
> counts is a very difficult problem so larger clusters tend to have
> smaller values.

Hi Ashley, list

I may have phrased it poorly.

I meant exactly what you said, i.e., that it is more difficult
to keep high efficiency in a large installation (particularly w.r.t.
network latency, I would guess) than in a small single-switch cluster.

"Small is better", or easier, perhaps.  :)

> 
>> In the original email I mentioned that Roadrunner (top500 1st),
>> has Rmax/Rpeak ~= 76%.
>>
>> However, without any particular expertise or too much effort,
>> I got 83.4% Rmax here. :)
>> I was happy with that number,
>> until somebody in the OpenMPI list told me that
>> "anything below 85%" needs improvement.  :(
> 
> At 24 nodes that's probably a reasonable statement.
> 
 > Ashley,

Thank you.
It is an encouragement to seek better performance.

However, considering other postings that emphasized the importance
of memory size (and problem size N) for HPL performance,
I wonder if there is still room for significant improvement
in the context of my 16GB/node (out of possible maximum of 128GB/node,
which we don't plan to buy, of course).

With the current memory I have, I can't make the problem much bigger
than the N=196,000 that I've been using.
(This is keeping the "use 80% of your memory" HPL rule of thumb.)
Maybe N can grow a bit more, but not a lot,
as I am close to trigger memory swapping already.

So far varying the HPL parameters, using processor affinity, etc,
haven't shown significant improvement.
The NB, P, Q, sweet spots are clear.
I have not tried other compilers, though, only Gnu, with optimization
flags appropriate for Opteron Shanghai.

I wonder if I reached the HPL "saturation point" for this memory size.
I wonder also if the 24 nodes had full 128GB/node RAM,
which would give me a max problem size N=554,000
(and a really long walltime to run HPL!),
there would be a significant increase in performance.

What do you think?

Has anybody done HPL benchmarks done with nodes "full of memory"? :)

Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------