Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] evaluating FLOPS capacity of our cluster

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Gus Correa gus at ldeo.columbia.edu
Mon May 11 15:36:23 PDT 2009


Ashley Pittman wrote:
> On Mon, 2009-05-11 at 15:09 -0400, Gus Correa wrote:
>> Mark Hahn wrote:
> 
>> I haven't checked the Top500 list in detail,
>> but I think you are right about 80% being fairly high.
>> (For big clusters perhaps?).
> 
> Other way around, maintaining a high efficiency rating at large node
> counts is a very difficult problem so larger clusters tend to have
> smaller values.

Hi Ashley, list

I may have phrased it poorly.

I meant exactly what you said, i.e., that it is more difficult
to keep high efficiency in a large installation (particularly w.r.t.
network latency, I would guess) than in a small single-switch cluster.

"Small is better", or easier, perhaps.  :)

> 
>> In the original email I mentioned that Roadrunner (top500 1st),
>> has Rmax/Rpeak ~= 76%.
>>
>> However, without any particular expertise or too much effort,
>> I got 83.4% Rmax here. :)
>> I was happy with that number,
>> until somebody in the OpenMPI list told me that
>> "anything below 85%" needs improvement.  :(
> 
> At 24 nodes that's probably a reasonable statement.
> 
 > Ashley,


Thank you.
It is an encouragement to seek better performance.

However, considering other postings that emphasized the importance
of memory size (and problem size N) for HPL performance,
I wonder if there is still room for significant improvement
in the context of my 16GB/node (out of possible maximum of 128GB/node,
which we don't plan to buy, of course).

With the current memory I have, I can't make the problem much bigger
than the N=196,000 that I've been using.
(This is keeping the "use 80% of your memory" HPL rule of thumb.)
Maybe N can grow a bit more, but not a lot,
as I am close to trigger memory swapping already.

So far varying the HPL parameters, using processor affinity, etc,
haven't shown significant improvement.
The NB, P, Q, sweet spots are clear.
I have not tried other compilers, though, only Gnu, with optimization
flags appropriate for Opteron Shanghai.

I wonder if I reached the HPL "saturation point" for this memory size.
I wonder also if the 24 nodes had full 128GB/node RAM,
which would give me a max problem size N=554,000
(and a really long walltime to run HPL!),
there would be a significant increase in performance.

What do you think?

Has anybody done HPL benchmarks done with nodes "full of memory"? :)

Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------





More information about the Beowulf mailing list