[Beowulf] running the Linpak -HPL benchmark.

Sun Jan 17 18:03:25 PST 2010

Hi Rahul

I've got Rmax/Rpeak around 84% on the cluster (AMD Opteron Shanghai, IB on a single switch).
I didn't have the cluster available to play with HPL for too long, not too much tuning,
I had to move to production mode.
Some folks on mailing lists said they'd get 90%, but the topmost group in Top500 get less 
(as of mid-2009 it was ~75%, IIRR), probably because of their big networks 
with stacked switches and communication overhead.

To optimize in a single node, apply also the formula for Nmax, using the node's RAM. 
P and Q (block matrix decomposition) tend to be optimal when they are close to each other.

With Nehalem you may have to consider the extra complexity of 
symmetric multi-threading (hyperthreading),
and whether it makes or doesn't make a difference on very regular problems like HPL,
with big loops and not much branching/ifs.
(Your real world computational chemistry problems probably are not like that.)
Have you tried HPL with and without SMT/hyperthreading?
It maybe worth testing on a single node at least.

I hope this helps.
Gus Correa

On Jan 16, 2010, at 10:02 PM, Rahul Nabar wrote:

> On Thu, Jan 14, 2010 at 7:25 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> 
>> 
>> First, to test, run HPL in a single node or a few nodes,
>> using small values of N, say 1000 to 20000.
>> 
>> The maximum value of N can be approximated by
>> Nmax = sqrt(0.8*Total_RAM_on_ALL_nodes_in_bytes/8).
>> This uses all the RAM, but doesn't get into memory paging.
>> 
>> Then run HPL on the whole cluster with the Nmax above.
>> Nmax pushes the envelope, and is where your
>> best performance (Rmax/Rpeak) is likely to be reached.
>> Try several P/Q combinations for Nmax (see the TUNING file).
>> 
> 
> Thanks Gus! That helps a lot. I have Linpak running now on just a
> single server and am trying to tune and hit the Rpeak.
> 
> I'm getting 62 Gflops but I think my peak has to be around 72 (2.26
> GHz 8 cores Nehalem). On a single server test do you manage to hit the
> theoretical peak?What's a good Rmax / Rpeak to shoot for while tuning?
> 
> Once I am confident I'm well tuned on one server I'll try and extend
> it to the whole cluster.
> 
> -- 
> Rahul