[Beowulf] running the Linpak -HPL benchmark.
gus at ldeo.columbia.edu
Sun Jan 17 18:03:25 PST 2010
I've got Rmax/Rpeak around 84% on the cluster (AMD Opteron Shanghai, IB on a single switch).
I didn't have the cluster available to play with HPL for too long, not too much tuning,
I had to move to production mode.
Some folks on mailing lists said they'd get 90%, but the topmost group in Top500 get less
(as of mid-2009 it was ~75%, IIRR), probably because of their big networks
with stacked switches and communication overhead.
To optimize in a single node, apply also the formula for Nmax, using the node's RAM.
P and Q (block matrix decomposition) tend to be optimal when they are close to each other.
With Nehalem you may have to consider the extra complexity of
symmetric multi-threading (hyperthreading),
and whether it makes or doesn't make a difference on very regular problems like HPL,
with big loops and not much branching/ifs.
(Your real world computational chemistry problems probably are not like that.)
Have you tried HPL with and without SMT/hyperthreading?
It maybe worth testing on a single node at least.
I hope this helps.
On Jan 16, 2010, at 10:02 PM, Rahul Nabar wrote:
> On Thu, Jan 14, 2010 at 7:25 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>> First, to test, run HPL in a single node or a few nodes,
>> using small values of N, say 1000 to 20000.
>> The maximum value of N can be approximated by
>> Nmax = sqrt(0.8*Total_RAM_on_ALL_nodes_in_bytes/8).
>> This uses all the RAM, but doesn't get into memory paging.
>> Then run HPL on the whole cluster with the Nmax above.
>> Nmax pushes the envelope, and is where your
>> best performance (Rmax/Rpeak) is likely to be reached.
>> Try several P/Q combinations for Nmax (see the TUNING file).
> Thanks Gus! That helps a lot. I have Linpak running now on just a
> single server and am trying to tune and hit the Rpeak.
> I'm getting 62 Gflops but I think my peak has to be around 72 (2.26
> GHz 8 cores Nehalem). On a single server test do you manage to hit the
> theoretical peak?What's a good Rmax / Rpeak to shoot for while tuning?
> Once I am confident I'm well tuned on one server I'll try and extend
> it to the whole cluster.
More information about the Beowulf