[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Sun Aug 9 20:42:25 PDT 2009

On Sun, Aug 9, 2009 at 9:34 PM, Gus Correa<gus at ldeo.columbia.edu> wrote:

> See answers inline.

Thanks!

> So it is to me.
> The good news is that according to all reports I read,
> hyperthreading in Nehalem works well

What I am more concerned about is its implications on benchmarking and
schedulers.

(a) I am seeing strange scaling behaviours with Nehlem cores. eg A
specific DFT (Density Functional Theory) code we use is maxing out
performance at 2, 4 cpus instead of 8. i.e. runs on 8 cores are
actually slower than 2 and 4 cores (depending on setup)

Just doesn't make sense to me. We are indeed doing something wrong.
And no, it isn't just bad parallelization of this code since we have
ran it on AMDs and of course performance increases with cores on a
single server for sure.

(b) We usually set up Torque / PBS / maui to also allow partial server
requests. i.e. somebody could say just get 4 cores on a server. The
other four cores could go to another job or stay empty. Question is
with hyperthreading this compartmentalization is lost isn't it? So
userA who got 4 cores could end up leeching on the other 4 cores too?
Or am I wrong?

>
> Which MPI do you use?

OpenMPI

> IIRR, you have Gigabit Ethernet, right? (not Infiniband)

Yes. That's right. No infiniband.

> If you use OpenMPI, you can set the processor affinity,
> i.e. bind each MPI process to one "processor" (which was once
> a CPU, then became a core, and now is probably a virtual
> processor associated to the hyperthreaded Nehalem core).
> In my experience (and other people's also) this improves
> performance.

Yup, good point. I have done this with Barcelonas (AMD) and had a 5%
boost. Let me try it with the Nehalems too.

>
> It is possible that this is the result of not setting
> processor affinity.
> The Linux scheduler may not switch processes
> across cores/processors efficiently.

So let me double check my understanding. On this Nehalem if I set the
processor affinity is that akin to disabling hyperthreading too? Or
are these two independent concepts?

> (Not sure you actually have 24GB or 16GB, though.
> You didn't say how much memory you bought.)

I am running two tests. machineA has 24 GB machineB has 16GB. But
other things change too. machineA has the X5550 whereas machineB has
the E5520.

I'll post the results once I have them for the Nehalems! Thanks again,
Gus. All very helpful.
-- 
Rahul