[Beowulf] hpl size problems

Tue Sep 27 07:50:29 PDT 2005

Greg M. Kurtzer wrote:
> On Sat, Sep 24, 2005 at 12:10:46PM -0400, Mark Hahn wrote:

[...]

>>> hours) running on Centos-3.5 and saw a pretty amazing speedup of the
>>> scientific code (*over* 30% faster runtimes) then with the previous
>>> RedHat/Rocks build. Warewulf also makes the cluster rather trivial to
>> such a speedup is indeed impressive; what changed?
> 
> Actually, we used the same kernel (recompiled from RHEL), and exactly the
> same compilers, mpi and IB (literally the same RPMS). The only thing
> that changed was the cluster management paradigm. The tests were done
> back to back with no hardware changes.

If these were NUMA machines (Opterons specifically), you need to 
worry/watch for processor affinity issues.  You can get streams like 
programs hopping from CPU to CPU, which results in using the HT path + 
memory controller on remote CPU as well as the memory controller on the 
local CPU.  We have seen 30ish% performance differences between the two 
(on memory latency/bandwidth bound codes running multiple threads on the 
NUMA).

We also see benchmark cases where the memory system was improperly set 
up or configured.  Most of these are due to lack of readily available 
information (if your goal is to compare realistic performance of real 
codes the way people will run them, you don't start out with a 
mis-configured system, say with all the memory on the Opteron system 
tied to one CPU ... seen that one quite a bit ... ).

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615