[Beowulf] hpl size problems

Greg M. Kurtzer gmkurtzer at lbl.gov
Mon Sep 26 10:29:31 PDT 2005


On Sat, Sep 24, 2005 at 12:10:46PM -0400, Mark Hahn wrote:

[..snip..]
> > (we like to think Warewulf had something to do with that ;).
> 
> also interesting, why?

Well, first off, I am biased ;). Putting that aside, I have given this
thought, and considering I don't use other cluster implementation
methods, I can only hypothesize...

Warewulf by default creates the virtual node file system to be extremely
minimal yet fully functional and tuned for the job at hand (which exists
in a hybrid RAM/NFS file system). The nodes are lightweight in both file
system and process load (context switching and cache management can be
expensive especially on a non-NUMA SMP systems with lots of cache). The
more daemons and extra processes that are running, the higher the
process load and context switching that must occur.

It reminds me of chapter 1 of sysadmin 101: Only install what you *need*

If someone else also has thoughts as to what would have caused the
speedup, I would be very interested.

> > hours) running on Centos-3.5 and saw a pretty amazing speedup of the
> > scientific code (*over* 30% faster runtimes) then with the previous
> > RedHat/Rocks build. Warewulf also makes the cluster rather trivial to
> 
> such a speedup is indeed impressive; what changed?

Actually, we used the same kernel (recompiled from RHEL), and exactly the
same compilers, mpi and IB (literally the same RPMS). The only thing
that changed was the cluster management paradigm. The tests were done
back to back with no hardware changes.

> > We did find that symbol errors in the fabric are very common if anyone
> > "breathes" on the wire plant and cause drastic changes in performance.
> 
> hmm, interesting.  I guess that sensitivity would also apply to other 
> HSI that uses the same phy layers (infinipath, rapidarray).  does anyone
> have a myri-10G cluster they can comment on?  (I'm happy with the robustness
> of my myri-2g and also with my older quadrics.)

Good point.
-- 
Greg Kurtzer
Berkeley Lab, Linux guy



More information about the Beowulf mailing list