[Beowulf] Bright Cluster Manager
chris at csamuel.org
Fri May 4 07:30:26 PDT 2018
On Thursday, 3 May 2018 5:52:41 AM AEST Jeff White wrote:
> Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way more than
> they help IMO.
To my mind having built clusters with xCAT and then used systems that have
been done in a DIY manner I always run into tooling that I'm missing with the
latter. Usually around node discovery (and BMC config), centralised logging and
IPMI/HMC tooling (remote power control, SoL console logging, IPMI sensor
information, event logs, etc).
Yes you can roll your own there, but having a consistent toolset that takes
the drudgery out of rolling your own and means you don't need to think "wait,
is this an IPMI v2 node or managed via an HMC?" and then use different methods
depending on the answer is a big win.
It's the same reason things like EasyBuild and Spack exist; we've spent
decades building software from scratch and creating little shell scripts to do
the config/build for each new version, but abstracting that and building a
framework to make it easy is a good thing at scale. It also means you can
add things like checksums for tarballs and catch projects that re-release
their 1.7.0 tarball with new patches without changing the version number (yes,
TensorFlow, I'm looking at you).
But unpopular opinions are good, and the great thing about the Beowulf
philosophy is that there is the ability to do things your own way. It's like
building a Linux system with Linux From Scratch, yes you could install Ubuntu
or some other distro that makes it easy but you learn a hell of a lot from
doing it the hard way - and anyone with a strong interest in Linux should try
that at least once in their life.
Aside: Be aware if you are using Puppet that some folks on the Slurm list have
found that when it runs it can move HPC jobs out of the Slurm control group.
All the best,
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the Beowulf