To my mind having built clusters with xCAT and then used systems that have 
been done in a DIY manner I always run into tooling that I'm missing with the 
latter. Usually around node discovery (and BMC config), centralised logging and 
IPMI/HMC tooling (remote power control, SoL console logging, IPMI sensor 
information, event logs, etc).

Yes you can roll your own there, but having a consistent toolset that takes 
the drudgery out of rolling your own and means you don't need to think "wait, 
is this an IPMI v2 node or managed via an HMC?" and then use different methods 
depending on the answer is a big win.

It's the same reason things like EasyBuild and Spack exist; we've spent 
decades building software from scratch and creating little shell scripts to do 
the config/build for each new version, but abstracting that and building a 
framework to make it easy is a good thing at scale.   It also means you can 
add things like checksums for tarballs and catch projects that re-release 
their 1.7.0 tarball with new patches without changing the version number (yes, 
TensorFlow, I'm looking at you).

But unpopular opinions are good, and the great thing about the Beowulf 
philosophy is that there is the ability to do things your own way.  It's like 
building a Linux system with Linux From Scratch, yes you could install Ubuntu 
or some other distro that makes it easy but you learn a hell of a lot from 
doing it the hard way - and anyone with a strong interest in Linux should try 
that at least once in their life.

Aside: Be aware if you are using Puppet that some folks on the Slurm list have 
found that when it runs it can move HPC jobs out of the Slurm control group.

