[Beowulf] more automatic building

Bill Broadley bill at cse.ucdavis.edu
Thu Sep 29 22:04:00 PDT 2016


On 09/28/2016 07:34 AM, Mikhail Kuzminsky wrote:
> I worked always w/very small HPC clusters and built them manually
> (each server).

Manual installs aren't too bad up to 4 nodes or so.

> But what is reasonable to do for clusters  containing some tens or
> hundred of nodes ?

We use cobbler for DHCP, bootp, DNS, and PXE boot.  It's nice to have a
single database for mac address, IP address, hostname, etc.  We have a
profile per OS.  Then we use cobbler to netboot CentOS or Debian family
OSs and part of the installation.

The installation installs puppet which handles:
* Which users can login to which hardware
* Distribution of ssh keys
* Installation of packages, services, monitoring, etc.
* Tweaking initd/systemd scripts, pam, ulimit, etc.
* Managing autofs and friends.

> But it looks that ROCKS don't support modern interconnects, and there
> may be problems
> w/OSCAR versions for support of systemd-based distributives like CentOS
> 7. For next year -
> is it reasonable to wait new OSCAR version or something else ?

The hard part of supporting an HPC side of things is the users/apps, I
think of installation and configuration of hardware fairly minor.
Personally I'd just pick an OS that best suites your user/application
needs.  PXE boot + cobbler with whatever linux OS is really not a big deal.

With the above it's easy to write a small script to shutdown a node,
turn on netboot, power on the node (assuming IPMI works), install on
boot, reboot into OS, NFS mount, run slurmd daemon, and be back in
production.



More information about the Beowulf mailing list