[Beowulf] distributions

Wed Feb 8 11:13:27 PST 2006

On Mon, 6 Feb 2006, Geoff Jacobs wrote:
>
> Last time I used Scyld, it involved kickstart floppies, RARP bootup, and
> a patch to apply against MPICH for recompilation with the PGI compilers.
> They don't make 'em like they used to, thank God.

That was quite a few years ago.  We have kept the core architecture the 
same, and most of the APIs and configuration files are backwards 
compatibles, but most aspects are considerably more sophisticated.

The floppy boot wasn't Kickstart.  Instead it was the Scyld-developed 
Beoboot system.  Today we recommend PXE boot instead, not because it's 
technically better but because it's ubiquitous.

We developed the "Stage 1" part of Beoboot because network booting on PCs 
was rare and mostly unusable at the time.  Beoboot leveraged the large set 
of Linux network drivers to use Linux itself to do network booting.

We designed the Beoboot system for more than just floppy disks.  We were 
trying to solve the problem of every cluster booting in a different way.  
Beoboot was a small general purpose boot system that would work on CD-ROM, 
a hard disk partition, disk-on-chip, etc. and converge the different 
methods to a common network boot.

Beoboot has the essential attributes of cluster boot system:
  - fast, reliable boot
 - minimal hardware configuration to contact a master
 - works with any hardware supported by the run-time system
 - all run-time elements, including a new kernel, are retrieved over the 
    network.  Removing all traces of the boot programs and minimizing 
    policy removes the need to update (and break) a working boot system

Today Beoboot Stage 1 is largely obsoleted by PXE, which is on even the 
least expensive system.  While PXE has it's problems (that's a whole 
chapter!), a workable mediocre standard is still much better than many 
incompatible implementations.

> I've always thought of Scyld as more like one single appliance than a
> cluster. If you're updating the master node, you're sort of updating
> everything. Kernel being used on the slaves turns out to be exploitable,
> update the master, reboot the nodes. Problem solved.

The goal of clustering is to create a unified system out of independent 
pieces -- creating the illusion of a single system.  The more you have to 
administer machines as stand-alone boxes, the less of a cluster system you 
have.

---- 
Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993