becker at scyld.com
Wed Feb 8 11:13:27 PST 2006
On Mon, 6 Feb 2006, Geoff Jacobs wrote:
> Last time I used Scyld, it involved kickstart floppies, RARP bootup, and
> a patch to apply against MPICH for recompilation with the PGI compilers.
> They don't make 'em like they used to, thank God.
That was quite a few years ago. We have kept the core architecture the
same, and most of the APIs and configuration files are backwards
compatibles, but most aspects are considerably more sophisticated.
The floppy boot wasn't Kickstart. Instead it was the Scyld-developed
Beoboot system. Today we recommend PXE boot instead, not because it's
technically better but because it's ubiquitous.
We developed the "Stage 1" part of Beoboot because network booting on PCs
was rare and mostly unusable at the time. Beoboot leveraged the large set
of Linux network drivers to use Linux itself to do network booting.
We designed the Beoboot system for more than just floppy disks. We were
trying to solve the problem of every cluster booting in a different way.
Beoboot was a small general purpose boot system that would work on CD-ROM,
a hard disk partition, disk-on-chip, etc. and converge the different
methods to a common network boot.
Beoboot has the essential attributes of cluster boot system:
- fast, reliable boot
- minimal hardware configuration to contact a master
- works with any hardware supported by the run-time system
- all run-time elements, including a new kernel, are retrieved over the
network. Removing all traces of the boot programs and minimizing
policy removes the need to update (and break) a working boot system
Today Beoboot Stage 1 is largely obsoleted by PXE, which is on even the
least expensive system. While PXE has it's problems (that's a whole
chapter!), a workable mediocre standard is still much better than many
> I've always thought of Scyld as more like one single appliance than a
> cluster. If you're updating the master node, you're sort of updating
> everything. Kernel being used on the slaves turns out to be exploitable,
> update the master, reboot the nodes. Problem solved.
The goal of clustering is to create a unified system out of independent
pieces -- creating the illusion of a single system. The more you have to
administer machines as stand-alone boxes, the less of a cluster system you
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf