becker at scyld.com
Mon Feb 6 16:07:50 PST 2006
On Fri, 3 Feb 2006, Geoff Jacobs wrote:
> Robert G. Brown wrote:
> > Note well that for most folks, once you get a cluster node installed and
> > working, there is little incentive to EVER upgrade the OS except for
> > administrative convenience. We still have (I'm ashamed to say) nodes
> > running RH 7.3. Why not? They are protected behind a firewall, they
> > are stable (obviously:-), they work.
> Hmm... crispy on the outside, tender on the inside.
> You have to have an OS installed on your nodes for which security
> updates are available
I'll disagree with that point.
The Scyld Beowulf system introduced a unique design that has "slave
nodes" with nothing installed and no network file system required.
This avoids most security problems, including the need for updates,
because the vulnerable services and programs just aren't there.
Only master nodes have an OS distribution. The other nodes are
server slaves or compute slave. At boot time they are given a kernel by a
master, queried by the master for their installed hardware, and then given
device drivers to load. When it's time to start a server or compute
process, they are told to the versions of libraries and executable for
the process that they should cache or reuse from a hidden area.
The hidden area is usually a ramdisk file system. The objects there are
only libraries and executables, tracked by version information, so
that they may be reused and shared.
One way to think of the result is a dynamically generated minimal
distribution that has only the specific kernel, device drivers,
application libraries and executables needed to run an application or
service. But it's even simpler than that, since the node doesn't even
have initialization and configuration files.
> down. You should know as well as I do that your users are scientists and
> academics, they are not security professionals. They'll pick bad
> passwords, log in from Winblows terminals which have been infected with
> virae, keyloggers, etc. In short, you lower the security of the system
> to that of your least secure user.
> At least if your systems are updated, the chance of an attack escalating
> priviledge and making the situation serious is small. You can give Mr.
> Sloppy The Lecture, restore his files from backup, and be on your merry way.
Or you can have compute nodes with only the application running. No
daemons are running to break into, and there are no local
system configuration files to hide persistent changes e.g. cracks that
start up after a reboot.
You still have to keep masters updated, but they are few (just enough for
redundancy) and in an internet server environment they don't need to be
exposed to the outside world ("firewalled" by the single-purpose slaves
with zero installations).
A final note: The magic of a dynamic "slave node" architecture isn't a
characteristic of the mechanism used to start or control processes. It
does interact with the process subsystem -- it needs to know what to cache
(including the specific version) to start up processes accurately. But the
other details, such as the library call interface, process control,
security mechanism, naming and cluster membership are almost unrelated.
Nor does "ramdisk root" give you the magic. A ramdisk root is part of how
we implement the architecture, especially the part about not requiring local
storage or network file systems to work. (Philosophy: You mount file
systems as needed for application data, not for the underlying system.)
But to be fast, efficient and effective the ramdisk can't just be a
stripped-down full distribution. You need a small 'init' system and
a dynamic, version-based caching mechanism. Otherwise you end up with
lots of wasted memory, version skew and still have a crippled compute node
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf