becker at scyld.com
Tue Dec 5 12:19:05 PST 2000
On Wed, 6 Dec 2000, Bruce Janson wrote:
> From: Daniel Ridge <newt at scyld.com>
> On Wed, 6 Dec 2000, Bruce Janson wrote:
> > Like you, installing makes me grumpy too, so I try not to do it
> > more than once. Ideally all of our compute servers would share
> > the same (network) file system. There are ways of doing this
> > now (typically via NFS) but they tend to be hand-crafted and
> > unpopular.
> > In particular, I notice that the recent Scyld distribution
> > assumes that files (libraries if I remember rightly) will be
> > installed and available on the local computer.
> > Why do people want to install locally? (Scyld people in particular
> > are encouraged to reply.)
> While it is true that our (Scyld's) distribution places some files
> on target nodes, the total volume is pretty tiny (a couple of tens of
> megabytes for now, less in the future). These files, essentially
> all shared libraries, are placed on the nodes just as a cache and
> are not 'available' from most useful perspectives. They are 'available'
> for a remote application to 'dlopen()' or certain other dynamic link
> Yes, but storing any files locally suggests that you don't trust the
> kernel's network file system caching. Is that why? If so, in what
> way does such caching fail?
> Sounds like you don't use a network file system at all,
> which in itself is an interesting decision.
> Care to give some reasons?
A common misperception when people first see the Scyld Beowulf system is
that it is based on a NFS root scheme.
Using a NFS root has several problems:
NFS is very slow
NFS is unreliable
NFS file caching has consistency and semantic problems.
Instead our model is based on a ramdisk root and cache, along with using
'bproc' to migrate processes from a master node.
All of the NFS problems are magnified and multiplied when working on a
Beowulf cluster. Unlike a workstation network, where users are idle on
average and working on different jobs, a cluster is all about hot spots.
The NFS server quickly becomes a major serialization point. (The same
observation is true of a NIS/Yellow-Pages server: when starting a cluster
job, every processor will try to scan the password list at the same time.)
While a ramdisk root initially sounds like a waste of memory, the semantics
of a ramdisk fits very well with what we are doing. The Linux ramdisk code
is unified with the buffer cache, rather than a separate page cache. The
files cached in the root ramdisk are mostly contain hot pages on a running
system: the "/", /etc, /lib and /dev directories, and the common libraries.
The unified cache means that rather than costing performance by wasting 40MB
of ramdisk memory, we have only have a few MB of dead pages. In some cases
we have a performance improvement over a local disk root by effectively
wiring down start-up library pages that tend to FIFO-thrash.
> In addition to shared libraries, we also place a number of entries
> for '/dev' on the nodes.
> Well, now that you mention /dev, why don't you use devfs to automatically
> populate your nodes' /dev directories?
The devfs addition is controversial, at best. It would make our system
smaller and less complex, and the drawback of retaining non-standard device
ownership and permissions is much reduced on a slave node. But we don't
want to introduce what some perceive as a gratuitous change on top of our
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
More information about the Beowulf