HD cloning

Tue Dec 5 12:19:05 PST 2000

On Wed, 6 Dec 2000, Bruce Janson wrote:
> From: Daniel Ridge <newt at scyld.com>
>     On Wed, 6 Dec 2000, Bruce Janson wrote:
>     > Like you, installing makes me grumpy too, so I try not to do it
>     > more than once.  Ideally all of our compute servers would share
>     > the same (network) file system.  There are ways of doing this
>     > now (typically via NFS) but they tend to be hand-crafted and
>     > unpopular.
>     > In particular, I notice that the recent Scyld distribution
>     > assumes that files (libraries if I remember rightly) will be
>     > installed and available on the local computer.
>     > Why do people want to install locally?  (Scyld people in particular
>     > are encouraged to reply.)
>     
>     While it is true that our (Scyld's) distribution places some files
>     on target nodes, the total volume is pretty tiny (a couple of tens of
>     megabytes for now, less in the future). These files, essentially
>     all shared libraries, are placed on the nodes just as a cache and
>     are not 'available' from most useful perspectives. They are 'available'
>     for a remote application to 'dlopen()' or certain other dynamic link
>     operations.
> Yes, but storing any files locally suggests that you don't trust the
> kernel's network file system caching.  Is that why?  If so, in what
> way does such caching fail?
...
> Sounds like you don't use a network file system at all,
> which in itself is an interesting decision.
> Care to give some reasons?

A common misperception when people first see the Scyld Beowulf system is
that it is based on a NFS root scheme.
Using a NFS root has several problems:
   NFS is very slow
   NFS is unreliable
   NFS file caching has consistency and semantic problems.

Instead our model is based on a ramdisk root and cache, along with using
'bproc' to migrate processes from a master node.

All of the NFS problems are magnified and multiplied when working on a
Beowulf cluster.  Unlike a workstation network, where users are idle on
average and working on different jobs, a cluster is all about hot spots.
The NFS server quickly becomes a major serialization point.  (The same
observation is true of a NIS/Yellow-Pages server: when starting a cluster
job, every processor will try to scan the password list at the same time.)

While a ramdisk root initially sounds like a waste of memory, the semantics
of a ramdisk fits very well with what we are doing.  The Linux ramdisk code
is unified with the buffer cache, rather than a separate page cache.  The
files cached in the root ramdisk are mostly contain hot pages on a running
system: the "/", /etc, /lib and /dev directories, and the common libraries.

The unified cache means that rather than costing performance by wasting 40MB
of ramdisk memory, we have only have a few MB of dead pages.  In some cases
we have a performance improvement over a local disk root by effectively
wiring down start-up library pages that tend to FIFO-thrash.

>     In addition to shared libraries, we also place a number of entries
>     for '/dev' on the nodes.
> Well, now that you mention /dev, why don't you use devfs to automatically
> populate your nodes' /dev directories?

The devfs addition is controversial, at best.  It would make our system
smaller and less complex, and the drawback of retaining non-standard device
ownership and permissions is much reduced on a slave node.  But we don't
want to introduce what some perceive as a gratuitous change on top of our
required changes.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993