Questions and Sanity Check

Donald Becker becker at scyld.com
Fri Mar 2 06:57:13 PST 2001


On Thu, 1 Mar 2001, Dan Yocum wrote:
> Daniel Ridge wrote:
> > On Thu, 1 Mar 2001, Dan Yocum wrote:
> > > Daniel Ridge wrote:
> > > > For people who are spending a lot of time booting their Scyld slave nodes
> > > > -- I would suggest trimming the library list.
> > > >
> > > > This is the list of shared libraries which the nodes cache for improved
> > > > runtime migration performance. These libraries are transferred over to
> > > > the nodes at node boot time.
> > >
> > > Hm.  Wouldn't it be better (i.e., more efficient) to cache these libs on
> > > a small, dedicated partition...
..
> > Also, I think Amdahl's law kicks in and tells us that the potential
> > speedup is small in most cases (with respect to my trimming comment
> > above) and that there might be other areas that are worth more attention
> > in lowering boot times. On my VMware slave nodes, it costs me .5 seconds
>
> Hold it.  How big are the shared libs?  If they're tiny, then yeah,
> ferget it.  No big deal tranferring them over...

The cached libraries on the slave nodes are 10-40MB uncompressed.
That's on the order of 1 second of Fast Ethernet time to transfer the
compressed version.  The boot time isn't a significant issue.

A project that's on the "to do" list but not yet scheduled(*) is to
dynamically adjust the shared library list.

The Scyld Beowulf system could be booted with just a few cached elements
on the slaves, with frequently referenced libraries slowly added to the
cached list.

The existing caching technique isn't limited to libraries.  A subtle
aspect of the current ld.so design is that there is very little
difference between a library and an executable.  Full programs, say
a frequently-run 10MB simulation engine, could be cached on the slave
nodes without changing the code.

It's a larger step extending that concept to a persistent disk-based
cache.  We want to avoid that for philosophical reason: unless done
carefully, it reintroduces the risk of version skew, and there is a
slippery slope returning to the old full-node-install model.

(*) Yes, that's a hint to anyone looking for a project.

> > to transfer my libraries but still takes me the better part of a minute to
> > get the damn BIOS out of the way.
> 
> Well, yeah, there is that.  Have you tried running Beowulf2 on machines
> with Linux BIOS?  Now that'd be cool to see - a Beowulf cluster come up
> in 3 seconds.  :)

Ron Minnich uses Scyld Beowulf with his LinuxBIOS work.  He was demoing
the resulting "instant boot" clusters at SC2000 and the Extreme Linux
Developers Forum last week.  Some tuning must be done to reach a 3
second boot time -- some device drivers have needless delays and IDE
disks might take long time to respond after a reset.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993





More information about the Beowulf mailing list