Questions and Sanity Check

Jag agrajag at linuxpower.org
Fri Mar 2 10:38:43 PST 2001


On Fri, 02 Mar 2001, Donald Becker wrote:

> A project that's on the "to do" list but not yet scheduled(*) is to
> dynamically adjust the shared library list.
> 
> The Scyld Beowulf system could be booted with just a few cached elements
> on the slaves, with frequently referenced libraries slowly added to the
> cached list.
> 
> The existing caching technique isn't limited to libraries.  A subtle
> aspect of the current ld.so design is that there is very little
> difference between a library and an executable.  Full programs, say
> a frequently-run 10MB simulation engine, could be cached on the slave
> nodes without changing the code.
> 
> It's a larger step extending that concept to a persistent disk-based
> cache.  We want to avoid that for philosophical reason: unless done
> carefully, it reintroduces the risk of version skew, and there is a
> slippery slope returning to the old full-node-install model.

I'm not sure hwo the caching of actual programs would work, but the
dynamic caching of libraries sounds like a really good idea, especially
for the people running diskless nodes.  This way they only need a very
small ram disk for the library caching, and if they run out of space on
the ramdisk, the caching system should helpfully be able to remove the
less used libraries in favor of the new ones.

However, I can see the full-node-install problem that you run into if
the slave nodes have a local hd for caching as they will have enough
space that they'll probablly never have to remove libraries to save
space.  A possible solution is to have them wipe the harddrives every
boot, however that's still similar to if you had a system where
everytime a slave node booted it dd'ed a full install image onto the hd.

There is another problem that I'm not really sure if its covered even by
the current method.  What happens when you update a cached library on
the master node?  Should you have to reboot the slave nodes try to clear
their cache, or run a program that simply recaches all the libraries on
the slave nodes?  or run a program that just updates the cache for the
libraries you specify? (this last one can be dangerous if a sysadmin
does rpm -Uhv libfoo.rpm but doesn't check to see if the rpm actually
had more than one library in it)


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010302/b93f13ec/attachment.sig>


More information about the Beowulf mailing list