[Beowulf] Cluster install and admin approach (newbie question)
hahn at mcmaster.ca
Fri Aug 28 08:07:32 PDT 2009
> * if the /var filesystem is shared, race conditions happen (all nodes
> want to write on the same files). I had this problem and moved to a
> local /var filesystem.
indeed, shared /var is simply a bug. non-shared NFS /var is viable,
but generally pointless.
> * if /var is local (which it may because the disks do exist), the
> whole point of central point for easy admin vanishes, because I would
> had to create all the /var structure that packages need to work, on
> each node (would be easier to do: "for $node; ssh $install_cmd; done",
> than guessing which dirs I need to create or files to copy).
but if your nodes are nfs-root, you won't be installing anything on them:
you'll be installing on the nfs-root.
> * if /var is tmpfs all forensics are certainly gone after failure
> (Murphy told me this one ;).
syslog is very happy to log over the network.
> Everything I read on the subject do underline the advantages of
> diskless approaches but miss to alert to this problem and/or to solve
> it. On the other side, the distributed approach tools (where every
> node is autonomous) seem to be halted (as systemimager - which is used
> in the Oscar project) or discontinued, or truly overblown for my
> reference scale (IBM's xCat); so it really seems that I'm missing
there's also OneSIS.
> The question is what you do about this ?
setting up your own nfs-root cluster is a simple exercise. if you're not
very familiar with *nix booting/daemons/init scripts, it will take a few
tries to get the config right, but the end result is pretty simple and
robust. remote syslog, preferably with console-over-net (ipmi sol,
netconsole) means that there's nothing interesting on the local /var.
More information about the Beowulf