[Beowulf] Clusters and Distro Lifespans

Wed Jul 19 07:44:17 PDT 2006

> > ii) Would it be better to develop our own installation process for
> > clusters so that upgrades, in terms of distros, can be rolled out
> > easily?  I feel like i'm tied in some way to the supplier of our
> > cluster for upgrades.
>
> Hmmm.... you would be re-inventing this wheel, which has been
> re-invented many times.

Yes and each person who re-invents the wheel then knows how a wheel
works and isn't stuck using the same wheel regardless of its
limitations.  I'm a strong advocate of building your own install
method.  1) Its fun 2) its easy 3) Your learn a lot 4) you can
integrate your method into your own management style.

eg. we are able to do a rolling upgrade of our cluster without
downtime and without users even noticing.  We have it tightly coupled
with our queueing software and when a node becomes free it gets
re-installed.

We also have our install process configured to allow booting different
distros/images, which is useful to boot diagnostic cd images etc.

> > iii) Do people regularly upgrade their clusters in relation to
> > distros?  I guess this is like asking how long is a piece of string
> > because everyone's needs are different.
>
> Cluster upgrades are rare unless you are missing functionality or
> something is broken.  That is of course one opinion, some here do
> upgrades nightly.  From a purely production oriented viewpoint, where
> downtime == lost money for our customers, we usually advise against that.

I think rare is a strong word.  Infrequent may be better.  We
regularly apply patches and upgrades to the front end nodes (globally
connected) and infrequently (~ every 6 months) upgrade all the cluster
nodes in the rolling fashon mentioned above.

You can even do a kernel upgrades to the file servers/front end nodes
(which requires a reboot) without killing or disrupting jobs.  Having
complete control has a lot of benefits.

-- 
Dr Stuart Midgley
sdm900 at gmail.com