[Beowulf] Compute Node OS on Local Disk vs. Ram Disk

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Thu Oct 2 05:03:30 PDT 2008


On Wed, 1 Oct 2008, Donald Becker wrote:

> That's correct.  Our model is that a "cluster" is a single system -- 
> and a single install.

That's the idea that I've also started with, almost 10 years ago ;-) 
Not using Beo*/bproc, but NFS-root which allowed a single install in 
the node "image" to be used on all nodes - although you'd probably 
call this 2 system installs (the master node itself and the node 
"image"). But over the course of the years I have changed my mind...

> If you are running different distributions on nodes, you discard 
> many of the opportunities of running a cluster.  More importantly, 
> it's much more knowledge- and labor-intensive to maintain the 
> cluster while guaranteeing consistency.

It indeed requires more work, however in some cases it cannot be 
avoided. From my own experience: a quantum chemistry program was 
distributed some 5 years ago as a binary statically compiled on RH9 or 
RHEL3 (kernel 2.4 based) with MPICH included. This meant that when I 
wanted to switch to running a 2.6 kernel this program could not run 
anymore so some of the nodes had to be kept to an older distribution 
until a newer program version could be obtained (that took about a 
year); it also meant that whenever there were discussions about using 
higher performance interconnects than GigE, this software's users were 
insisting on buying more nodes rather than a faster interconnect. This 
situation has caused both technical and administrative issues and the 
possibility of running different distributions has solved all of them 
easily.

Having the possibility to run several distributions side-by-side 
requires spending some effort in organizing the other installed 
software, normally shared through NFS or a parallel FS to the nodes. 
But once you make the jump from 1 to 2, you might as well make it from 
1 to many.

This leads me to observe that we have non-similar points of view: you 
are a maker of a cluster-oriented distribution, trying to promote it 
and its underlying ideas (which are fine ideas, no question about that 
:-)), and sure that it works because it was bought and used 
successfully. I, on the other hand, have to find solutions to keep the 
scientists productive (whatever productive means ;-)) and to keep them 
as far as possible from the system details so that they can 
concentrate on their work. So it's not surprising that we come to 
different conclusions - at least they sustain an interesting 
discussion :-)

I would be interested to hear Mark Hahn's opinion on this, as from how 
he presented himself to this list it seemed to me that he is in a very 
similar position to mine: supporting a variety of users with a variety 
of needs. But others should not feel left out, write your opinions as 
well ;-)

> Most distributions (all the commercially interesting ones) are 
> workstation-oriented

I don't really agree with this statement (looking at RHEL and SLES), 
but anyone who installs a workstation-oriented distribution on a 
cluster node gets what (s)he pays for :-) I have seen very recently 
(identity hidden to protect the guilty ;-)) such a node "image" which 
contained OpenOffice - to be fair, it was used via NFS-root so it 
wasn't wasting node memory, only master disk space...

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de



More information about the Beowulf mailing list