<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <title></title>

</head>

<body>

I have found downsides to the Beowulf 1 and 2 model so I am doing something

in between...<br>

<br>

The reliability/upgrade solution I have found is to have multiple single

system images, mirrored disks and perhaps multiple boot image and file servers.

 I have found several benefits of a single system image (well almost single;

I actually use a separate NFS mounted root fs, minus /usr which is shared,

for each system.  This ends up being ~70MB per node).  In my environments

we have the need to have several clusters (or a cluster of subclusters).

 Using netbooting I can easily maintain multiple subclusters of various types/arch

from a single boot server.  On one of my cluster of clusters I am booting

2 alpha subclusters and 5 intel subclusters from the same boot server.  Each

subcluster can be served different kernels and be running different versions

of linux.  The largest of these cluster of clusters is managing/booting 104

intel and alpha nodes booting from a single boot server.<br>

<br>

The solution I have found to upgrading the boot server is to have 2 boot

servers.  When it is time to upgrade, I reconfigure the new boot server with

the new OS and get everything setup for the client images.  Then it is just

a case of swapping the boot server and rebooting all of the nodes on the

cluster (if you have a cluster where it never can be all rebooted then having

2 or more active boot servers is the solution).  I maintain the cluster consistency

and configuration using a set of custom scripts.  I have also found that

in large clusters it is best to separate the boot server from the file server

so that the load on the boot server does not get too high affecting it's

ability to share root file systems.  It is also very easy to setup multiple

file servers for various subclusters.<br>

<br>

Another concern which people sometimes have is that netbooting will create

too much traffic on the network.  I have found that the traffic is very low

once the nodes have booted, considerably less then 1% of the network is used

for sending NFS messages from the boot server to the clients.  I have not

seen any measurable loss of system/network performance using this netboot

model (compared to a Beowulf 1 model) over a single 100mbs switched network

connection to each node.<br>

<br>

This model also easily supports the use of 3rd party drivers/modules on an

individual node or subcluster basis.  This is simply configured on the boot

server via subcluster management scripts.  In one of my subclusters I am

running Myrinet on all of the nodes.  The servers are connected via Gigabit

to a hierarchy of subcluster switches.<br>

<br>

Here is my pro/con list for netbooting with a single system image:<br>

<br>

Pros:<br>

* All configuration and setup is maintained on the boot server<br>

* Client node setup is much faster<br>

* Client nodes can be run diskless or with just swap/scratch space.  Node

disk failures do not require OS reinstall.  In a diskless environment reliability

is greatly improved, especially with large clusters (100s of nodes), were

disk failure becomes more common.<br>

* OS and configuration for all clients can be mirrored on the boot server.<br>

<br>

Cons:<br>

* The boot server is a single point of failure (one solution is multiple

servers)<br>

* Setup is more technically complex (in the long run it is much easier to

manage I have found)<br>

<br>

<br>

My $0.04<br>

<br>

<br>

--JIM<br>

<br>

Josip Loncaric wrote:<br>

<blockquote type="cite" cite="mid3D90BA77.A3A47635@icase.edu">

  <pre wrap="">~snip~

  </pre>

  <blockquote type="cite">

    <pre wrap="">                            The Beo 2 model also provides single system

image with regard to process space, and methods to manage remote processes

from a central point of control.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

This is an attractive feature, but in my personal view, not essential.  I'm

reasonably happy with separate process spaces and periodic central collection

of machine status data.  We operate a heterogeneous cluster built in several

phases with different hardware and different networks, so it makes more sense

for me to consider each class of machines separately.  On the other hand, a

central point of control and unified view of the entire cluster open the

possibility of rather sophisticated management.

In my experience, the most time consuming task in operating a Beowulf 1

cluster is the initial node installation or reinstallation after suffering

disk damage.  The Beowulf 2 model makes this much easier.  The second most

annoying problem is loss of network connectivity (usually due to some hardware

glitch or network driver flakiness under heavy load).  The Beowulf 2 model

depends on network connectivity more than the Beowulf 1 model, where one can

operate the problematic node in stand-alone mode and debug the problem.  The

third issue on my list is specialized hardware with its own drivers and

daemons, which are usually designed for Beowulf 1 operation but may be

difficult to convert to the Beowulf 2 model.  Finally, the issue of process

startup time may concern some people: the Beowulf 1 model usually goes through

standard Linux login, which is typically slower than the Beowulf 2 model,

where all processes are started on the head node and then immediately migrated

to the appropriate compute node.

Sincerely,

Josip

P.S.  Our cluster (almost) never goes down, even during system upgrades. 

Robustness comes from redundancy and compartmentalization.  We typically

upgrade one section of the system, monitor new software for bugs it

(re)introduced, find fixes, then complete the cluster upgrade.  Multiple

system images allow this kind of experimentation.  The single system image

approach is much more dependent on this single image working reliably on all

of your hardware immediately.  One of the following two approaches may appeal

to you:

1) Don't keep all of your eggs in one basket 

   -->multiple images, multiple versions

2) Keep all of your eggs in one basket and WATCH that basket 

   -->thoroughly debugged single image, no version skew

P.P.S.  Version skew can begin before Linux gets loaded, in BIOS.  Diagnosing

that one machine in a hundred becomes flaky under heavy load because its BIOS

settings are subtly different is a very time consuming process...  By

comparison, detecting and fixing Linux version skew is easy (start by checking

/var/log/rpmpkgs, automatically generated daily on each machine).

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="$mailwrapcol">-- 

 -----------------------------------------------------------------------

 James W. Matthews - UNIX System Administration / Beowulf Cluster Design

 Raytheon Technical Services Company - NASA Langley Research Center

 MS 128 - 18E West Taylor Street - Hampton, VA 23681

 E-Mail: <a class="moz-txt-link-abbreviated" href="mailto:J.W.Matthews@LaRC.NASA.GOV">J.W.Matthews@LaRC.NASA.GOV</a> - Phone: (757) 864-5259

 -----------------------------------------------------------------------

</pre>

<br>

</body>

</html>