[Beowulf] NFS+XFS+SMP on kernel 2.6

Robert G. Brown rgb at phy.duke.edu
Wed Jun 15 09:16:30 PDT 2005


Suvendra Nath Dutta writes:

> We set up a 160 node cluster with a dual processor head node with 2GB 
> RAM. The head node also has two RAID devices attached to two SCSI 
> cards. These have a XFS filesystem on them and are NFS exported to the 
> cluster. The head node runs very low on memory (7-8 MB). And today I 
> ran into a kernel bug that crashed the system. Google suggests that I 
> should upgrade to kernel 2.6.11, but that sounds very unpleasant. I am 

Why?  FC3 and now FC4 are running 2.6.11 as their current kernel.  If
you are using a sound distribution with an update stream and yum to do
your maintenance, moving to 2.6.11 should require you do to exactly --
nothing.  Well, reboot the nodes and server after yum does an automated
update to the new kernel, or do a yum update kernel by hand if you
(sensibly enough) prevent it from updating kernels on your servers
except by hand, so you have a chance to test the updates on a
prototyping server on the side first.

If you base your cluster on a sound rpm distro, yum will do pretty much
all of this sort of thing for you, or with absolutely minimal personal
effort.  I'm pretty sure that Centos/RHEL 4.x also uses 2.6.11 in its
current update stream, since I understand that is "is" FC4 but frozen
except for passback updates.

Another possibility, of course, is adding memory to just the server.
Expensive, but not amortized over 160 nodes.  Depending on your
motherboard, you can probably use 1 GB DIMMS, in which case it isn't
even that expensive.

> thinking of putting the raid boxes on a different box. Will separating 
> the file-server and the head node give me back stability on the head 
> node?

What do you mean by "head node" vs "file server"?  To put the question
another way, exactly what sort of load is being placed on the
server/head node by the two different functions.  This question has to
be answered for YOUR particular usage pattern before you can answer your
own question.

For example, suppose your users connect to the head node infrequently to
put a batch job in the queue, and those jobs typically run for a long
time before finishing, and produce results that the user can then
offload to their local workstation with a 1 second remote copy.  This
load is pretty much negligible, and splitting it off won't gain you much
of anything.  OTOH, suppose your users connect to the head node and run
jobs that support a real-time visualization socket back to their desktop
workstation that pumps all sorts of data in and out of the nodes on the
same network channel as the file store and adds insult to injury by then
REpumping it, somewhat digested on the head node, out to the remote
workstation.  This is now a (possibly) significant CPU and network load,
and splitting it off might well help.  Similar usage patterns might
emerge associated with the file store itself -- some patterns produce
very little server load so splitting it off is a waste of time, others
might saturate the server's capacity and splitting it off WOULD help.

Let's think very briefly about the two "trouble" cases.  Suppose that
your disk IS the culprit -- users are running jobs on all 160 nodes that
are constantly reading from and writing to disks.  In that case "just"
splitting off the RAID might not be enough -- you might have to
REARCHITECT the RAID to get greater parallelism, or put it on a better
network e.g. IB instead of ethernet.  Just splitting it off might
stabilize your head node per se (so it didn't crash) but leave you with
a disk server that crashed and jobs getting hung or delayed by disk
access bottlenecks.  Ditto analyzing your "head node" utilization at the
application level -- the right solution might be task reorganization
instead of just buying new hardware.  At the very least you'll have to
do some serious work to be able to KNOW that what you end up doing will
actually solve the problem.

HTH,

  rgb

> 
> Suvendra.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050615/98f3ca75/attachment.sig>


More information about the Beowulf mailing list