[Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

John Hearns john.hearns at streamline-computing.com
Mon Mar 19 09:42:44 PDT 2007


Brian R. Smith wrote:
> Hey list,
>
> 1. Proprietary parallel storage systems (like Panasas, etc.):  It 
> provides the per-node bandwidth, aggregate bandwidth, caching 
> mechanisms, fault-tolerance, and redundancy that we require (plus having 
> a vendor offering 24x7x365 support & 24 hour turnover is quite a breath 
> of fresh air for us).  Price point is a little high for the amount of 
> storage that we will get though, little more than doubling our current 
> overall capacity.  As far as I can tell, I can use this device as a 
> permanent data store (like /home) and also as the user's scratch space 
> so that there is only a single point for all data needs across the 
> cluster.  It does, however, require the installation of vendor kernel 
> modules which do often add overhead to system administration (as they 
> need to be compiled, linked, and tested before every kernel update).

If you like Panasas, go with them.
The kernel module thing isn't all that a big deal - they are quite 
willing to 'cook' the modules for you.
but YMMV


> 
> Our final problem is a relatively simple one though I am definitely a 
> newbie to the H.A. world.  Under this consolidation plan, we will have 
> only one point of entry to this cluster and hence a single point of 
> failure.  Have any beowulfers had experience with deploying clusters 
> with redundant head nodes in a pseudo-H.A. fashion (heartbeat 
> monitoring, fail-over, etc.) and what experiences have you had in
> adapting your resource manager to this task?  Would it simply be more 
> feasible to move the resource manager to another machine at this point 
> (and have both headnodes act as submit and administrative clients)?  My 
> current plan is unfortunately light on the details of handling SGE in 
> such an environment.  It includes purchasing two identical 1U boxes 
> (with good support contracts).  They will monitor each other for 
> availability and the goal is to have the spare take over if the master 
> fails.  While the spare is not in use, I was planning on dispatching 
> jobs to it.

I have constructed several clusters using HA.
I believe Joe Landman has also - as you are in the States why not give 
some thought to contacting Scalable and getting them to do some more 
detailed designs for you?


For HA clusters, I have implemented several clusters using Linux-HA and 
heartbeat. This is an active/passive setup, with a primary and a backup 
head node. On failover, the backup head node starts up cluster services.
Failing over SGE is (relatively) easy - the main part is making sure 
that the cluster spool directory is on shared storage.
And mounting that share storage on one machine or the other :-)

The harder part is failing over NFS - again I've done it.
I gather there is a wrinkle or two with NFS v4 on Linux-HA type systems.


The second way to do this would be to look at using shared storage,
and using the Gridengine queue master failover mechanism. This is a 
different approach, in that you have two machines running, using either 
a NAS type storage server or Panasas/Lustre. The SGE spool directory is 
on this, and the SGE qmaster will start on the second machine if the 
first fails to answer its heartbeat.


ps. 1U boxes? Think something a bit bigger - with hot swap PSUs.
You also might have to fit a second network card for your HA heartbeat 
link (link plural - you need two links) plus a SCSI card, so think 
slightly bigger boxes for the two head nodes.
You can spec 1U nodes for interactive login/compile/job submission 
nodes. Maybe you could run a DNS round robin type load balancer for 
redundancy on these boxes - they should all be similar, and if one stops 
working then ho-hum.

pps. "when the spare is not in use dispatching jobs to it"
Actually, we also do a cold failover setup which is just like that, and 
the backup node is used for running jobs when it is idle.




-- 
      John Hearns
      Senior HPC Engineer
      Streamline Computing,
      The Innovation Centre, Warwick Technology Park,
      Gallows Hill, Warwick CV34 6UW
      Office: 01926 623130 Mobile: 07841 231235



More information about the Beowulf mailing list