disadvantages of a linux cluster

Robert G. Brown rgb at phy.duke.edu
Wed Nov 6 06:18:38 PST 2002

On Tue, 5 Nov 2002, Mark Hahn wrote:

> > 256-processor Intel clusters (home grown apps). We run in parallel with
> > MPI Pro and Cluster Controller and Windows 2000. Reliability is 5-nines;
> > manageability tools have helped us to reduce systems administration
> > costs/staff.	
> so what would be the list price of that software?  do you have any
> data on how reliability would compare with a linux approach?
> also, .99999 is impressive, only 5 minutes a year; how long have 
> you had the cluster?  is that .99999 counted for all nodes,
> or do you mean "at least some nodes worked for .99999 of the time"?
> if you really mean that the sum of all downtime (across all 256 nodes)
> is 5 minutes/year, that's truely remarkable!

I agree.  In fact, hardware alone is a lot less reliable than that.
You've been amazingly lucky.  Even with Dell hardware we've never gone a
year without some sort of hardware failure that involved a day or so of
downtime (or expensive onsite service contracts and/or lots of spare
parts sitting around), and one day contains 1440 minutes, or more than
five minutes per node for 256 nodes.  Just diagnosing a failed part
(like a bad memory DIMM or crashed disk or burned motherboard) usually
takes a few hours.  So you've either really got (effectively) 258
systems with a couple of them functioning as more-or-less-hot spares or
have had phenomenally good luck.

If the latter, you might try computing your uptime including the hot


> thanks, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list