disadvantages of a linux cluster

Paul Redfern red at tc.cornell.edu
Wed Nov 6 08:25:59 PST 2002

-----Original Message-----
From: Mark Hahn [mailto:hahn at physics.mcmaster.ca] 
Sent: Tuesday, November 05, 2002 7:32 PM
To: Paul Redfern
Cc: glockner at enscpb.fr; beowulf at beowulf.org
Subject: Re: disadvantages of a linux cluster

> 256-processor Intel clusters (home grown apps). We run in parallel
> MPI Pro and Cluster Controller and Windows 2000. Reliability is
> manageability tools have helped us to reduce systems administration
> costs/staff.	

so what would be the list price of that software?  do you have any
data on how reliability would compare with a linux approach?
also, .99999 is impressive, only 5 minutes a year; how long have 
you had the cluster?  is that .99999 counted for all nodes,
or do you mean "at least some nodes worked for .99999 of the time"?

if you really mean that the sum of all downtime (across all 256 nodes)
is 5 minutes/year, that's truely remarkable.
thanks, mark hahn.

On our first 256-processor Dell cluster, we worked with Intel who
provided a special service that collected all machine errors, both
hardware and software, every night. Intel collected the error logs and,
at regular intervals, took the tags off them and sent them to MIT for
independent analysis. The first four months (initial period of analysis)
with Windows 2000 Advanced Server, MIT reported 99.9986% uptime.  Since
then, the machine got hardened and reliability for it and our other
clusters, has gotten better, not worse.  We've been operating Windows
2000 clusters since the server OS was first introduced. Typically
outages are handled in less than ten minutes on one node with spare
memory and hard drives. Outages don't affect the overall cluster; the
scheduler works around it, and the cluster continues to run. The HPC
manager at Microsoft recently made available special pricing/package for
HPC clusters through the OEM channel (such as Dell, HP, IBM, Hitachi)
that closes the gap in pricing between windows-based and linux-based
bundles. I'd be happy to introduce you to the appropriate people at
Microsoft or Dell.

Paul Redfern
Cornell Theory Center
Tel 607-254-8693
red at tc.cornell.edu

More information about the Beowulf mailing list