disadvantages of a linux cluster

Wed Nov 6 13:37:35 PST 2002

> Paul Redfern:
> On our first 256-processor Dell cluster, we worked with Intel who
> provided a special service that collected all machine errors, both
> hardware and software, every night. Intel collected the error logs and,
> at regular intervals, took the tags off them and sent them to MIT for
> independent analysis. The first four months (initial period of analysis)
> with Windows 2000 Advanced Server, MIT reported 99.9986% uptime.

OK, I take that as a "yes, all I meant is that we had at least two
working machines .99999 of the time".  obviously, my stricter interpretation
is impossible, since .99999 means 105 seconds in 4 months, and I expect
that even a single reboot would take almost that long.

so, your .99999 must mean "for a total of 105 seconds in 4 months,
absolutely nothing worked.".  that's actually not good at all.

ah, how about this gem:
	"CTC achieved 99.9986 percent availability (as independently verified
	by Massachusetts Institute of Technology) during the first three
	months it ran Windows 2000 on Velocity, and by late 2001, CTC reached
	99.99999 percent across the entire machine room."
from http://www.dell.com/us/en/slg/topics/power_ps4q01-ctccase.htm
so this is just a pointless marketing number which tells us NOTHING about
factual costs, reliability or availability.

come to think of that, it's pretty hillarious: if they never had any
further outage after that 110 seconds, they'd only have to run 34
years without any outage in order to hit seven-nines ;)

> 2000 clusters since the server OS was first introduced. Typically
> outages are handled in less than ten minutes on one node with spare
> memory and hard drives.

OK.  but hot-sparing is not OS-specific, so this is irrelevant.

> Outages don't affect the overall cluster; the
> scheduler works around it, and the cluster continues to run. 

sure, a bog-standard failover/HA feature.

> The HPC
> manager at Microsoft recently made available special pricing/package for
> HPC clusters through the OEM channel (such as Dell, HP, IBM, Hitachi)
> that closes the gap in pricing between windows-based and linux-based
> bundles.

since the alternative costs zero, closing the gap is pretty easy ;)

from crunching a few of the numbers on CTC's website, it looks like 
their per-node prices are in the >$50k range, compared to something 
like $3-5K for the linux-whitebox approach.  to me, it looks like they
simply front-end load their costs to make sustaining costs appear low.