[Beowulf] Re: Cooling vs HW replacement

Fri Jan 21 12:51:12 PST 2005

> Humans don't live a megahour MTBF.  Disks damn sure don't.

that's an attractive analogy, but I think it misses the fact that 
a disk is mostly in a steady-state.  yes, there's a modest process
of wear, and even some more exotic things like electromigration.
but humans, by contrast, are always teetering on the edge of failure.
I'm tiping back in my chair right now, courting a broken neck.
I'm about to go out for my 4pm latte, which requires crossing a street.
none of my disks are doing foolish and risky things like this - 
most of them are just sitting there, some not even spinning, most 
occasionally stirring themselves to glide a head across the disk.
I at least, think of a seek as about as stressful as taking a breath
(which is not to deny that my breaths and a disks seeks are both,
eventually, going to come to an end...)

one of my clusters has 96 nodes, each with a commodity disk in it.
10^6/(24*365.2425) = 114.07945862452115147242 years for each disk,
and 1.18832769400542866117 years for the whole cluster.  since the 
cluster has good cooling, and the disks not much used, I only expect
about 1.2 failures per year.

we're about to buy a cluster with 1536 nodes; assuming the new machineroom
being built for it works out, we should expect about 1 failure per month.

fortunately, I favor disk-free booting (PXE, NFS root), so even if we 
have 10x the failure rate, and it takes a week to replace each disk,
we shouldn't have any kind of problem.

another new facility will be 200TB of nearline storage.  if we did it 
with 1.4e6 hr, 147GB SCSI disks, I'd expect to go 1022 hrs between failures.
I'd prefer to use 500 GB SATA disks, even if they're 1e6 hrs, since that
will let me go 2500 hours between failures (not to mention saving around 
5KW of power!)

regards, mark hahn.