[Beowulf] Re: failure trends in a large disk drive population

Fri Feb 16 14:15:49 PST 2007

At 12:50 PM 2/16/2007, David Mathog wrote:
>Eugen Leitl <eugen at leitl.org> wrote:
>
> > http://labs.google.com/papers/disk_failures.pdf
>
>Interesting.  However google apparently uses:
>
>   serial and parallel ATA consumer-grade hard disk drives,
>   ranging in speed from 5400 to 7200 rpm
>
>Not quite clear what they meant by "consumer-grade", but I'm assuming
>that it's the cheapest disk in that manufacturer's line.  I don't
>typically buy those kinds of disks, as they have only a 1 year
>warranty but rather purchase those with 5 year warranties.

But this is potentially a very interesting trade-off, and one right 
in line with the Beowulf concept of leveraging cheap consumer gear...

Say you need 100 widgets worth of horsepower.  Are you better off 
buying 103 pro widgets at $500 and a 3% failure rate or 110 consumer 
widgets at $450 and a 10% failure rate.... $51.5K vs  $49.5K... the 
cheap drives win..  And, in fact, if the drives fail randomly during 
the year (not a valid assumption in general, but easy to calculate on 
the back of an envelope), then you actually get more compute power 
with the cheap drives (105 average vs 101.5 average over the year)

This also assumes that the failure rate is "small" and 
"independent"  (that is, you don't wind up with a bad batch that all 
fail simultaneously from some systemic flaw.. the bane of a 
reliability calculation)

One failing I see of many cluster applications is that they are quite 
brittle.. that is, they depend on a particular number of processors 
toiling on the task, and the complement of processors not changing 
during the "run".  But this sort of thing makes a 100 node cluster no 
different than depending on the one 100xspeed supercomputer.

I think it's pretty obvious that Google has figured out how to 
partition their workload in a "can use any number of processors" sort 
of way, in which case, they probably should be buying the cheap 
drives and just letting them fail (and stay failed.. it's probably 
cheaper to replace the whole node than to try and service one)...

James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875