[Beowulf] Re: failure trends in a large disk drive population

David Mathog mathog at caltech.edu
Fri Feb 16 12:50:49 PST 2007


Eugen Leitl <eugen at leitl.org> wrote:

> http://labs.google.com/papers/disk_failures.pdf

Interesting.  However google apparently uses:

  serial and parallel ATA consumer-grade hard disk drives,
  ranging in speed from 5400 to 7200 rpm

Not quite clear what they meant by "consumer-grade", but I'm assuming
that it's the cheapest disk in that manufacturer's line.  I don't
typically buy those kinds of disks, as they have only a 1 year
warranty but rather purchase those with 5 year warranties.  Even
for workstations.

So I'm not too sure how useful their data is.  I think everyone here
would have agreed without the study that a disk reallocating blocks and
throwing scan errors is on the way out.  Quite surprising about the
lack of a temperature correlation though.  At the very least I would
have expected increased temps to lead to faster loss of bearing
lubricant.  That tends to manifest as a disk that spun for 3 years
not being able to restart after being off for a half an hour.  
Presumably you've all seen that. If they have great power and systems
management at their data centers the systems may not have been
down long enough for this to be observed.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list