[Beowulf] real hard drive failures
kinghorn at pqs-chem.com
Tue Jan 25 08:16:33 PST 2005
I'm only partially interested in the thread "Cooling vs HW replacement" but
the problem with drive failures is a real pain for me. So, I thought I'd
share some of my experience.
I do clusters for computational chemistry and every node has two drives raid
striped for scratch since some comp chem procedures require huge amounts of
scratch space. Our older systems were typical rack mounts but overt the last
year and a half we have used a custom chassis with better cooling ...
We have used mostly Western Digital (WD) drives for > 4 years. We use the
higher rpm and larger cache varieties ...
We also used IBM 60GB drives for a while and some of you will have experienced
that mess ... approx. 80% failure over 1 year time frame!
Observations on WD drive failures: (estimates)
WD 20, 40, 60 GB drives in the field for 3+ years, [~600 drives] very few, (
<1%) failures most machines have been retired.
WD 80GB drives in the field for 1+ years, [~500 drives] "ARRRRGGGG!" ~15%
failure and increasing. I send out 3-5 replacement drives every month.
WD 120 and 200GB SATA in the field <1 year, [~400 drives] one failure so far.
I'm moving to a 3 drive raid5 setup on each node (drives are cheap, down time
is not) and considering changing to Seagate SATA drives anyone care to offer
opinions or more anecdotes? :-)
Best wishes to all
Dr. Donald B. Kinghorn Parallel Quantum Solutions LLC
More information about the Beowulf