hard disk reliability
Bob Cat
ClusterHack@snet.net
Sat, 5 Jun 1999 04:44:59 -0400
> > 5) I think we'll find that power supply and CPU fans are the most common
> > failure points.
>
> But if you buy good ones, I have proof that they rarely fail. As I
> said before, out of my 290 machines (1/3 2 years old, most of the rest
> 4 months old), I haven't had a single power supply or cpu fan failure.
I have seen Micron, HP, and other "name-brand PC" fans and supplies fail.
Out of 3 Microns installed, I had 1 bad fan and 2 bad sectors on 1 HD - I submit
personal experience is not a sufficient indicator of reliability.
(I would still recommend Micron)
Perhaps it was because they were in an office environment, not a nice clean lab?
I'm curious about the MTBFs for fans and power supplies - are they greater than those for disks?
Have you looked inside one of the "commodity" supplies? Ecchh!
BTW, dying supplies often take other components with them.
> > service multiple nodes.
> Where's your proof? I dislike multi-node anything because it reduces
> reliability.
If a single node fails while working on a sufficiently fine-grained parallel problem,
won't that stop the entire run anyway? Coarse-grained is different, of course.
But:
If I have X components, I divide the MTBF by X.
Halve the components and you double reliability. (assuming same MTBF).
I *said* statistics are fun.
:ßobÇat.Bat 1.0 >^^< In base(one half) an infinite number approaches unity.
Echo f b800:0000 fff 32 00 e1 09 6f 0f 62 0f 80 04 61 0f 74 0f 32 00 > Bob.Cat
Echo q >> Bob.Cat
DeBug < Bob.Cat > Nul
@Erase Bob.Cat > Nul