[Beowulf] Cooling vs HW replacement
alvin at Mail.Linux-Consulting.com
Tue Jan 18 13:26:23 PST 2005
hi ya luc
On Tue, 18 Jan 2005, Luc Vereecken wrote:
> The first summer I had a failure rate of over 60%. Some motherboards
the normal failure rate is say 5% or so for first 30 days or first year..
- if you lose too much more systems, than it's a vendor parts
problem ( where you or they get their parts to build systems )
> failed, plenty of powersupplies failed, I had 10 brandnew disks that ran so
> hot at times i couldn't put my hand on them at these ambient temperatures.
> 5 of them failed in the first 6 months, the other 5 a few months later.
the disks should be coool to the touch ... say no more than 30C
for its operating temp ( hddtemp seems to be good measure )
- silly things like a $3 or $15 fan will keep a disk from
failing, and use 2 of um to avoid single fan failure problem
yyp.. after an AC failure, lots of disks will die within 2-3 months
if some died during the ac failure
More information about the Beowulf