[Beowulf] Cooling vs HW replacement

Alvin Oga alvin at Mail.Linux-Consulting.com
Tue Jan 18 13:26:23 PST 2005

hi ya luc

On Tue, 18 Jan 2005, Luc Vereecken wrote:

> The first summer I had a failure rate of over 60%. Some motherboards 

the normal failure rate is say 5% or so for first 30 days or first year..
	- if you lose too much more systems, than it's a vendor parts
	problem ( where you or they get their parts to build systems )

> failed, plenty of powersupplies failed, I had 10 brandnew disks that ran so 
> hot at times i couldn't put my hand on them at these ambient temperatures. 
> 5 of them failed in the first 6 months, the other 5 a few months later.

the disks should be coool to the touch ... say no more than 30C
for its operating temp ( hddtemp seems to be good measure )
	- silly things like a $3 or $15 fan will keep a disk from
	failing, and use 2 of um to avoid single fan failure problem

yyp.. after an AC failure, lots of disks will die within 2-3 months
if some died during the ac failure

c ya

More information about the Beowulf mailing list