[Beowulf] real hard drive failures

Mark Hahn hahn at physics.mcmaster.ca
Tue Jan 25 14:26:36 PST 2005

> > I'm only partially interested in the thread "Cooling vs HW replacement" but 
> > the problem with drive failures is a real pain for me. So, I thought I'd 
> > share some of my experience.
> i'd add 1 or 2 cooling fans per ide disk, esp if its 7200rpm or 10,000 rpm
> disks 

I'm pretty dubious of this: adding two 50Khour moving parts to 
improve the airflow around a 1Mhour moving part which only dissipates
10W in the first place?  designing the chassis for proper airflow 
with minimum fanage is obviously smarter and probably safer.

> 	- if downtime is important, and should be avoidable, than raid
> 	is the worst thing, since it's 4x slower to bring back up than
> 	a single disk failure

eh?  you have a raid which is not operational while rebuilding?

> 	- raid will NOT prevent your downtime, as that raid box
> 	will have to be shutdown sooner or later 
> 	( shutting down sooner ( asap )  prevents data loss )

huh?  hotspares+hotplug=zero downtime.

but yes, treating whole servers as your hotspare+hotplug element is 
a nice optimization, since hotplug ethernet is pretty cheap vs 
$50 hotplug caddies for each and every disk ;)

