[Beowulf] 32 nodes cluster price

Bill Rankin wrankin at ee.duke.edu
Sun Oct 7 08:47:59 PDT 2007


On Oct 5, 2007, at 4:17 PM, Leif Nixon wrote:

> "Geoff Galitz" <geoff at galitz.org> writes:
>
>> Why do you automatically distrust hardware raid?
>
> To some extent I share Mark's sentiment. I certainly trust the
> Linux kernel more than the firmware in a cheap raid controller.

Let me offer up a somewhat concrete example of a problem with  
hardware raid.

A local group around here kept some Very Important Data on a hardware  
raid array.  Due to several factors, a backup was not made of certain  
data.  The device lost a drive and started an automagic rebuild on  
one of the hot spares.  The sudden beating that the other drives took  
(because of the rebuild) caused a second hard drive to fail (always a  
concern with RAID5).

Since the data was not fully backed up, the drives were sent out for  
a Very Expensive Recovery.  Most of the data was recovered but once  
the drives were reinstalled in the enclosure, the hardware raid could  
not be made to understand that all the drives were now okay.  It  
essentially got itself into an unrecoverable state that could not be  
changed by us mere mortals (since data formats and such on hardware  
raid tend to be proprietary).  So the entire array had to be sent out  
for another Even More Expensive Recovery to get the data back.

Now while this is kind of a "perfect storm" in turns of hardware and  
data failure, it does illustrate the extent of control that you give  
up when going with a hardware raid solution.  I think that the higher  
end vendors (ie. NetApp, EMC, et al) have their reliability up to the  
point where this is much less of a risk.  But for the low-end beer  
budget cluster, software raid is probably still the way to go.  As  
for the "mid-tier" vendors, I would be very cautious and pay close  
attention to the worst case data lose scenario.

Good luck,

-bill

  



More information about the Beowulf mailing list