[Beowulf] 32 nodes cluster price
wrankin at ee.duke.edu
Sun Oct 7 08:47:59 PDT 2007
On Oct 5, 2007, at 4:17 PM, Leif Nixon wrote:
> "Geoff Galitz" <geoff at galitz.org> writes:
>> Why do you automatically distrust hardware raid?
> To some extent I share Mark's sentiment. I certainly trust the
> Linux kernel more than the firmware in a cheap raid controller.
Let me offer up a somewhat concrete example of a problem with
A local group around here kept some Very Important Data on a hardware
raid array. Due to several factors, a backup was not made of certain
data. The device lost a drive and started an automagic rebuild on
one of the hot spares. The sudden beating that the other drives took
(because of the rebuild) caused a second hard drive to fail (always a
concern with RAID5).
Since the data was not fully backed up, the drives were sent out for
a Very Expensive Recovery. Most of the data was recovered but once
the drives were reinstalled in the enclosure, the hardware raid could
not be made to understand that all the drives were now okay. It
essentially got itself into an unrecoverable state that could not be
changed by us mere mortals (since data formats and such on hardware
raid tend to be proprietary). So the entire array had to be sent out
for another Even More Expensive Recovery to get the data back.
Now while this is kind of a "perfect storm" in turns of hardware and
data failure, it does illustrate the extent of control that you give
up when going with a hardware raid solution. I think that the higher
end vendors (ie. NetApp, EMC, et al) have their reliability up to the
point where this is much less of a risk. But for the low-end beer
budget cluster, software raid is probably still the way to go. As
for the "mid-tier" vendors, I would be very cautious and pay close
attention to the worst case data lose scenario.
More information about the Beowulf