[Beowulf] 32 nodes cluster price

Geoff Galitz geoff at galitz.org
Mon Oct 8 00:38:41 PDT 2007

I would argue that the situation you describe is a result of  that
particular RAID adapter or that particular make and model is just
inappropriate (no offense)

I have certainly seen lots of RAID arrays where multiple drives die at
approx the same time, but I find that usually:

- multiple drives die at the same time from the same production batch, and
my vendor replaces those drives with no questions asked
- the drives that died exceeded their production lifetime at approx the same

I'm not sure how a software RAID solution would work around that.

It is clear that most of the folks in the list favor software raid, but I
have worked with both hardware and software RAID and with different OS and
hardware vendors.  RAID setups are no different than any other component
should be spec'd carefully for their intended target, including the disks.

I do favor hardware RAID, myself.  I have never had any unexplained data
corruption or unresolved performance or recovery issues on my watch.  I tend
to favor hardware RAID due to the fact I can rely on a level of conformity
across an install base (if needed), more flexible admin tools, better
support (I specifically choose adapters from trusted vendors) and lower
admin overhead.  I could certainly choose a cheaper hardware RAID adapter
which would result in some of the problem I am trying to avoid... that is
where doing the research comes in.  I am one of those guys who like to move
as many operations closer to the hardware layer as possible.... but it doest

Having said that, I am also running software RAID in a medium scale
environment now (Redhat Linux and FreeBSD) and it works just fine (along
side our hardware RAID systems).  

I also observe that many vendors will fully populate a RAID-5 array and
create no hot-spare (DELL) or only one hot-spare.  I usually create two
hot-spares in order to give myself the wiggle room to run a medium-large
datacenter with a small staff.  No need to rush in if a single disk craps
out. That would also avoid any kind of "rebuild storm." The data component
is so important that I have never had a problem recommending a cluster with
such a configuration.

Just my two cents.



Let me offer up a somewhat concrete example of a problem with  
hardware raid.

A local group around here kept some Very Important Data on a hardware  
raid array.  Due to several factors, a backup was not made of certain  
data.  The device lost a drive and started an automagic rebuild on  
one of the hot spares.  The sudden beating that the other drives took  
(because of the rebuild) caused a second hard drive to fail (always a  
concern with RAID5).

More information about the Beowulf mailing list