[Beowulf] 32 nodes cluster price
bill at cse.ucdavis.edu
Sun Oct 7 12:54:28 PDT 2007
Geoff Galitz wrote:
> Why do you automatically distrust hardware raid?
Because they are low volume parts designed to handle failure modes in very
complicated environments. If you buy a hardware RAID card you very well could
have the only one on the planet with that exact config. Variables include
raid controller, hardware revision of the controller, which drives you have
(and their revision), the motherboard (and it's BIOS version), etc.
So when a drive fails in a strange way you might very well have a problem that
nobody else on the planet has had. Additionally you have to gain expertise in
the particular details, quirks, and bugs of that RAID controller.
The higher end RAID setups of course do not let you pick your own drives and
some even change the BIOS of the drives to they can guarantee that they have
tested the various failure modes. Of course the even higher end models put
the disks in their own enclosure so they can control 100% of the environment
of the drives including nasty little details like power quality, airflow/temp,
controller, vibration, etc.
I've seen a significant number of quirks in 3ware, storage works, areca, and
dell perc (lsi logic?) controllers. The related forums discuss the numerous
landmines related to their huge variety of options. In one particular case I
bought a 3ware 6800 (the then current high end 3ware) which was advertised as
supporting RAID-5 and ended up losing a filesystem, I called support, they
said upgrade the firmware. Which I did, and lost another filesystem. I
called back, they said oh try a newer driver... which I did, and lost another
filesystem. They then gave a nervous laugh and said "Yeah, they do that, we
recommend you buy the new 7xxx series, the 6800 wasn't really intended to run
raid-5. Software RAID worked fine.
Linux software RAID on the other hand is popular, free, robust, and has likely
already encountered any strange and wacky behavior from your motherboard,
revision of disks, brokenness from hardware. There's likely 1000 times as
many software RAIDs out in production as there are any particular RAID card,
RAID firmware, RAID driver, disk hardware, and disk firmware.
Additionally you have to buy TWO hardware raids, often you end up with
significantly less performance, and often the following questions
are rather hard to answer:
* Can I migrate a RAID to another machine?
* Can I split disks different partitions can be in different RAIDs
* Can I be emailed when the RAID changes state?
* Can I migrate the RAID to larger disks gradually (I.e. 2 250GB disks
to 2 500GB disks without having 4 slots/ports)
* Can I control RAID rebuild speed?
* Enable ECC scrubbing on my schedule?
* Can I migrate the RAID to completely different hardware to debug if it's a
RAID controller issue?
* Can I grow/shrink raid volumes as well as the disk used per drive?
Sure, they can be answered, but frankly it takes more time than I'm willing
to invest in the flavor of the month card, firmware, and linux kernel
driver. Especially since it's a small market there seems to be dramatic
differences in price/performance among those trying to gain market share.
Way back when it was adaptec, then 3ware was an upstart, then areca, and now
it seems like adaptec is making a big push with their newer 16 port ish
I've found linux software raid almost always faster than hardware RAID, much
more reliable, and pleasingly consistent. Uptimes on busy servers with UPS
are often over 500 days, even back when the linux uptime counter still
rolled. During a disaster I'd much rather debug and troubleshoot a software
RAID then trying to find one of the few experts in the world on some
particular hardware configuration.
More information about the Beowulf