[Beowulf] hardware RAID versus mdadm versus LVM-striping

Tony Travis a.travis at abdn.ac.uk
Tue Jan 19 13:30:22 PST 2010


Joe Landman wrote:
> [...]
> Not entirely correct.  SATA where the hot swap (bring device in/out) 
> logic is.  And it does (at least in modern kernels) support physical 
> removal/addition of devices.  The MD system itself is event driven.  You 
> can "automate" device removal/insertion into a unit, and rebuild the 
> RAID as needed ... to a degree.  The issue we run into is that 
> occasionally, we have to force a bus scan on the scsi buses to see new 
> SATA drives.  Once that is done, some of our other tools automate the 
> incorporation of the new disk within the RAID.

Hello, Joe.

The "sdhci" driver in the 2.6 kernel does not notify the kernel of a 
device change, neither does it flush the kernel buffers. Hot-swapping 
drives using the standard SATA driver is a great way to corrupt your 
disks, all it does on a SATA disconnect is try connecting again under 
the assumption that the same drive is attached but the data rate is too 
high for the cable - I have practical experience of this problem ;-)

I started off my quest to build a COTS RAID5 believing what you just 
said to be true, but I think there is a popular misconception about 
SATA: It's true that most modern SATA controllers do support hot-swap 
electrically, but SATA device drivers to my, albeit limited, knowledge 
do not notify the kernel that a device has been removed or added. The 
3ware 'twe' 'hardware' RAID driver does, in response to events from the 
RAID controller firmware that is monitoring the physical drives.

I've looked at the SATA driver sources quite carefully because I do want 
to use hot-swap with "md" if that is a *safe* and reliable thing to do. 
However, I am not confident that it is (yet!). Please correct me if I am 
wrong, because it would be very useful to be able to *reliably* hot-swap 
SATA drives on an "md" RAID. I bought a lot of 3ware 8006-2's because I 
don't trust "md" hot-swapping. The 8006-2 is well supported under Linux.

>[...]
> In the many RAID cases we have dealt with over the years, we haven't run 
> into this as an issue.  That is, while touted as a real tangible benefit 
> of MD RAID, it is of dubious real value in most of the cases we have 
> encountered.

I've dealt with quite a few cases myself, where we have upgraded 
motherboards (esp. Tyan) with completely different on-board RAID, with 
hit and miss support under Linux. Typically, I've replaced an old or 
faulty motherboard and left everything else as it was. It's because I 
was using "md" RAID's that this worked. Now I have a great big pile of 
3ware 8006-2's just in case, but I also use the on-board RAID 
controllers in SATA/AHCI mode to construct "md" RAID's.

I responded to Rahul who started this thread because his requirements 
seemed to be similar to mine: i.e. a small-scale DIY Beowulf cluster. In 
this context, every penny counts and we do not throw things away until 
they are actually dead: Old servers become new compute nodes, and so on. 
I think that lot of people reading this list are interested in running 
small Beowulf clusters for relatively small projects, like me. I've 
found the Beowulf list to be a mine of useful information, but we are 
not all running huge Beowulf clusters or supporting them commerically.

> Really the benefit is that of being against the change of business 
> conditions for your RAID vendor.  If you plan on keeping the same array 
> active until it dies (4-10 years), this could be a consideration. 
> However, you also have to worry about disk availability/compatibility, 
> etc.  That is, its not *just* a RAID card issue, its a full stack issue.

I agree, and I've been bitten by that for using 'enterprise' grade disks 
that are no longer available and ended up replacing faulty 250GB drives 
with 500GB drives just so I could rebuild the RAID after a disk failure. 
I've just repeated the trick replacing 500GB drives with 1TB. It's OK if 
the replacement drive is bigger, and you're using LBA so drive geometry 
doesn't matter.

> MD allows you to reduce the risk in various portions of this stack.

Indeed it does, but I think it would be better with reliable hot-swap!

Bye,

   Tony.
-- 
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt



More information about the Beowulf mailing list