[Beowulf] Software Raid
Robert G. Brown
rgb at phy.duke.edu
Tue Dec 13 15:57:42 PST 2005
On Mon, 12 Dec 2005, Paul wrote:
> I read in a post somewhere that it was not possible to use a Linux software
> RAID configuration for shared file storage in a cluster. I know that it is
> possible to use software RAID on individual compute nodes but the post stated
> that software RAID would not support properly support simultaneous accesses
> on a file server. Is this true?
No, this is not true. Plenty of linux people use software RAID NFS
servers to manage entire departments and shared project space across a
cluster. It works, and -- up to a point -- it works quite well. It CAN
require some tuning (e.g. a large enough number of nfsd's) and of course
is subject to various physical and electronic limitations in performance
-- on a slow network it won't suddenly get fast, it has to cope with
disk seeks and latency, and it is probably a good idea not to use your
NFS server as a compute engine at the same time or run it with marginal
memory. Still, for many many purposes this is an inexpensive workhorse
configuration. I use it myself at home, we use it at Duke on a
department-wide basis (probably order of 200 clients) with complete
satisfaction. I don't know how it would scale to 1000 clients, and I
don't know how it would perform if it were being hammered with a ton of
small packet traffic or with simultaneous accesses of large files that
vastly oversubscribed its resources. YMMV. The usual.
> Assuming that hardware RAID is required (or at least preferable) I was
> wondering if the built in RAID on some motherboards would be adequate or do
> we need to look into a dedicated piece of hardware. We will have about 10 -
> 12 cpus initially that will be connected with giganet network. We currently
> have about a terrabyte of storage space and are planning to mount it using
> NFS in a RAID 5 configuration. Our applications for now will be database
> intensive bioinformatics apps. I would be very interested in any comments.
This is a delicate question. I personally don't think that hardware
RAID is called for or necessary in this configuration and would urge you
to try an md raid solution first. A TB soft RAID is pretty easy to
build this year -- a stack of 4-6 SATA or EIDE drives at ~$0.50 TB, a
box, a suitable case (with e.g. dual power and/or hot swap as desired
and supported by the hardware). I'm guestimating less that $2K for a
solid configuration, less than $2.5K for a gold-plated configuration
(and as low as $1200 in a lowball homemade COTS configuration).
In the past we've been less than thrilled by so-called RAID cards -- in
many cases they underperformed md-raid. Sometimes by a lot.
You still have various interface questions to settle, mind you, and
should proceed cautiously. However, another nice thing about a soft
RAID is that you can buy the components, build one for testing, and if
it doesn't work well enough for your application at the desired scale
you can just use the components in something else -- e.g. compute nodes.
I'm not familiar enough with bioinformatics apps to know if this is
reasonable for your case, but on this list somebody will know -- maybe
Joe, for example.
> Paul Mc Kenna
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf