[Beowulf] how cluster's storage can be flexible/expandable?

Duke Nguyen duke.lists at gmx.com
Sun Nov 11 19:50:36 PST 2012


On 11/9/12 7:26 PM, Bogdan Costescu wrote:
> On Fri, Nov 9, 2012 at 7:19 AM, Christopher Samuel
> <samuel at unimelb.edu.au> wrote:
>> So JBODs with LVM on top and XFS on top of that could be resized on
>> the fly.  You can do the same with ext[34] as well (from memory).

We also thought of using LVM on top of RAID disks, but never think of 
XFS. Why do we need XFS and how does this compare with GPFS?

> It also works with hardware external RAID systems, I've done it ~5
> years ago - the key is firmware support in the RAID system. Swapped
> disks one by one, allowing one to be fully rebuilt before the next one
> is changed; here it helps if the firmware allows one disk to be a
> perfect copy of another, otherwise you just treat it as a failed disk
> which needs to be reconstructed. Once each larger disk is in, the
> volume is enlarged on the fly; the firmware will do a rebuild using
> (hopefully :)) only the disk areas which were not previously used. So
> far it is all done on the RAID system, the host computer doesn't know
> anything about it. Afterwards, the kernel needs to be informed that
> the volume has grown; IIRC this has required a rescan of that
> particular SCSI target. And finally the FS (I used ext3 at the time)
> needs to be enlarged (using resize2fs). All without unmounting, users
> noticed only that the FS suddenly became larger :)

The RAID software (megaRAID) also states that new disks can be added on 
the fly, but I have no idea if the new disks can also be formatted and 
ready together with the available storage. Everything is still very new 
to me.

>
> The access will be slowed down throughout the whole process, as data
> needs to be copied between disks (during disk swapping phase), RAID
> volume reconstructed (during volume expansion phase) and FS enlarged
> (which for ext3 means creating extra inodes, etc.; for FS without
> fixed nr. of inodes this phase will probably be very short). The
> process will also take long... IIRC after each disk was inserted it
> took about 10h for it to be fully integrated, so I was able to
> exchange 2 disks/day; this can be different nowadays due to different
> disk sizes, disk read-write speed and controller speed. Up to you to
> decide whether it makes sense to do it this way or it becomes easier
> to declare a downtime :)
>
>> Then there are things like Panasas where you can buy more shelves and
>> add them to the bladeset and expand that way.
> ... but the expansion of volumes requiring a rebalancing of the
> objects distribution will also slow down the access. There's no magic
> bullet :)
>
> Cheers,
> Bogdan
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>




More information about the Beowulf mailing list