[Beowulf] Big storage

Loic Tortay tortay at cc.in2p3.fr
Wed Sep 5 06:46:39 PDT 2007

According to Bruce Allen:
> >>
> > This is (in my opinion) probably the only real issue with the X4500.
> > The system disk(s) must be with the data disks (since there are "only"
> > 48 disks slots) and the two bootable disks are on the same controller
> > which effectively make this controller a single point of failure (there
> > are easy ways to move the second system disk to another controller, but
> > you still need a working "first" controller to boot).
> Can you boot from a USB device?  You can have an inexpensive RAID-1 USB
> device for the root and OS.
You can boot from a USB device, there are 4 ports available (2 on the
front side, 2 on the back).

We booted a machine from an external DVD drive (there are also virtual
floppy and DVD drives available through the service processor).

> > Although in our experience, controller failures are rare on the X4500
> > (one failure in over a year with a few tens of X4500).
> Did you lose data with a controller failure?  I assume can you just move
> the 48 disks to another box.
We did not loose any data due to the controller failure.

The problem occured a few days before a scheduled downtime, the
mainboard was replaced during the downtime and the machine rebooted
just fine.

Even if we hadn't been close to a scheduled downtime, the applications
running on most of our X4500 are fault tolerant enough that we can
offline a machine for some time without a significant impact.

> It will take me some time to digest your other comments.  But I made a
> mistake in what I wrote.  I want to have a 48 disk box with 500 GB disks.
> >From this (raw) 24 TB of storage I want to get 20 TB usable (eg, lose no
> more than 8 disks of the 40 for redundancy and the OS).  I mistakenly
> wrote 20/24 disks and 10 TB in my email.  How would you revise your
> recommendations for 20TB of usable storage?
With 48 disks, there are also many different possible configurations.

The default one (which only remains if you use the bundled Solaris
installation), is quite good and gives globally good results.
It's obvious that Sun has given a lot of thought to this.

If you really want 20 TB (10*2^41 bytes) of usable space, then you
either need to:
 . wait until Sun provides 750 GB or 1 TB disks (750 GB should be
   available soon if I'm not mistaken);
 . use a less redundant configuration that will not make the machine
   resistent to controller failures and probably less resistant to
   disks failures.

We have an actual usable space of 16.9 TB on our machines (we mostly
use a minor variation of the Sun layout).

The largest possible usable space you can get from a X4500 with 48x500
GB disks, two system disks and "some" redundancy is 19.6 TB.  But this
is certainly NOT a configuration you want to use:
                |                Controllers                    |
                |   c5     c4      c7      c6      c1      c0   |
    ^       7   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
    |    -------+-----------------------------------------------+
    |       6   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
    |    -------+-----------------------------------------------+
    |       5   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
    |    -------+-----------------------------------------------+
    D       4   |  Sys2 |  v1   |  v1   |  v1   |  v1   |  v1   |
    i    -------+-----------------------------------------------+
    s       3   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
    k    -------+-----------------------------------------------+
    s       2   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
    |    -------+-----------------------------------------------+
    |       1   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
    |    -------+-----------------------------------------------+
    |       0   |  Sys1 |  v2   |  v2   |  v2   |  v2   |  v2   |

That's two "raidz1" (single parity) vdevs of 23 disks (2 x 22+P).

This a very bad idea if you consider basic best practices, Sun
engineers recommendations and, of course, current hardware reliability
as outlined in the (previously mentionned) article by Bianca Schroeder
and Garth Gibson.

If you want roughly 20 TB but can cope with less, then I suggest you
use the Sun configuration or one of its minor variation: moving the
second system disk and/or having 7 identically sized vdevs instead of 6
(7 x 5+P + 1 x 3+P instead of 6 x 5+P + 2 x 4+P).

We have tested about 25 different ZFS configurations with various I/O
workloads and unless you're willing to sacrifice available space or
data security, the Sun layout is the best balanced and also gives
good or acceptable performance for most workloads.

| Loïc Tortay <tortay at cc.in2p3.fr> -     IN2P3 Computing Centre     |

More information about the Beowulf mailing list