[Beowulf] Big storage

Xavier Canehan canehan at gmail.com
Tue Apr 15 07:17:21 PDT 2008

On Mon, Apr 14, 2008 at 7:36 PM, Bruce Allen <ballen at gravity.phys.uwm.edu>

> Have you ever lost data on the X4500 systems?


Considering only solaris boxes in production, 10 for HPSS are based on a SVM
disk configuration when 131 use ZFS (101 dCache, 19 Xrootd, 11 SRB).
We never had a big enough problem on SVM configuration to lose anything (or
detect any corruption, to keep in line with previous thread posts).

For ZFS, we never lost a single bit.
We even tried active corruption of ZFS filesystem by writing to disk: we did
not achieve to lose anything.

About boot disk failure, even if it is not a X4500 specific issue: the worst
we had to go through has been the process of restoring boot_archive after
crash, with SVM mirrors out of sync. Situation was similar to the one
depicted by Tim Bradshaw here: <

Would it be possible to get a (private) copy of your Jumpstart config file
> and the custom install scripts?  Reading these and modifying them will
> probably be quite a bit quicker than developing our own from scratch.

Sure. I'll send them in private after review.

(PS: do you see any good reason NOT to boot the boxes with root on an
> NFS-exported file system? To me this makes sense, as it would permit an
> 'all-ZFS' and symmetric disk configuration.)

Seems a good idea, but having standalone servers minimize dependancies. You
don't want to get several dozens of stale TB because of a NFS server
Moreover, in our case the need of local filesystems, not mentionning swap
area, was against it. We had to use UFS filesystem for dCache local cache
directory. We also use OpenAFS and did not try a memcache configuration (our
configuration may not fit well a in "standalone" definition).

You can use a disk slice as a ZFS vdev. However, if you have to format a
single disk, then your configuration will not be symmetric. So we went
straight to mirroring the 2 boot disks.

I'd be gratefull to get returns after your testing.

> PPS: We've also been doing some experiments with putting OpenSolaris+ZFS
> on some of our generic (Supermicro + Areca) 16-disk RAID systems, which were
> originally intended to run Linux.

I think that DESY proved some data corruption with such configuration, so
they switched to OpenSolaris+ZFS. I can't give further details, but can look
for a contact, should you need them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080415/d865d280/attachment.html>

More information about the Beowulf mailing list