[Beowulf] Big storage

Mon Apr 14 10:36:18 PDT 2008

Dear Xavier,

Thank you for the nice and detailed response.  This is very useful 
'intelligence' particularly regarding the reliability of the controllers 
and IMPI firmware problems.

Have you ever lost data on the X4500 systems?

Would it be possible to get a (private) copy of your Jumpstart config file 
and the custom install scripts?  Reading these and modifying them will 
probably be quite a bit quicker than developing our own from scratch.

I am traveling at the moment (just picking up email from airport wireless) 
so I will need a bit of time to absorb everything below.  I'll probably 
write back with a few other questions and comments later.

(PS: do you see any good reason NOT to boot the boxes with root on an 
NFS-exported file system? To me this makes sense, as it would permit an 
'all-ZFS' and symmetric disk configuration.)

Cheers,
 	Bruce

PS: Loic: thanks for passing this on to your colleauge!

PPS: We've also been doing some experiments with putting OpenSolaris+ZFS 
on some of our generic (Supermicro + Areca) 16-disk RAID systems, which 
were originally intended to run Linux.

On Mon, 14 Apr 2008, Xavier Canehan wrote:

> Bruce Allen wrote:
>> Hi Loic,
>>
> Hello,
>
> As I am one of Loic's collegue and in charge of Thumpers, I'm answering
> there.
>
>> (I'm picking up a 6-month old thread.)
>>
>> On Wed, 5 Sep 2007, Loic Tortay wrote:
>>
>>> As of today we have 112 X4500, 112U are almost 3 racks which is quite
>>> a lot due to our floor space constraints.
>>
>> We're now doing the comissioning of our new cluster which includes about
>> 30 Linux storage server boxes of ~10TB each and (only!) 12 Sun X4500.
>>
> We now have 146 X4500. 3 machines are the spare/developpement kit, 2 are
> running Linux (Scientific Linux 4.5 x64), the remainder is running Solaris
> 10 (Update 3 with a recent patch level).
>
>> We have cloning scripts set up (Debian FAI) to automatically build the
>> Linux boxes, but are less familiar with cloning in a Solaris
>> environment. What method do you use to install and OS and patches onto
>> your X4500s and ensure a homogenous environment.  If it's Sun's
>> Jumpstart, would it be possible to see the config and Jumpstart file(s)
>> that you use for this?
>>
> We use JumpStart for the system installation with several custom scripts
> launched automatically after the "post_install" (which does as little as
> possible).
>
> Everything (OS patches, software installation & patches, storage
> configuration, system services configuration) is done automatically except
> the actual storage applications (dCache, Xrootd, SRB or HPSS) installation
> and configuration which are left to the applications administrators.
>
> The system configuration is homogenous simply because the installation is
> done with the same set of scripts for all Solaris installations (with
> machine, service & hardware specific configuration files).
> We only have ~2 Solaris admins (< 1 FTE), things have to be kept simple.
>
>> Since stock Solaris can not boot from ZFS, I'm a bit reluctant to throw
>> away drives and storage space to host the OS separately on each X4500.
>> One attractive alterative is to NFS-boot the Thumpers from a single
>> central OS image.  Have you tried this yourself?  Where do you boot your
>> X4500 systems from?
>>
> We still use two of the internal disks for the system (in software RAID-1).
> This served us once in less than 40 hardware incidents.
>
> We have not tried to NFS-boot the X4500 for daily operation (only for
> install and once for rescue).
>
>> I'd be grateful for any advice, anecdotes, war stories, etc.
>>
> The Marvell SATA controllers in the X4500s are much more reliable than we
> initially expected (one failure in 876 controllers over 18 months). So there
> is no need to be over-cautious and configure the zpools as we first did
> (i.e. in the Sun default configuration with 8 security disks).
> We now use a configuration with only 6 security disks which gives an extra
> terabyte.
>
> The war story is still raging.
> The (Linux based) service processor firmware in the X4500s (and apparently
> other Sun X4x00 servers) may become unusable after some time.
> This is an issue because when the service processor becomes unusable, IPMI
> no longer works (no serial-over-lan console, no ipmitool, etc.)
> At that point, more often than not, the OS eventually becomes (very)
> unresponsive. The only solution is then to unplug all power supplies (!)
> The current Sun supported work-around is to reboot the SP every 30 days,
> 60 at most. This should not be a big deal but sometimes it triggered a
> reboot
> of the X4500 as well. As This Should Not Happen, Sun is actively
> investigating,
> suspecting a related but different issue.
> There is supposedly a corrected firmware for this but it's scheduled to be
> released "sometime" in the future (current version is 2.0.2.1).
>
> As for anecdotes, the X4500s we have survived to two catastrophic power
> failures (all machine room instantly quiet): once due to a human error and
> once to an almost melting/melted power generator after a power cut.
> First one stroke as 46 machines were in production: we lost only one hard
> drive over 2208. Last one passed almost unnoticed. Almost.
>
> Good point for support which training is now perfect. We do not encounter
> people asking to check the Hitachi Disk Array connected to the Sun server.
>
> Most of the issues we've had with the X4500 have been forwarded to Sun
> engineering and it seems that several of these remarks were taken into
> account for the next generation of X4500 (of course, several other X4500
> users provided input too).
>
>
> We are planning to set up a tender to get at least this volume of usable
> storage this year. Thumpers may be good candidates.
>
>
> X.
> --
> | Xavier Canehan <Xavier.Canehan at in2p3.fr <tortay at cc.in2p3.fr>> - IN2P3
> Computing Centre |
>