Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Surviving a double disk failure

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Orion Poplawski orion at cora.nwra.com
Fri Apr 10 10:16:20 PDT 2009


Bill Broadley wrote:
> Guy Coates wrote:
>> Yikes, epic recovery.
>>
>>> What are the lessons learnt?
>> You forgot the obvious one.
> 
> I suggest ditching silly old centos/redhat kernels and run something new
> enough to allow for scrubbing.  So that all your disks don't silently start
> collecting errors waiting to cascade into a lost RAID upon the first
> non-silent error.

As a stop-gap solution here I periodically use "smartctl -t long 
/dev/<blah>" on all the disks to check their status.  I have a daily 
cron that does one disk a day on my 26 disk servers so each disk checks 
checked once a month.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com



More information about the Beowulf mailing list