[Beowulf] Surviving a double disk failure
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Orion Poplawski orion at cora.nwra.comFri Apr 10 10:16:20 PDT 2009
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bill Broadley wrote: > Guy Coates wrote: >> Yikes, epic recovery. >> >>> What are the lessons learnt? >> You forgot the obvious one. > > I suggest ditching silly old centos/redhat kernels and run something new > enough to allow for scrubbing. So that all your disks don't silently start > collecting errors waiting to cascade into a lost RAID upon the first > non-silent error. As a stop-gap solution here I periodically use "smartctl -t long /dev/<blah>" on all the disks to check their status. I have a daily cron that does one disk a day on my 26 disk servers so each disk checks checked once a month. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
