[Beowulf] RE: real hard drive failures
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduWed Jan 26 08:25:23 PST 2005
- Previous message: [Beowulf] Problem with `mpi_init_' in MM5 MPP
- Next message: [Beowulf] help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > > - raid will NOT prevent your downtime, as that raid box > > > will have to be shutdown sooner or later > > > ( shutting down sooner ( asap ) prevents data loss ) > > > > huh? hotspares+hotplug=zero downtime. > > you're assuming that "hot plug" work as its supposed to > - i usually get the phone calls after the raid didnt > do its magic for some odd reason My impression, based solely on web research and not personal experience, is that RAIDs that don't rebuild are often suffering from "latent lost block" syndrome. That is, a block on disk 1 has gone bad, but hasn't been read yet, so that bad block is "latent". Then disk 2 fails. The RAID tries to rebuild and now tries to read the bad block on disk 1, gets a read error, and that's pretty much all she wrote for that chunk of data. The "fix" is to disk scrub, forcing reads of every block on every disk periodically, and so converting the "latent" bad blocks into "known" bad blocks at a time when the RAID still has sufficient information to rebuild a lost disk block. Also use SMART to keep track of disks which have started to swap out blocks and replace them before they fail totally. Deciding how many bad blocks is too many on a drive seems like it might be a fairly complex decision in a storage environment involving hundreds or thousands of disks. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Problem with `mpi_init_' in MM5 MPP
- Next message: [Beowulf] help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
