[Beowulf] Surviving a double disk failure
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSat Apr 11 19:38:16 PDT 2009
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Stuart Midgley wrote: > Thanks to all the responses, it has been interesting reading. We have > started using raid6 on newer servers and will slowely get rid of our old > raid5 servers. > > I found the comments about scrubbing very interesting. What do people > do with their file systems? We couldn't afford the reduced performance Software RAIDs (our DeltaV) are scrubbed once a week. Hardware raids are scrubbed also once a week. Basically errors can accumulate. Scrubbing isn't perfect, and as Michael and others have pointed out, there can be bugs. But honestly, I am of the opinion that the several hours of scrubbing which results in reduced performance, are a heck of a lot better than dealing with down time due to an "event". Scrubbing occurs in the background, and you can limit its impact. > and time for scrubbing. We run our Lustre setup almost flat out all the > time. We regularly do over a PB of io in a week (we often have our > total throughput at ~3GB/s for weeks on end). We use lustre as our > scratch space so backups are not possible. Nothing could get the data > off fast enough between us creating/using/deleting it. > > Of course, the fact that we basically run at 95% full all the time is as > good as scrubbing :) Not quite ... Scrubbing is a bit more of a structured testing and repair. The I/O may leave coverage holes ... even at 95% capacity. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Surviving a double disk failure
- Next message: [Beowulf] Surviving a double disk failure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
