[Beowulf] Re: failure trends in a large disk drive population

Justin Moore justin at cs.duke.edu
Wed Feb 21 15:50:41 PST 2007

>> How did they look for predictive models on the SMART data?  It sounds
>> like they did a fairly linear data decomposition, looking for first
>> order correlations.  Did they try to e.g. build a neural network on it,
>> or use fully multivariate methods (ordinary stats can handle it up to
>> 5-10 variables).
>> This is really an extension of David's questions below.  It would be
>> very interesting to add variables to the problem (if possible) until the
>> observed correlations resolve (in sufficiently high dimensionality) into
>> something significantly predictive.  That would be VERY useful.
> RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of 
> this data. Now the question is, will we ever get access to this data? 
> ;)

As mentioned in an earlier e-mail (I think) there were 4 SMART variables 
whose values were strongly correlated with failure, and another 4-6 that 
were weakly correlated with failure.  However, of all the disks that 
failed, less than half (around 45%) had ANY of the "strong" signals and 
another 25% had some of the "weak" signals.  This means that over a 
third of disks that failed gave no appreciable warning.  Therefore even 
combining the variables would give no better than a 70% chance of 
predicting failure.

To make things worse, many of the "weak" signals were found on a 
significant number of disks.  For example, among the disks that failed, 
many had a large number of seek error; however, over 70% of disks in the 
fleet -- failed and working -- had a large number of seek errors.

About all I can say beyond what's in the paper is that we're aware of 
the shortcomings of the existing work and possible paths forward.  In 
response, we are
Hello, this is the Google NDA bot.  In our massive trawling of the 
Internet and other data sources, I have detected a possible violation of 
the Google NDA.  This has been corrected.  We now return you to your 
regularly scheduled e-mail.
[ Continue ]  [ I'm Feeling Confidential ]

So that's our master plan.  Just don't tell anyone. :)

P.S. Unfortunately, I doubt that we'll be willing or able to release the 
raw data behind the disk drive study.

Department of Computer Science, Duke University, Durham, NC 27708-0129
Email:	justin at cs.duke.edu
Web:	http://www.cs.duke.edu/~justin/

More information about the Beowulf mailing list