[Beowulf] Re: failure trends in a large disk drive population

Justin Moore justin at cs.duke.edu
Wed Feb 21 15:50:41 PST 2007


>> How did they look for predictive models on the SMART data?  It sounds
>> like they did a fairly linear data decomposition, looking for first
>> order correlations.  Did they try to e.g. build a neural network on it,
>> or use fully multivariate methods (ordinary stats can handle it up to
>> 5-10 variables).
>>
>> This is really an extension of David's questions below.  It would be
>> very interesting to add variables to the problem (if possible) until the
>> observed correlations resolve (in sufficiently high dimensionality) into
>> something significantly predictive.  That would be VERY useful.
>>
>
> RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of 
> this data. Now the question is, will we ever get access to this data? 
> ;)

As mentioned in an earlier e-mail (I think) there were 4 SMART variables 
whose values were strongly correlated with failure, and another 4-6 that 
were weakly correlated with failure.  However, of all the disks that 
failed, less than half (around 45%) had ANY of the "strong" signals and 
another 25% had some of the "weak" signals.  This means that over a 
third of disks that failed gave no appreciable warning.  Therefore even 
combining the variables would give no better than a 70% chance of 
predicting failure.

To make things worse, many of the "weak" signals were found on a 
significant number of disks.  For example, among the disks that failed, 
many had a large number of seek error; however, over 70% of disks in the 
fleet -- failed and working -- had a large number of seek errors.

About all I can say beyond what's in the paper is that we're aware of 
the shortcomings of the existing work and possible paths forward.  In 
response, we are
<GOOGLE_NDA_BOT>
Hello, this is the Google NDA bot.  In our massive trawling of the 
Internet and other data sources, I have detected a possible violation of 
the Google NDA.  This has been corrected.  We now return you to your 
regularly scheduled e-mail.
[ Continue ]  [ I'm Feeling Confidential ]
</GOOGLE_NDA_BOT>

So that's our master plan.  Just don't tell anyone. :)
-jdm

P.S. Unfortunately, I doubt that we'll be willing or able to release the 
raw data behind the disk drive study.

Department of Computer Science, Duke University, Durham, NC 27708-0129
Email:	justin at cs.duke.edu
Web:	http://www.cs.duke.edu/~justin/



More information about the Beowulf mailing list