Disk reliability (Was: Node cloning)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduWed Apr 11 14:38:15 PDT 2001
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Lindahl wrote: > > > For IBM drives (IDE or SCSI), one can download and use the Drive Fitness > > Test utility (see > > http://www.storage.ibm.com/techsup/hddtech/welcome.htm). This program > > can diagnose typical problems with hard drives. In many cases, bad > > blocks can be 'healed' by erasing the drive using this utility (back up > > your data first, and be prepared for the 'Erase Disk' to take an hour or > > more). If that fails and your drive is under warranty, the drive ought > > to be replaced. > > NOOOOOOOOOOOOOOOOO! > > If a sector returns the wrong result 0.01% of the time, it is bad, but > testing is unlikely to be intensive enough to detect it (10,000 > reads...) If you "heal" it, it will appear to work at first, but it > will eventually turn up bad again. So all you're doing is papering > over the problem. You ought to just replace the disk. Granted, testing once or twice is not perfect, but if that fails, you can always replace the disk later, which is what IBM suggests. The suggestion to 'heal' the disk using IBM's Drive Fitness Test comes from its manual: http://service.boulder.ibm.com/storage/hddtech/dft32ug.pdf which says (on pg.27) "[...] For example if during testing of your hard drive DFT reports a error code of 0x70 as shown on page 14, this indicates that your hard disk drive has one or more bad sectors. In most of these cases the drive can heal itself of these errors. To do this first back-up all your data from the problem drive (if possible) then run DFT again and select the Erase Disk option which is under the Utilities heading. [...] Once erase disk has completed you can then run one of the test options Quick or Advance to confirm htat the drive has been healed. The result code, which should be displayed, is 0x00 if the test returns another code then you should check with your drive/system vendor if the drive can be return for warranty replacement." (sic!) Sincerely, Josip P.S. I'm guessing that the manufacturer's list of bad blocks (written at the disk drive factory) is the result of very limited testing (a few times at most). The drive you return for replacement will be subjected to more testing (only a few times), and if it passes (with an updated list of bad blocks) it will probably be used as a refurbished drive, replacing the ones people sent in for replacement... No manufacturer can afford to perform 10,000 reads of an entire 30GB drive since that would take at least 6 months per drive... -- Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
