Disk reliability (Was: Node cloning)

Josip Loncaric josip at icase.edu
Wed Apr 11 20:40:04 PDT 2001

Donald Becker wrote:
> On Wed, 11 Apr 2001, Robert G. Brown wrote:
> > I would assume the "erase" option is really a name for a new low level
> > reformat that fixes the latter kind of error and MIGHT even help with
> When they say "heal", they actually mean "remap to substitute disk
> blocks reserved for this purpose".  They must have thought that the
> concept of remapping disk blocks was too confusing.

I've found a few web pages which may be of interest.  The low level
format on modern drives is created at the factory, possibly using very
precise disk drive servo track writing machines.  This process cannot be
duplicated by any utility commands to the hard drive.  However, each
sector ID contains a flag indicating whether it is defective or not. 
What IBM's Drive Fitness Test and similar tools do is not low level
formating but defect detection, remapping of defective sectors and
zero-fill of the data areas.

In typical hard drive, embedded servo bursts (written at the factory)
are used to guide the disk heads (if those servo signals are erased, the
drive needs to be replaced).  They are followed by a gap, then sector
ID, sync pattern, data area, ECC field and another gap.  In mid-1990s,
IBM developed the No-ID sector format which uses the disk space more
efficiently (by up to 30%).  The embedded servo bursts are still used to
provide the servo signals, but the ID fields are stored in solid state
memory rather than taking space from each sector.  Also, improved servo
tracking algorithms have reduced the problems caused by increased
vibration at 7200rpm. 


Bad sectors on a disk are, well, bad.  You do not want them, and if you
can get a good disk instead, doing so is a good idea.  However, there
are also reasonably good software solutions, which primarily apply when
(1) replacing 25% of the slightly troubled disks in your cluster is a
pain and (2) the data on these disks is replicated 64 times with at
least 75% of the copies being good. A regular application of 'e2fsck -c
...' and suitable 'rsync -ac ...' commands can keep such a cluster
operating with reasonable confidence.

Finally, keep in mind that if 25-35% of brand new IDE new disks can
develop bad blocks, a similar percentage of the replacements could also
develop bad blocks.  A zero tolerance policy will mean at least an hour
of system administrator's time per incident to replace the disk, reload
the software and do the paperwork to have it replaced.  This would be
repeated every time another unit develops a bad block.  The software
alternative (e2fsck -c ...) can be automated and can keep the entire
system operational until a group of seriously defective drives can be
replaced together. 

While most of the IDE drives can work flawlessly, it bears noting that
cheap IDE drives are designed for lighter duty than expensive SCSI
models.  The IDE drives are typically designed for 11 hours/day
operation, but Beowulf clusters operate them 24 hours/day, 365 days a
year, for years on end.  [Moreover, some IDE drive servos are designed
to calibrate certain parameters at powerup, so if the drive is never
powered down, its mechanical parameters might drift away from the
calibration.]  For a list of SCSI/IDE differences, see



Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

More information about the Beowulf mailing list