Disk reliability (Was: Node cloning)

JackM meisterj at acm.org
Mon May 21 20:02:55 PDT 2001


You can try using hdparm to turn the DMA off.  Of course, it does slow
down data transfer rates considerably. 


Jeffrey B Layton wrote:
> 
> Hello,
> 
>   I hate to dredge up this topic again, but ... . I've got a machine
> with an IBM drive that is giving me the following errors,
> 
> kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> 
> as discussed in previous emails on the list. I followed the pointers
> that Josip gave and ran the IBM code on the drive. It said the drive
> was fine. However, I'm still getting the same error messages.
> Anybody care to suggest anything else to look at? Perhaps cabling
> or a new motherboard (it's an Abit board).
> 
> TIA,
> 
> Jeff
> 
> Josip Loncaric wrote:
> 
> > Thanks to several constructive responses, the following picture emerges:
> >
> > (1) Modern IDE drives can automatically remap a certain number of bad
> > blocks.  While they are doing this correctly, the OS should not even see
> > a bad block.
> >
> > (2) However, the drive's capacity to do this is limited to 256 bad
> > blocks or so.  If more bad blocks exist, then the OS will start to see
> > them.  To recover from this without replacing the hard drive, one can
> > detect and map out the bad blocks using 'e2fsck -c ...' and 'mkswap -c
> > ...' commands.  Obviously, the partition where this is being done should
> > not be in use (turn swap off first, unmount the file system or reboot
> > after doing "echo '-f -c' >/fsckoptions").
> >
> > (3) In general, IDE cables should be at most 18" long with both ends
> > plugged in (no stubs), and preferably serving only one (master) drive.
> >
> > For IBM drives (IDE or SCSI), one can download and use the Drive Fitness
> > Test utility (see
> > http://www.storage.ibm.com/techsup/hddtech/welcome.htm).  This program
> > can diagnose typical problems with hard drives.  In many cases, bad
> > blocks can be 'healed' by erasing the drive using this utility (back up
> > your data first, and be prepared for the 'Erase Disk' to take an hour or
> > more).  If that fails and your drive is under warranty, the drive ought
> > to be replaced.
> >
> > For older existing drives (in less critical applications, e.g. to boot
> > Beowulf client nodes where the same data is mirrored by other nodes)
> > mapping out bad blocks as needed is probably adequate.
> >
> > Finally, the existing Linux S.M.A.R.T. utilities apparently do not
> > handle every SMART drive correctly.  Use with caution.
> >
> > Sincerely,
> > Josip
> >
> > --
> > Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
> > ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
> > NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
> > Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list