Disk reliability (Was: Node cloning)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduMon Apr 9 07:12:24 PDT 2001
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks to several constructive responses, the following picture emerges: (1) Modern IDE drives can automatically remap a certain number of bad blocks. While they are doing this correctly, the OS should not even see a bad block. (2) However, the drive's capacity to do this is limited to 256 bad blocks or so. If more bad blocks exist, then the OS will start to see them. To recover from this without replacing the hard drive, one can detect and map out the bad blocks using 'e2fsck -c ...' and 'mkswap -c ...' commands. Obviously, the partition where this is being done should not be in use (turn swap off first, unmount the file system or reboot after doing "echo '-f -c' >/fsckoptions"). (3) In general, IDE cables should be at most 18" long with both ends plugged in (no stubs), and preferably serving only one (master) drive. For IBM drives (IDE or SCSI), one can download and use the Drive Fitness Test utility (see http://www.storage.ibm.com/techsup/hddtech/welcome.htm). This program can diagnose typical problems with hard drives. In many cases, bad blocks can be 'healed' by erasing the drive using this utility (back up your data first, and be prepared for the 'Erase Disk' to take an hour or more). If that fails and your drive is under warranty, the drive ought to be replaced. For older existing drives (in less critical applications, e.g. to boot Beowulf client nodes where the same data is mirrored by other nodes) mapping out bad blocks as needed is probably adequate. Finally, the existing Linux S.M.A.R.T. utilities apparently do not handle every SMART drive correctly. Use with caution. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
