Disk reliability (Was: Node cloning)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at coffee.psychology.mcmaster.caSun May 27 09:23:02 PDT 2001
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > > You can try using hdparm to turn the DMA off. Of course, it does slow > > > down data transfer rates considerably. > > > > As Mark said, BadCRC only means that the transfer was retried. If a few > > BadCRC messages are the only problem, I would not turn off DMA. > > What size of CRCs are being used? If it's a 32-bit CRC and the errors > involved are likely to involve several bits, I think your chances of > having an uncaught data error are only four billion to one. Four > billion microseconds is about eighty minutes, a billion milliseconds > is about a month and a half, and four billion seconds is about 125 > years. hmm, I'll admit I never actually looked at the details. the CRC is 16b (not really surprising, since ATA is that wide): G(X) = X15 + X12 + X5 + 1. so I think your point was to be less blase' about badCRC reports, and you're certainly right. hmm, so the chance of undetected errors depends on tranfers/second, right? so figuring a worst-case ATA100 and nothing but 4K transfers, we'd see something like 20K t/s. hmm, how do you go from those numbers to mean time to undetected failure? I think your back-of-envelope numbers were assuming 1 transfer per us, right? so with 16b CRC, you'd expect an uncaught error in 64K/20K=3 s. but is that assuming some particular distribution of errors? thanks, mark hahn.
- Previous message: Disk reliability (Was: Node cloning)
- Next message: Disk reliability (Was: Node cloning)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
