[Beowulf] Consumer vs. Enterprise Hard Drives in Clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eugen Leitl eugen at leitl.orgSat Jan 24 01:49:46 PST 2009
- Previous message: [Beowulf] Consumer vs. Enterprise Hard Drives in Clusters
- Next message: [Beowulf] Consumer vs. Enterprise Hard Drives in Clusters [corrected]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 23, 2009 at 02:13:02PM -0800, Bill Broadley wrote: > I've seen little correlation between weight and vibration. After all even the > built like a tank hardware is still noisy. If yelling at a RAID array in a noisy center causes a latency peak obviously the drives themselves are susceptible. The cover plate is thin, after all. Another reason to look forward to SSDs. > Just a delay between read/write and the answer. Usually there is a timeout, > after all a completely dead drive might never answer. Does anyone know whether WDTLER.EXE http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery still works on modern non-RAID-Edition WD Green lines? The price difference is some 50 EUR for TByte drives. > Well you don't want the drive hiding the fact that you had to retry 10 times > to read a sector. Sure smartctl can track this kind of thing, strangely I should make it a habit to read SMART trend report for my drive population. > hardware RAID controllers often hide that info from the operating system. > Basically for a raid you want a yes you have this block or no you don't have a > block within a fairly low time windows. Especially in the gruesome case of a > manual rebuild where you don't want the marginal sectors sending your drive > into la la land preventing you from getting the perfectly healthy blocks off. > > It all comes down to it's easier to deal with a sorry, can't get that block > within 50ms then handle a drive that disappears for 10's of seconds at a time. > > The kind of nightmare scenarios I've seen is a 16 disk array bit rot starts, > the array looks perfect, but of course the number of invisible retries starts > increasing. If you are using a pathetically old kernel (like say the standard > RHEL kernel) you don't have ECC scrubbing. Then of course a drive drops, you Apropos scrubbing, is chipkill worth it? Some AMD systems I've seen have ECC buffered DIMMs with chipkill. > go to rebuild, then a 2nd drive hits an error (that has been silent till now). > Then you are in a position where you want to scan all drives and hope that > the errors that you find are not aligned with the errors on other drives. > With RAID edition drives you can do such a rebuild in a reasonable amount of > time, with desktop drives, even one that is 99% good blocks can lead to very > high rebuild times. I'm aware of the problem, and looking at FreeNAS 0.7 (currently pre-alpha) with scrubbing and zfs/RAID-Z for self-healing. > I'm guessing that when a 120MB/sec consumer drive is providing 20-30MB/sec > that it's service life is shortened, but I've no numbers to back that up. In > the same conditions a raid edition drive provided 75MB/sec or so with or > without vibration. As another anecdote, I had 7200.11 TByte line perform awfully on DB-like tasks, and a lot of issues reported by SMART and failures during use (one RAID 1 failed to rebuild since the second drive died during reconstruction). > Manufacturers are starting to mention the number of drives in a RAID... they > seem to be differentiating between single drive, 2-4 drive arrays, and larger. ...
- Previous message: [Beowulf] Consumer vs. Enterprise Hard Drives in Clusters
- Next message: [Beowulf] Consumer vs. Enterprise Hard Drives in Clusters [corrected]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
