[Beowulf] Re: real hard drive failures
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Maurice Hilarius maurice at harddata.comSun Jan 30 09:28:42 PST 2005
- Previous message: [Beowulf] New toys
- Next message: [Beowulf] Re: real hard drive failures
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Some observations: >Date: Tue, 25 Jan 2005 13:42:05 -0800 (PST) >From: Alvin Oga <alvin at Mail.Linux-Consulting.com> > > >i'd add 1 or 2 cooling fans per ide disk, esp if its 7200rpm or 10,000 rpm >disks > > Adding fans makes some assumptions: 1) There is inadequate chassis cooling in the first place. If that is the case, one should consider a better chassis. If the drives are not being cooled, then what else is also not properly cooled? 2) To add a fan effectively, one must have sufficient input of outside air, and sufficient exhaust capacity in the chassis to move out the heated air. In my experience the biggest deficiency in most chassis is in the latter example. Simply adding fans on the front input side, without sufficient exhaust capacity adds little real air flow. Think of most chassis as a funnel. You can only push in as much air as there is capacity for it to escape at the back. More fans do not add much more flow, unless the fans are capable of increasing the pressure inside the case sufficiently to force more air out of the back. You average small axial fan generates extremely small pressure. In effect the air flow will be stalled, in most cases. 3) Adding fans requires some place to mount them so that the airflow passes over the hard disks. Most chassis used in clusters do not provide that space and location. 4) Adding fans often creates some additional maintenance issues and failure points. Typical small fans have generally high spin rates, and correspondingly high failure rates. If the survival of a hard disk depends on the fan, and the fan has a short life what are you gaining in terms of lifespan? A fan with a 1 year lifespan to cool a hard disk with a 5 year lifespan is a waste of time, or, at best, a huge maintenance burden. >>We have used mostly Western Digital (WD) drives for > 4 years. We use the >>higher rpm and larger cache varieties ... >> >> > >8MB cache versions tend to be better > > > True, which is why WD sells those as their "Special Edition" (JB) variant with 3 year warranty, and the 2MB (BB) variants with 1 year. >>We also used IBM 60GB drives for a while and some of you will have experienced >>that mess ... approx. 80% failure over 1 year time frame! >> >> > >80% failure is way way ( 15x) too high, but if its deskstar ( from >thailand) than, those disks are known to be bad > > > The "bad drives" mainly came from their now defunct Hungarian plant. The Thailand plant products had few problems. >if it's not the deskstar, than you probably have a vendor problem >of the folks that sold those disks to you > > > Maxtor drives have had very high failure rates in recent (3) years. That probably prompted them to lead the rush to 1 year warranties 2.5 years ago. WD did very well in the market by keeping the 3 year Special Edition drives available, and recently Seagate, then Maxtor came back to add longer warranties, now generally 5 years. What is telling is that their product does not seem to have been improved in design reliability. This is ALL about marketing. What is also worth considering is the question of will the company will be around in 5 years to honor that warranty. With Seagate and Maxtor on a diet of steady losses for at least 3 years it is worth considering. WD, OTOH, have been making profit while selling 3-5 year warranty drives. >>WD 80GB drives in the field for 1+ years, [~500 drives] "ARRRRGGGG!" ~15% >>failure and increasing. I send out 3-5 replacement drives every month. >> >> > >probably running too hot ... needs fans cooling the disks > - get those "disk coolers with 2 fans on it ) > > Agreed ( but see comment above), also he probably has the "cheaper" BB model rather than the better "JB" on those 80's > > > > >>I'm moving to a 3 drive raid5 setup on each node (drives are cheap, down time >>is not) and considering changing to Seagate SATA drives anyone care to offer >>opinions or more anecdotes? :-) >> >> Average. WD are slightly more reliable in our experience ( we sell several thousand drives a year). As long as you stick to JB, JD, or SD models. Hitachi and Seagate tie for 2nd, Maxtor are last. BTW, Hitachi took over the IBM drive business, but most of the product line is new, so these are not the same as the older infamous "deathstar" drives. >== using 4 drive raid is better ... but is NOT the solution == > > - configuring raid is NOT cheap ... > > Why? Most modern boards support 4 IDE devices and 4 S-ATA devices. Using mdadm to configure and maintain a RAID is trivial. Onboard "RAID" on integrated controllers is not standardized, and is usually limited to RAID 0 and 1, whereas software RAID allows RAID 5, 6, and mixed RAID types on the same disks. Configuring RAID10 on a system entails twice as many drives, but provides much greater reliability of data, while costing virtually no overhead or performance loss. > - fixing raid is expensive time ... (due to mirroring and syncing) > > - if downtime is important, and should be avoidable, than raid > is the worst thing, since it's 4x slower to bring back up than > a single disk failure > > I disagree. You have no downtime on a RAID if you incorporate a redundant RAID scheme. If the interface supports swapping out disks you need never shut down to deal with a failed disk. . If you have to change drives immediately when they fail, maybe you do need a better controller. OTOH, shutdown time to change a disk on a decent chassis is under 1 minute. Depends on your needs. > - raid will NOT prevent your downtime, as that raid box > will have to be shutdown sooner or later > > Simply not true. As long as the controller supports removing and adding devices, and as long as your chassis has disk trays to support hot-swap, there is ZERO downtime. If you have redundant RAID you can delay the shutdown until the time that is convenient to you. You have to shut down for some form of scheduled maintenance at least once in a while. Price penalty is fairly light. For example, our 1U cluster node chassis have 4 hotswap S-ATA or SCSI trays, redundant disk cooling fans, and you can add a 4 port 3Ware controller and you pay a price premium of only $280. Not including extra disks, of course. What is downtime worth to you is the main question YOU have to answer.. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice at harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20050130/19aac6aa/attachment.html
- Previous message: [Beowulf] New toys
- Next message: [Beowulf] Re: real hard drive failures
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
