[Beowulf] Re: failure trends in a large disk drive population (google fileing system)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
momentics momentics at gmail.comMon Feb 19 01:00:26 PST 2007
- Previous message: [Beowulf] Re: failure trends in a large disk drive population (google fileing system)
- Next message: [Beowulf] Re: failure trends in a large disk drive population
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/19/07, matt jones <jamesjamiejones at aol.com> wrote: > if one fails there > are still 3, if another there are still 2. i've also read somewhere else > that if one fails, it can automatically recreate the image from the > remaining ones on a spare node. [...] >this approach is rather ott, but it works and works well. not sure of Google gents; but we're using reliability model to calculate number of nodes and their physical locations (continuous scheduling) - to meet the expected reliability coefficient specified by the system operator/deployer/configurator (for EE, SW and HW parts). HDD is unreliable system part, with the nearly known reliability (expected -actually), moreover, as we know, most of HDDs have SMART metrics - the good way to correct live coefficients within used math model. The outcome here is to use adaptive techs. So Googles are using the same way probably - a good company anyhow... ta-da! :) Scal at Grid – http://sgrid.sourceforge.net/ // (the perfect doc - the amazing work)
- Previous message: [Beowulf] Re: failure trends in a large disk drive population (google fileing system)
- Next message: [Beowulf] Re: failure trends in a large disk drive population
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
