Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re: failure trends in a large disk drive population (google fileing system)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

momentics momentics at gmail.com
Mon Feb 19 01:00:26 PST 2007


On 2/19/07, matt jones <jamesjamiejones at aol.com> wrote:

> if one fails there
> are still 3, if another there are still 2. i've also read somewhere else
> that if one fails, it can automatically recreate the image from the
> remaining ones on a spare node.

[...]

>this approach is rather ott, but it works and works well.


not sure of Google gents; but we're using reliability model to
calculate number of nodes and their physical locations (continuous
scheduling) - to meet the expected reliability coefficient specified
by the system operator/deployer/configurator (for EE, SW and HW
parts).

HDD is unreliable system part, with the nearly known reliability
(expected -actually), moreover, as we know, most of HDDs have SMART
metrics - the good way to correct live coefficients within used math
model. The outcome here is to use adaptive techs.
So Googles are using the same way probably - a good company anyhow... ta-da! :)

Scal at Gridhttp://sgrid.sourceforge.net/

//
(the perfect doc - the amazing work)




More information about the Beowulf mailing list