[Beowulf] MD check/scrub

Tue Nov 13 11:52:58 PST 2007

Bill Broadley <bill at cse.ucdavis.edu> writes:

> Leif Nixon wrote:
>> Reconstruction. With raid 6, you can recover from single-disk
>> corruption (As opposed to *failures*, where you get read errors from a
>> disk. Raid 6 can handle two simultaneous disk *failures*.).
>> See section 4 in:
>> http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
>>
>
> I just read it.
>
>> Just recalculating the parity blocks does give you a consistent raid
>> stripe, but destroys your data (unless it actually was one of the
>> parity blocks that was corrupted).
>
> Er, that's not how I read it at all.  To quote:
>
>  In the case of data drive corruption, once the faulty drive has been
> identified, recover using the P drive in the same way as a one-disk
> erasure failure.

I think you misunderstood me. This quote is about how it *should* be
done. My point is that as far as we can tell many raid controllers, as
well as the current md driver, don't do this.

If they find an inconsistent stripe they don't try to identify the
corrupt block. Instead, they dumbly *recompute P and Q*, which of
course makes the stripe consistent, but *leaves the corrupt data in
place*.

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------