[Beowulf] RAID5 rebuild, remount with write without reboot?

Peter St. John peter.st.john at gmail.com
Tue Sep 5 10:58:51 PDT 2017


Aren't the drives in the RAID hot-swappable? Removing the defective drive
and installing a new one certainly cycled power on those two? But I'm weak
at hardware, and have never knowingly relied on firmware on a disk.

On Tue, Sep 5, 2017 at 1:52 PM, Andrew Latham <lathama at gmail.com> wrote:

> Without a power cycle updating the drive firmware would be the only method
> of tricking the drives into a power-cycle. Obviously very risky. A reboot
> should be low risk.
>
> On Tue, Sep 5, 2017 at 12:28 PM, mathog <mathog at caltech.edu> wrote:
>
>> Short form:
>>
>> An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro controller
>> (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon]")
>> system was long ago configured with a small partition of one disk as /boot
>> and logical volumes for / (root) and /home on a single large virual drive
>> on the RAID.  Due to disk problems and a self goal (see below) the array
>> went into a degraded=1 state (as reported by megacli) and write locked both
>> root and home.  When the failed disk was replaced and the rebuild completed
>> those were both still write locked.  "mount -a" didn't help in either
>> case.  A reboot brought them up normally but ideally that should not have
>> been necessary.  Is there a method to remount the logical volumes writable
>> that does not require a reboot?
>>
>> Long form:
>>
>> Periodic testing of the disks inside this array turned up pending sectors
>> with
>> this command:
>>
>>    smartctl -a  /dev/sda -d sat+megaraid,7
>>
>> A replacement disk was obtained and the usual replacement method applied:
>>
>> megacli -pdoffline -physdrv[64:7] -a0
>> megacli -pdmarkmissing -physdrv[64:7] -a0
>> megacli -pdprprmv -physdrv[64:7] -a0
>> megacli -pdlocate -start -physdrv[64:7] -a0
>>
>> The disk with the flashing light was physically swapped.  The smartctl
>> was run again and unfortunately its values were unchanged.  I had always
>> assumed that the "7" in that smartctl was a physical slot, turns out that
>> it is actually the "Device ID".  In my defense the smartctl man page does a
>> very poor job describing this:
>>
>>   megaraid,N - [Linux only] the device consists of one or more SCSI/SAS
>> disks
>>   connected to  a  MegaRAID controller.   The  non-negative  integer N (in
>>   the range of 0 to 127 inclusive) denotes which disk on the controller
>>   is monitored.  Use syntax such as:
>>
>> In this system, unlike the others I had worked on previously, Device ID
>> and
>> slots were not 1:1.
>>
>> Anyway, about a nanosecond after this was discovered the disk at Device
>> ID 7 was marked as Failed by the controller whereas previously it had been
>> "Online, Spun Up".
>> Ugh. At that point the logical volumes were all set read only and the OS
>> became barely usable, with commands like "more" no longer functioning.
>> Megacli and sshd, thankfully, still worked.  Figuring that I had nothing to
>> lose the replacement disk was removed from slot 7 and the original,
>> hopefully still good disk replaced.  That put the system into this state.
>>
>> slot 4 (device ID 7) failed.
>> slot 7 (device ID 5) is Offline.
>>
>> and
>>
>> megacli -PDOnline -physdrv[64:7] -a0
>>
>> put it at
>>
>> slot 4 (device ID 7) failed.
>> slot 7 (device ID 5) Online, Spun Up
>>
>> The logical volumes were still read only but "more" and most other
>> commands now worked again.  Megacli still showed the "degraded" value as
>> 1.  I'm still not clear
>> how the two "read only" states differed to cause this change.
>>
>> At that point the failed disk in slot 4 (not 7!) was replaced with the
>> new disk (which had been briefly in slot 7) and it immediately began to
>> rebuild.  Something on the order of 48 hours later that rebuild completed,
>> and the controller set "degraded" back to 0.  However, the logical volumes
>> were still readonly.  "mount -a" didn't fix it, so the system was rebooted,
>> which worked.
>>
>>
>> We have two of these back up systems.  They are supposed to have
>> identical contents but do not.  Fixing that is another item on a long todo
>> list.  RAID 6 would have been a better choice for this much storage, but it
>> does not look like this card supports it:
>>
>>   RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with
>> spanning,
>>   SRL 3 supported, PRL11-RLQ0 DDF layout with no span,
>>   PRL11-RLQ0 DDF layout with span
>>
>> That rebuild is far too long for comfort.  Had another disk failed in
>> those two days that would have been it. Neither controller has battery
>> backup, and the one in question is not even on a UPS, so a power glitch
>> could be fatal too. Not a happy thought while record SoCal temperatures
>> persisted throughout the entire rebuild! The systems are in different
>> buildings on the same campus, sharing the same power grid.  There are no
>> other backups for most of this data.
>>
>> Even though the controller shows this system as no longer degraded,
>> should I believe that there was no data loss?  I can run checksums on all
>> the files (even though it will take forever) and compare the two systems.
>> But as I said previously, the files were not entirely 1:1, so there are
>> certainly going to be some files on this system which have no match on the
>> other.
>>
>> Regards,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>
> --
> - Andrew "lathama" Latham lathama at gmail.com http://lathama.com
> <http://lathama.org> -
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170905/73fd8fb5/attachment-0001.html>


More information about the Beowulf mailing list