[Beowulf] RAID5 rebuild, remount with write without reboot?
Peter St. John
peter.st.john at gmail.com
Tue Sep 5 10:58:51 PDT 2017
Aren't the drives in the RAID hot-swappable? Removing the defective drive
and installing a new one certainly cycled power on those two? But I'm weak
at hardware, and have never knowingly relied on firmware on a disk.
On Tue, Sep 5, 2017 at 1:52 PM, Andrew Latham <lathama at gmail.com> wrote:
> Without a power cycle updating the drive firmware would be the only method
> of tricking the drives into a power-cycle. Obviously very risky. A reboot
> should be low risk.
> On Tue, Sep 5, 2017 at 12:28 PM, mathog <mathog at caltech.edu> wrote:
>> Short form:
>> An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro controller
>> (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon]")
>> system was long ago configured with a small partition of one disk as /boot
>> and logical volumes for / (root) and /home on a single large virual drive
>> on the RAID. Due to disk problems and a self goal (see below) the array
>> went into a degraded=1 state (as reported by megacli) and write locked both
>> root and home. When the failed disk was replaced and the rebuild completed
>> those were both still write locked. "mount -a" didn't help in either
>> case. A reboot brought them up normally but ideally that should not have
>> been necessary. Is there a method to remount the logical volumes writable
>> that does not require a reboot?
>> Long form:
>> Periodic testing of the disks inside this array turned up pending sectors
>> this command:
>> smartctl -a /dev/sda -d sat+megaraid,7
>> A replacement disk was obtained and the usual replacement method applied:
>> megacli -pdoffline -physdrv[64:7] -a0
>> megacli -pdmarkmissing -physdrv[64:7] -a0
>> megacli -pdprprmv -physdrv[64:7] -a0
>> megacli -pdlocate -start -physdrv[64:7] -a0
>> The disk with the flashing light was physically swapped. The smartctl
>> was run again and unfortunately its values were unchanged. I had always
>> assumed that the "7" in that smartctl was a physical slot, turns out that
>> it is actually the "Device ID". In my defense the smartctl man page does a
>> very poor job describing this:
>> megaraid,N - [Linux only] the device consists of one or more SCSI/SAS
>> connected to a MegaRAID controller. The non-negative integer N (in
>> the range of 0 to 127 inclusive) denotes which disk on the controller
>> is monitored. Use syntax such as:
>> In this system, unlike the others I had worked on previously, Device ID
>> slots were not 1:1.
>> Anyway, about a nanosecond after this was discovered the disk at Device
>> ID 7 was marked as Failed by the controller whereas previously it had been
>> "Online, Spun Up".
>> Ugh. At that point the logical volumes were all set read only and the OS
>> became barely usable, with commands like "more" no longer functioning.
>> Megacli and sshd, thankfully, still worked. Figuring that I had nothing to
>> lose the replacement disk was removed from slot 7 and the original,
>> hopefully still good disk replaced. That put the system into this state.
>> slot 4 (device ID 7) failed.
>> slot 7 (device ID 5) is Offline.
>> megacli -PDOnline -physdrv[64:7] -a0
>> put it at
>> slot 4 (device ID 7) failed.
>> slot 7 (device ID 5) Online, Spun Up
>> The logical volumes were still read only but "more" and most other
>> commands now worked again. Megacli still showed the "degraded" value as
>> 1. I'm still not clear
>> how the two "read only" states differed to cause this change.
>> At that point the failed disk in slot 4 (not 7!) was replaced with the
>> new disk (which had been briefly in slot 7) and it immediately began to
>> rebuild. Something on the order of 48 hours later that rebuild completed,
>> and the controller set "degraded" back to 0. However, the logical volumes
>> were still readonly. "mount -a" didn't fix it, so the system was rebooted,
>> which worked.
>> We have two of these back up systems. They are supposed to have
>> identical contents but do not. Fixing that is another item on a long todo
>> list. RAID 6 would have been a better choice for this much storage, but it
>> does not look like this card supports it:
>> RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with
>> SRL 3 supported, PRL11-RLQ0 DDF layout with no span,
>> PRL11-RLQ0 DDF layout with span
>> That rebuild is far too long for comfort. Had another disk failed in
>> those two days that would have been it. Neither controller has battery
>> backup, and the one in question is not even on a UPS, so a power glitch
>> could be fatal too. Not a happy thought while record SoCal temperatures
>> persisted throughout the entire rebuild! The systems are in different
>> buildings on the same campus, sharing the same power grid. There are no
>> other backups for most of this data.
>> Even though the controller shows this system as no longer degraded,
>> should I believe that there was no data loss? I can run checksums on all
>> the files (even though it will take forever) and compare the two systems.
>> But as I said previously, the files were not entirely 1:1, so there are
>> certainly going to be some files on this system which have no match on the
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> - Andrew "lathama" Latham lathama at gmail.com http://lathama.com
> <http://lathama.org> -
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf