[Beowulf] GPFS and failed metadata NSD

John Hanks griznog at gmail.com
Fri May 19 07:33:23 PDT 2017


There is a potentially lost PhD project there which was to be defended this
month for which the person may simply give up and another project which
represents two years of work but was never copied anywhere else. We also
have many users who don't read notices and who probably aren't aware yet
there was a problem so it's possible there is more loss yet to be noticed.
Our environment has multiple scratch filesystems by design, so we are able
to limp along while trying to recover data from this one.

I've been at this for nearly 30 years now, this is basically a 500 TB
version of the first 5 1/4" floppy disk I was handed with the plea "this is
the only copy of my thesis, can you please get it back..." I think the
lesson has been driven home, for them and me, but we'd still like to get
something back if we can and we will actually be completely down throughout
the month of June for unrelated reasons so it's mitigated a bit by that
planned downtime.

Given the number of floppy disks, Zip disks, Jaz drive cartridges, USB
thumb drives, dead laptops,... I've been handed over the years the one
constant take home lesson is that you can never stop educating people
(including yourself) about the importance of keeping multiple copies of
critical data no matter how stable the storage you use claims to be. It is
ALL going to die eventually.

jbh


On Fri, May 19, 2017 at 11:57 AM Bogdan Costescu <bcostescu at gmail.com>
wrote:

> Hello John,
>
> I'm quite curious to know what was the thinking behind taking a
> scratch filesystem down for so long, i.e. you wrote to the list on
> 29.04 and likely you experienced the problem some days before, and you
> expect the image from the recovery company to reach you no sooner than
> beginning of June, so a downtime of 6+ weeks? Is there some really
> precious data stored there? Did you do some kind of costs analysis
> comparing recovery costs with recomputing the data? Your description:
>
> "Of course this is a scratch filesystem, of course people were warned
> repeatedly about the risk of using a scratch filesystem that is not
> backed up and of course many ignored that."
>
> fits one of our large filesystems too :) The software side (BeeGFS)
> has been resilient in the face of hardware failures until now, but I'm
> worried about an extended hardware failure (i.e. several disks failing
> at the same time) which would take some part of the data with them...
>
> Cheers,
> Bogdan
>
> On Fri, May 19, 2017 at 9:39 AM, John Hanks <griznog at gmail.com> wrote:
> > Thanks Arif, I'm signed up there now.
> >
> > As a general update, the most recently failed disk of the pair is at a
> data
> > recovery company who thinks they can recover a workable image from it. We
> > should have that back in two or three weeks and will try to use it to
> > recover the filesystem.
> >
> > jbh
> >
> > On Thu, May 18, 2017 at 5:21 PM Arif Ali <mail at arif-ali.co.uk> wrote:
> >>
> >> Hi John,
> >>
> >> I would recommend joining up at spectrumscale.org mailing list, where
> you
> >> will find very good experts from the HPC industry who know GPFS well,
> >> including, Vendors, users and integrators. More specifically, you'll
> you'll
> >> find gpfs developers on there. Maybe someone on that list can help out
> >>
> >> More direct link to the mailing list, here,
> >>
> https://www.spectrumscale.org:10000/virtualmin-mailman/unauthenticated/listinfo.cgi/gpfsug-discuss/
> >>
> >>
> >> On 29/04/2017 08:00, John Hanks wrote:
> >>
> >> Hi,
> >>
> >> I'm not getting much useful vendor information so I thought I'd ask here
> >> in the hopes that a GPFS expert can offer some advice. We have a GPFS
> system
> >> which has the following disk config:
> >>
> >> [root at grsnas01 ~]# mmlsdisk grsnas_data
> >> disk         driver   sector     failure holds    holds
> >> storage
> >> name         type       size       group metadata data  status
> >> availability pool
> >> ------------ -------- ------ ----------- -------- ----- -------------
> >> ------------ ------------
> >> SAS_NSD_00   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_01   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_02   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_03   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_04   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_05   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_06   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_07   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_08   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_09   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_10   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_11   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_12   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_13   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_14   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_15   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_16   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_17   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_18   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_19   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_20   nsd         512         100 No       Yes   ready         up
> >> system
> >> SAS_NSD_21   nsd         512         100 No       Yes   ready         up
> >> system
> >> SSD_NSD_23   nsd         512         200 Yes      No    ready         up
> >> system
> >> SSD_NSD_24   nsd         512         200 Yes      No    ready         up
> >> system
> >> SSD_NSD_25   nsd         512         200 Yes      No    to be emptied
> down
> >> system
> >> SSD_NSD_26   nsd         512         200 Yes      No    ready         up
> >> system
> >>
> >> SSD_NSD_25 is a mirror in which both drives have failed due to a series
> of
> >> unfortunate events and will not be coming back. From the GPFS
> >> troubleshooting guide it appears that my only alternative is to run
> >>
> >> mmdeldisk grsnas_data  SSD_NSD_25 -p
> >>
> >> around which the documentation also warns is irreversible, the sky is
> >> likely to fall, dogs and cats sleeping together, etc. But at this point
> I'm
> >> already in an irreversible situation. Of course this is a scratch
> >> filesystem, of course people were warned repeatedly about the risk of
> using
> >> a scratch filesystem that is not backed up and of course many ignored
> that.
> >> I'd like to recover as much as possible here. Can anyone confirm/reject
> that
> >> deleting this disk is the best way forward or if there are other
> >> alternatives to recovering data from GPFS in this situation?
> >>
> >> Any input is appreciated. Adding salt to the wound is that until a few
> >> months ago I had a complete copy of this filesystem that I had made onto
> >> some new storage as a burn-in test but then removed as that storage was
> >> consumed... As they say, sometimes you eat the bear, and sometimes,
> well,
> >> the bear eats you.
> >>
> >> Thanks,
> >>
> >> jbh
> >>
> >> (Naively calculated probability of these two disks failing close
> together
> >> in this array: 0.00001758. I never get this lucky when buying lottery
> >> tickets.)
> >>
> >> --
> >> ‘[A] talent for following the ways of yesterday, is not sufficient to
> >> improve the world of today.’
> >>  - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
> >>
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >>
> >>
> >> --
> >> regards,
> >>
> >> Arif Ali
> >> Mob: +447970148122 <+44%207970%20148122>
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > --
> > ‘[A] talent for following the ways of yesterday, is not sufficient to
> > improve the world of today.’
> >  - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
-- 
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
 - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170519/1bff22dc/attachment-0001.html>


More information about the Beowulf mailing list