[Beowulf] GPFS and failed metadata NSD

Arif Ali mail at arif-ali.co.uk
Thu May 18 06:57:08 PDT 2017


Hi John,

I would recommend joining up at spectrumscale.org mailing list, where 
you will find very good experts from the HPC industry who know GPFS 
well, including, Vendors, users and integrators. More specifically, 
you'll you'll find gpfs developers on there. Maybe someone on that list 
can help out

More direct link to the mailing list, here, 
https://www.spectrumscale.org:10000/virtualmin-mailman/unauthenticated/listinfo.cgi/gpfsug-discuss/

On 29/04/2017 08:00, John Hanks wrote:
> Hi,
>
> I'm not getting much useful vendor information so I thought I'd ask 
> here in the hopes that a GPFS expert can offer some advice. We have a 
> GPFS system which has the following disk config:
>
> [root at grsnas01 ~]# mmlsdisk grsnas_data
> disk         driver   sector     failure holds    holds               
>            storage
> name         type       size       group metadata data  status       
>  availability pool
> ------------ -------- ------ ----------- -------- ----- ------------- 
> ------------ ------------
> SAS_NSD_00   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_01   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_02   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_03   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_04   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_05   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_06   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_07   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_08   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_09   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_10   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_11   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_12   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_13   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_14   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_15   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_16   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_17   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_18   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_19   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_20   nsd         512         100 No       Yes ready         up 
>           system
> SAS_NSD_21   nsd         512         100 No       Yes ready         up 
>           system
> SSD_NSD_23   nsd         512         200 Yes      No  ready         up 
>           system
> SSD_NSD_24   nsd         512         200 Yes      No  ready         up 
>           system
> SSD_NSD_25   nsd         512         200 Yes      No  to be emptied 
> down         system
> SSD_NSD_26   nsd         512         200 Yes      No  ready         up 
>           system
>
> SSD_NSD_25 is a mirror in which both drives have failed due to a 
> series of unfortunate events and will not be coming back. From the 
> GPFS troubleshooting guide it appears that my only alternative is to run
>
> mmdeldisk grsnas_data  SSD_NSD_25 -p
>
> around which the documentation also warns is irreversible, the sky is 
> likely to fall, dogs and cats sleeping together, etc. But at this 
> point I'm already in an irreversible situation. Of course this is a 
> scratch filesystem, of course people were warned repeatedly about the 
> risk of using a scratch filesystem that is not backed up and of course 
> many ignored that. I'd like to recover as much as possible here. Can 
> anyone confirm/reject that deleting this disk is the best way forward 
> or if there are other alternatives to recovering data from GPFS in 
> this situation?
>
> Any input is appreciated. Adding salt to the wound is that until a few 
> months ago I had a complete copy of this filesystem that I had made 
> onto some new storage as a burn-in test but then removed as that 
> storage was consumed... As they say, sometimes you eat the bear, and 
> sometimes, well, the bear eats you.
>
> Thanks,
>
> jbh
>
> (Naively calculated probability of these two disks failing close 
> together in this array: 0.00001758. I never get this lucky when buying 
> lottery tickets.)
>
> -- 
> ‘[A] talent for following the ways of yesterday, is not sufficient to 
> improve the world of today.’
>  - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
regards,

Arif Ali
Mob: +447970148122

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170518/a8ed4486/attachment.html>


More information about the Beowulf mailing list