[Beowulf] Lustre Upgrades

Paul Edmon pedmon at cfa.harvard.edu
Mon Jul 23 11:11:40 PDT 2018


Yeah we've pinged Intel/Whamcloud to find out upgrade paths as we wanted 
to know what the recommended procedure is.

Sure. So we have 3 systems that we want to upgrade 1 that is a PB and 2 
that are 5 PB each.  I will just give you a description of one and 
assume that everything would scale linearly with size. They all have the 
same hardware.

The head nodes are Dell R620's while the shelves are M3420 (mds) and 
M3260 (oss).  The MDT is 2.2T with 466G used and 268M inodes used.  Each 
OST is 30T with each OSS hosting 6.  The filesystem itself is 93% full.

-Paul Edmon-


On 07/23/2018 01:58 PM, Jeff Johnson wrote:
> Paul,
>
> How big are your ldiskfs volumes? What type of underlying hardware are 
> they? Running e2fsck (ldiskfs aware) is wise and can be done in 
> parallel. It could be within a couple of days, the time all depends on 
> the size and underlying hardware.
>
> Going from 2.5.34 to 2.10.4 is a significant jump. I would be sure 
> there isn't a step upgrade advised. I know there has been step 
> upgrades in the past, not sure about going to/from these two versions.
>
> --Jeff
>
> On Mon, Jul 23, 2018 at 10:34 AM, Paul Edmon <pedmon at cfa.harvard.edu 
> <mailto:pedmon at cfa.harvard.edu>> wrote:
>
>     Yeah we've found out firsthand that its problematic as we have
>     been seeing issues :).  Hence the urge to upgrade.
>
>     We've begun exploring this but we wanted to reach out to other
>     people who may have gone through the same thing to get their
>     thoughts.  We also need to figure out how significant an outage
>     this will be.  As if it takes a day or two of full outage to do
>     the upgrade that is more acceptable than a week.  We also wanted
>     to know if people had experienced data loss/corruption in the
>     process and any other kinks.
>
>     We were planning on playing around on VM's to test the upgrade
>     path before committing to upgrading our larger systems.  One of
>     the questions we had though was if we needed to run e2fsck
>     before/after the upgrade as that could add significant time to the
>     outage for that to complete.
>
>     -Paul Edmon-
>
>
>     On 07/23/2018 01:18 PM, Jeff Johnson wrote:
>>     You're running 2.10.4 clients against 2.5.34 servers? I believe
>>     there are notable lnet attrs that don't exist in 2.5.34. Maybe a
>>     Whamcloud wiz might chime in but I think that version mismatch
>>     might be problematic.
>>
>>     You can do a testbed upgrade to test taking a ldiskfs volume from
>>     2.5.34 to 2.10.4, just to be conservative.
>>
>>     --Jeff
>>
>>
>>     On Mon, Jul 23, 2018 at 10:05 AM, Paul Edmon
>>     <pedmon at cfa.harvard.edu <mailto:pedmon at cfa.harvard.edu>> wrote:
>>
>>         My apologies I meant 2.5.34 not 2.6.34. We'd like to get up
>>         to 2.10.4 which is what our clients are running.  Recently we
>>         upgraded our cluster to CentOS7 which necessitated the client
>>         upgrade.  Our storage servers though stayed behind on 2.5.34.
>>
>>         -Paul Edmon-
>>
>>
>>         On 07/23/2018 01:00 PM, Jeff Johnson wrote:
>>>         Paul,
>>>
>>>         2.6.34 is a kernel version. What version of Lustre are you
>>>         at now? Some updates are easier than others.
>>>
>>>         --Jeff
>>>
>>>         On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon
>>>         <pedmon at cfa.harvard.edu <mailto:pedmon at cfa.harvard.edu>> wrote:
>>>
>>>             We have some old large scale Lustre installs that are
>>>             running 2.6.34 and we want to get these up to the latest
>>>             version of Lustre.  I was curious if people in this
>>>             group have any experience with doing this and if they
>>>             could share them.  How do you handle upgrades like
>>>             this?  How much time does it take?  What are the
>>>             pitfalls?  How do you manage it with minimal customer
>>>             interruption? Should we just write off upgrading and
>>>             stand up new servers that are on the correct version (in
>>>             which case we need to transfer the several PB's worth of
>>>             data over to the new system)?
>>>
>>>             Thanks for your wisdom.
>>>
>>>             -Paul Edmon-
>>>
>>>             _______________________________________________
>>>             Beowulf mailing list, Beowulf at beowulf.org
>>>             <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>>>             To change your subscription (digest mode or unsubscribe)
>>>             visit http://www.beowulf.org/mailman/listinfo/beowulf
>>>             <http://www.beowulf.org/mailman/listinfo/beowulf>
>>>
>>>
>>>
>>>
>>>         -- 
>>>         ------------------------------
>>>         Jeff Johnson
>>>         Co-Founder
>>>         Aeon Computing
>>>
>>>         jeff.johnson at aeoncomputing.com
>>>         <mailto:jeff.johnson at aeoncomputing.com>
>>>         www.aeoncomputing.com <http://www.aeoncomputing.com>
>>>         t: 858-412-3810 x1001   f: 858-412-3845
>>>         m: 619-204-9061
>>>
>>>         4170 Morena Boulevard, Suite C - San Diego, CA 92117
>>>
>>>         High-Performance Computing / Lustre Filesystems / Scale-out
>>>         Storage
>>>
>>>
>>>         _______________________________________________
>>>         Beowulf mailing list,Beowulf at beowulf.org <mailto:Beowulf at beowulf.org>  sponsored by Penguin Computing
>>>         To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
>>>         <http://www.beowulf.org/mailman/listinfo/beowulf>
>>
>>
>>         _______________________________________________
>>         Beowulf mailing list, Beowulf at beowulf.org
>>         <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>>         To change your subscription (digest mode or unsubscribe)
>>         visit http://www.beowulf.org/mailman/listinfo/beowulf
>>         <http://www.beowulf.org/mailman/listinfo/beowulf>
>>
>>
>>
>>
>>     -- 
>>     ------------------------------
>>     Jeff Johnson
>>     Co-Founder
>>     Aeon Computing
>>
>>     jeff.johnson at aeoncomputing.com
>>     <mailto:jeff.johnson at aeoncomputing.com>
>>     www.aeoncomputing.com <http://www.aeoncomputing.com>
>>     t: 858-412-3810 x1001   f: 858-412-3845
>>     m: 619-204-9061
>>
>>     4170 Morena Boulevard, Suite C - San Diego, CA 92117
>>
>>     High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>
>>
>>     _______________________________________________
>>     Beowulf mailing list,Beowulf at beowulf.org <mailto:Beowulf at beowulf.org>  sponsored by Penguin Computing
>>     To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf
>>     <http://www.beowulf.org/mailman/listinfo/beowulf>
>
>
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     http://www.beowulf.org/mailman/listinfo/beowulf
>     <http://www.beowulf.org/mailman/listinfo/beowulf>
>
>
>
>
> -- 
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.johnson at aeoncomputing.com <mailto:jeff.johnson at aeoncomputing.com>
> www.aeoncomputing.com <http://www.aeoncomputing.com>
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180723/ef59cb0f/attachment-0001.html>


More information about the Beowulf mailing list