[Beowulf] CentOS 7.x for cluster nodes ?

Thu Dec 29 23:36:22 PST 2016

I just deployed our first CentOS 7.3 cluster which will be the template for
upgrading our current CentOS 6.8 cluster. We use warewulf and boot nodes
statelessly from a single master image and the hurdles we've had to deal
with so far are:

1. No concept of "last" in systemd. Our configuration of the nodes gets
started by rc.local in CentOS 6 and to mimic that in CentOS 7 we had to
make a service that was dependent on a lot of things, then have it sleep
for  a minute or two before running. I fail to see why systemd can't be
made to understand "last" but *shrug*.

2. IB is now a load once and never unload thing. Previously we had our
configuration tool load IB dynamically so we could switch ofed without
rebooting. That becomes a little more complicated now but not a show
stopper.

3. rsyslog.conf needs some additional configuration to get all messages
forwarded to central syslog. I can't get to those from my phone but can add
later if you are interested.

4. The upgrade from 7.2 to 7.3 was a pretty big jump, maybe extra for us
because it coincided with some zfs changes which threw me for a moment. I'd
suggest starting with 7.3 if you don't have much invested in prior versions
yet.

5. As a rule we disable firewalld and install the old iptables service and
replace networkmanager with the old network stuff. I wish all Linux distros
would understand we don't run clusters of phones and laptops, but I think I
lost that argument a long time ago.

In hindsight it wasn't near as scary as it seemed to be. One bonus which I
hope to explore in the future is that a simple chroot with a few lines in
an init can pxe boot statelessly and systemd just generally does the Right
Thing(tm). This was a pleasant surprise and may end up with us writing our
own very simple provisioning tool around ipxe (dependent on Warewulf 4
direction/progress).

We've been using our CentOS 6 software stack in about 3 dozen heavily used
CentOS 7 workstations for about a year and have found very few and very
minor issues.

My advice would be to dive in. The sooner you do the sooner your brief
window of no-glibc-version problems starts :)

jbh

On Fri, Dec 30, 2016, 9:29 AM Andrew Mather <mathera at gmail.com> wrote:

> Thanks for this Lachlan and thanks for the reminder....
>
> Sorry, should also have mentioned, we use NFS for /usr/local, for /home
> and for our shared data area /group.  Our scratch is local and in most
> cases, jobs copy their datasets across before starting.  The time cost of
> that operation is really just a rounding error in the job runtimes in most
> cases.
>
> Cluster uses Torque/Moab for queue/resource management and scheduling.
>
> Andrew
>
> On Fri, Dec 30, 2016 at 5:17 PM, Lachlan Musicman <datakid at gmail.com>
> wrote:
>
> We use Centos 7.2 exclusively in our cluster (SLURM, 12 Nodes going up to
> 40 in the new year) and it works a treat. Same set up as you, but with some
> shared NFS mounts. Systemd is fine - a few more keystrokes, but not the end
> of the world.
>
> Very happy
>
> cheers
> L.
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
> On 30 December 2016 at 17:12, Andrew Mather <mathera at gmail.com> wrote:
>
> Hi All,
>
> Hope you're having/had time to relax and unwind with those near and dear.
>
> We are in the very early planning stages for our next cluster and I'm
> currently looking at the OS.  We're a CentOS shop and planning to stay that
> way for the forseeable future, so please, no partisan OS wars :)
>
> When v7 of the Redhat-based OS' appeared, the change to systemd in
> particular, seemed to attract a lot of hate, but since it's been out a
> while, there doesn't seem to be as much.
>
> So, has anyone got recent war-stories, good experiences etc to share about
> v7 of CentOS specifically as the OS for cluster nodes.
>
> We don't have infiniband interconnects and don't use MPI, shared memory
> and the like.  All our jobs stay within the confines of the nodes and we
> have a variety of hardware configurations to accommodate different types of
> job (RAM, disk requirements etc)
>
> I'd welcome any info.
>
> Thanks and hope 2017 is kind for you.
>
> Andrew
>
>
> --
> -
>  https://picasaweb.google.com/107747436224613508618
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "Voting is a lot like going to Bunnings really:
> You walk in confused, you stand in line, you have a sausage on the way out and
> at the end, you wind up with a bunch of useless tools"
> Joe Rios
> -
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>
>
>
> --
> -
>  https://picasaweb.google.com/107747436224613508618
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "Voting is a lot like going to Bunnings really:
> You walk in confused, you stand in line, you have a sausage on the way out and
> at the end, you wind up with a bunch of useless tools"
> Joe Rios
> -
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-- 
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
 - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20161230/6fc9daa7/attachment.html>