[Beowulf] RIP CentOS 8

Kilian Cavalotti kilian.cavalotti.work at gmail.com
Sat Dec 12 01:12:00 UTC 2020


On Fri, Dec 11, 2020 at 10:57 AM Douglas Eadline <deadline at eadline.org> wrote:
> Second, and most importantly, CentOS will not matter to HPC.
> (and maybe other sectors as well) Distributions will become
> second class citizens to containers.  All that is needed is a
> base OS to run the container (think Singularity)

As much as I agree that containers can, do and will continue to solve
problems for a lot of things in user-space, because they're just
another process, they fall short for anything related to kernel-space.
because they're not VMs,. they depend on the kernel of the host
they're running on, and all of its drivers stack. And HPC, being
focused on performance, is still a lot about kernel-space: think
parallel file systems, interconnects, network drivers, GPU drivers,
etc. Or even MPI implementations to some extent.

It only takes a quick look at the entry point of containers that some
vendors provide, to realize that making them work in multiple
environments is not that far from supporting applications on multiple
distributions in the first place: detecting what versions of what
drivers are present on the host to select matching libraries in the
container is anything but straightforward or efficient.

Anything that interacts with a GPU or an interconnect adapter will
have specific requirements about drivers and system-level things (like
kernel parameters) where containers will be of little to no help. If
you want to make sure that your container works on any version of the
OFED stack, GPU driver versions and kernel versions, and still provide
decent performance, you'll probably need a massive amount of extra
work to support the multitude possible host-side configurations.

> Years ago in the early days of Warewwulf, Greg Kurtzer
> (Warewulf/Singularity) talked about the idea of bundling the
> essential/minimal OS and libraries with applications in custom
> Warewulf VNFS image. The scheduler would then boot the application
> image -- everything works.

This is the same approach that many HPC sites have taken over the
years, to decouple system-level software from user-level software as
much as possible: deploy compute nodes with a bare minimal OS
installation (to cover kernel, drivers and low-level hardware-related
stacks), and provide the user-level software (scientific applications)
as modules, over NFS and the like, independently of the OS
distribution.

> An open source project will release a container that "contains"
> everything thing it needs to run (along with the container recipe)
> Using Singularity you can also sign the container to assure
> provenance of the code. The scheduler runs containers. Simple.

Provided it can access the hardware resources it needs, yes. :)

> The need to maintain library version trees and Modules for
> goes away, Of course if are developer writing your own application,
> you need specific libraries, but not system wide. Build the
> application in your working directly, include any specific libraries
> you need in the local source tree and fold it all into a container.

That's also a domain where containers only go part of the way:
containerized applications are fine as long as they are autonomous and
are happy being alone in their own world. But as soon as they need to
interact with another application (hello DL frameworks), things get
more complicated: what version of that container do you need? Will it
work with your own container? Will you end up building yet another
container that bundles both applications? What if there's a 3rd
element in the workflow?

Suddenly, it's back to selecting application versions and making them
work together. A little bit like a... distribution? :)

> Bottom line, it is all good, we are moving on.

Things are changing for sure. But the devil is in the details.

Cheers,
--
Kilian


More information about the Beowulf mailing list