[Beowulf] Docker vs KVM paper by IBM
landman at scalableinformatics.com
Wed Jan 28 17:10:33 PST 2015
On 01/28/2015 07:02 PM, Christopher Samuel wrote:
> On 29/01/15 05:32, Joe Landman wrote:
>> Docker/VMs allow you to package your app, once, and be done with it.
>> New app, new package. Packaging can be done programmatically.
> This is the "appification" of HPC, à la mobile phones.
> It brings with it the same issue that has dogged the distro maintainers
> with things like the Chrome browser that bundles a whole heap of system
> libraries - when a critical vulnerability appears in a library, how do
> you make sure everything is fixed?
This is an issue, but its an issue with distros as well. If you have a
build process for your containers, slotting in the patches from the
mailing lists shouldn't be hard. If you have a distro, you have to wait
for the official package from the distro maintainer.
> You can upgrade your base OS, but when a vulnerable container is run
> again then that vulnerability is back. Yes, theoretically the impact
> is constrained to the container (and any filesystems that it can see),
> but if that job is running for a few weeks (or longer) that's a few
> weeks (or longer) of vulnerability exposure you've got.
While true as docker containers are effectively immutable, you only
expose the ports you need exposed from the containers. Since the
containers are immutable, you can't "infect" the storage, only the
memory. But ... as with everything, you get nothing for free and there
are engineering choices to be made for this.
For long running apps, the concept of a state-saving and graceful
shutdown/restart is pretty important. There have been attempts at this
via checkpointing apps that snapshot state to disk, that you can resume
later. This concept won't work well (that is, binary state) for
containers whose core parts change between checkpoint and restart. It
will work fine if you can provide a restart type mode and save state so
as to resume from a specific point without forcing all the binary state
> It also means rather than needing the space for a distro plus all the
> HPC stack you put on there you end up with a copy of the distro and HPC
> stack for each container. Plus each version of the container.
For a centralized storage of containers, this isn't so much of an
issue. For a distro, it does mean you can pare down un-needed
packages. For the HPC stack, it means you can keep much of it lean and
> I prefer the (now defunct I believe) Globus Workspace idea of having a
> job that requests an OS version, a stack version and then supplies its
> own config info and GW built the VM for you from that, and threw the VM
> (but hopefully not your results) away at the end.
Way way back in the day I wrote this little tool called DragonFly (see
I've not moved it over to gitlab/github yet). See
https://scalability.org/?p=425 and https://scalability.org/?p=482 . It
did this (albeit on a fixed configuration cluster). It generated the
run script from an xml description of the run, on the local hardware. I
had it do data transfers, and all manner of interesting things. I could
have easily added in the OS spec as well had I an ability to make the OS
version a detail of the job (which, curiously, I do now).
> But in the end it's a balance of risk decision, weighing security needs
> and exposure risk against predictability, the reproducibility of results
> and provenance.
Most HPC codes aren't a major risk. Though as we get into big data of
unknown provenance, the IoT and IoH (internet of hacks) this is more of
an issue. Immutable containers are a good start in some ways.
Restartable apps are another good thing. Apps that provide a mechanism
to restart by providing enough state to make this easy is something
>> Need a version of a library your sysadmin has told you will never be allowed on
>> the system because its not distro approved? Fine, container/VM-ize it.
> I strongly suspect if the sysadmin won't install a library for you then
> you're probably going to have a much harder time getting them to
> consider containers. :-)
> I've got to say I'm puzzled by the assumption that these things need to
> affect the base OS. We install the libraries, packages and other tools
> that users want under /usr/local/$name/$version-$compiler and just use
> modules to manage that. It's visible across the whole cluster.
> Want a different version of GCC? We've got 4 different versions in
> addition to the system GCC. If it's not there ask for it. We've also
> got Intel v13, v14 and v15 installed for users to pick and choose from.
> Perl, Python? Check, one of the former, three of the latter.
> We only have Open-MPI, but if people really needed MVAPICH or MPICH we
> could get it but so far we've not been asked for it. COMSOL uses
> Intel MPI happily enough.
> Our oldest Intel cluster has 398 modules available, our newer Intel
> cluster has 293 modules and our BG/Q has 223.
> That's counted via the slightly clumsy way of:
> module avail 2>&1 | fgrep -v -- '---' | fgrep / | sed -re 's/\s+/\n/' | wc -l
>> Only your code/environment is at "risk." Need to install HP-MPI (c'mon,
>> we've all run into vendor apps that were built against one very esoteric
>> version of the library ... ) to run your code? Sure, do it in a container.
> AFAIK Docker containers don't (by default) get access to devices, so
> you'd need to negotiate with the sysadmin to agree to map things like
> IB devices mapped into it.
> Not impossible, but not available out of the box I suspect.
> All the best,
>  - whenever I use the term theoretical in relation to security I can't
> help but remember Mudge et. al - "L0pht Heavy Industries. Making the
> theoretical practical since 1992".
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
p: +1 734 786 8423 x121
c: +1 734 612 4615
More information about the Beowulf