[Beowulf] Docker vs KVM paper by IBM

Thu Jan 29 03:17:08 PST 2015

It seems that in environments where you don't care about security then
docker is a great enabler so that scientists can make any kind of mess in a
sandbox type environment and no one cares because your not on a public
facing network. There are however difficulties in using docker with mpi so
its probably not suitable for clustering yet.

For people requiring security then docker does not provide any added
protection or containerisation and, in fact could be an impediment to
building secure systems. Redhat have done some interesting work here by
integrating Docker with SELinux so that each container runs in its own
SELinux domain.

On 29 January 2015 at 02:10, Joe Landman <landman at scalableinformatics.com>
wrote:

>
> On 01/28/2015 07:02 PM, Christopher Samuel wrote:
>
>> On 29/01/15 05:32, Joe Landman wrote:
>>
>>  Docker/VMs allow you to package your app, once, and be done with it.
>>> New app, new package.  Packaging can be done programmatically.
>>>
>> This is the "appification" of HPC, à la mobile phones.
>>
>> It brings with it the same issue that has dogged the distro maintainers
>> with things like the Chrome browser that bundles a whole heap of system
>> libraries - when a critical vulnerability appears in a library, how do
>> you make sure everything is fixed?
>>
>
> This is an issue, but its an issue with distros as well.  If you have a
> build process for your containers, slotting in the patches from the mailing
> lists shouldn't be hard.  If you have a distro, you have to wait for the
> official package from the distro maintainer.
>
>
>> You can upgrade your base OS, but when a vulnerable container is run
>> again then that vulnerability is back.  Yes, theoretically[1] the impact
>> is constrained to the container (and any filesystems that it can see),
>> but if that job is running for a few weeks (or longer) that's a few
>> weeks (or longer) of vulnerability exposure you've got.
>>
>
> While true as docker containers are effectively immutable, you only expose
> the ports you need exposed from the containers.  Since the containers are
> immutable, you can't "infect" the storage, only the memory.  But ... as
> with everything, you get nothing for free and there are engineering choices
> to be made for this.
>
> See this: http://devops.com/blogs/docker-leaving-immutable-
> infrastructure-2/
>
> For long running apps, the concept of a state-saving and graceful
> shutdown/restart is pretty important.  There have been attempts at this via
> checkpointing apps that snapshot state to disk, that you can resume later.
> This concept won't work well (that is, binary state) for containers whose
> core parts change between checkpoint and restart.  It will work fine if you
> can provide a restart type mode and save state so as to resume from a
> specific point without forcing all the binary state preservation.
>
>
>> It also means rather than needing the space for a distro plus all the
>> HPC stack you put on there you end up with a copy of the distro and HPC
>> stack for each container.  Plus each version of the container.
>>
>
> For a centralized storage of containers, this isn't so much of an issue.
> For a distro, it does mean you can pare down un-needed packages.  For the
> HPC stack, it means you can keep much of it lean and mean.
>
>
>> I prefer the (now defunct I believe) Globus Workspace idea of having a
>> job that requests an OS version, a stack version and then supplies its
>> own config info and GW built the VM for you from that, and threw the VM
>> (but hopefully not your results) away at the end.
>>
>
> Way way back in the day I wrote this little tool called DragonFly (see
> https://git.scalableinformatics.com/?p=DragonFlyCloud.git;a=summary and
> I've not moved it over to gitlab/github yet).  See
> https://scalability.org/?p=425 and https://scalability.org/?p=482 . It
> did this (albeit on a fixed configuration cluster).  It generated the run
> script from an xml description of the run, on the local hardware.  I had it
> do data transfers, and all manner of interesting things.  I could have
> easily added in the OS spec as well had I an ability to make the OS version
> a detail of the job (which, curiously, I do now).
>
>
>> But in the end it's a balance of risk decision, weighing security needs
>> and exposure risk against predictability, the reproducibility of results
>> and provenance.
>>
>
> Most HPC codes aren't a major risk.  Though as we get into big data of
> unknown provenance, the IoT and IoH (internet of hacks) this is more of an
> issue.  Immutable containers are a good start in some ways.  Restartable
> apps are another good thing.  Apps that provide a mechanism to restart by
> providing enough state to make this easy is something beneficial.
>
>
>  Need a version of a library your sysadmin has told you will never be
>>> allowed on
>>> the system because its not distro approved?  Fine, container/VM-ize it.
>>>
>> I strongly suspect if the sysadmin won't install a library for you then
>> you're probably going to have a much harder time getting them to
>> consider containers. :-)
>>
>> I've got to say I'm puzzled by the assumption that these things need to
>> affect the base OS.  We install the libraries, packages and other tools
>> that users want under /usr/local/$name/$version-$compiler and just use
>> modules to manage that.  It's visible across the whole cluster.
>>
>> Want a different version of GCC?  We've got 4 different versions in
>> addition to the system GCC. If it's not there ask for it.  We've also
>> got Intel v13, v14 and v15 installed for users to pick and choose from.
>> Perl, Python? Check, one of the former, three of the latter.
>>
>> We only have Open-MPI, but if people really needed MVAPICH or MPICH we
>> could get it but so far we've not been asked for it.  COMSOL uses
>> Intel MPI happily enough.
>>
>> Our oldest Intel cluster has 398 modules available, our newer Intel
>> cluster has 293 modules and our BG/Q has 223.
>>
>> That's counted via the slightly clumsy way of:
>>
>> module avail 2>&1 | fgrep -v -- '---' | fgrep / |  sed -re 's/\s+/\n/' |
>> wc -l
>>
>>  Only your code/environment is at "risk."  Need to install HP-MPI (c'mon,
>>> we've all run into vendor apps that were built against one very esoteric
>>> version of the library ... ) to run your code?  Sure, do it in a
>>> container.
>>>
>> AFAIK Docker containers don't (by default) get access to devices, so
>> you'd need to negotiate with the sysadmin to agree to map things like
>> IB devices mapped into it.
>>
>> Not impossible, but not available out of the box I suspect.
>>
>> All the best,
>> Chris
>>
>> [1] - whenever I use the term theoretical in relation to security I can't
>> help but remember Mudge et. al - "L0pht Heavy Industries. Making the
>> theoretical practical since 1992".
>>
>>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics, Inc.
> e: landman at scalableinformatics.com
> w: http://scalableinformatics.com
> t: @scalableinfo
> p: +1 734 786 8423 x121
> c: +1 734 612 4615
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150129/91338284/attachment.html>