[Beowulf] Docker vs KVM paper by IBM

Wed Jan 28 17:09:20 PST 2015

On Wed, Jan 28, 2015 at 11:00 AM, Gavin W. Burris <bug at wharton.upenn.edu> wrote:
> I guess I would have to ask a few questions of the developer considering
> docker...  WHY do you need to be outside of a self-contained directory?

Given that this is mostly an HPC crowd, this answer may not be 100%
relevant, but I'll try to answer anyway and throw in a few opinions
along the way.

I'm finding that more and more developers and open source projects
have a heavy dependency on other services, libraries, and code.  Ask
any top Python programmer how to start a new project and it will start
with "virtualenv"--basically a chroot type of solution for Python,
including different Pythons and their libs/packages.

I'm sure Ruby and node.js has something similar.

If the app uses an Flask or Unicorn and needs to be frontended with a
web server, then you have the complexity of supporting many other
components and trying to get them to play together nicely.  Then
there's the databases and trying to maintain all the different table
spaces and security, etc...

It's not an impossible problem, sys admins have been dealing with this
for a very long time.  The challenge is that the number of
environments to deploy applications has exploded.  So your admins have
to know everything or limit what the users can develop.  In my DevOps
env. I have to deal with Ruby, Python, node.js, and Java.  Each may
require different versions.

VMs solve a lot of that problem, however at a greater cost.  VM's
usually have a static memory foot print (esp. in the cloud).  It's
possible to have 90% of your memory assigned to VMs, but not used by
the applications in the VMs.  Containers are just processes that use
what they need (and can be limited).  In my own experimentation I've
been able to reduce 20 1GB VMs running 20 services into 20 containers
on a single 8GB VM.  4GB of my RAM is still unused.  Sure I could also
spend 100s of hours getting all 20 services to play nice on a single
OS, but one problem with one service can take down the others.  I also
have different admins assigned to different services.  With containers
I have them fenced off.

Because I have to pay for my VMs in the cloud, using containers has
measurably reduced my cost.

Other benefits has been time.  I do not have to figure out how to get
Flask and Unicorn to play nice with Nginx.  When I need a new gitlab
instance I just create a new set of three containers linked together
and it does not impact my other instances.

Containers are not perfect, but for me they reduce my costs and
complexity while saving me a lot of time and hassle.  Every container
or cluster of containers is a single app.  Makes life really easy.
Yes you can do with VMs, but for me, it's too costly.

Lastly for developer productivity I use Docker on my Mac.  It's really
a VirtualBox VM with Linux with Docker installed.  I've got about 3-5
containers running various services and tools that I will use later in
upgrading my production environment.  I've tried that in VMs before.
It was slow, painful, and not as easy to automate.  Docker is stupid
simple to use with it's Python APIs.  You can learn it in 5 min.

Anyway to directly answer your question.  Containers is how I put
complexity into a self-contained directory with no limitations.

Oh, let me close with, developers like to bring their own stack.  It's
not uncommon.  In 2003-2004 I worked on the TeraGrid.  Every week all
four of the original sites got on the phone and debated the SW stacks.
Only if they were the same could applications run across the grid.
That inspired me to explore stateless provisioning.  In 2005 I worked
with Adaptive computing and we got Moab talking to xCAT so that we
could provision any stateless OS/stack on demand on bare-metal.
Bring-your-own-stack.  We call that cloud now.  There was demand for
it then as there is now.  Containers makes this really easy for both
the admin and the developer.  The admin can provide some constraints
(it's not the free for all with VMs and BM where you the developer
have to provide an entire OS image), and the developers get a bit a
structure, but the freedom to be as lazy and dumb as they want to so
that they can get results faster.  And the admin does not have to be
bothered with setting up libs, chroot, modules, etc...  And if the
admin has to provide a base, well Docker supports that to and you just
put in a registry.

If you are a goal oriented admin/developer, then containers are your
friends. :-)

Cheers,

Egan