[Beowulf] Docker vs KVM paper by IBM

Thu Jan 29 06:15:37 PST 2015

On 05:09PM Wed 01/28/15 -0800, Egan Ford wrote:
> On Wed, Jan 28, 2015 at 11:00 AM, Gavin W. Burris <bug at wharton.upenn.edu> wrote:
> > I guess I would have to ask a few questions of the developer considering
> > docker...  WHY do you need to be outside of a self-contained directory?
> 
> Given that this is mostly an HPC crowd, this answer may not be 100%
> relevant, but I'll try to answer anyway and throw in a few opinions
> along the way.
> 
> I'm finding that more and more developers and open source projects
> have a heavy dependency on other services, libraries, and code.  Ask
> any top Python programmer how to start a new project and it will start
> with "virtualenv"--basically a chroot type of solution for Python,
> including different Pythons and their libs/packages.

Yes, good stuff there.  We have largely "standardized" on Python to
minimize the ground we have to cover, and virtualenv was the stuff.
Developers can install any-and-every module at any version per git repo
/ project.  This is still a headache, though, when an application makes
the transition from dev to prod.  There is no hard answer about how to
best keep things patched and updated in a production environment if devs
can go crazy with a dozen previously unseen modules.  It is all in a
contained directory, but it still has the patching / updating, without
breaking, problem.  Instead of one central module to maintain, there are
virtualenvs all over the place.  I would say the goal is to centralize
those env dependencies on production.  BUT, if it is forever
research/dev code, go crazy, in your own contained world.  This seems to
be the promise of Docker.

> 
> I'm sure Ruby and node.js has something similar.
> 
> If the app uses an Flask or Unicorn and needs to be frontended with a
> web server, then you have the complexity of supporting many other
> components and trying to get them to play together nicely.  Then
> there's the databases and trying to maintain all the different table
> spaces and security, etc...
> 
> It's not an impossible problem, sys admins have been dealing with this
> for a very long time.  The challenge is that the number of
> environments to deploy applications has exploded.  So your admins have
> to know everything or limit what the users can develop.  In my DevOps
> env. I have to deal with Ruby, Python, node.js, and Java.  Each may
> require different versions.

The researcher/dev vs admin/ops dynamic is definitely at play here.  My
stance is still that devs should try to target known modules, and admins
should be flexible to support additional ones with a reasonable and
generous time commitment.  This should be true of all apps, not just
Python modules.

> 
> VMs solve a lot of that problem, however at a greater cost.  VM's
> usually have a static memory foot print (esp. in the cloud).  It's
> possible to have 90% of your memory assigned to VMs, but not used by
> the applications in the VMs.  Containers are just processes that use
> what they need (and can be limited).  In my own experimentation I've
> been able to reduce 20 1GB VMs running 20 services into 20 containers
> on a single 8GB VM.  4GB of my RAM is still unused.  Sure I could also
> spend 100s of hours getting all 20 services to play nice on a single
> OS, but one problem with one service can take down the others.  I also
> have different admins assigned to different services.  With containers
> I have them fenced off.

Thin provisioning goes a long way here for CPU, memory AND storage.
We've been pretty happy thin provisioning our VMs and our NFS shares.

> 
> Because I have to pay for my VMs in the cloud, using containers has
> measurably reduced my cost.
> 
> Other benefits has been time.  I do not have to figure out how to get
> Flask and Unicorn to play nice with Nginx.  When I need a new gitlab
> instance I just create a new set of three containers linked together
> and it does not impact my other instances.
> 
> Containers are not perfect, but for me they reduce my costs and
> complexity while saving me a lot of time and hassle.  Every container
> or cluster of containers is a single app.  Makes life really easy.
> Yes you can do with VMs, but for me, it's too costly.
> 
> Lastly for developer productivity I use Docker on my Mac.  It's really
> a VirtualBox VM with Linux with Docker installed.  I've got about 3-5
> containers running various services and tools that I will use later in
> upgrading my production environment.  I've tried that in VMs before.
> It was slow, painful, and not as easy to automate.  Docker is stupid
> simple to use with it's Python APIs.  You can learn it in 5 min.
> 
> Anyway to directly answer your question.  Containers is how I put
> complexity into a self-contained directory with no limitations.

Docker has been our go-to solution for reproducibility of dev
environments, with virtualenvs inside.  Will have to give containers a
hard look in this area, too.  Thanks.

> 
> Oh, let me close with, developers like to bring their own stack.  It's
> not uncommon.  In 2003-2004 I worked on the TeraGrid.  Every week all
> four of the original sites got on the phone and debated the SW stacks.
> Only if they were the same could applications run across the grid.
> That inspired me to explore stateless provisioning.  In 2005 I worked
> with Adaptive computing and we got Moab talking to xCAT so that we
> could provision any stateless OS/stack on demand on bare-metal.
> Bring-your-own-stack.  We call that cloud now.  There was demand for
> it then as there is now.  Containers makes this really easy for both
> the admin and the developer.  The admin can provide some constraints
> (it's not the free for all with VMs and BM where you the developer
> have to provide an entire OS image), and the developers get a bit a
> structure, but the freedom to be as lazy and dumb as they want to so
> that they can get results faster.  And the admin does not have to be
> bothered with setting up libs, chroot, modules, etc...  And if the
> admin has to provide a base, well Docker supports that to and you just
> put in a registry.

I think this is where I start getting anxious, opening the doors to
support any OS with any stack.  I would much rather push it the other
way.  The language environment should be cross-platform and well
supported, so that production can support one OS well.  My inclination
is to Keep It Simple Stupid, and not add additional layers of
complexity.

> 
> If you are a goal oriented admin/developer, then containers are your
> friends. :-)

Noted.  Stop making SENSE, Egan.

> 
> Cheers,
> 
> Egan

Cheers.

-- 
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania