[Beowulf] Docker in HPC

Peter Clapham pc7 at sanger.ac.uk
Wed Nov 27 04:01:20 PST 2013


On 27/11/13 11:45, John Hearns wrote:
> Here is a Register article, which covers the same ground as Joe's post:
> http://www.theregister.co.uk/2013/11/26/docker_spreads_to_more_linux_distros/
> " For instance, Docker could be used to run a database in one 
> container and an app server in another, and the configurable isolation 
> properties"
> So can we think of batch schedulers which woudl reserve parts of big 
> NUMA machines, and run docker containers on them?
> Also fromthe blog, Offline Transfer:
> "The exported bundles are regular directories, and can be transported 
> by any file transfer mechanism, included ftp, physical media, 
> proprietary installers, etc
> . This feature is particulary interesting for software vendors who 
> need to ship their software as sealed appliances to their “enterprise” 
> customers.
> Using offline transfer, they can use docker containers as the delivery 
> mechanism for software updates"
> That is really interesting.
> Can we forsee users running on in-house clusters with Docker 
> containers, which may be commercial applications delivered 
> pre-packaged by an ISV,
> or locally developed?
> Then when they need more capacity in short timescales just exporting 
> those containers to run on a cloud (let's say AWS ) and be confident 
> they will run in the same way?
>
This is something that is being strongly considered in house. As we are 
increasingly being exposed to restricted data sets the security model is 
very compelling. There is also a secondary and possibly equally 
important aspect for our users.

In the bio-informatics arena the local software half life is 
approximately 6-12 months. This, along with the wide range of 
applications in use rapidly creates an environment where users can cross 
link or pick up binaries or libraries that they weren't expecting. 
Rolling containers with predefined environments would not only 
potentially alleviate these potential pitfalls BUT they could provide an 
environment in which data can be re-analysed at a future date in against 
the same pre-defined environment.

So in short I would be very surprised if we are not running something 
along exactly these lines in the (hopefully) very near future. If there 
is the interest we'd be happy to pass on our war stories / experiences 
along the way.

Commercial options also exist...
As an aside for those who pay to use IBMs Platform LSF they have had an 
integrated CGROUP environment for a while now. They also provide various 
supported options for managing such instances in their portfolio. So far 
we have only investigated integrating the lsf CGROUPS within lsf and 
whilst we have found some interesting features / bugs, the patches 
provided and early results are very promising.

If anyone has similarly prodded the world of HPC and cgroups we'd be 
very interested in hearing how you get on.

Pete


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Beowulf mailing list