Christopher Samuel samuel at unimelb.edu.au
Wed Nov 27 16:53:55 PST 2013

On 27/11/13 23:01, Peter Clapham wrote:

> In the bio-informatics arena the local software half life is 
> approximately 6-12 months.

I feel your pain..

> This, along with the wide range of applications in use rapidly
> creates an environment where users can cross link or pick up
> binaries or libraries that they weren't expecting. Rolling
> containers with predefined environments would not only potentially
> alleviate these potential pitfalls BUT they could provide an
> environment in which data can be re-analysed at a future date in
> against the same pre-defined environment.

This whole issue of provenance and reproducibility is a big one, and
hopefully this will help with that.

> So in short I would be very surprised if we are not running
> something along exactly these lines in the (hopefully) very near
> future. If there is the interest we'd be happy to pass on our war
> stories / experiences along the way.

I'd be very interested to hear about it.

> If anyone has similarly prodded the world of HPC and cgroups we'd
> be very interested in hearing how you get on.

I was one of the main agitators for cpuset support in Torque to
constrain jobs to just the cpus requested, back around 2006/2007.

Now we've migrated to Slurm we use its cgroups support to the same
effect and it works nicely.   I'm looking forward to the 14.03 release
which will include obtaining stats about the job from cgroups too.

The one drawback of cgroups that I can see relates to memory
enforcement.  If you enable it (we don't at present, we just get Slurm
to set RLIMIT_AS for processes) then when the kernel realises you've
hit the memory limit for your control group it will invoke the OOM
killer on that cgroup rather than just making the memory request fail.

 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

