[Beowulf] Using Linux cgroups to enforce resources allocation limits?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kilian CAVALOTTI kilian at stanford.eduThu Apr 17 11:19:16 PDT 2008
- Previous message: [Beowulf] Thermals question
- Next message: [Beowulf] Opteron 235X: mobos & coolers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all, With the fresh release of the 2.6.25 kernel, Linux cgroups (fka process containers) are getting more attention. For those unfamiliar with the concept, Control Groups are "a generic framework where several 'resource controllers' can plug in and manage different resources of the system such as process scheduling or memory allocation. [They] also offer a unified user interface, based on a virtual filesystem where administrators can assign arbitrary resource constraints to a group of chosen tasks." 2.6.24 introduced a CPU bandwidth allocation controller, and today, 2.6.25 features a memory resource controller. Patches for network and block I/O bandwidth control have also been submitted. So it looks to me that everything is available to create real process containers, susceptible to hold individual users' jobs and to keep them inside defined limits. At kernel level. One of the limitations with current resource schedulers is the CPU usage limit enforcement on multi-core systems. On non-NUMA systems (hello Intel! :)), there's no mechanism to prevent a user submitting a job which asks for, say, one core on a 8-cores machine, to actually spawn 8 threads which will be spread over the 8 cores, and make exclusive use of all the machine's CPU resources. This would impact performance of other users' jobs in a sneaky way, and, as a rigid^Wrighteous sysadmin, I can't tolerate this. I was looking for the longest time for a way to "pin" a group of processes to a specific *number* of cores, and not to a specific list of cores (ie. I don't want to limit a process to run on cores 0 and 1, but rather say that I'd like this process to use at most 2 cores on the system). And it looks like cgroups would be a good candidate to achieve this. Another benefit would be the memory resources allocation. Our current scheduler, and it's probably the case for the others as well, enforces memory limitations by accounting for memory used by jobs every x minutes. So if a job has peak memory bursts, it can easily get unnoticed and continue to run, although it may already have either triggered the OOM killer, or prevented another process' memory allocation. If the enforcement is made at kernel level, I assume that it will be in real-time, and that this kind of problem would be avoided. I yet have to try implementing cgroups and see if they could be used in an HPC environment to enforce reliable resources allocation limits, but I was wondering if anybody tried this already, especially the integration with existing schedulers, or if anyone had ideas on the subject. Thanks, -- Kilian
- Previous message: [Beowulf] Thermals question
- Next message: [Beowulf] Opteron 235X: mobos & coolers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
