[Beowulf] User resource limits

Prentice Bisbal prentice at ias.edu
Mon Jun 9 08:41:29 PDT 2008


This topic is slightly off topic, since it's not a beowulf specific
problem, but it is HPC-related:

I have several fat servers with 4 cores and 32 GB of RAM, for jobs that
aren't very parallel and need large amounts of RAM. They are not
clustered in any way. At the moment, users ssh into these systems to run
large jobs. Eventually, I will have these nodes managed by a queuing
system.

The problem: Every couple of days, one of these systems become
unresponsive due to OOM errors. If we wait long enough, the offending
job will complete, and everything will return to normal. Since these are
multi-user shared resources, I don't have the luxury of waiting for the
systems to clear themselves up, and I often have to hit the power button.


I would like to impose some CPU and memory limits on users that are hard
limits that can't be changed/overridden by the users. What is the best
way to do this? All I know is environment variables or shell commands
done as the user (ulimit, for example).


-- 
Prentice



More information about the Beowulf mailing list