[Beowulf] Users abusing screen
reuti at staff.uni-marburg.de
Fri Oct 21 08:24:32 PDT 2011
Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:
> I have a question that isn't directly related to clusters, but I suspect
> it's an issue many of you are dealing with are dealt with: users using
> the screen command to stay logged in on systems and running long jobs
> that they forget about. Have any of you experienced this, and how did
> you deal with it?
> Here's my scenario:
> In addition to my cluster, we have a bunch of "computer servers" where
> users can run the programs. These are "large" boxes with more cores
> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
> desktop top.
> Periodically, when I have to shutdown/reboot a system for maintenance,
> I find a LOT of shells being run through the screen command for users
> who aren't logged in. The majority are idle shells, but many are running
> jobs, that seem to be forgotten about. For example, I recently found
> some jobs running since July or August that were running under the
> account of someone who hasn't even been here for months!
> My opinion is these these are shared resources, and if you aren't
> interactively using them, you should log out to free up resources for
> others. If you have a job that can be run non-interactively, you should
> submit it to the cluster.
> Has anyone else here dealt with the problem?
> I would like to remove screen from my environment entirely to prevent
> this. My fellow sysadmins here agree. I'm expecting massive backlash
> from the users.
I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).
For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).
More information about the Beowulf