[Beowulf] Users abusing screen

Prentice Bisbal prentice at ias.edu
Fri Oct 21 07:44:27 PDT 2011



On 10/21/2011 09:44 AM, Henning Fehrmann wrote:
> Hi Prentice,
> 
> On Fri, Oct 21, 2011 at 09:10:18AM -0400, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I wouldn't deinstall screen. It is a useful tool for many things and
> there are alternatives doing the same.  Instead one could enforce a
> maximum CPU time a job can take by setting ulimits.
> 
> Have you thought about queueing systems like condor or SGE? 

Yes, I have cluster that uses SGE, and we allow users to run serial jobs
(non-MPI, etc.) there, so there is no need for them to use screen to
execute long-running jobs. Hence my frustration.

Prentice



More information about the Beowulf mailing list