[Beowulf] first cluster

Reuti reuti at staff.uni-marburg.de
Mon Jul 19 03:46:03 PDT 2010


Am 19.07.2010 um 10:54 schrieb Tim Cutts:

> 
> On 16 Jul 2010, at 6:11 pm, Douglas Guptill wrote:
> 
>> On Fri, Jul 16, 2010 at 12:51:49PM -0400, Steve Crusan wrote:
>>> We use a PAM module (pam_torque) to stop this behavior. Basically, if you
>>> your job isn't currently running on a node, you cannot SSH into a node.
>>> 
>>> 
>>> http://www.rpmfind.net/linux/rpm2html/search.php?query=torque-pam
>>> 
>>> That way one is required to use the queuing system for jobs, so the cluster
>>> isn't like the wild wild west...
>> 
>> Ah Ha!.  The key.
> 
> It's a very neat idea, but it has the disadvantage - unless I'm misunderstanding - that if the job fails, and leaves droppings in, say, /tmp on the cluster node, the user can't log in to diagnose things or clean up after themselves.

Yep. With GridEngine the $TMPDIR will be removed automatically, at least when the user honors the variable. I disable ssh and rsh in my clusters except for admin staff. Normal users can use an interactive job in SGE, which is limited to a cpu time of 60 sec., if they really want to peak on the nodes.

-- Reuti

> 
> Tim
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list