[Beowulf] first cluster

Gus Correa gus at ldeo.columbia.edu
Fri Jul 16 10:50:04 PDT 2010


Chris Dagdigian wrote:
> You want the honest answer?
> 
> There are technical things you can do to to prevent users from bypassing 
> the scheduler and resource allocation policies. One of the cooler things 
> I've seen in Grid Engine environments was a cron job that did a "kill 
> -9" against any user process that was not a child of a sge_shepherd 
> daemon. Very effective.
> 
> Other people play games with pam settings and the like.
> 
> The honest truth is that technical countermeasures are mostly a waste of 
> time. A motivated user always has more time and effort to spend trying 
> to game the system than an overworked administrator.
> 
> My recommendation is to subject users to a cluster acceptable use 
> policy. Any abuses of the policy are treated as a teamwork and human 
> resources issue. The first time you screw up you get a warning, the 
> second time you get caught I'll send a note to your manager. After that 
> any abuses are treated with a loss of cluster access and a referral to 
> human resources for further action.
> 
> Simply put -- you don't have enough time in the day to deal with users 
> who want to game/abuse the system. It's far easier for all concerned to 
> have everyone agree on a fair use policy and treat any infractions via 
> management rather than cluster settings.
> 
> This is another reason why having a cluster governance body helps a lot. 
> A committee of cluster power users and IT staff is a great way to get 
> consensus on queue setup, cluster policies, disk quotas and the like. 
> They can also come down hard with peer pressure on pissy users.
> 
> my $.02
> 
> -Chris
> 
> 
Hi Chris, Douglas, list

Very wise words, and match my experience here,
particularly to have a small cluster committee
to share the responsibility of policies and their enforcement.

As Chris said, this is a not a technical issue, this is about hacking.
Resource managers rely on ssh, you can tweak with it, with IP tables,
with pam, launch cron jobs to kill recalcitrant behavior, etc,
to prevent some stuff, but there will always be a back door
to be found by those so inclined.
Also, too many restrictions on the technical side
may become hurdles to legitimate use of the cluster.

Since Douglas is in an university, I would suggest also
that when you set up new accounts,
have the user agree to the general IT policies of your
university.  Or, as I do, just send an email to the new user
telling the account is up and adding something like this:

"By accepting this account you are automatically
agreeing with the general IT regulations of our university, which I
encourage you to read at http://your.univ.it.regulations ,
and to abide by any other policies established by the cluster
committee and system administrators."

It is a bit like that lovely paradigm of Realpolitik:

"Speak softly and carry a big stick ..."   (Teddy Roosevelt)  :)

Gus Correa

> 
> Douglas Guptill wrote:
>> How does the presence of a job scheduler interact with the ability of 
>> a user to
>>    ssh to<head>,
>>    ssh to<compute-node-n>, and then type
>>    mpirun -np 64 my_application
>>
>> Intuition tells me there has to be something in a cluster setup, when
>> it has a scheduler, that prevents a user from circumventing the
>> scheduler by doing something like the above.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list