[Beowulf] question about enforcement of scheduler use

Mon May 22 14:15:24 PDT 2006

Sorry for not having specifics related to PBS, I'm usually using Grid  
Engine or LSF for this type of work.

I can give you one piece of advice which I've learned the hard way  
and have tested in several different deployments ...

In short, technical fixes or "sysadmin" approaches to mandating the  
use of a scheduler will never work in the long run. All you do is end  
up kicking off a technological arms race with your more savvy users.  
An upset user looking to game the system is always going to have far  
more time and motivation than an overworked cluster admin so  
generally it becomes a losing battle.

I've repeatedly found that is is far better in the long run to make  
the scheduler system (and proper use of it) a policy matter. Clear  
acceptable use policies need to be drafted with user input and  
clearly communicated to everyone.  After that, users who attempt to  
bypass or game the system are referred to their manager. A 2nd  
attempt to bypass the system gets reported up higher and a third  
attempt results in the loss of cluster login access and a possible  
referral to the HR department.

That said though, I work in commercial environments where scheduler  
policies are in place to enforce fairshare-by-user or are used to  
prioritize cluster resources according to very specific business,  
scientific or research goals. In those settings it is very easy to  
point out costs of dealing with users who repeatedly bypass the system.

Going back to the technical side .. One trick that I've seen done  
with grid engine takes advantage of the fact that all Grid Engine  
launched cluster tasks are all going to be a child process of a  
sge_shepherd daemon.  I've seen clusters where there was a recurring  
cron script that would search out and "kill -9" any user process that  
was not a child of a sge_shepherd. The end result was that nobody  
could run a job on a node unless it was under the control of the  
scheduler.   If PBS has the same sort of setup and you could discern  
between pbs-managed jobs and non-pbs-managed tasks then a similar  
approach could be taken.

My $.02

-Chris

On May 22, 2006, at 8:45 AM, Larry Felton Johnson wrote:

>
> My apologies in advance if this is a FAQ, but I'm reading through the
> documentation and tinkering with the problem below simultaneously, and
> would appreciate  help at least focussing the problem and avoiding
> going down useless paths (given my relative inexperience with  
> clusters).
>
> I'm primarily a solaris sysadmin (and a somewhat specialized one at
> that).  I've been given the task of administering a cluster (40 nodes
> + head) put together by atipa, and have been scrambling to come up to
> speed on Linux on the one hand and the cluster-specific software and
> config files on the other.
>
> I was asked by the folks in charge of working with the end users to
> help migrate to enforcement of the use of a scheduler (in our case
> PBSpro).  In preparation for this I was asked to isolate four nodes
> and make those nodes only accessable to end users via PBSpro.
>
> The most promising means I found in my searches was the one used
> by Dr. Weisz, of modifying the PAM environment, limits.conf, and the
> PBS prologue and epilogue files.  I found his document describing the
> approach, but have not found his original prologue and epilogue  
> scripts.
>
> However, I wrote prologue and epilogue scripts that did what he  
> decribed
> (wrote a line of the form "${USER}   hard maxlogins 18  #${JOB_ID}"
> to the limits.conf file on the target node, and erased it after the  
> job was
> completed).
>
> If we limit the job to one node the prologue and epilogue scripts run
> with the intended effect.  The problem is when we put the other three
> target nodes in  play, we get a failure on three of the nodes,  
> which is I
> suspect due to an attempt by the application to communicate via ssh  
> under
> the user's id laterally from node to node.
>
> PBS hands the job off to node037 which sucessfully runs it's prologue
> file.
>
> Here's the contents of the output file:
>
> Starting 116.head Thu May 18 15:10:48 CDT 2006
> Initiated on node037
>
> Running on 4 processors: node037 node038 node039 node040
>
>
> Here's the error file:
>
> Connection to node038 closed by remote host.
> Connection to node039 closed by remote host.
> Connection to node040 closed by remote host.
> =>> PBS: job killed: walltime 159 exceeded limit 120
>
>
> To clean up my question a bit I'll break it into four chunks:
>
> 1) Is the general approach I'm using appropriate for my intended  
> effect
>    (isolating four nodes and enforcing the use of the pbspro scheduler
>    on those nodes)?
>
> 2) If so what's the best way of allowing node-to-node  
> communication, if
>    indeed that's my likely problem?
>
> 3) If not does anyone have any other strategies for achieving what I'm
>    after?
>
> 4) If the answer is RTFM could someone steer me towards the FMs or  
> parts
>    thereof I need to be perusing :-)
>
> Thanks in advance.
>
> Larry
>
> Larry
> -- 
> ========================================================
> "I learned long ago, never to wrestle with a pig. You
>  get dirty, and besides, the pig likes it."
>
>                               George Bernard Shaw
> ========================================================
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf