PBS and PGI interactions?
morton at cs.umt.edu
Sat Apr 7 19:46:56 PDT 2001
All, after some work and, due to "some" previous experience
in setting up PBS, I've answered my own question of how to
integrate PGI's HPF with PBS. The answer is certainly a result
of my putting various disparate pieces together - the info wasn't
in a single place so, if anybody is interested, I'll "try" to
put it together for others' reference.
So, below I will make short comments referring to my previous
Don Morton wrote:
> 1) Has anybody devised an approach (I would guess a wrapper) that
> allows parallel jobs to be run only via PBS? In other words,
> in my opinion, you can't have a production cluster if any old
> Joe Sixpack can come in and bypass PBS by typing
> mpirun -np 16 a.out
This is still unanswered - the only immediate (but somewhat difficult)
solution I can think of is to write a wrapper script (e.g. "mpprun")
which is intended to run any parallel programmes, and simply catches
jobs where PBS_ENVIRONMENT is not set and says, "sorry pal, gotta use
PBS" Theoretically a simple thing to do - practically, I imagine it's
full of probblems :)
> 2) Has anybody devised an approach for launching PGI HPF (and OpenMP)
> jobs via PBS, that does so correctly (i.e. keeps track of node
> allocations from other jobs, etc.)?
Finally got this figured out by looking at PGI HPF references -
much like using MPICH where you utilise the PBS_NODEFILE
> 3) With PGI's HPF runtime environment, is it possible to execute
> completely on "compute nodes?" I'm trying to reserve our "head"
> node for compilation, visualisation, etc., but it appears to me that
> when you run PGI HPF processes, they always put one on the "local" node,
> which isn't necessarily a good thing. I don't see a "clear" way around
This seems to fall into place when you set the PGHPF_HOST to
> MPI, PGI HPF, etc. In fact, Cray's "mpprun" command seemed to
> abstract away the details of dealing with various parallel libraries.
> You launched a job from a shell node - you could run an interactive job
> for maybe 30 minutes, and the allocation of those nodes would be
> coordinated with NQS. Anything longer required a batch NQS job
> and, again, you could use the same submission paradigm whether you
> were using PGI HPF, MPICH, etc.
Again, the above is "ideal" - have a single script, "mpprun" that
launches any parallel job whether it's MPICH, PGHPF, PVM, etc.
Let the script take care of all these tedious details - has anybody
done something like this?
Don Morton http://www.cs.umt.edu/~morton/
Department of Computer Science The University of Montana
Missoula, MT 59812 | Voice (406) 243-4975 | Fax (406) 243-5139
More information about the Beowulf