[Beowulf] Do these SGE features exist in Torque?
Bogdan.Costescu at iwr.uni-heidelberg.de
Tue May 13 03:17:11 PDT 2008
On Mon, 12 May 2008, Glen Beane wrote:
> I know TORQUE USED to be much better than SGE at controlling MPI
> type jobs.
I think that it still is, due to the long-awaited but still not
existing TM support in SGE.
> If you use a PBS/TORQUE aware MPI job launcher it is pretty much
> impossible for any of the job processes to escape control of the
> batch system.
Hmm, not quite true. I've had just recently several such instances
where I had to kill individual processes by hand (using Torque
2.1.10). One nice thing about SGE is its use of setgroups() to set
additional groups from a reserved range on the all the processes of a
job; as this call is normally only available to "root", it's
impossible for user processes to modify the additional groups list and
escape being killed; I used SGE in the past and don't remember ever
having to clean up processes by hand.
[ Please note that I'm taking here into consideration only the batch
system proper and not any kind of prologue/epilogue scripts which are
the usual fixes that are applied locally. IMHO job cleanup is a basic
functionality that should be included in the batch system proper. ]
> Last time I used SGE, I found the MPI support much less
> sophisticated than TORQUE, but this was several years ago.
This is easy to explain once you have to look at how they both
started. However generally speaking I can see that during the past few
years they started to grow similar features (f.e. SGE is getting
better parallel jobs integration and possibly TM support, Torque is
getting job-array support)
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the Beowulf