Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Do these SGE features exist in Torque?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Reuti reuti at staff.uni-marburg.de
Mon May 12 08:19:58 PDT 2008


Hiho,

Am 12.05.2008 um 15:14 schrieb Prentice Bisbal:

>>> At a previous job, I installed SGE for our cluster. At my current  
>>> job
>>> Torque is the queuing system of choice. I'm very familar with  
>>> SGE, but
>>> only have a cursory knowledge of Torque (installed it for  
>>> evaluation,
>>> and that's it). We're about to purchase a new cluster. I'd have  
>>> to make
>>> a good argument for using SGE over Torque. I was wondering if the
>>> following SGE features exist in Torque:
>>>
>>> 1. Interactive shells managed by queuing system
>>> 2. Counting licenses in use (done using a contributed shell  
>>> script in
>>> SGE)
>>> 3. Separation of roles between submit hosts, execution hosts, and
>>> administration hosts
>>> 4. Certificate-based security.
>>>
>>> Are there any notable features available in Torque that aren't  
>>> available
>>> in SGE?
>>
>> what you can find in Torque but not in SGE: request a mixture of  
>> nodes,
>> i.e. one heavy node with much memory (or big I/O options) and 5 nodes
>> with less memory or less disk performance for a parallel job.
>
> Huh? Can you elaborate? My initial thought is "why would you need
> this?", but I think I see where you're going with this...

For some types of applications only the "master" of a parallel job is  
collecting all the information and accessing the disk to store the  
results. Therefore this, and only this, machine needs better I/O  
capabilities than the slave nodes.

It's still an RFE in SGE to get any arbitrary combination of  
resources, e.g. you need for one job 1 host with big I/O, 2 with huge  
memory and 3 "standard" type of nodes you could request in Torque:

-l nodes=1:big_io+2:mem+3:standard

(Although this syntax has its own pitfalls: -l nodes=4:ppn=1 might  
still allocate 2 or more slots on a node AFAIO in my tests.)

As Craig pointed out correctly, it can be set up in SGE, but there  
might be combinatins where this gets convoluted. Thankfully I never  
needed such a type of allocation of nodes, but I just wanted to point  
out that this feature exists in Torque. If you don't need it for your  
type of jobs, ignore it ;-)

BTW @Craig: It should be possible to request -masterq  
qbigmem*@bigmem* in 6.0 which would shorten the line.

>>
>> OTOH, if you have parallel jobs:
>> http://www.beowulf.org/archive/2007-September/019269.html
>
> Thanks for the link. From my understanding of SGE, you can get tight
> integration with just about any MPI implementation. Is that true?

At least: all that I'm aware of. Nowadays I would go for Open MPI,  
which should work out-of-the-box with both queuing systems. If you  
need Linda or HP-MPI, seems SGE is the only option for a Tight  
Integration (between these two - I'm not aware of the features of LSF  
and others).

>> What is different between them from the idea: in Torque you submit  
>> a job
>> into a queue, while in SGE you request resources and SGE will  
>> select an
>> appropriate queue for you.
>
> You'll have to elaborate on this, too. From my knowledge of SGE,  
> you had
>  to specify the correct queue, too, or it went into the default queue.

There is nothing like a default queue in SGE (even the all.q, defined  
at installation time, is just a queue without any special features -  
you can edit or remove it, if you don't like to have it in your  
cluster).

If you have some queues with different limits: i.e. e.g. 1h wallclock  
vs. unlimited wallclock and two queues for 8 GB vs. 16 GB, a job  
requesting 2 hrs of wallclock time will for sure end up in the queue  
with unlimited time constraints, while a job requesting 30 minutes  
might end up in either of the two queues if a slot is free. Same  
stands for memory requests: a job requesting 4 GB might end up in any  
of the two memory limited queues, while a 12 GB request can only be  
run in the 16 GB queue. You don't have to specify it, as SGE will  
select the correct one which fulfills your requested resources.

--- Reuti


> In  SGE you can specify resources such as mem >= 32 GB for a node, or
> arch=AMD64. You can't do this with Torque? Seems like a very basic
> queuing system feature.



More information about the Beowulf mailing list