[Beowulf] Do these SGE features exist in Torque?

Craig Tierney Craig.Tierney at noaa.gov
Mon May 12 09:01:03 PDT 2008


Reuti wrote:
> Hiho,
> 
> Am 12.05.2008 um 15:14 schrieb Prentice Bisbal:
> 
>>>> At a previous job, I installed SGE for our cluster. At my current job
>>>> Torque is the queuing system of choice. I'm very familar with SGE, but
>>>> only have a cursory knowledge of Torque (installed it for evaluation,
>>>> and that's it). We're about to purchase a new cluster. I'd have to make
>>>> a good argument for using SGE over Torque. I was wondering if the
>>>> following SGE features exist in Torque:
>>>>
>>>> 1. Interactive shells managed by queuing system
>>>> 2. Counting licenses in use (done using a contributed shell script in
>>>> SGE)
>>>> 3. Separation of roles between submit hosts, execution hosts, and
>>>> administration hosts
>>>> 4. Certificate-based security.
>>>>
>>>> Are there any notable features available in Torque that aren't 
>>>> available
>>>> in SGE?
>>>
>>> what you can find in Torque but not in SGE: request a mixture of nodes,
>>> i.e. one heavy node with much memory (or big I/O options) and 5 nodes
>>> with less memory or less disk performance for a parallel job.
>>
>> Huh? Can you elaborate? My initial thought is "why would you need
>> this?", but I think I see where you're going with this...
> 
> For some types of applications only the "master" of a parallel job is 
> collecting all the information and accessing the disk to store the 
> results. Therefore this, and only this, machine needs better I/O 
> capabilities than the slave nodes.

It isn't just collecting IO, it is also reading it.  There are many
MPI codes out there that don't do distributed IO.  We have several codes
like this.  We also have codes where the head-node reads all of the input
in first, then distributes it.  These codes need extra RAM in the head
node just to startup.  Yes, it is stupid, but I cannot rewrite everyone's
codes.

> 
> It's still an RFE in SGE to get any arbitrary combination of resources, 
> e.g. you need for one job 1 host with big I/O, 2 with huge memory and 3 
> "standard" type of nodes you could request in Torque:
> 
> -l nodes=1:big_io+2:mem+3:standard
> 
> (Although this syntax has its own pitfalls: -l nodes=4:ppn=1 might still 
> allocate 2 or more slots on a node AFAIO in my tests.)

You mean the syntax has its pitfalls in Torque, or how SGE may impelement
it?  I personally like the way SGE allocates nodes.  I can control how
they get nodes.  When a user asks for 16 processors (core, slots, whatever)
they should get N nodes that have M processors, and N*M=16.  If a user
needs to specify ppn=2 (or 4 or 8) it means they will mess it up causing
jobs to share nodes and adversely impact each other which I don't want.


> 
> As Craig pointed out correctly, it can be set up in SGE, but there might 
> be combinatins where this gets convoluted. Thankfully I never needed 
> such a type of allocation of nodes, but I just wanted to point out that 
> this feature exists in Torque. If you don't need it for your type of 
> jobs, ignore it ;-)
> 
> BTW @Craig: It should be possible to request -masterq qbigmem*@bigmem* 
> in 6.0 which would shorten the line.
> 

Thanks for point that out.  In 5.3, not all options supported wildcards.
I didn't try changing the scripts when we upgraded.

CRaig



>>>
>>> OTOH, if you have parallel jobs:
>>> http://www.beowulf.org/archive/2007-September/019269.html
>>
>> Thanks for the link. From my understanding of SGE, you can get tight
>> integration with just about any MPI implementation. Is that true?
> 
> At least: all that I'm aware of. Nowadays I would go for Open MPI, which 
> should work out-of-the-box with both queuing systems. If you need Linda 
> or HP-MPI, seems SGE is the only option for a Tight Integration (between 
> these two - I'm not aware of the features of LSF and others).
> 
>>> What is different between them from the idea: in Torque you submit a job
>>> into a queue, while in SGE you request resources and SGE will select an
>>> appropriate queue for you.
>>
>> You'll have to elaborate on this, too. From my knowledge of SGE, you had
>>  to specify the correct queue, too, or it went into the default queue.
> 
> There is nothing like a default queue in SGE (even the all.q, defined at 
> installation time, is just a queue without any special features - you 
> can edit or remove it, if you don't like to have it in your cluster).
> 
> If you have some queues with different limits: i.e. e.g. 1h wallclock 
> vs. unlimited wallclock and two queues for 8 GB vs. 16 GB, a job 
> requesting 2 hrs of wallclock time will for sure end up in the queue 
> with unlimited time constraints, while a job requesting 30 minutes might 
> end up in either of the two queues if a slot is free. Same stands for 
> memory requests: a job requesting 4 GB might end up in any of the two 
> memory limited queues, while a 12 GB request can only be run in the 16 
> GB queue. You don't have to specify it, as SGE will select the correct 
> one which fulfills your requested resources.
> 
> --- Reuti
> 
> 
>> In  SGE you can specify resources such as mem >= 32 GB for a node, or
>> arch=AMD64. You can't do this with Torque? Seems like a very basic
>> queuing system feature.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the Beowulf mailing list