[Beowulf] Queue Systems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Reuti reuti at staff.uni-marburg.deThu Sep 6 14:16:47 PDT 2007
- Previous message: [Beowulf] Queue Systems
- Next message: [Beowulf] Big storage
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 06.09.2007 um 18:27 schrieb Chris Dagdigian:
>
> { Declaration of bias; I run the http://gridengine.info site in my
> spare time ... }
>
> I'm quite familiar with both LSF and SGE, using both products in my
> professional work and helping clients with queue system selection,
> deployment, application integration and training. I'm less
> familiar with PBS/Torque/etc. having only run those in small
> virtualized lab environments. At the time when I was looking at
> open source solutions, none of the PBS variants supported array
> jobs so I went with SGE and never looked back.
Another thing is Tight Integration of parallel runs, which is
available in PBS/Torque for LAM-MPI and OpenMPI, but not for HP-MPI,
Linda or PVM. You can use it with these queuing systems of course,
but the slave processes are not controlled by them, nor will you get
a correct accounting. SGE offers an rsh replacement called qrsh which
will support these parallel environments.
-- Reuti
> The current state of the art is quite good. For 90% of use cases
> and end-user requirements you really can't go wrong with any of the
> available products.
>
> Everything out there (open source or commercial) is capable of
> doing the standard sort of "policy based resource management on
> distributed systems" that we all care about.
>
> So with all products capable of doing just about everything you
> would need, making an actual product selection comes down to areas
> other than the functionality of the queueing core.
>
> Things like:
>
> - Administrative burden (if keeping PBS from falling over requires
> a full time employee; the cost of LSF looks far more attractive for
> instance ...)
> - Cost
> - Quality of support
> - Quality of technical documentation
> - Quality of training / professional services
> - Layered products that enhance base functionality
>
> Platform LSF is the gold standard. Low administrative burden, great
> documentation/support and resiliency features that competitors
> still have a tough time matching and all wrapped up with additional
> (at extra cost of course) layered products that nobody else can
> really touch. The downside? Cost of course. In particular the
> current Linux pricing model punishes you for putting more than 4GB
> of RAM into a compute node or using a non X86/X86_64 architecture
> -- in both cases you'll get bounced out of the "cheap" license
> category and into a far more expensive one where the cost of the
> software license is in the same ballpark as the cost of the server
> hardware.
>
> Platform will happily sell you additional layered products that can
> do things like:
>
> - Tight integration with FlexLM license servers; more powerful than
> the standard load sensor (SGE) and elim (LSF) methods that people
> do "for free"
> - Seriously hardcore reporting and analytic tools suitable for the
> largest enterprises
> - Tight integration with parallel environments and high speed
> interconnects (plus support for these environments which is non-
> trivial)
> - SLA-aware scheduling
> - Multi-cluster aware scheduling
> - etc. etc.
>
> The base version of LSF also ships with a basic reporting module
> and a tomcat-driven web interface that is suitable for users
> (submit and monitor jobs) as well as admins (manage queues and
> hosts). SGE in particular does not really have anything like this
> except for ARCo on the reporting side and ARCo is no match for even
> the "free" reporting module you get with LSF 7.x
>
> That said though, it's been my experience that a vast majority of
> the "market" does not need and will not likely ever need some of
> the advanced/enterprise level add-ons that integrate so cleanly
> with the base Platform LSF products.
>
> So this drops me back down into my original argument that just
> about any of the available products will perform well at doing what
> you need. The key advice I have is to understand that everyone is
> pretty good at the basic functions so you'll have to make your
> selection decision based on some of the other criteria I tried to
> list above.
>
>
> My general rule of thumb for new projects is to start with the
> assumption that I'll be using Grid Engine. Then, after more formal
> understanding of the work-flows and customer requirements are
> achieved it may become clear that Platform LSF is a better choice.
>
> For all of 2007 I'd probably take a guess at saying that I've
> worked on 20+ Grid Engine systems and deployed LSF just once for a
> large enterprise customer.
>
>
> My $.02 of course!
>
> Regards,
> Chris (posting from my non-corporate address)
>
>
>
>
>
>
>
>
> On Sep 6, 2007, at 5:30 AM, andrew holway wrote:
>
>> Hi,
>>
>> We are trying to work out the differences between these queue
>> systems.
>>
>> Can anyone shed any light? Pros and Cons...
>>
>> SGE, Torque (with Maui), PBSPro and LSF
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] Queue Systems
- Next message: [Beowulf] Big storage
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
