[Beowulf] Reliable Job Queueing and Notification
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bernard Li bernard at vanhpc.orgWed Oct 17 11:31:26 PDT 2007
- Previous message: [Beowulf] Reliable Job Queueing and Notification
- Next message: [Beowulf] Reliable Job Queueing and Notification
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Sean: On 10/16/07, Sean Ward <SeanWard at msn.com> wrote: > I've started work on a web service which contains several potentially > long running processing steps (molecular dynamics), which are perfect to > farm out to the fairly large (90 node) Beowulf I have access to. The > primary issue is translating requests from the event driven web service, > to job queues, and back again upon completion. Specifically, the major > queuing systems I have immediate access to (Sun Grid Engine and Condor) > only support e-mail based notification of job completion. Starting jobs > isn't an issue, as my service can simply ssh over and execute shell > scripts as needed to start things up, the problem is reliably being > informed when the jobs fail or complete, via any programmatic method > (such as executing a shell script, calling a web service via SOAP/etc, > or an asynchronous message library). My other problem, ensuring that > these web service requests don't starve in house jobs on the Beowulf is > easily handled via the priority levels built into all the various job > managers, although being able to checkpoint a long running job would be > a plus (such as is supported by Condor). > > I am currently investigating modifications to either Condor (more > complex to update, but checkpoint is useful) or Ruby Queue (very easy to > update for reliable notification) to solve this issue, but wanted to be > sure I wasn't overlooking any existing solutions to programmatic based > queuing and receiving notifications on jobs in a Beowulf environment... If you plan to stay with the SGE/Condor route, you should take a look at DRMAA: http://drmaa.org/wiki/ Perhaps you will find something useful there. Cheers, Bernard
- Previous message: [Beowulf] Reliable Job Queueing and Notification
- Next message: [Beowulf] Reliable Job Queueing and Notification
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
