[Beowulf] Please help to setup Beowulf

Glen Beane Glen.Beane at jax.org
Fri Feb 20 05:22:50 PST 2009




On 2/19/09 4:28 PM, "Bill Broadley" <bill at cse.ucdavis.edu> wrote:

Granted this was 5+ years ago, I assume things are better now, and have heard
quite a few good things about torque lately.  SGE does seem the default for
many small/medium clusters these days (and is the Rocks default) but does make
some things strangely hard, usually with a work around though.


I started with OpenPBS and a few dozen community patches, eventually started using PBS Pro through their .edu program, and then switched to SPBS (which later became known as TORQUE).  I still use TORQUE now,  and have been contributing code to it for quite some time.  I looked into SGE a long time ago, but I found the MPI support terrible when compared to TORQUE/PBS Pro - there were several native PBS/TORQUE job launchers (using the TM API) for many MPI flavors so ssh/rsh was not needed to launch the remote processes and every process was under control of the batch system.  When you used a ssh-based job launcher the batch system only knew about the mpirun processes and if a job hit its walltime or a user qdel'd the job, the batch system would only send the TERM/KILL signals to that processes. It was common for the other processes to stick around. TM-based job MPI job launchers made that a thing of the past.  When I had looked at SGE I thought the MPI support was clumsy. I had no interest in that, but I'm sure things have improved.  Fast forward to today, and my only real complaint about TORQUE is incomplete job arrays (I wrote the code for the current primitive job arrays quite some time ago and have not had a chance to finish them), and no way to request X number of processors on any combination of nodes (instead of nodes=X:ppn=Y), but I believe this is being worked on. If you use Moab you can set it so it will interpret nodes=X:ppn=Y as X*Y processors distributed any way, but then you lose the ability to specify precisely how you want your job laid out.


--
Glen L. Beane
Software Engineer
The Jackson Laboratory
Phone (207) 288-6153

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090220/bf54dd02/attachment.html>


More information about the Beowulf mailing list