Parallel batch jobs on beowulf?
Drake.Diedrich at anu.edu.au
Thu Oct 4 04:44:18 PDT 2001
On Tue, Oct 02, 2001 at 12:16:06AM +0300, Eray Ozkural wrote:
> We have a small research cluster at our CS dept., it's got 32 compute nodes.
> We run debian and the setup is a typical beowulf. (locally installed
> software, nfs, nis, mpich, lam, etc.)
> An instructor asked us whether it would be possible to run a parallel job
> system. I know a regular batch system (like pbs) could be used to that end,
> but what is the recommended way of providing parallel batch jobs on a Beowulf
DQS is already available in Debian (non-free), and has some tie-ins to
PVM. To submit a parallel job "qsub -l qty.eq.67" to get 67 nodes, and -par
PVM to set up pvm across them. The pvm setup can conflict with other pvms
for the same user on the cluster, so only run one parallel job/user at a
time, and no interactive parallel jobs.
MPI and other systems can be set up using the $HOSTS_FILE variable, which
points to a file containing the list of hosts allocated to the job.
The script you qsub runs on only the master node, and has to set up the rest
of the allocated nodes itself.
I find it useful to put two queues on each node, one low priority queue
intended for bulk serial jobs that is subordinated by a higher priority
queue where the parallel jobs can go.
More information about the Beowulf