job queue/farm out suggestions

Bill Comisky bcomisky at pobox.com
Mon Jan 8 20:10:07 PST 2001


hi all,

I've been using some homegrown code for quite a while now to farm out
independent jobs (no interprocess communication required) to the nodes of
a cluster.  I was starting to add features and improve scalability, and I
thought I would look around to see if there was something better elsewhere
I could use instead, maybe with some simple front end scripts.

I'd like to find out what other "embarassingly parallel" cluster users use
to farm jobs out.  My jobs are the fitness evaluations of the population
of a single generation of a genetic algorithm optimization.

Some of the features I would like are:

- some way to know when a particular set of jobs (one generation of
individuals in my case) is done

- a way to partition the cluster so that one user gets nodes 1:N, and
another user gets nodes N+1:M.

- allow multiple users to share the entire cluster, while limiting the
runs started on each node to that nodes memory/processor limitations.

- tolerance of node failures (node goes down) or job failures (run hangs)

- an open source/free solution would be preferred

- process migration might be interesting to try out, but I don't think it
is essential for what I'm doing.

Right now I use some scripts and rsh, and I redirect the program input
through rsh and either redirect the output through rsh or write remotely
to an NFS mounted directory.  Do the queueing systems you use require NFS?

I've started looking at GNU queue, anyone have experience with it?  Is
there a way to tell if a set of jobs has been completed (would you have to
collect PID's and check the process list continually?).

MOSIX sounds interesting, but it sounds like I would still need some
queueing routine.  Meaning, if I have 1000 jobs and 20 processors, MOSIX
won't wait for the first 20 to finish before starting another 20,
right?  It just tries to balance all 1000 over the cluster.

thanks for any input!
bill

-- 
Bill Comisky
bcomisky at pobox.com





More information about the Beowulf mailing list