Is Boewulf what I need?

Greg Lindahl glindahl at hpti.com
Thu May 18 22:03:57 PDT 2000


> Our need is no paralelization in the programs but distribution of the
> processes.
>
> We would like to access to a single system, but to have a lot of CPUs.

It sounds like you have a large number of single-process jobs that you would
like to run. This is often called "capacity computing". The only sharing
between these processes would be via disk.

There are a couple of basic ways to solve this kind of problem, and several
systems aimed at this kind of problem. One already mentioned is Mosix, which
will take arbitrary processes and migrate them around. The user does nothing
different. A second class of systems are ones where you do something special
to start the process (other than just running it). Condor falls into that
category, and something like a PBS wrapper would also fall into this
category. The wrapper could just be a few lines.

The advantage of a simpler approach is that it does less violence to the
overall system. Mosix requires a kernel module. Condor is very clever at
checkpoint/restart, but it can take a lot of hacking to port it to a
particular system, and keep up with Linux kernel and libc releases. (The
Condor guys support Linux pretty well, so I hear.) PBS, though, is pretty
simple, and rarely needs to be changed if you update the kernel. So, you can
choose your level of pain and benefit, by looking at the trade-offs.

-- greg





More information about the Beowulf mailing list