Uses for a beowulf cluster?
Daniel J. Frasnelli
dfrasnel@csee.wvu.edu
Sun, 13 Sep 1998 00:22:26 -0400
Greetings,
On Sun, 13 Sep 1998, Shachar Tal wrote:
> The university I work for is considering switching from it's monstrous
> multi-CPU computers to Linux Beowulf clusters. I've been reading about the
> subject, and I know that clusters are used to distribute computation
> between nodes.
> What we want to do is make, let's say, a 16-node cluster and let users log
> in to the cluster (via telnet, ssh, whatever) and do whatever they usually
> do: read their mail, telnet elsewhere, run netscape, use gcc, you get the
> picture.
I think a brief tutorial on the principles of implementation may
be in order. First, most freely available software packages are not
"parallel-ready" (Sorry, this sounds like a marketing buzzword), meaning
they do not include calls to parallel communication libraries or provide
their own mechanisms for doing so. Notable exceptions are some ray
tracing packages and compilers.
There are several ways of executing tasks in parallel:
1) Provide communication of data and process status through parallel
communication libraries, such as PVM and MPICH. This is done on an
application-by-application basis, and generally is only possible if you
have access to the source code.
2) Implement low-level support in the kernel for things like distributed
memory, node to node communication, data passing, etc.
3) Provide functionally equivalent libraries which allow transparent
process distribution. In other words, rewrite the system libraries
(Again, you need the source code) to provide hooks to node-node
communication and drop in the new libc, libm, libcrypt, etc.
Keep in mind that not all tasks are benefitted by running in parallel.
Some are inherently friendly to "parallelization", others will be slowed
down by splitting the task across multiple nodes. It is my understanding
that almost any program can benefit from a distributed shared memory
implementation.
Please take a look through the Parallel-Processing HOWTO from the
Linux project at your local sunsite mirror for more information.
> My question is: Will a beowulf do the job or is it not up to the job? Has
> anyone done such a thing?
I seriously doubt that a Beowulf cluster is what you are seeking.
I recently had a discussion with the systems group on this very problem.
We currently average 50-70 users on our main shell server per day, but
with the growing freshman class size per year, this figure easily will
increase.
At this informal discussion, I proposed that we purchase a
cluster of servers, say 4-6. Instead of attempting to re-write our
applications for distribution across the nodes, we are planning to write a
basic daemon which passes information about the number of processes,
average system load, memory in use, etc. to a "smart" NAT box. The NAT
box will pass incoming telnet/ftp traffic to any of the servers based on
an algorithm taking into account the variables listed above (system load
et al.) passed by the specialized daemon.
This is transparent load sharing (or "load balancing"), which will
likely fit your requirements for reduced system load and improved
performance.
Best regards,
Daniel
---
Daniel J. Frasnelli Remote sensing scientist
dfrasnel @ wvu.edu Imaging spectroscopy researcher
Explore terrestrial physics! http://ltpwww.gsfc.nasa.gov/