Uses for a beowulf cluster?

Shachar Tal shachar@vipe.technion.ac.il
Sun, 13 Sep 1998 02:22:01 -0400


Hi,

On Sun, 13 Sep 1998, Daniel J. Frasnelli wrote:

> 	I think a brief tutorial on the principles of implementation may
> be in order.  First, most freely available software packages are not
> "parallel-ready" (Sorry, this sounds like a marketing buzzword), meaning 
> they do not include calls to parallel communication libraries or provide
> their own mechanisms for doing so.  Notable exceptions are some ray
> tracing packages and compilers.  
>   
> There are several ways of executing tasks in parallel:
> 1) Provide communication of data and process status through parallel
>  communication libraries, such as PVM and MPICH.  This is done on an
>  application-by-application basis, and generally is only possible if you
>  have access to the source code.  
> 2) Implement low-level support in the kernel for things like distributed
>  memory, node to node communication, data passing, etc.
> 3) Provide functionally equivalent libraries which allow transparent 
>  process distribution.  In other words, rewrite the system libraries 
>  (Again, you need the source code) to provide hooks to node-node
>  communication and drop in the new libc, libm, libcrypt, etc.  
>
> Keep in mind that not all tasks are benefitted by running in parallel.
> Some are inherently friendly to "parallelization", others will be slowed
> down by splitting the task across multiple nodes.  It is my understanding
> that almost any program can benefit from a distributed shared memory
> implementation.  
> 	Please take a look through the Parallel-Processing HOWTO from the
> Linux project at your local sunsite mirror for more information.      	 
> > My question is: Will a beowulf do the job or is it not up to the job? Has
> > anyone done such a thing?
> 	I seriously doubt that a Beowulf cluster is what you are seeking.
> I recently had a discussion with the systems group on this very problem.
> We currently average 50-70 users on our main shell server per day, but
> with the growing freshman class size per year, this figure easily will
> increase.  
> 	At this informal discussion, I proposed that we purchase a
> cluster of servers, say 4-6.  Instead of attempting to re-write our
> applications for distribution across the nodes, we are planning to write a
> basic daemon which passes information about the number of processes,
> average system load, memory in use, etc. to a "smart" NAT box.  The NAT
> box will pass incoming telnet/ftp traffic to any of the servers based on
> an algorithm taking into account the variables listed above (system load
> et al.) passed by the specialized daemon.  
> 	This is transparent load sharing (or "load balancing"), which will
> likely fit your requirements for reduced system load and improved
> performance.           

Transparent load sharing is exactly what I need. But I want a bit more. I
want process migration, I want lastlog to be unified, etc. I don't need
*anything* to run in parallel (not that it wouldn't be nice to have
this), but I prefer the following scenario: each user logs in to a random
(for the argument's sake) node, and does his work there. No interaction is
needed for most of the work (pine, gcc, mathematica, flex, yacc, etc.),
maybe sharing memory would be fine, but I suspect swapping pages to a
local hard drive is a bit less expensive than shared memory across a
100Mb/s network.

Shachar Tal
-------------
Taub Computer Center, Technion, Israel Institute of Technology
KeyID 0481FEF1 fingerprint = 52 1B 97 6A F2 77 AE C6  64 B6 5A 5E 14 28 8E 7E