[Beowulf] best archetecture / tradeoffs

Fri Aug 26 08:56:02 PDT 2005

Hello everyone, I am Seth Keith, this is my first mailing.

I am new to the distributed computing thing, but despite this I find 
myself constructing a Beowulf system. I have put together a few 
different types and experimented and read enough to realize it is time 
to solicit outside opinions. I really hope to get some good advice. If 
you have the time please help me out...

My requirements are easy, I think, since my program is already broken up 
into a lot of different programs communicating via STDIN/OUT. I 
benchmarked and found my problem is CPU intensive. The overall data 
transfer is small, but all the different parts need to be assembled 
before the final pass on the data. The final pass cannot be broken up, 
but the final pass is fast. So my model is  input data -> break up into 
N workers -> assemble results -> process -> done.

I need advice on a few of the tradoffs:

1) diskless vs disk

I am thinking diskless is better. I don't worry about network traffic as 
much as power consumption, overall node cost, reliability, and ease of 
management. My nodes are all identical, so I figure diskless, right?  
Well I am having a few problems...

I still don't know exactly about swap. One of the clusters I set up was 
an NFS mounted root file system that did something with swap to 
/dev/loop0, but I don't really understand that, is the swap going onto 
the nfs drive or is it just back into memory? What is the best ( fastest 
) way to handle swap on diskless nodes that might sometimes be 
processing jobs using more than the physical RAM?

Also, is it really true you need a separate copy of the root nfs drive 
for every node? I don't see why this is. I have it working with just one 
for all nodes, but am I missing something here?

2) message passing vs roll yer own

I have played with a few different packages, and written a bunch of perl 
networking code, and read a bunch and I am still not sure what is 
better. Please chime in:

    - what is the fastest way to run perl on worker nodes. Remember I 
don't need to do anything too fancy, just grab a bunch of workers, send 
jobs to them, assemble the results, send the results to another worker, 
etc. I don't need to broadcast to all nodes or anything else.

    - what is the easiest way to do it. I wrote the whole thing in perl 
already, and I was not really impressed with the speed or reliability. 
Certainly this was at least partially programmer error, but my question 
stands, what is the easiest way to reliably control a cluster of worker 
nodes running different perl programs, and assembling the data. This 
includes load balancing.

    - I saw some information on clusters that were linked in the kernel 
and acted as a single machine. Is this a working reality? How does the 
performance of such a system compare with message passing for dedicated 
processing such as my own.

    - I was playing with MPICH-2, is this better than LAM? What about 
other message passing libraries what is the best one?  any with direct 
hooks into perl?

    - how fast is NFS and RSH. If I were to change the code so it works 
with a NFS mounted file instead of STDIN/OUT and I use RSH to 
communicate how would the speed compare with message passing? with 
direct perl networking?

3) Distribution and kernel

I create my NFS system by copying directories off my RH9 distribution. I 
had lots of problems and could never get everything working. I think it 
would be loads easier if I could find a standard distribution image 
already constructed somewhere out there... I don't really care what 
distribution as long as I can run perl.

I keep seeing people advising against the NFS root option and advocating 
ram disk images. Opinions here? Where can I get ram disk images? I would 
be nice to download a basic complete ram disk image, that boots with 
root rsh working already.

Well I guess that is enough for one day. Thank you for taking the time 
to read this email. If you have the time please send me your opinions on 
this stuff.

Thanks again.

-Seth