[Beowulf] Daisychained rcp script

Mon Mar 21 15:51:24 PST 2005

Here's a script for copying a file across a list of nodes.

ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/pdist_file.sh

It uses a daisychain method similar to that in "dolly".  I'm
a bit curious how it holds up on larger sites with different network
hardware. We have a switched 100baseT network with data starting
on the headnode and going to up to 20 nodes, all nodes are identical.
Here are some timings with nothing else running:

Nodes  Time (s     Mb/s          Repeater Nodes
1      8           10.8          0
2      8.9 - 9.3   9.7 - 9.3     1
1-5    13.5-14.8   6.4 - 5.8     4
1-10   13.5-17.4   6.4 - 5.0     9
1-20   19.5-20.5   4.4 - 4.2    19

The test file was 86.4 Mb (for no good reason.)
1-5 means the first 5 nodes were written.
Repeater nodes are those that read from the net, store locally,
and also write to the net.  There's always a first node (only
writes to the net) and a last node (only reads from the net
and stores to disk.)

Ideally the daisychain method would scale up better than
this.  I think that what's happening is that the there is more
and more chance of wasted time on the repeater nodes because
N+1 is writing to N+2 when it should be reading from N).
Also the writes to disk and read/write from the network
are not synch'd very well, so again, it's probably
doing the wrong thing at the wrong time which introduces
progressively more delays.  Consequently it uses less and less
of the available bandwidth as the number of nodes in the
chain increases.

That said, it's still a lot faster moving data out this way than
with 20 sequential rcp's.  It also doesn't massacre the NFS
server as would 20 of these simultaneously:

  rsh remotenode "cp /nfsmount/data /localdisk"

I also times this using my variant of dolly 0.57C, which should
be about the same as 0.58.  Interestingly even though dolly
reports that it is moving 

Time: 8.935656
MBytes/s: 9.674

when I use "time" to measure the actual elapsed time the transfer
actually takes 16.0 seconds total elapsed time, for 5.4 Mb/s.  
(And that doesn't count the 1 second or so for rsh to set up
the 20 slave dolly processes.)  So dolly is a little better
than my simple script but it also can't keep the network
running flat out. 

Anybody have a better "daisychain" (or other) data replicator?

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech