[Beowulf] Daisychained rcp script
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Felix Rauch Valenti felix.rauch.valenti at gmail.comMon Mar 21 23:04:12 PST 2005
- Previous message: [Beowulf] Daisychained rcp script
- Next message: [Beowulf] newbie question about mpich2 on heterogenous cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 21 Mar 2005 15:51:24 -0800, David Mathog <mathog at mendel.bio.caltech.edu> wrote: > Here's a script for copying a file across a list of nodes. > > ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/pdist_file.sh > > It uses a daisychain method similar to that in "dolly". I'm > a bit curious how it holds up on larger sites with different network > hardware. We have a switched 100baseT network with data starting > on the headnode and going to up to 20 nodes, all nodes are identical. > Here are some timings with nothing else running: > > Nodes Time (s Mb/s Repeater Nodes > 1 8 10.8 0 > 2 8.9 - 9.3 9.7 - 9.3 1 > 1-5 13.5-14.8 6.4 - 5.8 4 > 1-10 13.5-17.4 6.4 - 5.0 9 > 1-20 19.5-20.5 4.4 - 4.2 19 I only had a quick look at your script, but it seems that it uses (named) pipes and "tee", so I'd guess it does more data copies than "dolly" (which implements the whole replication in a single C program). That could explain a difference in throughput between 1 node and multiple nodes, because the repeater nodes limit the performance. A reason for the farther decrease in performance with higher numbers of nodes might be that the pipes and your network connection don't use the same blocksize (I'm not sure though), which could result in "hiccups" in the daisychain due to bad synchronisation between data streams. [...] > I also times this using my variant of dolly 0.57C, which should > be about the same as 0.58. Interestingly even though dolly > reports that it is moving > > Time: 8.935656 > MBytes/s: 9.674 > > when I use "time" to measure the actual elapsed time the transfer > actually takes 16.0 seconds total elapsed time, for 5.4 Mb/s. > (And that doesn't count the 1 second or so for rsh to set up > the 20 slave dolly processes.) So dolly is a little better > than my simple script but it also can't keep the network > running flat out. I didn't check dolly's code, but I guess it doesn't measure the startup and teardown phases, because I was mostly interested in throughput for very large files (that's what dolly was written for). To get rid of the startup phase in dolly -- and thus achiever higher throughputs for medium sized files -- one might want to use a dolly daemon. Such a daemon would be started once, set up all the daisychain connections, and then wait for files to transmit. Thus, the file replication could start immediately after writing the file to the dolly server daemon, withouth any setup or teardownup delays. - Felix
- Previous message: [Beowulf] Daisychained rcp script
- Next message: [Beowulf] newbie question about mpich2 on heterogenous cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
