[Beowulf] Daisychained rcp script
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduMon Mar 21 15:51:24 PST 2005
- Previous message: [Beowulf] Why Do Clusters Suck?
- Next message: [Beowulf] Daisychained rcp script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Here's a script for copying a file across a list of nodes. ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/pdist_file.sh It uses a daisychain method similar to that in "dolly". I'm a bit curious how it holds up on larger sites with different network hardware. We have a switched 100baseT network with data starting on the headnode and going to up to 20 nodes, all nodes are identical. Here are some timings with nothing else running: Nodes Time (s Mb/s Repeater Nodes 1 8 10.8 0 2 8.9 - 9.3 9.7 - 9.3 1 1-5 13.5-14.8 6.4 - 5.8 4 1-10 13.5-17.4 6.4 - 5.0 9 1-20 19.5-20.5 4.4 - 4.2 19 The test file was 86.4 Mb (for no good reason.) 1-5 means the first 5 nodes were written. Repeater nodes are those that read from the net, store locally, and also write to the net. There's always a first node (only writes to the net) and a last node (only reads from the net and stores to disk.) Ideally the daisychain method would scale up better than this. I think that what's happening is that the there is more and more chance of wasted time on the repeater nodes because N+1 is writing to N+2 when it should be reading from N). Also the writes to disk and read/write from the network are not synch'd very well, so again, it's probably doing the wrong thing at the wrong time which introduces progressively more delays. Consequently it uses less and less of the available bandwidth as the number of nodes in the chain increases. That said, it's still a lot faster moving data out this way than with 20 sequential rcp's. It also doesn't massacre the NFS server as would 20 of these simultaneously: rsh remotenode "cp /nfsmount/data /localdisk" I also times this using my variant of dolly 0.57C, which should be about the same as 0.58. Interestingly even though dolly reports that it is moving Time: 8.935656 MBytes/s: 9.674 when I use "time" to measure the actual elapsed time the transfer actually takes 16.0 seconds total elapsed time, for 5.4 Mb/s. (And that doesn't count the 1 second or so for rsh to set up the 20 slave dolly processes.) So dolly is a little better than my simple script but it also can't keep the network running flat out. Anybody have a better "daisychain" (or other) data replicator? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Why Do Clusters Suck?
- Next message: [Beowulf] Daisychained rcp script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
