[Beowulf] copying big files
henning.fehrmann at aei.mpg.de
Fri Aug 8 08:37:13 PDT 2008
Coping a big file onto all nodes in a cluster is a rather common problem.
I would have thought that there might be a standard tool for
distributing the files in an efficient way. So far, I haven't found one.
Assuming one has a network design which allows non blocking full duplex
wire-speed connections between N/2 pairs of nodes where N is the number
of nodes in the cluster. It is basically a non blocking coreswitch.
In this case the following scheme would be convenient and rather simple:
The file is placed on node n1 and one builds a chain of nodes n1 , n2 .... nN.
One splits the file into many packages (p1..pM), lets say a fragment fits
into one TCP package. In the first step n1 transmits the package p1 to node n2.
In the second step n1 transmits the package p2 to n2 and n2 transmits p1 to node n3.
The transmission of a single package is fast. The time of passing a particular
package through the whole chain of nodes is short compared with time of the
entire copying process. E.g., using jumbo frames a package can have the size of ca 10kB.
In Gb network the transmission time of a single package between nodes is
of the order of 0.1 ms. Even in a cluster with 1024 nodes it takes
in an ideal case just 0.1s to pass a package from node n1 through all nodes to n1024.
On each node the package is stored and, in the end, one reassembles the file.
For big files (size >> 10Mb) the required time is approximately
the same as one needs for copying the file between two nodes plus 0.1s.
One needs basically a daemon which handles copying requests and establishes
the connection to next node in the chain.
Has somebody written such a tool?
More information about the Beowulf