[Beowulf] copying big files
landman at scalableinformatics.com
Fri Aug 8 08:59:24 PDT 2008
Henning Fehrmann wrote:
> Hi everybody,
> Coping a big file onto all nodes in a cluster is a rather common problem.
> I would have thought that there might be a standard tool for
> distributing the files in an efficient way. So far, I haven't found one.
> Assuming one has a network design which allows non blocking full duplex
> wire-speed connections between N/2 pairs of nodes where N is the number
> of nodes in the cluster. It is basically a non blocking coreswitch.
> In this case the following scheme would be convenient and rather simple:
> The file is placed on node n1 and one builds a chain of nodes n1 , n2 .... nN.
> One splits the file into many packages (p1..pM), lets say a fragment fits
> into one TCP package. In the first step n1 transmits the package p1 to node n2.
> In the second step n1 transmits the package p2 to n2 and n2 transmits p1 to node n3.
Someone has implemented this bucket brigade model for data transfer.
Its not the only one available, as each NIC has two neighbors to
communicate with, and thus winds up at effectively 1/2 the bandwidth, or
a serialization of the packets. Not that this is a bad thing, but for
big file distribution, this could be a problem.
> The transmission of a single package is fast. The time of passing a particular
> package through the whole chain of nodes is short compared with time of the
> entire copying process. E.g., using jumbo frames a package can have the size of ca 10kB.
> In Gb network the transmission time of a single package between nodes is
> of the order of 0.1 ms. Even in a cluster with 1024 nodes it takes
> in an ideal case just 0.1s to pass a package from node n1 through all nodes to n1024.
> On each node the package is stored and, in the end, one reassembles the file.
> For big files (size >> 10Mb) the required time is approximately
> the same as one needs for copying the file between two nodes plus 0.1s.
> One needs basically a daemon which handles copying requests and establishes
> the connection to next node in the chain.
> Has somebody written such a tool?
I saw something like this several years ago.
We were working on a different type of tool that exploited the fact that
you have N/2 pairs, and tried to maximize the flow to these N/2 pairs.
It included error correction and a few other nice things (multi-sourcing
was on the roadmap). Never could find interested customers/users for
it, so it fell off the radar. We called it xcp, and you used it as
xcp [set of files] cluster://name/path/to/deposit/files/into
and it handled it all for you.
Prior to that, we had a system that used multicast, but after seeing
what this did to other traffic on the gigabit switches, we went away
from that. That was mcp, and was dated around 2000-ish or so.
You can use bittorrent to do something approximately like xcp though at
> Henning Fehrmann
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf