[Beowulf] copying big files (Henning Fehrmann)

David Mathog mathog at caltech.edu
Fri Aug 8 09:11:46 PDT 2008


Henning Fehrmann <henning.fehrmann at aei.mpg.de> wrote:

> Coping a big file onto all nodes in a cluster is a rather
> common problem. I would have thought that there might be a
> standard tool for distributing the files in an efficient way. 
> So far, I haven't found one.

This is what I use:

  http://saf.bio.caltech.edu/nettee.html

The production version is pretty much what you described.  The
development version is more flexible, allowing processing on each data
chunk, and data flow in either direction along the chain.

The biggest problem with chain methods is that it is difficult to
recover if something breaks in the middle during the transfer.  My
cluster is only 20 nodes and it has not been an issue, but on a 2000
node cluster it probably would be.  It is of course also important that
all of the nodes in the distribution chain have sufficient free network
and CPU resources.  If there are any slow nodes the whole chain will be
slow since the slow nodes will be rate limiting.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list