[Beowulf] fast file copying
geoff at galitz.org
Thu May 10 15:06:31 PDT 2007
Thanks to all for responding... here is a follow up:
We push our datasets out as part of a service deployment routine
which includes a bunch of "other stuff" in addition to just getting
the data to the nodes. I went ahead and modified our service
deployment program to use dolly. Here is what we do:
- enter deployment phase
- check for member nodes that are alive
- dynamically build the config file
- bring the ring up
- start the transfer
- tear everything down
- enter next phase
With this system, we can support a dynamic environment where nodes go
on and offline at (our) will.
We use pdsh to do as much of the configuration and command execution
as possible. This made dolly a better choice for us rather than
nettee as we can issue the exact same command to all nodes in
parallel. Nettee required more specific commands on each node.
In our testing environment, we're getting as much as 45MB/sec and as
little as 11MB/sec in our various scenarios (mismatched hardware,
busy network, different types of data). We did achieve our primary
goal in reducing load on the master/server system. In our old setup,
our load would increase to 25+. With dolly, our load never exceeds 1.5.
I plan on also making the same test with torrent.
> The normal ring based disadvantages, reliability and performance.
> Seems like any number of things could make the ring based approach
> a poor choice, where the worst case of the ring could dramatically
> slow things down. Things like:
> * Head node's network connection is 10 times faster
> * A single node dies during the transfer
> * A single node joins late
> * A single node is very busy (I/O, memory constrained, or CPU)
More information about the Beowulf