[Beowulf] copying data between clusters

Michael Di Domenico mdidomenico4 at gmail.com
Fri Mar 5 09:32:37 PST 2010


As i expect from the smartest sysadmins on the planet, everyone has
over analyzed the issue... :)

lets see if i can clarify

assuming there are two clusters - clusterA and clusterB

Each cluster is 32nodes and has 50TB of storage attached

the aggregate network bandwidth between the clusters is 800MB/sec

the problem is the per-node bandwidth on clusterB is 30MB/sec

so i use a single node to copy the 20TB of data from clusterB, yes
it's going to take me 7days to copy everything

I'd like to paralyze that across multiple nodes to drive the aggregate up

I was hoping someone would pop up say, hey use this magical piece of
software. (of which im unable to locate)..



On Fri, Mar 5, 2010 at 11:30 AM, kyron <kyron at neuralbs.com> wrote:
> On Fri, 05 Mar 2010 11:22:14 -0500, Mike Davis <jmdavis1 at vcu.edu> wrote:
>> Michael Di Domenico wrote:
>>> How does one copy large (20TB) amounts of data from one cluster to
>>> another?
>>>
>>> Assuming that each node in the cluster can only do about 30MB/sec
>>> between clusters and i want to preserve the uid/gid/timestamps, etc
>>>
>> If the clusters are co-lo I wouldn't copy I would use shared storage. If
>
>> they are not co-located I would use patience.
>>
>> Seriously though, for a one time copy, I would consider copying to an
>> external system and then physically moving that system. To do this and
>> preserve ownerships you will need to duplicate accounts and groups.
>
>
> ...and we are all assuming non-compressibility; otherwise, use pbzip2 ;)
>



More information about the Beowulf mailing list