[Beowulf] copying data between clusters

kyron kyron at neuralbs.com
Fri Mar 5 08:18:54 PST 2010


On Fri, 05 Mar 2010 11:00:03 -0500, Joe Landman
<landman at scalableinformatics.com> wrote:
> Michael Di Domenico wrote:
>> How does one copy large (20TB) amounts of data from one cluster to
>> another?
>> 
>> Assuming that each node in the cluster can only do about 30MB/sec
>> between clusters and i want to preserve the uid/gid/timestamps, etc
>> 
>> I know how i do it, but i'm curious what methods other people use...

Could you clarify? Are-you actually sending from NodeXX-clusterA to
NodeXX-ClusterB ? Are-we to assume aggregate bandwidth of Node*BW (as long
as you don't saturate the switch fabric)? Also, given my comment below, I
am assuming the 20TB of data is actually segmented (20TB/NodeCount) across
the nodes and not 20TB*NodeCount.

> I am biased of course, but Fedex-net with one of these: 
> http://scalableinformatics.com/jackrabbit
> 
> 1GB @ 30 MB/s is about 33s.  1TB @ 30 MB/s is about 33000s.  Or more 
> than 1/3 of a day.  20TB @ 30 MB/s ... you are looking at ~7 days to
write.
> 
> If you have a 1GB/s disk write speed (less than the above unit can do), 
> 1TB takes ~1000s, 20TB takes 20000s, about 1/4 of a day.
> 
> If the clusters are close enough (same data center) this could be a 
> shared storage but you will need a fast network between them.  If the 
> clusters are far enough to avoid direct connection, chances are 30 MB/s 
> may be optimistic on getting data between them.
> 
> BTW: 30 MB/s sounds suspiciously like either a) 1GbE sustained NFS speed

> for some nodes or b) the speed of an IDE drive.

Given I haven't seen single 20TB drives out there yet, I doubt it to be
the case. I wouldn't throw in NFS as a limiting factor (just yet) as I have
been able to have sustained 250MB/s data transfer rates (2xGigE using
channel bonding). And this figure is without jumbo frames so I do have some
protocol overhead loss. The sending server is a PERC 5/i raid with
4*300G*15kRPM drives while the receiving well...was loading onto RAM ;)


Eric Thibodeau



More information about the Beowulf mailing list