[Beowulf] Torrents for HPC

Joe Landman landman at scalableinformatics.com
Mon Jun 11 11:10:35 PDT 2012


On 06/11/2012 02:02 PM, Jesse Becker wrote:

> I looked into doing something like this on 50-node cluster to
> synchronize several hundred GB of semi-static data used in /scratch.
> I found that the time to build the torrent files--calculating checksums
> and such--was *far* more time consuming than the actual file
> distribution.  This is on top of the rather severe IO hit on the "seed"
> box as well.
>

A long while ago, we developed 'xcp' which did data distribution from 1 
machine to many machines, and was quite fast (non-broadcast). 
Specifically for moving some genomic/proteomic databases to remote 
nodes.  Didn't see much interest in it, so we shelved it.  It worked 
like this

	xcp file remote_path [--nodes node1[,node2....]] [--all]

We were working on generalizing it for directories and other things as 
well, but as I noted, people were starting to talk (breathlessly at the 
time) about torrents for distribution, so we pushed it off and forgot 
about it.

> I fought with it for a while, but came to the conclusion that *for
> _this_ data*, and how quickly it changed, torrents weren't the way to
> go--largely because of the cost of creating the torrent in the first
> place.
>
> However, I do think that similar systems could be very useful, if
> perhaps a bit less strict in their tests.  The peer-to-peer model is
> uselful, and (in some cases) simple size/date check could be enough to
> determine when (re)copying a file.
>
> One thing torrent's don't handle are file deletions, which opens up a
> few new problems.
>
> Eventually, I moved to a distrbuted rsync tree, which worked for a
> while, but was slightly fragile.  Eventually, we dropped the whole
> thing when we purchased a sufficiently fast storage system.

This is one of the things that drove us to building fast storage 
systems.  Data motion is hard, and a good fast storage unit with some 
serious data movement cannons and high power storage can solve the 
problem with greater ease/elegance.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615





More information about the Beowulf mailing list