[Beowulf] Torrents for HPC

Bernard Li bernard at vanhpc.org
Mon Jun 11 11:17:53 PDT 2012


Hi all:

I'd also like to point you guys to pcp:

http://www.theether.org/pcp/

It's a bit old, but should still build on modern systems.  It would be
nice if somebody picks up development after all these years (hint
hint) :-)

Cheers,

Bernard

On Mon, Jun 11, 2012 at 11:10 AM, Joe Landman
<landman at scalableinformatics.com> wrote:
> On 06/11/2012 02:02 PM, Jesse Becker wrote:
>
>> I looked into doing something like this on 50-node cluster to
>> synchronize several hundred GB of semi-static data used in /scratch.
>> I found that the time to build the torrent files--calculating checksums
>> and such--was *far* more time consuming than the actual file
>> distribution.  This is on top of the rather severe IO hit on the "seed"
>> box as well.
>>
>
> A long while ago, we developed 'xcp' which did data distribution from 1
> machine to many machines, and was quite fast (non-broadcast).
> Specifically for moving some genomic/proteomic databases to remote
> nodes.  Didn't see much interest in it, so we shelved it.  It worked
> like this
>
>        xcp file remote_path [--nodes node1[,node2....]] [--all]
>
> We were working on generalizing it for directories and other things as
> well, but as I noted, people were starting to talk (breathlessly at the
> time) about torrents for distribution, so we pushed it off and forgot
> about it.
>
>> I fought with it for a while, but came to the conclusion that *for
>> _this_ data*, and how quickly it changed, torrents weren't the way to
>> go--largely because of the cost of creating the torrent in the first
>> place.
>>
>> However, I do think that similar systems could be very useful, if
>> perhaps a bit less strict in their tests.  The peer-to-peer model is
>> uselful, and (in some cases) simple size/date check could be enough to
>> determine when (re)copying a file.
>>
>> One thing torrent's don't handle are file deletions, which opens up a
>> few new problems.
>>
>> Eventually, I moved to a distrbuted rsync tree, which worked for a
>> while, but was slightly fragile.  Eventually, we dropped the whole
>> thing when we purchased a sufficiently fast storage system.
>
> This is one of the things that drove us to building fast storage
> systems.  Data motion is hard, and a good fast storage unit with some
> serious data movement cannons and high power storage can solve the
> problem with greater ease/elegance.
>
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>        http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list