[Beowulf] Rsync - checksums

Bill Wichser bill at princeton.edu
Tue Oct 1 06:26:47 PDT 2019


I used xxHash-0.7.0 to build against.  You'll need to grab a version and 
install.  For the actual rsync I have a diff, xxhash.patch along with 
the rpms for rsync in

https://tigress-web.princeton.edu/~bill/

If I get time I'll try and pass this to the upstream rsync folks.  It is 
performing about the same speed as using --checksum so we are happy. 
This has been in production and seems to work fine.

Bill

On 9/30/19 8:55 PM, Stu Midgley wrote:
> That's pretty awesome, are you going to make it available?  or push it 
> upstream?
> 
> If not... how can we get it?
> 
> On Tue, Oct 1, 2019 at 1:09 AM Bill Wichser <bill at princeton.edu 
> <mailto:bill at princeton.edu>> wrote:
> 
>     Just wanted to circle back on my orginal question.  I changed the rsync
>     code adding xxhash and we see about a 3x speedup.  Good enough since it
>     is very close to not using any checksum speedups.
> 
>     Bill
> 
>     On 6/17/19 9:43 AM, Bill Wichser wrote:
>      > We have moved to a rsync disk backup system, from TSM tape, in
>     order to
>      > have a DR for our 10 PB GPFS filesystem.  We looked at a lot of
>     options
>      > but here we are.
>      >
>      > md5 checksums take a lot of compute time with huge files and even
>     with
>      > millions of smaller ones.  The bulk of the time for running rsync is
>      > spent in computing the source and destination checksums and we'd
>     like to
>      > alleviate that pain of a cryptographic algorithm.
>      >
>      > Googling around, I found no mention of using a technique like
>     this to
>      > improve rsync performance.  I did find reference to a few hashing
>      > algorithms though which could certainly work here (xxhash,
>     murmurhash,
>      > sbox, cityhash64).
>      >
>      > Rsync has certainly been around for a few years!  We are going to
>     pursue
>      > changing the current checksum algorithm and using something much
>     faster.
>      >   If anyone has done this already and would like to share their
>      > experiences that would be wonderful. Ideally this could be some
>     optional
>      > plugin for rsync where users could choose which checksummer to use.
>      >
>      > Bill
>      > _______________________________________________
>      > Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>      > To change your subscription (digest mode or unsubscribe) visit
>      > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 
> 
> 
> -- 
> Dr Stuart Midgley
> sdm900 at gmail.com <mailto:sdm900 at gmail.com>


More information about the Beowulf mailing list