[Beowulf] Rsync - checksums

Bill Wichser bill at princeton.edu
Tue Jun 18 08:05:03 PDT 2019


No.  Using the rsync daemon on the receiving end.

Bill

On 6/18/19 11:03 AM, Stu Midgley wrote:
> Are you rsyncing over ssh?  If so, get HPN-SSH and use the non-cipher.  
> MUCH faster again :)
> 
> On Tue, Jun 18, 2019 at 11:00 PM Bill Wichser <bill at princeton.edu 
> <mailto:bill at princeton.edu>> wrote:
> 
>     Well thanks for THAT pointer!  Using --checksum-choice=none results in
>     speedup of somewhere between 2-3 times.  That's my validation of the
>     checksum theory things have been pointing towards.  Now to get xxhash
>     into rsync and I think we are all set.
> 
>     Thanks,
>     Bill
> 
>     On 6/18/19 9:57 AM, Ellis H. Wilson III wrote:
>      > On 6/18/19 9:16 AM, Bill Wichser wrote:
>      >> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64.  We've tried a
>      >> number of recompiles.  gcc, Intel.  The only thing between
>     identical
>      >> compiles was the md4 vs md5.
>      >>
>      >> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete
>      >> --delete-after --files-from=...
>      >>
>      >> I'm not asking for help.  Just if anyone had attempted to change
>     the
>      >> algorithm into something much faster.
>      >>
>      >> I refer you to this project https://cyan4973.github.io/xxHash/
>     where
>      >> there is a table of speeds.  Regardless of what anyone might
>      >> speculate, we are pursuing this route of changing out the
>     algorithm.
>      >> Maybe it's all for naught.  Maybe it isn't.  But in a few weeks
>      >> hopefully we'll have determined.
>      >
>      > Very interesting.  From the rsync man page:
>      >
>      > "Note that rsync always verifies that each transferred file was
>      > correctly reconstructed  on  the  receiving  side  by checking  a
>      > whole-file checksum that is generated as the file is transferred,
>     but
>      > that automatic after-the-transfer verification has nothing to do
>     with
>      > this option’s before-the-transfer "Does this file need to be
>     updated?"
>      > check."
>      >
>      > So it sounds like you have sufficient churn in large files that the
>      > checksum validation post-transfer is your bottleneck.  Short of
>     hacking
>      > rsync to use a faster algorithm, your remaining choice is to use the
>      > --checksum-choice=STR and set it to none, and then perform your own
>      > hashing out-of-band to check the transferred data using the list you
>      > have provided via in files-from.  This will nerf rsync's ability
>     to do
>      > delta-transfer, which may be ok depending on the nature of your
>     churning
>      > files.  If your pipes are huge (atypical for DR), your CPU is
>     weak, and
>      > your churning data is mostly completely new or completely changed
>     files,
>      > --checksum-choice=none may work very well for you.
>      >
>      > Best,
>      >
>      > ellis
>      >
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 
> 
> 
> -- 
> Dr Stuart Midgley
> sdm900 at gmail.com <mailto:sdm900 at gmail.com>


More information about the Beowulf mailing list