[Beowulf] Rsync - checksums

Stu Midgley sdm900 at gmail.com
Tue Jun 18 08:06:43 PDT 2019


if you get it working, I'd be interested in it :)

I personally like the way tar does it, where you can provide your own
"compression" which I've used to insert checksums into the stream.

On Tue, Jun 18, 2019 at 11:03 PM Stu Midgley <sdm900 at gmail.com> wrote:

> Are you rsyncing over ssh?  If so, get HPN-SSH and use the non-cipher.
> MUCH faster again :)
>
> On Tue, Jun 18, 2019 at 11:00 PM Bill Wichser <bill at princeton.edu> wrote:
>
>> Well thanks for THAT pointer!  Using --checksum-choice=none results in
>> speedup of somewhere between 2-3 times.  That's my validation of the
>> checksum theory things have been pointing towards.  Now to get xxhash
>> into rsync and I think we are all set.
>>
>> Thanks,
>> Bill
>>
>> On 6/18/19 9:57 AM, Ellis H. Wilson III wrote:
>> > On 6/18/19 9:16 AM, Bill Wichser wrote:
>> >> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64.  We've tried a
>> >> number of recompiles.  gcc, Intel.  The only thing between identical
>> >> compiles was the md4 vs md5.
>> >>
>> >> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete
>> >> --delete-after --files-from=...
>> >>
>> >> I'm not asking for help.  Just if anyone had attempted to change the
>> >> algorithm into something much faster.
>> >>
>> >> I refer you to this project https://cyan4973.github.io/xxHash/ where
>> >> there is a table of speeds.  Regardless of what anyone might
>> >> speculate, we are pursuing this route of changing out the algorithm.
>> >> Maybe it's all for naught.  Maybe it isn't.  But in a few weeks
>> >> hopefully we'll have determined.
>> >
>> > Very interesting.  From the rsync man page:
>> >
>> > "Note that rsync always verifies that each transferred file was
>> > correctly reconstructed  on  the  receiving  side  by checking  a
>> > whole-file checksum that is generated as the file is transferred, but
>> > that automatic after-the-transfer verification has nothing to do with
>> > this option’s before-the-transfer "Does this file need to be updated?"
>> > check."
>> >
>> > So it sounds like you have sufficient churn in large files that the
>> > checksum validation post-transfer is your bottleneck.  Short of hacking
>> > rsync to use a faster algorithm, your remaining choice is to use the
>> > --checksum-choice=STR and set it to none, and then perform your own
>> > hashing out-of-band to check the transferred data using the list you
>> > have provided via in files-from.  This will nerf rsync's ability to do
>> > delta-transfer, which may be ok depending on the nature of your
>> churning
>> > files.  If your pipes are huge (atypical for DR), your CPU is weak, and
>> > your churning data is mostly completely new or completely changed
>> files,
>> > --checksum-choice=none may work very well for you.
>> >
>> > Best,
>> >
>> > ellis
>> >
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
>
> --
> Dr Stuart Midgley
> sdm900 at gmail.com
>


-- 
Dr Stuart Midgley
sdm900 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190618/9a19bc3b/attachment.html>


More information about the Beowulf mailing list