[Beowulf] Rsync - checksums

Peter Kjellström cap at nsc.liu.se
Tue Jun 18 02:02:31 PDT 2019


On Mon, 17 Jun 2019 08:29:53 -0700
Christopher Samuel <chris at csamuel.org> wrote:

> On 6/17/19 6:43 AM, Bill Wichser wrote:
> 
> > md5 checksums take a lot of compute time with huge files and even
> > with millions of smaller ones.  The bulk of the time for running
> > rsync is spent in computing the source and destination checksums
> > and we'd like to alleviate that pain of a cryptographic algorithm.  
> 
> First of all I would note that rsync only uses checksums if you tell
> it to, otherwise it just uses file times and sizes to determine what
> to transfer.

As Chris says rsync decides if a files needs to be synced based on the
content of the file (by hashing it on both source and destination side).

It does _NOT_ protect the transfer with said checksum nor does it
verify the destination side write with it.

In the end the (significant) performance cost of using -c boils down to
the cost of doing open+read of each file on both source and destination
side (instead of just stat). The hasing algo is not the main problem.

/Peter


More information about the Beowulf mailing list