[Beowulf] Rsync - checksums

Christopher Samuel chris at csamuel.org
Mon Jun 17 08:29:53 PDT 2019


On 6/17/19 6:43 AM, Bill Wichser wrote:

> md5 checksums take a lot of compute time with huge files and even with 
> millions of smaller ones.  The bulk of the time for running rsync is 
> spent in computing the source and destination checksums and we'd like to 
> alleviate that pain of a cryptographic algorithm.

First of all I would note that rsync only uses checksums if you tell it 
to, otherwise it just uses file times and sizes to determine what to 
transfer.

rsync is also single-threaded, so I would take a look at what was 
previously called parsync, but is now parsynfp :-)

http://moo.nac.uci.edu/~hjm/parsync/

There is the caveat there though:

# As a warning, the main use case for parsyncfp is really only
# very large data transfers thru fairly fast network connections
# (>1Gb). Below this speed, rsync itself can saturate the
# connection, so there’s little reason to use parsyncfp and in
# fact the overhead of testing the existence of and starting more
# rsyncs tends to worsen its performance on small transfers to
# slightly less than rsync alone.

Good luck!
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


More information about the Beowulf mailing list