[Beowulf] Rsync - checksums

Bill Wichser bill at princeton.edu
Mon Jun 17 06:43:51 PDT 2019


We have moved to a rsync disk backup system, from TSM tape, in order to 
have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options 
but here we are.

md5 checksums take a lot of compute time with huge files and even with 
millions of smaller ones.  The bulk of the time for running rsync is 
spent in computing the source and destination checksums and we'd like to 
alleviate that pain of a cryptographic algorithm.

Googling around, I found no mention of using a technique like this to 
improve rsync performance.  I did find reference to a few hashing 
algorithms though which could certainly work here (xxhash, murmurhash, 
sbox, cityhash64).

Rsync has certainly been around for a few years!  We are going to pursue 
changing the current checksum algorithm and using something much faster. 
  If anyone has done this already and would like to share their 
experiences that would be wonderful. Ideally this could be some optional 
plugin for rsync where users could choose which checksummer to use.

Bill


More information about the Beowulf mailing list