[Beowulf] Rsync - checksums

Loncaric, Josip josip at lanl.gov
Mon Jun 17 09:34:33 PDT 2019


Why not use existing pftool?

https://github.com/pftool/pftool

-Josip

On 6/17/19 10:07 AM, Michael Di Domenico wrote:
> just out of morbid curiosity i popped through the rsync code.  it
> doesn't look terribly difficult to wedge in a new algo.  but honestly,
> if i was going to go through the trouble i'd write a new tool that
> walks the file tree in parallel and logs the checksums to a database.
> i've had problems rsync'ing big filesystems in the past, so i try to
> avoid it as a DR or poor-man's snapshotting
>
> On Mon, Jun 17, 2019 at 11:30 AM Christopher Samuel <chris at csamuel.org> wrote:
>> On 6/17/19 6:43 AM, Bill Wichser wrote:
>>
>>> md5 checksums take a lot of compute time with huge files and even with
>>> millions of smaller ones.  The bulk of the time for running rsync is
>>> spent in computing the source and destination checksums and we'd like to
>>> alleviate that pain of a cryptographic algorithm.
>> First of all I would note that rsync only uses checksums if you tell it
>> to, otherwise it just uses file times and sizes to determine what to
>> transfer.
>>
>> rsync is also single-threaded, so I would take a look at what was
>> previously called parsync, but is now parsynfp :-)
>>
>> http://moo.nac.uci.edu/~hjm/parsync/
>>
>> There is the caveat there though:
>>
>> # As a warning, the main use case for parsyncfp is really only
>> # very large data transfers thru fairly fast network connections
>> # (>1Gb). Below this speed, rsync itself can saturate the
>> # connection, so there’s little reason to use parsyncfp and in
>> # fact the overhead of testing the existence of and starting more
>> # rsyncs tends to worsen its performance on small transfers to
>> # slightly less than rsync alone.
>>
>> Good luck!
>> Chris
>> --
>>     Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


-- 
Dr. Josip Loncaric, LANL, MS-T001, P.O. Box 1663, Los Alamos, NM 87545
mailto:josip at lanl.gov   Cell: +1-505-412-8490   Phone: +1-505-412-6538
--
E Pluribus Unum



More information about the Beowulf mailing list