[Beowulf] 10G and rsync

Bill Abbott babbott at rutgers.edu
Thu Jan 2 07:56:42 PST 2020


If you have no choice but to use single rsync then either set up an 
rsyncd server on the other end to bypass ssh or use something like 
hpn-ssh for performance.

Bill

On 1/2/20 10:52 AM, Bill Abbott wrote:
> Fpsync and parsyncfp both do a great job with multiple rsyncs, although
> you have to be careful about --delete.  The best performance for fewer,
> larger files, if it's an initial or one-time transfer, is bbcp with
> multiple streams.
> 
> Also jack up the tcp send buffer and turn on jumbo frames.
> 
> Bill
> 
> On 1/2/20 10:48 AM, Paul Edmon wrote:
>> I also highly recommend fpsync.  Here is a rudimentary guide to this:
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rc.fas.harvard.edu%2Fresources%2Fdocumentation%2Ftransferring-data-on-the-cluster%2F&data=02%7C01%7Cbabbott%40rutgers.edu%7C274f2a02c8554251c4dd08d78f9be816%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135772084141981&sdata=4pAjOGbqw2dHH%2BJhkqRdsERupv9FRq06kL1EuQbpe%2F8%3D&reserved=0
>>
>>
>> I can get line speed with fpsync but single rsyncs usually only get up
>> to about 0.3-1 GB/s.  You really want that parallelism.  We use fpsync
>> for all our large scale data movement here and Globus for external
>> transfers.
>>
>> -Paul Edmon-
>>
>> On 1/2/20 10:45 AM, Joe Landman wrote:
>>>
>>> On 1/2/20 10:26 AM, Michael Di Domenico wrote:
>>>> does anyone know or has anyone gotten rsync to push wire speed
>>>> transfers of big files over 10G links?  i'm trying to sync a directory
>>>> with several large files.  the data is coming from local disk to a
>>>> lustre filesystem.  i'm not using ssh in this case.  i have 10G
>>>> ethernet between both machines.   both end points have more then
>>>> enough spindles to handle 900MB/sec.
>>>>
>>>> i'm using 'rsync -rav --progress --stats -x --inplace
>>>> --compress-level=0 /dir1/ /dir2/' but each file (which is 100's of
>>>> GB's) is getting choked at 100MB/sec
>>>
>>> A few thoughts
>>>
>>> 1) are you sure your traffic is traversing the high bandwidth link?
>>> Always good to check ...
>>>
>>> 2) how many files are you xfering?  Are these generally large files or
>>> many small files, or a distribution with a long tail towards small
>>> files?  The latter two will hit your metadata system fairly hard, and
>>> in the case of Lustre, performance will depend critically upon the
>>> MDS/MDT architecture and implementation. FWIW, the big system I was
>>> working on setting up late last year, we hit MIOP level reads/writes,
>>> but then again, this was architected correctly.
>>>
>>> 3) wire speed xfers are generally the exception unless you are doing
>>> large sequential single files.   There are tricks you can do to enable
>>> this, but they are often complex.  You can use the array of
>>> writers/readers, and leverage parallelism, but you risk invoking
>>> congestion/pause throttling on your switch.
>>>
>>>
>>>>
>>>> running iperf and dd between the client and the lustre hits 900MB/sec,
>>>> so i fully believe this is an rsync limitation.
>>>>
>>>> googling around hasn't lent any solid advice, most of the articles are
>>>> people that don't check the network first...
>>>>
>>>> with the prevalence of 10G these days, i'm surprised this hasn't come
>>>> up before, or my google-fu really stinks.  which doesn't bode well
>>>> given its the first work day of 2020 :(
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C274f2a02c8554251c4dd08d78f9be816%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135772084141981&sdata=Ea%2FIWr4AzIsOt%2BaEvgAnBvy%2B3gRzJaNHcH4pW1RgzF0%3D&reserved=0
>>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C274f2a02c8554251c4dd08d78f9be816%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135772084141981&sdata=Ea%2FIWr4AzIsOt%2BaEvgAnBvy%2B3gRzJaNHcH4pW1RgzF0%3D&reserved=0
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C274f2a02c8554251c4dd08d78f9be816%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135772084151978&sdata=y3VFfY05kcMx91Dvb3ZPcfxXUMzFWWlVfJVdByOYCIc%3D&reserved=0
> 


More information about the Beowulf mailing list