[Beowulf] 10G and rsync

Bill Abbott babbott at rutgers.edu
Thu Jan 2 07:52:18 PST 2020


Fpsync and parsyncfp both do a great job with multiple rsyncs, although 
you have to be careful about --delete.  The best performance for fewer, 
larger files, if it's an initial or one-time transfer, is bbcp with 
multiple streams.

Also jack up the tcp send buffer and turn on jumbo frames.

Bill

On 1/2/20 10:48 AM, Paul Edmon wrote:
> I also highly recommend fpsync.  Here is a rudimentary guide to this: 
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rc.fas.harvard.edu%2Fresources%2Fdocumentation%2Ftransferring-data-on-the-cluster%2F&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=us%2FmQefb44G%2BaCVVZRoJ797uI3TIgrnmR%2FU0WXsmskE%3D&reserved=0 
> 
> 
> I can get line speed with fpsync but single rsyncs usually only get up 
> to about 0.3-1 GB/s.  You really want that parallelism.  We use fpsync 
> for all our large scale data movement here and Globus for external 
> transfers.
> 
> -Paul Edmon-
> 
> On 1/2/20 10:45 AM, Joe Landman wrote:
>>
>> On 1/2/20 10:26 AM, Michael Di Domenico wrote:
>>> does anyone know or has anyone gotten rsync to push wire speed
>>> transfers of big files over 10G links?  i'm trying to sync a directory
>>> with several large files.  the data is coming from local disk to a
>>> lustre filesystem.  i'm not using ssh in this case.  i have 10G
>>> ethernet between both machines.   both end points have more then
>>> enough spindles to handle 900MB/sec.
>>>
>>> i'm using 'rsync -rav --progress --stats -x --inplace
>>> --compress-level=0 /dir1/ /dir2/' but each file (which is 100's of
>>> GB's) is getting choked at 100MB/sec
>>
>> A few thoughts
>>
>> 1) are you sure your traffic is traversing the high bandwidth link? 
>> Always good to check ...
>>
>> 2) how many files are you xfering?  Are these generally large files or 
>> many small files, or a distribution with a long tail towards small 
>> files?  The latter two will hit your metadata system fairly hard, and 
>> in the case of Lustre, performance will depend critically upon the 
>> MDS/MDT architecture and implementation. FWIW, the big system I was 
>> working on setting up late last year, we hit MIOP level reads/writes, 
>> but then again, this was architected correctly.
>>
>> 3) wire speed xfers are generally the exception unless you are doing 
>> large sequential single files.   There are tricks you can do to enable 
>> this, but they are often complex.  You can use the array of 
>> writers/readers, and leverage parallelism, but you risk invoking 
>> congestion/pause throttling on your switch.
>>
>>
>>>
>>> running iperf and dd between the client and the lustre hits 900MB/sec,
>>> so i fully believe this is an rsync limitation.
>>>
>>> googling around hasn't lent any solid advice, most of the articles are
>>> people that don't check the network first...
>>>
>>> with the prevalence of 10G these days, i'm surprised this hasn't come
>>> up before, or my google-fu really stinks.  which doesn't bode well
>>> given its the first work day of 2020 :(
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit 
>>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=yEYsxZWvLxkPpQPpqDer%2FXwVmkPcpLiK%2FQmzOwKrzCI%3D&reserved=0 
>>>
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=yEYsxZWvLxkPpQPpqDer%2FXwVmkPcpLiK%2FQmzOwKrzCI%3D&reserved=0 
> 


More information about the Beowulf mailing list