[Beowulf] NFS over RDMA performance confusion

Ellis H. Wilson III ellis at cse.psu.edu
Thu Sep 13 08:38:11 PDT 2012


On 09/13/2012 11:21 AM, holway at th.physik.uni-frankfurt.de wrote:
>> I assume so, but just to be clear you witnessed this behavior even with
>> the -I (directio) parameter?
>
> Yes.

Sorry for the confusion, this question was aimed at Joe, not you Andrew. 
  I was wondering if Joe had seen caching effects even when using IOzone 
with the -I parameter.

>> Can you rerun those tests with, 16 and 32 procs?  I've run into some
>
> 1 proc	Children see throughput for 1 random writers 	=   46036.32 KB/sec
> 2 proc	Children see throughput for 2 random writers 	=   82828.13 KB/sec
> 4 proc	Children see throughput for 4 random writers 	=  126709.65 KB/sec
> 8 proc	Children see throughput for 8 random writers 	=  190070.96 KB/sec
> 16 proc	Children see throughput for 16 random writers 	=  273970.94 KB/sec
>
> 1 proc	Children see throughput for 1 random readers 	=  109169.52 KB/sec
> 2 proc	Children see throughput for 2 random readers 	=  202556.82 KB/sec
> 4 proc	Children see throughput for 4 random readers 	=  381504.25 KB/sec
> 8 proc	Children see throughput for 8 random readers 	=  719108.27 KB/sec
> 16 proc	Children see throughput for 16 random readers 	= 1152648.13 KB/sec

Ah, this looks great!  You added 50% IOPs with the doubling of procs, 
and I would bet you could squeeze a little more out by going to 24 or 32 
procs.

> I am quite sure I am not doing any local caching.

Ok, great!  But remote caching likely is still happening unless you blow 
away those files in between runs, so make sure you're doing that. 
Obviously this is harder for the reads, but if you have root permissions 
to the nexgenta gear just nuke the kernel buffer cache on that end.

> Why is each process IO limited like that?

Anytime a process is forced to wait or does so voluntarily, you are 
going to run into this type of limiting.  By increasing the numbers of 
threads or processes you are able to "hide" some of this turn-around gap 
because another process that is available to run jumps right in and uses 
the bandwidth.

My dissertation /should/ fix this such that a single process can get 
full bandwidth, but that's some 2 years and a bunch of sleepless nights 
away, so don't hold your breath, ;D.

Best,

ellis


More information about the Beowulf mailing list