[Beowulf] network transfer issue to disk, old versus new hardware

David Mathog mathog at caltech.edu
Mon Jun 4 10:02:13 PDT 2007


Bogdan Costescu wrote:

> On Sat, 2 Jun 2007, David Mathog wrote:
> 
> > I can't quite wrap my head around a recent nettee result, perhaps
> > one of the network gurus here can explain it.
> 
> IMHO, it's not a network issue, as is shown by your G results.
> 
> >    sync; accudate; dd if=/dev/zero bs=512 count=1000000 of=test.dat;
> 
> All your tests use bs=512 - why ? This makes unnecessary trips to 
> kernel code and back which result in an increased number of context 
> switches and significant slowdown.

It's a convenient number, it may slow things down slightly but clearly
it isn't rate limiting since piping that straight to /dev/null gives
rates of 650Mb/sec or higher.  

In any case, I figured the problem out.  The issue was that the
distro (Mandriva 2007.0) installed a while back on the older
machines turns on "athcool".  Athcool does cut the idle temperatures
of the nodes considerably, but apparently also prevents them from
performing this sort of transfer at full speed, whether or not buffer
is used.   When I turned athcool off, on just the receiving node, the
transfer rate for:

 sender:
   dd if=/dev/zero  bs=512 count=1000000 | \
   nettee -in - -v 63 -next next_node
  
 receiver:
   nettee -out test.dat

jumped from 7.7Mb/sec to 11.6Mb/sec.  So apparently athcool gets in the
way by preventing rapid shifts from disk to network IO, no matter which
process is doing them.  Which is interesting because it didn't have any
measurable effect on CPU bound processes.  I had thought it would shut
itself off and get out of the way when the CPU rate was high, but
apparently not.  When imaging nodes athcool isn't running, but I'll
have to keep this in mind when doing routine transfers of data across
the nodes.

On the newer machines cpufreq runs instead of athcool, and it didn't
make very much difference if that was running or not.  Apparently this
power saver does a much better job of detecting higher CPU load and
"getting out of the way" when it's present.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list