[Beowulf] MPI and Redhat9 NFS slow down

Tue Aug 24 08:51:05 PDT 2004

Jack Chen chimou at mail.wsu.edu  wrote:

>If I start the same job but write the output to any other nfs mounted
>drives besides the master node, the job will be extremely slow.  In
>this case the same job took 10962 sec.

There could be a lot of different things going on here, off the
top of my head:

1.  If the NFS is served from a slave node AND that slave node
is running your code as well as managing the disks, AND your
code sucks up 99.9% of the CPU time, the code itself could be
competing with the NFS daemon.  Try it again with the "other
node" which is serving the NFS disks NOT running your code locally.

2.  Are the disks on the master node and "other node" the same?
If you hvae scsi 320 disks on the master and el cheapo ATA on
the slave you might see an effect like this.  Ditto if the
disk buffers are radically different in size.  Ditto if the
system memory available is much larger on one than the other
(because of the built in disk caching in linux.)

3.  Have you verified that the bandwidth you can achieve from
"other node" -> "master node" == "other node 1" -> "other node 2"?

4.  I have often observed code that writes a lot of small messages
to the output file at a very high rate.  This sort
of code tends to overwhelm anything marginal in a network/NFS
configuration.  If it's possible, try reconfiguring the
program to write output to the local disk /tmp/filename and
then when done, copy the completed output files in one
operation from that disk to the final location.  Note, this works
best if the nodes complete asynchronously.  If they all finish
simultaneously then you'll want to copy the output files
sequentially (more or less, you may be able to do 2 or 3 at once
without penalty).  Else it will be less efficient as
the NFS server disk heads jump all over the place trying to
write 8 files at once.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech