[Beowulf] MPI and Redhat9 NFS slow down
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Tod Hagan tod at gust.sr.unh.eduThu Aug 26 11:32:09 PDT 2004
- Previous message: [Beowulf] MPI and Redhat9 NFS slow down
- Next message: [Beowulf] MPI and Redhat9 NFS slow down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 2004-08-23 at 18:52, Jack Chen wrote: > We recently built a 8-node PC Linux cluster running RedHat 9 (kernel: > 2.4.20-8smp #1 SMP). We use this system to run EPA's CMAQ > photochemical grid model...However when I tried to run any program that write output to other nfs > mounted drives I get very long delay. I'm not sure where the problem > is. I know the NFS automount is working fine because if I start the > job with just one processor (mpirun -np 1), I don't experience the > slow down. > > If I start the job on master node using 4 processors (mpirun -np 4) > and write to the master node (master2 0)...the run takes 168 sec > > If I start the same job but write the output to any other nfs mounted > drives besides the master node, the job will be extremely slow. In > this case the same job took 10962 sec. Jack, We've started running CCTM parallel here too. It's my impression that CCTM is very I/O intensive, and for that reason you should do all node zero I/O locally instead of to an NFS-mounted partition. Note that while all nodes pass data via MPI to node 0 which does the netCDF writes, the compute nodes read data via NFS/netCDF/ioapi throughout the run. It's possible that some of the read-only files holding input data could be located on an NFS-mount somewhere other than node 0 with little penalty, but I haven't tried this. If you really need to use an NFS mount for data, Kumaran Rajaram's suggestion to disable the 'noac' mount option is worth trying since MPI-IO isn't being used. And the other NFS optimizations suggested as well. I've been working under the assumption that all data should be local to node 0, so I'll be interested to see how you fare with NFS. By the way, I queried the CMAQ help desk for information regarding CCTM parallel I/O, here's the URL: http://bugz.unc.edu/show_bug.cgi?id=1120 The CMAQ help desk (which is really just a product entry in the CMAS bugzilla database) is likely the best place to ask your question. I've had good response on a couple of issues.
- Previous message: [Beowulf] MPI and Redhat9 NFS slow down
- Next message: [Beowulf] MPI and Redhat9 NFS slow down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
