[Beowulf] problems with G95 and mpich-1.2.6 --with-device=ch_p4mpd

Ted Sariyski tsariysk at craft-tech.com
Tue May 24 07:11:03 PDT 2005


Hi,

I have problems running Fortran code compiled with G95 and mpich-1.2.6
--with-device=ch_p4mpd (GCC 4.0.0 20050129 May 20 2005; i386 GNU/Linux
box running RedHat EL4 with kernel 2.6.9-5.0.5). Let me first say that
all problems disappear if mpich is compiled without mpd. The problem
looks like as if mpd redirects STDOUT. Any output comes only
after the application exits. I hope that I have system buffering
disabled:

G95_ABORT=TRUE
G95_UNBUFFERED_ALL=TRUE

I use the following arguments to configure mpich:
export FC=g95
export F90=g95
#> ./configure --with-flibname=mpich-g95 
--prefix=/usr/local/mpich-1.2.6_g95 --with-device=ch_p4mpd  -fc=g95 -f90=g95

Compilation and installation go without errors. Compilation of all
examples in $(MPICH)/examples go also without errors or warnings. 
Problems are
at execution time. I'll use pi3.f for illustration.

1. With mpich compiled without mpd I'm able to execute pi3 both with and 
without mpirun:

#> ./pi3
 Process             0  of             1  is alive
  pi is approximately: 3.1415926535898580  Error is: 0.0000000000000644
FORTRAN STOP

With mpd it runs but doesn't return. It doesn't return any output even
after the process is killed. To run pi3 with mpd I HAVE TO use mpirun
and even then output comes only after the processes exit:

#>  mpirun -np 3 ./pi3
222       #--> there should be output
333       #--> there should be output
444       #--> there should be output
0         #--> quit and print all the above outpus:
 Process  0  of  3  is alive
Enter the number of intervals: (0 quits)
  pi is approximately: 3.1415943444698600  Error is: 0.0000016908800688
Enter the number of intervals: (0 quits)
  pi is approximately: 3.1415934050920500  Error is: 0.0000007515022533
Enter the number of intervals: (0 quits)
  pi is approximately: 3.1415930763098100  Error is: 0.0000004227200181
Enter the number of intervals: (0 quits)
 Process  1  of  3  is alive
 Process  2  of  3  is alive

It looks like as if mpd buffers the output.

I compiled compiled mpich with pgf90. Again, with mpd it behaves the same
way, as if mpd buffers the output. The difference between versions of
mpich compiled with pgf90 and g95 (both with mpd) is that with pgf90 I
am able to run mycode while with g95 I get runtime errors. 

#> mpirun -np 1 mycode_dbg_g95  data/jet_a.inp --
[mpdcon]: console failed to retrieve msg from control stream, errno = 104
#> mpirun -np 2 mycode_dbg_g95  data/jet_a.inp --

For np>1 it returns without complains and without output.

Any ideas what's wrong?
Thanks in advance, Ted




More information about the Beowulf mailing list