Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

mpich error (on 2.4.7 intel)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

tekka99 at libero.it tekka99 at libero.it
Wed Aug 1 05:35:57 PDT 2001


Hello,

I have a 2.4.7 kernel machine on intel, mpich 1.2.1, pgf90.
The system is RH 7.1.
I'm trying to run a program with this command

/usr/local/mpich/bin/mpirun -np 4 hydrompi

I'm using at the moment only one smp machine with 8 cpus and
2.5Gb of RAM.

I receive this kind of error:

PGFIO/stdio: Resource temporarily unavailable
PGFIO-F-/list-directed write/unit=6/error code returned by host stdio - 
11.
File name = stdout     formatted, sequential access   record = 32
In source file main.F90, at line number 74 
rm_l_1_3349:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_3354:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_3_3359:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_3344:  p4_error: net_recv read:  probable EOF on socket: 1


At line 74 of main.F90 I have:

          if(mype.eq.0)then
            year=told*tnow*year_in_secs
            write(*,*)'Calculating step : ',nstep
            write(*,99)'t_i, t_f, dt : ',told,t,dt
            write(*,99)'a_i, a_f     : ',at,atnew
            write(*,99)'z_i, z_f     : ',redshiftold,redshift
            write(*,99)'Hubble const.: ',hubble
            write(*,*)'age of the universe (Myears) : ',year
            write(*,*)''        <<<<<THIS IS LINE 74>>>>>
99 format(1x,a31,3(1x,e13.7))
          endif

Can anyone give me some hints on whhere to search?

As the lines above are only prints to stdout, if I comment them, 
recompile and relaunch, I receive now:

1 - MPI_SENDRECV_REPLACE : Null communicator
[1]  Aborting program !
[1] Aborting program!
p1_3714:  p4_error: : 197
rm_l_1_3715:  p4_error: interrupt SIGINT: 2
0 - MPI_SENDRECV_REPLACE : Null communicator
[0]  Aborting program !
rm_l_3_3725:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_3720:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_3710:  p4_error: net_recv read:  probable EOF on socket: 1

If I try rsh or rlogin they work inside the machine.
Any suggestions?

PS: I have two NICs on the machine (only one is UP anyway). Can this be 
part of the problem? Am I using ipc or tcp?

Thanks in advance.
Bye,
Gianluca Cecchi





More information about the Beowulf mailing list