mpich error (on 2.4.7 intel)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
tekka99 at libero.it tekka99 at libero.itWed Aug 1 05:35:57 PDT 2001
- Next message: SMP Kernel for Scyld?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
I have a 2.4.7 kernel machine on intel, mpich 1.2.1, pgf90.
The system is RH 7.1.
I'm trying to run a program with this command
/usr/local/mpich/bin/mpirun -np 4 hydrompi
I'm using at the moment only one smp machine with 8 cpus and
2.5Gb of RAM.
I receive this kind of error:
PGFIO/stdio: Resource temporarily unavailable
PGFIO-F-/list-directed write/unit=6/error code returned by host stdio -
11.
File name = stdout formatted, sequential access record = 32
In source file main.F90, at line number 74
rm_l_1_3349: p4_error: net_recv read: probable EOF on socket: 1
rm_l_2_3354: p4_error: net_recv read: probable EOF on socket: 1
rm_l_3_3359: p4_error: net_recv read: probable EOF on socket: 1
bm_list_3344: p4_error: net_recv read: probable EOF on socket: 1
At line 74 of main.F90 I have:
if(mype.eq.0)then
year=told*tnow*year_in_secs
write(*,*)'Calculating step : ',nstep
write(*,99)'t_i, t_f, dt : ',told,t,dt
write(*,99)'a_i, a_f : ',at,atnew
write(*,99)'z_i, z_f : ',redshiftold,redshift
write(*,99)'Hubble const.: ',hubble
write(*,*)'age of the universe (Myears) : ',year
write(*,*)'' <<<<<THIS IS LINE 74>>>>>
99 format(1x,a31,3(1x,e13.7))
endif
Can anyone give me some hints on whhere to search?
As the lines above are only prints to stdout, if I comment them,
recompile and relaunch, I receive now:
1 - MPI_SENDRECV_REPLACE : Null communicator
[1] Aborting program !
[1] Aborting program!
p1_3714: p4_error: : 197
rm_l_1_3715: p4_error: interrupt SIGINT: 2
0 - MPI_SENDRECV_REPLACE : Null communicator
[0] Aborting program !
rm_l_3_3725: p4_error: net_recv read: probable EOF on socket: 1
rm_l_2_3720: p4_error: net_recv read: probable EOF on socket: 1
bm_list_3710: p4_error: net_recv read: probable EOF on socket: 1
If I try rsh or rlogin they work inside the machine.
Any suggestions?
PS: I have two NICs on the machine (only one is UP anyway). Can this be
part of the problem? Am I using ipc or tcp?
Thanks in advance.
Bye,
Gianluca Cecchi
- Next message: SMP Kernel for Scyld?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
