[Beowulf] mpich2 error in 8 nodes

Diego Moreno diegovmorenor at gmail.com
Sat Feb 18 10:05:37 PST 2006


 Dear All,

I kept having this error message, I couldnt find out why, anybody have
similar experience?

Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406): MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76):
MPIC_Sendrecv(152):
MPIC_Wait(321):
MPIDI_CH3_Progress_wait(209): an error occurred while handling an
event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(489):
connection_recv_fail(1836):
MPIDU_Socki_handle_read(658): connection failure
(set=0,sock=2,errno=104:Connection reset by peer)
aborting job:

but in 7 nodes run fine, and not errors

can you help me ?

thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20060218/92372e9e/attachment.html>


More information about the Beowulf mailing list