[Beowulf] mpich2 error

Ru-Zhen Li r.li at qmul.ac.uk
Tue Jan 31 02:54:49 PST 2006


Dear all,

I kept having this error message, I couldnt find out why, anybody have similar experience? Thanks!

aborting job:
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406): MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76):
MPIC_Sendrecv(161):
MPIC_Wait(321):
MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(489):
connection_recv_fail(1836):
MPIDU_Socki_handle_read(658): connection failure (set=0,sock=1,errno=104:Connection reset by peer)
rank 9 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 9: killed by signal 9
rank 7 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 7: killed by signal 9
rank 10 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 10: return code 13
rank 11 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 11: killed by signal 9
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20060131/74651db9/attachment.html>


More information about the Beowulf mailing list