[Beowulf] newbie question about mpich2 on heterogenous cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
baenni at kiecks.de baenni at kiecks.deTue Mar 22 05:03:50 PST 2005
- Previous message: [Beowulf] Daisychained rcp script
- Next message: [Beowulf] Alternative to MPI ABI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear List I installed mpich2-1.0 on my little cluster (2 Linux nodes and 3 Solaris nodes). I first worked only on the two linux nodes, where the programms run without troubles. But when I would like to invoke the solaris nodes, i.e. when I run the programs on a heterogenous cluster, it ents up in error messages. For some reoson, the -arch parameter is not implemented in mpich2-1.0. Does anyone have experience with such problems? Can I run mpich2 on a heterogonous cluster? Thanks in advance for any help mpiexec -n 1 -host shaw -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi _cpi : -n 1 -host devienne -path /home1/00117cfd/CFD_3D/example/PARALLEL/cpi _cpi : -n 1 -host gallay -path /export/home/baenni/example/PARALLEL/cpi _cpi : -n 2 -host gallay1 -path /export/home/baenni/example/PARALLEL/cpi _cpi aborting job: Fatal error in MPI_Bcast: Other MPI error, error stack: MPI_Bcast(821): MPI_Bcast(buf=0x8145480, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast(229): MPIC_Send(48): MPIC_Wait(308): MPIDI_CH3_Progress_wait(207): an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(492): connection_recv_fail(1728): MPIDU_Socki_handle_read(590): connection closed by peer (set=0,sock=1) aborting job: Fatal error in MPI_Bcast: Internal MPI error!, error stack: MPI_Bcast(821): MPI_Bcast(buf=1786e0, count=1, MPI_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast(197): MPIC_Recv(98): MPIC_Wait(308): MPIDI_CH3_Progress_wait(207): an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(849): [ch3:sock] received packet of unknown type (369098752) rank 4 in job 19 shaw_33110 caused collective abort of all ranks exit status of rank 4: killed by signal 9
- Previous message: [Beowulf] Daisychained rcp script
- Next message: [Beowulf] Alternative to MPI ABI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
