BeoMPI doesn't work with more than 3 nodes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Carlos J. García Orellana carlos at nernet.unex.esWed Mar 28 01:09:25 PST 2001
- Previous message: Master node install stops during "performing post install configuration"
- Next message: help please
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, I want to work with BeoMPI in our cluster, so I have started with examples. First, it doesn't work because I was using a wrong 'mpirun' script, after that, the 'cpi' example works fine with 1, 2 or 3 processors. However, when I try to use more nodes, it doesn´t work, why? Please, which is the right setup to work with BeoMPI?. Thanks. Carlos. PD: Output of executing 'cpi' with 3 and 4 nodes (p4dbg=10) [root at nereapc mpiex]# mpirun -np 3 ./cpi -p4dbg 10 10: xm_30944: (-) using procgroup file /proc/self/fd/3 10: p0_30944: (0.000007) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30944: (0.001176) hostname in first line of procgroup is -1 10: p0_30944: (0.001228) hostname for first entry in proctable is -1 10: p0_30944: (0.001257) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30944: (0.001421) Beowulf: using beowulf version of gethostbyname_p4 10: rm_30946: (-) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30944: (0.050455) Beowulf: using beowulf version of gethostbyname_p4 10: rm_30948: (-) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30944: (0.098959) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30944: (0.101561) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30944: (0.101661) sent msg of type 1010101010 from 0 to 2 via socket 7 10: p0_30944: (0.101728) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30944: (0.101772) sent msg of type 1010101010 from 0 to 2 via socket 7 10: p0_30944: (0.101811) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30944: (0.101868) sent msg of type 1010101010 from 0 to 2 via socket 7 10: p0_30944: (0.141063) received type=1010101010, from=1 10: p0_30944: (0.141094) received type=1010101010, from=2 10: p0_30944: (0.141155) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30944: (0.141198) sent msg of type 1010101010 from 0 to 2 via socket 7 Process 0 on -1 10: p0_30944: (0.143268) sent msg of type 0 from 0 to 2 via socket 7 10: p0_30944: (0.143493) sent msg of type 0 from 0 to 1 via socket 6 Process 2 on 2 Process 1 on 1 10: p0_30944: (0.143752) received type=0, from=1 10: p0_30944: (0.143826) received type=0, from=2 pi is approximately 3.1416009869231249, Error is 0.0000083333333318 wall clock time = 0.000780 10: p0_30944: (0.143927) sent msg of type 0 from 0 to 2 via socket 7 10: p0_30944: (0.143973) sent msg of type 0 from 0 to 1 via socket 6 10: p0_30944: (0.144176) received type=0, from=2 10: p0_30944: (0.144249) sent msg of type 0 from 0 to 1 via socket 6 10: p0_30944: (0.144301) received type=0, from=1 10: p0_30944: (0.144346) sent msg of type 0 from 0 to 2 via socket 7 [root at nereapc mpiex]# [root at nereapc mpiex]# mpirun -np 4 ./cpi -p4dbg 10 10: xm_30951: (-) using procgroup file /proc/self/fd/3 10: p0_30951: (0.000012) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.001121) hostname in first line of procgroup is -1 10: p0_30951: (0.001220) hostname for first entry in proctable is -1 10: p0_30951: (0.001265) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.001425) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.001529) Beowulf: using beowulf version of gethostbyname_p4 10: rm_30953: (-) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.050602) Beowulf: using beowulf version of gethostbyname_p4 10: rm_30955: (-) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.099172) Beowulf: using beowulf version of gethostbyname_p4 10: rm_30957: (-) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.147644) Beowulf: using beowulf version of gethostbyname_p4 10: p0_30951: (0.151243) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30951: (0.151315) sent msg of type 1010101010 from 0 to 2 via socket 7 10: p0_30951: (0.151384) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30951: (0.151449) sent msg of type 1010101010 from 0 to 2 via socket 7 10: p0_30951: (0.151493) sent msg of type 1010101010 from 0 to 1 via socket 6 10: p0_30951: (0.151557) sent msg of type 1010101010 from 0 to 2 via socket 7 p1_30953: p4_error: Timeout in establishing connection to remote process: 0
- Previous message: Master node install stops during "performing post install configuration"
- Next message: help please
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
