BeoMPI doesn't work with more than 3 nodes

Carlos J. García Orellana carlos at nernet.unex.es
Wed Mar 28 01:09:25 PST 2001


Hello,

I want to work with BeoMPI in our cluster, so I have started with examples.
First, it doesn't work because I was using a wrong 'mpirun' script, after
that,
the 'cpi' example works fine with 1, 2 or 3 processors.

However, when I try to use more nodes, it doesn´t work, why?

Please, which is the right setup to work with BeoMPI?.

Thanks.

Carlos.

PD: Output of executing 'cpi' with 3 and 4 nodes (p4dbg=10)

[root at nereapc mpiex]# mpirun -np 3 ./cpi -p4dbg 10
10: xm_30944: (-) using procgroup file /proc/self/fd/3
10: p0_30944: (0.000007) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.001176) hostname in first line of procgroup is -1
10: p0_30944: (0.001228) hostname for first entry in proctable is -1
10: p0_30944: (0.001257) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.001421) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30946: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.050455) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30948: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.098959) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.101561) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101661) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.101728) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101772) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.101811) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101868) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.141063) received type=1010101010, from=1
10: p0_30944: (0.141094) received type=1010101010, from=2
10: p0_30944: (0.141155) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.141198) sent msg of type 1010101010 from 0 to 2 via socket
7
Process 0 on -1
10: p0_30944: (0.143268) sent msg of type 0 from 0 to 2 via socket 7
10: p0_30944: (0.143493) sent msg of type 0 from 0 to 1 via socket 6
Process 2 on 2
Process 1 on 1
10: p0_30944: (0.143752) received type=0, from=1
10: p0_30944: (0.143826) received type=0, from=2
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000780
10: p0_30944: (0.143927) sent msg of type 0 from 0 to 2 via socket 7
10: p0_30944: (0.143973) sent msg of type 0 from 0 to 1 via socket 6
10: p0_30944: (0.144176) received type=0, from=2
10: p0_30944: (0.144249) sent msg of type 0 from 0 to 1 via socket 6
10: p0_30944: (0.144301) received type=0, from=1
10: p0_30944: (0.144346) sent msg of type 0 from 0 to 2 via socket 7
[root at nereapc mpiex]#

[root at nereapc mpiex]# mpirun -np 4 ./cpi -p4dbg 10
10: xm_30951: (-) using procgroup file /proc/self/fd/3
10: p0_30951: (0.000012) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001121) hostname in first line of procgroup is -1
10: p0_30951: (0.001220) hostname for first entry in proctable is -1
10: p0_30951: (0.001265) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001425) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001529) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30953: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.050602) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30955: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.099172) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30957: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.147644) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.151243) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151315) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30951: (0.151384) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151449) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30951: (0.151493) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151557) sent msg of type 1010101010 from 0 to 2 via socket
7
p1_30953:  p4_error: Timeout in establishing connection to remote process: 0







More information about the Beowulf mailing list