[bproc]MPI chokes

Jag agrajag at linuxpower.org
Thu Mar 15 07:44:48 PST 2001


On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> Erik Arjan Hendriks wrote:
> 
> > On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> > 
> >> I've installed Scyld on a small cluster and I'm trying to
> >> run the test programs that come with beompi
> >> 
> >> The codes run on one node. However, when I try to run
> >> on multiple nodes I get the following error
> >> 
> >> jarrett/home/edwardsa>mpirun -np 2 pi3p
> >> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
> >>     p4_error: latest msg from perror: Invalid argument
> >> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
> >> 

<snip>

> > 
> > BProc doesn't use any host names anywhere so nothing involving
> > hostnames will affect whether or an rfork works.
> > 
> > There's some other MPI issue going on here.
> > 
> > - Erik
> > 
> 
> Thanks for the reply. The program dies in the PMPI_INIT phase. What 
> should I be doing to figure this out?

Based on the error messages from your previous message, it looks like it
is trying to rfork to a node that is down.  What does the output of
'bpstat' on your cluster look like?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010315/11649afa/attachment.sig>


More information about the Beowulf mailing list