Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[bproc]MPI chokes

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jag agrajag at linuxpower.org
Thu Mar 15 07:44:48 PST 2001


On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> Erik Arjan Hendriks wrote:
> 
> > On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> > 
> >> I've installed Scyld on a small cluster and I'm trying to
> >> run the test programs that come with beompi
> >> 
> >> The codes run on one node. However, when I try to run
> >> on multiple nodes I get the following error
> >> 
> >> jarrett/home/edwardsa>mpirun -np 2 pi3p
> >> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
> >>     p4_error: latest msg from perror: Invalid argument
> >> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
> >> 

<snip>

> > 
> > BProc doesn't use any host names anywhere so nothing involving
> > hostnames will affect whether or an rfork works.
> > 
> > There's some other MPI issue going on here.
> > 
> > - Erik
> > 
> 
> Thanks for the reply. The program dies in the PMPI_INIT phase. What 
> should I be doing to figure this out?

Based on the error messages from your previous message, it looks like it
is trying to rfork to a node that is down.  What does the output of
'bpstat' on your cluster look like?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20010315/11649afa/attachment.bin


More information about the Beowulf mailing list