Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Bizarre problems when adding a PPC machine...

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

John Nelson john at computation.com
Sun Jan 13 21:57:27 PST 2002


Hi all,

I really hate to bother the mailing list but this one has me somewhat 
stumped.  I have a four node cluster comprising Linux machines and one 
PPC machine.  The Linux machines have been adequately tested and play 
well together.  That PPC machine is another matter.  When I include the 
PPC machine (a Mac 8500 running YellowDog Linux) in my network 
cluster... well things fall apart.  Here's what appears on the console 
after running a simple test on my "root" node....


[john at adenine examples]$ ./mpirun -np 4 simpleio
p2_9722:  p4_error: Could not allocate memory for commandline args: 
553648128
bm_list_24602: (4.056938) Listener: Unable to interrupt client pid=24601.
Connection failed for reason: : Connection refused
p1_1962:  p4_error: net_recv read:  probable EOF on socket: 1
[john at adenine examples]$ Connection failed for reason: : Connection refused
p3_1283:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_24602: (4.076335) Listener: Unable to interrupt client pid=24601.
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Broken pipe
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Broken pipe
Connection failed for reason: : Connection refused
Broken pipe
bm_list_24602:  p4_error: net_recv read:  probable EOF on socket: 1


Connection refused is a strange strange message because RSH seems to be 
working well as do other networking applications.  I imagine that one 
reason could be MPICH version differences between the different 
architectures.  These are the versions of the RPM libraries installed:

    PPC: mpich-1.2.0-1a
    Linux: mpich-1.2.0-12

But I also compiled and installed the source code on both classes of 
machines.

Any ideas.  Its probably something simple but being a Beowulf newbie, 
its beyond me right now.

-- John

-- 
_________________________________________________________

John T. Nelson
President       |      Computation.com Inc.
mail:           |      john at computation.com
company:        |      http://www.computation.com/
journal:        |      http://www.computation.org/

_________________________________________________________





More information about the Beowulf mailing list