Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Update on mpi problem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joe Landman landman at scalableinformatics.com
Wed Jul 9 20:58:32 PDT 2008


Ok ... thought this would be interesting for some folks.  As a reminder, 
using Open-MPI 1.2.6 for a customer code, seeing different behavior than 
in the past.  Scratching my head over it (seemingly non-deterministic).

I tried using '--mca btl ^sm' (turn off shared memory usage) on the 
non-infiniband machine, and ... it runs.  Repeatedly.  To completion.

Ok, over to the Infiniband machine.  I tried using '--mca btl ^sm'.  No 
dice (the tcp and openib are still available).

Next I tried turning off the tcp (ethernet)

	--mca btl ^sm,tcp

Nope.  Still doesn't work right.  Hmmm....  One left.  Turn off openib 
(infiniband).


	--mca btl ^sm,openib

Yup.  It works.  Repeatedly.  To completion.

It looks like this is an MPI stack issue of some sort.  I'll ping the 
Open-MPI list and see what they think.

Thanks to all the suggestions and comments.

FWIW, I also pulled down the DDT tool from Allinea, with the thought of 
testing it, and seeing if I could figure out where the problem was with 
the code.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list