[Beowulf] Update on mpi problem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comWed Jul 9 20:58:32 PDT 2008
- Previous message: [Beowulf] A press release
- Next message: [Beowulf] Update on mpi problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ok ... thought this would be interesting for some folks. As a reminder, using Open-MPI 1.2.6 for a customer code, seeing different behavior than in the past. Scratching my head over it (seemingly non-deterministic). I tried using '--mca btl ^sm' (turn off shared memory usage) on the non-infiniband machine, and ... it runs. Repeatedly. To completion. Ok, over to the Infiniband machine. I tried using '--mca btl ^sm'. No dice (the tcp and openib are still available). Next I tried turning off the tcp (ethernet) --mca btl ^sm,tcp Nope. Still doesn't work right. Hmmm.... One left. Turn off openib (infiniband). --mca btl ^sm,openib Yup. It works. Repeatedly. To completion. It looks like this is an MPI stack issue of some sort. I'll ping the Open-MPI list and see what they think. Thanks to all the suggestions and comments. FWIW, I also pulled down the DDT tool from Allinea, with the thought of testing it, and seeing if I could figure out where the problem was with the code. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] A press release
- Next message: [Beowulf] Update on mpi problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
