[Beowulf] Update on mpi problem
landman at scalableinformatics.com
Wed Jul 9 20:58:32 PDT 2008
Ok ... thought this would be interesting for some folks. As a reminder,
using Open-MPI 1.2.6 for a customer code, seeing different behavior than
in the past. Scratching my head over it (seemingly non-deterministic).
I tried using '--mca btl ^sm' (turn off shared memory usage) on the
non-infiniband machine, and ... it runs. Repeatedly. To completion.
Ok, over to the Infiniband machine. I tried using '--mca btl ^sm'. No
dice (the tcp and openib are still available).
Next I tried turning off the tcp (ethernet)
--mca btl ^sm,tcp
Nope. Still doesn't work right. Hmmm.... One left. Turn off openib
--mca btl ^sm,openib
Yup. It works. Repeatedly. To completion.
It looks like this is an MPI stack issue of some sort. I'll ping the
Open-MPI list and see what they think.
Thanks to all the suggestions and comments.
FWIW, I also pulled down the DDT tool from Allinea, with the thought of
testing it, and seeing if I could figure out where the problem was with
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf