[Beowulf] IB problem/using IB diagnostics
prentice at ias.edu
Fri Jun 19 09:48:37 PDT 2009
Gus Correa wrote:
> Prentice Bisbal wrote:
>> John Hearns wrote:
>>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>> John Hearns wrote:
>>> > Can you log into node36 and run ibstat or ibstatus?
>>> Looks good to me!
>>> Links are up and it sees a subnet manager. As Greg says, looks like
>>> something wonky in the script which is reporting
>>> the node status??
>> It's actually an MPI job (HPL using OpenMPI) which is reporting the
>> The head scratching continues...
> Hi Prentice, list
> Just in case you haven't seen this ...
> Are you using OpenMPI 1.3.0 or 1.3.1?
> Those versions have a memory leak bug when using IB.
> The solution for the memory leak is to upgrade to 1.3.2.
> A workaround is to use -mca mpi_leave_pinned=0.
> My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
> I upgraded to 1.3.2, which fixed the problem,
> and I haven't looked at the error messages,
> so your problem may be different.
> However, memory leaks can produce weird errors, hard to diagnose.
I'm using OpenMPI 1.2.8
More information about the Beowulf