Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] IB problem/using IB diagnostics

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Prentice Bisbal prentice at ias.edu
Fri Jun 19 09:48:37 PDT 2009


Gus Correa wrote:
> Prentice Bisbal wrote:
>> John Hearns wrote:
>>>
>>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>>
>>>     John Hearns wrote:
>>>     > Can you log into node36 and run ibstat or ibstatus?
>>>     >
>>>
>>> Looks good to me!
>>> Links are up and it sees a subnet manager. As Greg says, looks like
>>> something wonky in the script which is reporting
>>> the node status??
>>
>> It's actually an MPI job (HPL using OpenMPI) which is reporting the
>> problem.
>>
>> The head scratching continues...
>>
> 
> Hi Prentice, list
> 
> Just in case you haven't seen this ...
> Are you using OpenMPI 1.3.0 or 1.3.1?
> Those versions have a memory leak bug when using IB.
> The solution for the memory leak is to upgrade to 1.3.2.
> A workaround is to use -mca mpi_leave_pinned=0.
> See:
> 
> http://www.open-mpi.org/community/lists/announce/2009/04/0030.php
> https://svn.open-mpi.org/trac/ompi/ticket/1853
> 
> My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
> I upgraded to 1.3.2, which fixed the problem,
> and I haven't looked at the error messages,
> so your problem may be different.
> However, memory leaks can produce weird errors, hard to diagnose.
> 

I'm using OpenMPI 1.2.8


-- 
Prentice



More information about the Beowulf mailing list