[Beowulf] IB problem/using IB diagnostics
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Prentice Bisbal prentice at ias.eduFri Jun 19 09:48:37 PDT 2009
- Previous message: [Beowulf] IB problem/using IB diagnostics
- Next message: [Beowulf] IB problem/using IB diagnostics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Gus Correa wrote: > Prentice Bisbal wrote: >> John Hearns wrote: >>> >>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>> >>> >>> John Hearns wrote: >>> > Can you log into node36 and run ibstat or ibstatus? >>> > >>> >>> Looks good to me! >>> Links are up and it sees a subnet manager. As Greg says, looks like >>> something wonky in the script which is reporting >>> the node status?? >> >> It's actually an MPI job (HPL using OpenMPI) which is reporting the >> problem. >> >> The head scratching continues... >> > > Hi Prentice, list > > Just in case you haven't seen this ... > Are you using OpenMPI 1.3.0 or 1.3.1? > Those versions have a memory leak bug when using IB. > The solution for the memory leak is to upgrade to 1.3.2. > A workaround is to use -mca mpi_leave_pinned=0. > See: > > http://www.open-mpi.org/community/lists/announce/2009/04/0030.php > https://svn.open-mpi.org/trac/ompi/ticket/1853 > > My HPL with OpenMPI 1.3.1 crashed when using lots of memory. > I upgraded to 1.3.2, which fixed the problem, > and I haven't looked at the error messages, > so your problem may be different. > However, memory leaks can produce weird errors, hard to diagnose. > I'm using OpenMPI 1.2.8 -- Prentice
- Previous message: [Beowulf] IB problem/using IB diagnostics
- Next message: [Beowulf] IB problem/using IB diagnostics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
