Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] IB problem/using IB diagnostics

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Paulo Afonso Lopes pal at di.fct.unl.pt
Fri Jun 19 10:28:20 PDT 2009


>
> It's actually an MPI job (HPL using OpenMPI) which is reporting the
> problem.
>
> The head scratching continues...
>

It seems, from the ongoing discussion , that you do not have a hw problem,
but an (open)MPI one; I  have seen openMPI failing because some user-level
(or kernel; in my case it was user) verbs/etc. library missing.

Sugestions:

1) check the job runs, with say,  -mca btl ^udapl (exclude UDAPL and see
if it runs) or  e.g., -mca btl openib,tcp,sm,self

or

2) more tediously, check that all libraries present in a non-failing node
are available in the failing one... There is a "Getting Started with
InfiniBand" page which has the names of the libraries/products that you
should have loaded to have a fully functioning IB stack - it solved my
problem :-)

HTH

paulo

-- 
Paulo Afonso Lopes                        | Tel: +351- 21 294 8536
Departamento de Informática               | 294 8300 ext.10702
Faculdade de Ciências e Tecnologia        | Fax: +351- 21 294 8541
Universidade Nova de Lisboa               | e-mail: pal at di.fct.unl.pt
2829-516 Caparica, PORTUGAL






More information about the Beowulf mailing list