[Beowulf] IB troubles - mca_mpool_openib_register
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Michael Huntingdon hunting at ix.netcom.comThu Jun 22 11:08:19 PDT 2006
- Previous message: [Beowulf] IB troubles - mca_mpool_openib_register
- Next message: [Beowulf] IB troubles - mca_mpool_openib_register
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bill If you are going to look into a different MPI implementation, consider HP-MPI. The choice of interconnect (GigE, Myrinet, IB, and Quadrics) are all written into it, so you can create a single (common) operating environment for your programmers. I had a look at the benchmarks a few months ago, which appear pretty consistent across the board. Michael At 10:37 AM 6/22/2006, Bill Wichser wrote: >Thanks. > >No I have not tried a different version of MPI to test but will do >so. As for a later version of OpenIB, there is incentive to do so >but I don't know how quickly that can be accomplished. > >Bill > >Lombard, David N wrote: >>More memory in your nodes? Not sure what size of queues and such >>openmpi allocates, but you could simply be running out of memory if >>openmpi allocates large queue depths. >>Have you tried an alternate MPI to see if you have the same problem? >>Intel MPI, MVAPICH, MVAPICH2, as well as others support OpenIB. >>Can you consider moving to a newer version of OpenIB? >>-- >>David N. Lombard >>My statements represent my opinions, not those of Intel Corporation >> >>>-----Original Message----- >>>From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] >>On >> >>>Behalf Of Bill Wichser >>>Sent: Thursday, June 22, 2006 6:02 AM >>>To: beowulf at beowulf.org >>>Subject: [Beowulf] IB troubles - mca_mpool_openib_register >>> >>> >>>Cluster with dual Xeons and Topsping IB adapters running a RH >>>2.6.9-34.ELsmp kernel (x86_64) with the RH IB stack installed, each >>node >> >>>w/8G of memory. >>> >>>Updated firmware as per Mellanox in the IB cards. >>> >>>Updates /etc/security/limits.conf to have memlock be 8192, both soft >>and >> >>>hard limits to overcome the initial trouble of pool allocation. >>> >>>Application is cpi.c. >>> >>>I can run across the 64 nodes using nodes=64:ppn=1 without trouble, >>>except for the >>> >>>[btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] >>ibv_create >> >>>_qp: returned 0 byte(s) for max inline data >>> >>>error messages, to be fixed I suppose in the next release. These I >>can >> >>>live with, perhaps, for now. >>> >>>The problem is that when I run with nodes=64:ppn=2 and only use -np 64 >>>with my openmpi (v 1.0.2 gcc compiled), it still runs fine, but when I >>>run with -np 65 I get megabytes of error messages and the job never >>>completes. The errors all look like this: >>> >>>mca_mpool_openib_register: ibv_reg_mr(0x2a96641000,1060864) >>>failed with error: Cannot allocate memory >>> >>>I've submitted to the openib-general mailing list with no responses. >>I'm >> >>>not sure if this is an openmpi problem, an openib problem, or some >>>configuration problem with the IB fabric. Other programs fail with >>even >> >>>less processors being allocated with these same errors. Running over >>>TCP, albeit across the GigE network and not over IB, works fine. >>> >>>I'm stuck here not knowing how to proceed. Has anyone found this >>issue >> >>>and, more importantly, found a solution? I don't believe it to be a >>>limits.conf issue as I can allocate both processors on a node up to 32 >>>nodes (-np 64) without problems. >>> >>>Thanks, >>>Bill >>>_______________________________________________ >>>Beowulf mailing list, Beowulf at beowulf.org >>>To change your subscription (digest mode or unsubscribe) visit >>>http://www.beowulf.org/mailman/listinfo/beowulf >> >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] IB troubles - mca_mpool_openib_register
- Next message: [Beowulf] IB troubles - mca_mpool_openib_register
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
