[Beowulf] Odd Infiniband scaling behaviour - *SOLVED* - MVAPICH2 problem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Chris Samuel csamuel at vpac.orgMon Oct 8 17:20:55 PDT 2007
- Previous message: [Beowulf] Odd Infiniband scaling behaviour
- Next message: [Beowulf] best linux distribution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 8 Oct 2007, Chris Samuel wrote: > If I then run 2 x 4 CPU jobs of the *same* problem, they all run at > 50% CPU. With big thanks to Mark Hahn, this problem is solved. Infiniband is exonerated, it was the MPI stack that was the problem! Mark suggested that this sounded like a CPU affinity problem, and he was right. Turns out that when you build MVAPICH2 (in our case mvapich2-0.9.8p3) on an AMD64 or EM64T system is defaults to compiling in and enabling CPU affinity support. So if we take an example of 4 x 2 CPU jobs, it has the unfortunate effect of binding all those MPI processes to the first 2 cores in the system - hence why we see only 25% CPU utilisation per process (watched via top, and evident by the comparative run time). Fortunately though it does check the users environment for the variable MV2_ENABLE_AFFINITY and if that is set to 0 then the affinity setting is bypassed. So simply modifying my PBS script to include: export MV2_ENABLE_AFFINITY=0 before using mpiexec [1] to launch the jobs results in a properly performing system again! I'm currently running 4 x 2 CPU NAMD jobs and they're back to properly consuming 100% CPU per process. Phew! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://www.scyld.com/pipermail/beowulf/attachments/20071009/b80235de/attachment.bin
- Previous message: [Beowulf] Odd Infiniband scaling behaviour
- Next message: [Beowulf] best linux distribution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
