[Beowulf] OFED/IB for FC8

Michael H. Frese Michael.Frese at NumerEx-LLC.com
Wed Jun 4 15:52:14 PDT 2008


Following Jeff Layton's post to this list [Cheap SDR IB] on January 28,
we purchased 8 Infinihost LX's and an 8 port switch, and began trying to get
the OpenFabrics (OFED) release of MVAPICH for Fedora Core 6 to run on our new
machines.  We develop and run a multiphysics code in a relatively 
fine grain parallel mode
where latency dominates the performance scaling, so it seemed like a 
good thing to try.

This is our first exposure to InfiniBand, though we have considerable
experience with MPI, both in-memory and over GigE, including using netpipe to
measure latency and bandwidth.

Those machines have AMD Athlon X2 6000+'s on Asus M2N-SLI Deluxe motherboards
with an open PCI Express slot that will handle x4.

The main issue is that we are presently running Fedora Core 8 and the 2.6.21
SMP kernel, but there is no OFED release for FC8 yet.  Is anyone else working
on this?  Has anyone succeeded at getting it to work?

We started with OFED version 1.2.5 from
http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5-RPMS/
We downloaded all the rpms from redhat-release-4AS-6.1 version.
In particular the kernel rpms are kernel-ib-devel-1.2-2.6.9_55.ELsmp and
kernel-ib-1.2-2.6.9_55.ELsmp.

We used the 1.2.5 version because there don't seem to be any rpms for 
the 1.3 version.

All the OFED rpm's for FC6 installed on FC8 without difficulty, 
except for opensm-3.0.3-0.ppc64.rpm
It didn't say "missing dependencies ..."  It just got stuck. We had 
to kill the 'rpm -ivh', remove the lock file
and rebuild the rpm database.  After that,

# lsmod | grep ib

shows about 15 IB related kernel mods.

Even so, at this point, some of the IB stuff works.  We can run 
ibnetdiscover and see the HCA's on the
two machines that have the rpm's installed, and the switch, too.  We could use
that to make a topology file, but we don't know where to put it, or even if we
should put it somewhere. We can run ibchecknet, and though it finds 4 nodes,
it says they are all bad.  It also reports "lid 0 address resolution:
FAILED".  We have not succeeded in getting ibping to work, and aren't really
sure what how to specify the remote address for it.

We found

/usr/share/doc/ofed-docs-1.2/README.txt
/usr/share/doc/ofed-docs-1.2/OFED_Installation_Guide.txt

and, as described there, did

# /etc/init.d/openibd start
Loading QLogic InfiniPath driver:                          [FAILED]
Loading HCA driver and Access Layer:                       [  OK  ]
Setting up InfiniBand network interfaces:
Failed to configure IPoIB connected mode for ib0
Bringing up interface ib0:                                 [FAILED]
Setting up service network . . .                           [  done  ]
Loading ib_sdp                                             [FAILED]
Loading ib_vnic                                            [FAILED]
Module ib_vnic not loaded.
Bringing up VNIC interfaces                                [FAILED]

That mostly looks bad.

Does anyone have any suggestions?

We are willing to try a build from source, but we are unsure of what 
challenges might lie down that path.

We'd rather not fall back to FC6, but we may have to do that.

Thanks for your help.


Mike Frese
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080604/085e9134/attachment.html>


More information about the Beowulf mailing list