[Beowulf] Mellanox ConnectX-3 MT27500 problems
jcatana at gmail.com
Sat Apr 27 16:52:46 PDT 2013
I noticed on systems running xen-kernel netback driver for virtualization,
bandwidth drops to very low rates.
On Apr 27, 2013 6:19 PM, "Brice Goglin" <brice.goglin at gmail.com> wrote:
> These cards are QDR and even FDR, you should get 56Gbit/s (we see about
> 50Gbit/s in benchmarks iirc). That what I get on sandy-bridge servers
> with the exact same IB card model.
> $ ibv_devinfo -v
> active_width: 4X (2)
> active_speed: 14.0 Gbps (16)
> These nodes have been running Debian testing/wheezy (default kernel and
> IB packages) for 9 months without problems.
> I had to fix the cables to get 56Gbit/s link state. Without Mellanox FDR
> cables, I was only getting 40. So maybe check your cables. And if you're
> not 100% sure about your switch, try connecting the nodes back-to-back.
> You can try upgrading the IB card firmware too. Mine is 2.10.700 (likely
> not uptodate anymore, but at least this one works fine).
> Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
> benchmarks, it could be related to the PCIe link speed. Upgrading the
> BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead of
> Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x as
> $ sudo lspci -d 15b3: -vv
> LnkSta: Speed 8GT/s, Width x8
> Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
> > Dear all,
> > I was wondering whether somebody has/had similar problems as I have.
> > We have recenctly purchased a bunch of new nodes. These are Sandybridge
> > with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is where
> I got
> > problems with.
> > I am usually using Debian Squeeze for my clusters (kernel
> > Unfortunately, as it turned out I cannot use that kernel as my Intel NIC
> > not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport
> kernel to
> > sqeeze). Here I got network but the InfiniBand is not working. The
> device is
> > not even recognized by ibstatus. Thus, I decided to do an upgrade (not
> > upgrade) to wheezy to get the newer OFED stack.
> > Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple
> > of the plug increases that to 20 Gb/sec (4X DDR), which is still slower
> > the speed of the older nodes (40 Gb/sec (4X QDR)).
> > So I upgraded completely to wheezy (dist-upgrade now) but the problem
> does not
> > vanish.
> > I re-installed squeeze again and installed a vanilla kernel (3.8.8) and
> > latest OFED stack from their site. And guess what: same experiences here:
> > After a reboot the IfniniBand speed is 8.5 and reseating the plug
> > that to 20 Gb/sec. It does not matter whether I connect to the edge
> switch or
> > to the main switch, in both cases I got the same
> > Frankly, I am out of ideas now. I don't think the observed speed change
> > reseating the plug should happen. I am in touch with the technical
> > here as well but I think we both are a bit confused.
> > Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are QDR
> > so I should get 40 Gb/sec and not 20 Gb/sec?
> > Has anybody made similar experiences? Any ideas?
> > All the best from London
> > Jörg
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf