[Beowulf] IPoIB failure
skylar.thompson at gmail.com
Tue Jan 27 15:34:25 PST 2015
On 01/27/2015 02:24 PM, Christopher Samuel wrote:
> On 24/01/15 01:29, Lennart Karlsson wrote:
>> This reminds me of when we upgraded to SL-6.6 (approximately the same as
>> CentOS-6.6 and RHEL-6.6).
>> The new kernel we got, could not handle our IPoIB for storage traffic,
>> which broke down within a few hours.
> Interesting, we use GPFS over IPoIB and upgraded to RHEL 6.6 in early
> November and haven't seen any issues at all (and with a lot of
> bioinfomatics users we'd notice problems pretty quickly).
> Is your IB running in connected mode or datagram mode?
> We're in connected mode everywhere because of our BG/Q.
We've had some problems with the RHEL-provided OFED stack interfering
with the Mellanox one. One of the systems we've experienced in the past
is some IB services (like RDMA) work wile others (like IPoIB) don't.
Using the Mellanox install script in the MLNX OFED package clears this
up. I wonder if this is what's going on?
More information about the Beowulf