Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re: Bugfix for Broadcom NICs losing connectivity

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Tina Friedrich Tina.Friedrich at diamond.ac.uk
Fri Jun 4 01:39:38 PDT 2010


We've had that happen on some of our servers. Currently using the 
disable_msi workaround, which seems to have stopped it. I believe 
there's supposed to be a fix in the latest Red Hat kernel but we haven't 
really tested that yet.

You loose all network connectivity (including IPMI) to the server - not 
all connectivity, so e.g. serial console (not SOL, proper serial 
console, or using a console server) still works (as would a locally 
attached keyboard/monitor). Unless you require network to log in :) . If 
one runs into this, it's a really weird one (before you find the bug 
report) - to all appearances, the server works happily, no strangeness 
in the logs - just network gone completely.

It's not one to trigger easily - hard to track down sort of thing. Had 
610s and 710s for a while before this first happened (and loads we never 
saw it on, still). We first saw it on a rather heavily used NFS server 
(i.e. lots of network I/O).

Tina


Cris Rhea wrote:
>> In case it helps anyone using Dell R410 / 610 / 710 etc. servers: I have had
>> machines lose their eth connections periodically (CentOS 5.4 bnx2 driver).
>> Seems like a bug with the Broadcom NIC drivers. [luckily read of it on a
>> Dell mailing list]
>>
>> Bug Reports:
>>
>> http://kbase.redhat.com/faq/docs/DOC-26837
>> http://patchwork.ozlabs.org/patch/51106
>>
>> Not sure yet if this is exactly my issue but I'm giving it a shot now.
>> Thought I'd post since, anecdotally I've seen many people use these servers
>> on the list.
>>
>> -- 
>> Rahul
> 
> I've been following this on the Dell list as I have approx. 50 R410s  
> in our cluster.
> 
> One thing that isn't clear--  When this happens, do you lose all 
> connectivity to the node (i.e., do you have to reboot the node to 
> re-establish eth0)?
> 
> My R410s are running CentOS 5.2 - 5.4 and I rarely have one go 
> down.
> 
> --- Cris
> 
> 


-- 
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442



More information about the Beowulf mailing list