[Beowulf] tcp error: Need ideas!
gerry.creager at tamu.edu
Fri Jan 23 05:49:23 PST 2009
First, thanks to all who've responded. I've been looking a bit thins
morning and am trying to grok the results.
Joe Landman wrote:
> Hi Gerry
> Gerry Creager wrote:
>> History/background/description of the cluster
>> * 126 node Dell 1950 cluster with dual-quad core Xeons
>> * HP 5412zl switch for gigabit cluster backplane and 10GBE
>> interconnect to selected services (file server, etc)
>> * Gigabit interconnect
>> * Hand compiled 2.6.26 kernel
>> * bnx2 module loaded for the Broadcom onboard nics
>> * Switch, compute nodes, head node set to 9000 byte MTU
> We have had *lots* of problems with Broadcom nics and jumbo frames. From
> 2.6.9 timeframe onwards.
Marvelous. I'd prefer to not have to back-rev if I can avoid it...
>> We're seeing the following error in WRF compiled with openMPI and the
>> PGI 7.2 compiler:
>> mca_btl_tcp_frag_send:writev failed with errno=104
>> While all nodes were accessible prior to the run and returned
>> appropriate "stuff" when queried with, eg., ssh and a command, two
>> nodes now return something like this:
>> [gerry at brazos SCOOP12km]$ ssh c0522
>> Received disconnect from 192.168.200.154: 2: Bad packet length 808464432.
> Hmmm... sounds like a link tried re-negotiating. Can you get on via
> serial/console and
My guess is that the driver wandered across memory boundaries. This
stinks of a buffer problem to me. Typically, after this happens, I
can't log into the node via any interface, nor on console. It requites
an ipmi or physical reboot.
> root at lightning:~# ethtool eth0
-bash-3.2# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
Advertised auto-negotiation: Yes
Port: Twisted Pair
Supports Wake-on: g
Link detected: yes
> You might want to
> ethtool eth0 autoneg off
> to force it not to renegotiate its speed. Also, look at
-bash-3.2# ethtool -A eth1 autoneg off
autoneg unmodified, ignoring
no pause parameters changed, aborting
> root at lightning:~# ethtool -g eth0
-bash-3.2# ethtool -g eth1
Ring parameters for eth1:
RX Mini: 0
RX Jumbo: 4080
Current hardware settings:
RX Mini: 0
RX Jumbo: 765
> See if you can do something like
> ethtool -G eth0 rx-jumbo 100
> if you have zero jumbo ring rx entries.
Doesn't look like this requires much change.
Also, while I'm in the neighborhood, to respond to Mark's suggestions:
-bash-3.2# ethtool -k eth1
Offload parameters for eth1:
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
Hmmm Might be worth changing tcp segmentation here.
-bash-3.2# ethtool -S eth1
-bash-3.2# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:1E:C9:AC:27:FB
inet addr:192.168.200.154 Bcast:192.168.203.255
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:574 errors:0 dropped:0 overruns:0 frame:0
TX packets:265 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:44422 (43.3 KiB) TX bytes:54606 (53.3 KiB)
>> I'm stumped and looking for causes and solutions. Yeah, the WRF as
>> compiled did run before the change to Jumbos.
>> Do I reduce the size of the frames to something smaller, like 8800
>> bytes? 7500? 1500?
> In the past I had heard that jumbo frames may work on Broadcom NICs
> around 6000 byte length. We haven't tried this in a while ... YMMV.
>> I'm not completely out of ideas but stumped.
>> Thanks, gerry
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf