[Beowulf] tcp error: Need ideas!

Joe Landman landman at scalableinformatics.com
Wed Jan 21 15:23:17 PST 2009

Hi Gerry

Gerry Creager wrote:
> History/background/description of the cluster
> * 126 node Dell 1950 cluster with dual-quad core Xeons
> * HP 5412zl switch for gigabit cluster backplane and 10GBE interconnect 
> to selected services (file server, etc)
> * Gigabit interconnect
> * Hand compiled 2.6.26 kernel
> * bnx2 module loaded for the Broadcom onboard nics
> * Switch, compute nodes, head node set to 9000 byte MTU

We have had *lots* of problems with Broadcom nics and jumbo frames. 
 From 2.6.9 timeframe onwards.

> We're seeing the following error in WRF compiled with openMPI and the 
> PGI 7.2 compiler:
> mca_btl_tcp_frag_send:writev failed with errno=104
> While all nodes were accessible prior to the run and returned 
> appropriate "stuff" when queried with, eg., ssh and a command, two nodes 
> now return something like this:
> [gerry at brazos SCOOP12km]$ ssh c0522
> Received disconnect from 2: Bad packet length 808464432.

Hmmm... sounds like a link tried re-negotiating.  Can you get on via 
serial/console and

root at lightning:~# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: g
	Wake-on: d
	Current message level: 0x000000ff (255)
	Link detected: yes


You might want to

	ethtool eth0 autoneg off

to force it not to renegotiate its speed.  Also, look at

root at lightning:~# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		200
RX Mini:	0
RX Jumbo:	0
TX:		511

See if you can do something like

	ethtool  -G eth0 rx-jumbo 100

if you have zero jumbo ring rx entries.

> I'm stumped and looking for causes and solutions.  Yeah, the WRF as 
> compiled did run before the change to Jumbos.
> Do I reduce the size of the frames to something smaller, like 8800 
> bytes? 7500?  1500?

In the past I had heard that jumbo frames may work on Broadcom NICs 
around 6000 byte length.  We haven't tried this in a while ... YMMV.

> I'm not completely out of ideas but stumped.
> Thanks, gerry

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list