[Beowulf] tcp error: Need ideas!
kilian.cavalotti.work at gmail.com
Thu Jan 22 01:15:18 PST 2009
On Wednesday 21 January 2009 23:40:26 Gerry Creager wrote:
> History/background/description of the cluster
> * 126 node Dell 1950 cluster with dual-quad core Xeons
> * bnx2 module loaded for the Broadcom onboard nics
> Received disconnect from 192.168.200.154: 2: Bad packet length 808464432.
It may also be worth making sure you're using the latest bnx2 version.
$(modinfo bnx2) should give you that, the latest one being 1.7.6b3.
I've been using those PE1950 a while, and had my share of weird issues with
them, including complete kernel panics on high traffic load. Upgrading to the
Dell-provided bnx2 version proved helpful.
If you use SuSE or a Redhat-ish distro, the Dell Linux repository
<http://linux.dell.com/repo/hardware/> may be useful, and upgrading the module
version would be extremely simple to handle with the help of DKMS.
More information about the Beowulf