[Beowulf] tcp error: Need ideas!

Gerry Creager gerry.creager at tamu.edu
Sun Jan 25 07:08:18 PST 2009

Nifty Tom Mitchell wrote:
> On Sat, Jan 24, 2009 at 09:36:09AM -0600, Gerry Creager wrote:
>> Couple of follow-up notes.
>> MTU=4500:  Had one node fall over with the same overflow errors.
>> MTU=3000:  A WRF model is running, but single timesteps are executing  
>> 2.5x slower than MTU=1500
>> I'll go snag the new driver and compile it.  After all: What can it hurt!
>> Thanks, Guy!
>> Regards, Gerry
>> Guy Coates wrote:
>>> Hi,
>>> We have also seen problems with the bnx2 drivers.
>>> I got a more recent set of bnx2 drivers from Broadcom:
> ......
> Has the data been snooped for this data to see if all
> is as expected.

I've not done a formal snoop but I know that the data appear "normal" up 
to the point where a node falls over with larger frames.  In other 
words, when I look at the data that are written, and those remaining on 
the compute nodes that were read over, yeah, they look good.  It's not 
rigorous but sufficient to tell _me_ that the problem isn't in data i/o 
that were working before we went to jumbos.  I really suspect the driver.

> If you are seeing a natural MTU running faster than a jumbo MTU
> then something is fragmenting or causing fragmentation of the data.  

Yeah.  My thought too.  I don't think it's necessarily at the 
application, though, unless openMPI is telling it to use smaller 
packets, and openMPI source code just doesn't seem to do more than use 
the stack.  Again: The driver is suspect.

> Should the MTU=4500 causes overflow errors it might be related to fragmentation.
> Both the sender and receiver have to keep all the bits on a reliable 
> transfer until the data has been acknowledged.   At one time fragmentation
> could only be done once to a minimum MTU in the life of a packet.

Yeah, and that day wasn't really too long ago.  I was a network engineer 
6 years ago (gotta have money for gas and coffee!) and had to chase 
fragmentation problems.  Ask me sometime about IOS 11.4 thru 11.9 and 
the fragmentation problems in Cisco hardware...

It's conceivable that the HP 5412zl is causing problems with packet/frag 
reordering but I would be surprised.

> In addition to snooping packets try "tracepath" to and from all 
> the involved boxes to discover what is going on.

One hop to anything.  It's only 126 nodes, and the switch is line-rate. 
  Oh, and, no errors are showing up on the switch over the last several 
days (I reset counters to make my troubleshooting life easier).

Thanks, Gerry
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

More information about the Beowulf mailing list