[Beowulf] tcp error: Need ideas!

Nifty Tom Mitchell niftyompi at niftyegg.com
Sat Jan 24 11:03:18 PST 2009


On Sat, Jan 24, 2009 at 09:36:09AM -0600, Gerry Creager wrote:
> 
> Couple of follow-up notes.
>
> MTU=4500:  Had one node fall over with the same overflow errors.
> MTU=3000:  A WRF model is running, but single timesteps are executing  
> 2.5x slower than MTU=1500
>
> I'll go snag the new driver and compile it.  After all: What can it hurt!
>
> Thanks, Guy!
>
> Regards, Gerry
>
> Guy Coates wrote:
>> Hi,
>>
>> We have also seen problems with the bnx2 drivers.
>>
>> I got a more recent set of bnx2 drivers from Broadcom:
>>
......

Has the data been snooped for this data to see if all
is as expected.

If you are seeing a natural MTU running faster than a jumbo MTU
then something is fragmenting or causing fragmentation of the data.  

Should the MTU=4500 causes overflow errors it might be related to fragmentation.
Both the sender and receiver have to keep all the bits on a reliable 
transfer until the data has been acknowledged.   At one time fragmentation
could only be done once to a minimum MTU in the life of a packet.

In addition to snooping packets try "tracepath" to and from all 
the involved boxes to discover what is going on.


-- 
	Regards,
	T o m   M i t c h e l l




More information about the Beowulf mailing list