oversized ethernet and channel-bonding

Jon Tegner tegner at nada.kth.se
Sat Jan 20 03:10:22 PST 2001


Have a small 9-node cluster, 100 Mbit, D-link DFE-530 TX nics and a
DES-3225G switch.

Had it working quite well with channel-bonding, and for my application
(CFD), on a code developed in house, an substantial increase in
performance was noted. But when trying a commercial code on a similar
application the performance dropped substantially.

Both codes use the same version of mpich (1.1.2), but in the second case
the log files get flooded by messages of the type

"Oversized Ethernet frame c7fce480 vs c7fce480"

and

"Oversized Ethernet frame spanned multiple buffers, entry 0x146509
length 0 status 0600!"

By switching off the bonding, the performance went back to what can be
expected (for the commercial code, and a slight decrease for the in
house one).

The system runs on kernel 2.2.17 (with the drivers supplied with the
kernel), and I will update with the latest via-rhine from Scylds home
page, but I think it is odd that the bonding is working with one
application (while using the same version of mpich) and not the other
and I'm wondering if someone else have experienced similar problems?

Thanks,

Jon Tegner





More information about the Beowulf mailing list