[Beowulf] 10GbE LACP question for a cluster
landman at scalableinformatics.com
Sat Jun 28 07:26:04 PDT 2014
I stumbled onto this problem yesterday. Somewhat surprising, and I
figured I'd tap the broader community.
Installed one of our Unison storage units at a customer site. They
have a Brocade 10GbE switch pair and set up multi port LAG across both
switches. Unison and management/support nodes all have Chelsio T5
cards. Everything links to switch, no issues on the physical
connectivity. Link lights, whole 9 yards.
I set up the Chelsio as 802.3ad with default miimon (200ms) across
ports. Without the bonding setup, I saw LACP packets, about 1/sec
coming in from the switch. Thats fine. But with bonding configured, I
could not see any traffic at all. Switch reported no LACP bond formation.
I varied LACP settings. Tried it on a different machine with a
different kernel. No luck.
I had them turn off LACP and configured the ports individually. That
worked, sort of. Still looks like a mac hashing/cache issue (that
really makes no sense, but I am looking into it) on the switch side
(that I might be able to affect on the port side). But I can pass
traffic from some of the ports to the others all of the time. I can
disable ports and pass it from others.
Anyone ever run into this? I am not sure of Brocade switches. I've
used Arista with this with no problems, Mellanox, and others. No
issues. First time I've run into this non-LACP formation. Oh, and it
worked in our lab on our Arista switch, so ...
Any thoughts? I was thinking of doing something like a bridge or
vswitch formation and bonding the two ports to that. I'll play with
that in the lab and see if that works. Otherwise might just need to go
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615
More information about the Beowulf