[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Bogdan Costescu bcostescu at gmail.com
Fri Dec 4 13:23:06 PST 2009


On Thu, Dec 3, 2009 at 9:17 PM, Greg Keller <Greg at keller.net> wrote:
> Essentially, once the port
> has a physical link light it may take a while before spanning tree allows
> traffic to actually flow through the port.  Longer than a typical timeout.

The time taken to activate the link is around 60s, but I've been told
that it can be even higher. I've seen many times laptops randomly not
getting addresses via DHCP because the DHCP timeout and the STP time
on a Cisco switch were both around 60s - makes for very frustrating
network diagnostic.

>  When loading/reloading the driver there seems to be an instantaneous drop
> of the link that forces a new delay cycle.

Most likely the PXE stack doesn't reset the link; the link is up soon
after the computer is powered on so, by the time the POST has
finished, the link is active. Again most likely, the Linux driver does
a link reset as part of the initialization; I remember that the 3c59x
driver was changed ~6years ago to not do this anymore (at Don Becker's
suggestion, IIRC) and it would allow the established link to remain
active, making DHCP succeed all the time.

Bogdan




More information about the Beowulf mailing list