[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Michael Lewis mclewis at ucdavis.edu
Wed Dec 2 13:06:43 PST 2009


On Wed, Dec 02, 2009 at 12:36:17PM -0800, Bill Broadley wrote:
> Art Poon wrote:
> > I've tried resetting the SMC switch to factory defaults (with
> > auto-negotiate on).  I've checked the /etc/beowulf/modprobe.conf and it
> > doesn't seem to be demanding anything exotic.  We've tried swapping out to
> > another SMC switch but that didn't change anything.
> 
> I had a very unpleasant experience with an SMC switch awhile back.  I was
> having problems trying to bootstrap a rocks cluster.  Turns out the SMC (and
> Dell relabel) was so evil that it warranted a mention in the Rocks FAQ.

I run the cluster that Bill is describing here.  Indeed, the default
configuration of the SMC switches was to have spanning-tree turned on on all
ports.  The symptom we had was that the machines would PXEboot fine and load
a kernel, but then fail to DHCP later.  Even worse, the switches would 
occasionally revert back to this setting if they lost power.  

Also, as Bill notes, there is a Dell rebrand of the same switch, which runs
slightly different firmware.  If you've got one of those, get the firmware
from the Dell site, not from SMC.

> I believe the solution was to manually turn on edge node routing or similar on
> each port.  Unfortunately there was a bug and you could only turn on the first
> 16 ports.  There was a fix with new firmware, but there were 2 firmware images
> and you couldn't tell which from looking at the switch.  Said firmware upgrade
> caused other problems.

Here was the fix we used.  For each port (replace 1/5 with 1/N for N=1..):

        Console#config
        Console(config)#interface ethernet 1/5
        Console(config-if)#spanning-tree edge-port

And don't forget to write back to flash when you're done.

After the firmware updates, we haven't had the issue of the configuration
resetting anymore, but we've also upgraded the cluster to a better switch.
The SMCs now run other non-cluster servers.

-- 
Michael Lewis                       |   mclewis at ucdavis.edu
Systems Administrator               |   Voice: (530) 754-7978
Genome Center                       |  
University of California, Davis     |   



More information about the Beowulf mailing list