[Beowulf] Re: Booting nodes with PXE...

Cris Rhea crhea at mayo.edu
Wed Dec 2 11:14:29 PST 2009


> > What's got me and the IT guys stumped is that while the compute nodes
> > boot via PXE from the head node without trouble on the NetGear, they
> > barf with the SMC.  To be specific, after the initial boot with a
> > minimal Linux kernel, there is a "fatal error" with "timeout waiting
> > for getfile" when the compute node attempts to download the
> > provisioning image from head.  However, when they were running Rocks
> > before I arrived, the cluster worked fine with the SMC switch.
> 
> Switches sometimes have broadcast storm suppression turned on, or worse, 
> sometimes they have spanning tree turned on.  You want the switch to be 
> as dumb as you can possibly make it for most linux clusters.  Fast, but 
> dumb.

As some have already commented, I'm assuming you have tested each service
(DHCP, tftp, etc.).

My bet is on "spanning tree", as mentioned above. Watch the Ethernet lights 
on the node when booting and see if the port comes alive/stable before 
you get the timeout. I've seen this in spades if "spanning tree portfast" isn't 
set on Cisco switches-- just takes too long to negotiate the GbE interface.


--- Cris

-- 
 Cristopher J. Rhea                     
 Mayo Clinic - Research Computing Facility
 200 First St SW, Rochester, MN 55905
 crhea at Mayo.EDU
 (507) 284-0587



More information about the Beowulf mailing list