[Beowulf] Re: Booting nodes with PXE...
crhea at mayo.edu
Wed Dec 2 11:14:29 PST 2009
> > What's got me and the IT guys stumped is that while the compute nodes
> > boot via PXE from the head node without trouble on the NetGear, they
> > barf with the SMC. To be specific, after the initial boot with a
> > minimal Linux kernel, there is a "fatal error" with "timeout waiting
> > for getfile" when the compute node attempts to download the
> > provisioning image from head. However, when they were running Rocks
> > before I arrived, the cluster worked fine with the SMC switch.
> Switches sometimes have broadcast storm suppression turned on, or worse,
> sometimes they have spanning tree turned on. You want the switch to be
> as dumb as you can possibly make it for most linux clusters. Fast, but
As some have already commented, I'm assuming you have tested each service
(DHCP, tftp, etc.).
My bet is on "spanning tree", as mentioned above. Watch the Ethernet lights
on the node when booting and see if the port comes alive/stable before
you get the timeout. I've seen this in spades if "spanning tree portfast" isn't
set on Cisco switches-- just takes too long to negotiate the GbE interface.
Cristopher J. Rhea
Mayo Clinic - Research Computing Facility
200 First St SW, Rochester, MN 55905
crhea at Mayo.EDU
More information about the Beowulf