[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Ashley Pittman ashley at pittman.co.uk
Thu Dec 3 01:20:58 PST 2009


On Wed, 2009-12-02 at 14:58 -0500, Joe Landman wrote:
> David Mathog wrote:
> >> What's got me and the IT guys stumped is that while the compute nodes
> > boot via PXE from the head node without trouble on the NetGear, they
> > barf with the SMC.  To be specific, after the initial boot with a
> > minimal Linux kernel, there is a "fatal error" with "timeout waiting for
> > getfile" when the compute node attempts to download the provisioning
> > image from head.  However, when they were running Rocks before I
> > arrived, the cluster worked fine with the SMC switch.
> 
> Wondering aloud whether or not the ethernet driver has been correctly 
> included in the kernel/initrd for the PXE booted image.  I've 
> seen/experienced this before, PXE works fine, the kernel boots, and is 
> missing the ethernet driver.

Or the new distro you are trying enumerates the ethernet devices
differently and it's trying to load the getfile from a different
unconnected ethernet port.  That's fairly common as well.  It could even
be worse that than in that the enumeration could be non-deterministic to
really confuse you.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




More information about the Beowulf mailing list