[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK
ashley at pittman.co.uk
Thu Dec 3 01:20:58 PST 2009
On Wed, 2009-12-02 at 14:58 -0500, Joe Landman wrote:
> David Mathog wrote:
> >> What's got me and the IT guys stumped is that while the compute nodes
> > boot via PXE from the head node without trouble on the NetGear, they
> > barf with the SMC. To be specific, after the initial boot with a
> > minimal Linux kernel, there is a "fatal error" with "timeout waiting for
> > getfile" when the compute node attempts to download the provisioning
> > image from head. However, when they were running Rocks before I
> > arrived, the cluster worked fine with the SMC switch.
> Wondering aloud whether or not the ethernet driver has been correctly
> included in the kernel/initrd for the PXE booted image. I've
> seen/experienced this before, PXE works fine, the kernel boots, and is
> missing the ethernet driver.
Or the new distro you are trying enumerates the ethernet devices
differently and it's trying to load the getfile from a different
unconnected ethernet port. That's fairly common as well. It could even
be worse that than in that the enumeration could be non-deterministic to
really confuse you.
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
More information about the Beowulf