redhat and pxe

Donald Becker becker at scyld.com
Wed Jan 29 11:49:29 PST 2003


On Wed, 29 Jan 2003, Mark Hahn wrote:

> anyway, the thing I observed is that my tftp server winds up receiving
> a corrupted filename.  if dhcpd.conf says
> 	option bootfile-name "/pxelinux.0";
> then the tftp server winds up receiving "/pxelinux.0\xff".  (which looks
> like a y-umlaut iirc.)  I think I tcpdumped both dialogs, and concluded
> that the fault is in the bios.

Hmmm, I would blame the DHCP server, or perhaps the "proxy PXE" (sic)
server.  While the bug is in the client machine, this is a well-known
issue that has a trivial work-around.  Any PXE server that doesn't
implement it has obviously not been tested in real life!

The TFTP protocol uses null-terminated file names, while the
bootp/DHCP options are length-specified strings without a trailing
null.  Put this together with the end-of-options value of 255 (0xff),
and you get the result you observe.

The work-around is to put a '0' option following the boot file name,
which results in a boot file name that is both correctly
length-specified and null-terminated.

We have recently done work implementing the whole protocol suite
(bootp/DHCP, TFTP and PXE servers) along with implementing the clients.
Why the clients?  We needed to simulate large-scale simultaneous boots.

I'm pretty much convinced that most other current implementations are
lacking or flawed.  A cluster needs PXE boot services that are
   protocol-correct
   reliable with hundreds of simultaneous clients
   can relate configuration errors to their source

PXE was obviously developed around a hacked-up DHCP server, with the
intent to re-use existing servers.  But the details of the specification
don't isolate the functionality.  The result is that the only correct
way to implement it is with unified servers must be built together

> I've also only been able to get the nodes
> to boot off their builtin eepro100 interface, rather than the e1000
> (also-builtin - these are tyan s2720 nodes).

This is very common right now.  The typical server-class board now has a
10/100 and a gigabit interface.  Network booting, Wake-On-LAN and IPMI
management are implemented only on the 10/100 interface.  But most users
want to operate using the gigabit interface.

Thus we have had to split our boot services from the operational
services, and go through a second discovery phase.  Despite the node
having its IP configuration passed from the boot client, it must discard
that info to figure out which interface it should operate on.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list