PXE/TFTP (was Re: Bonded head nodes)

Sat Nov 9 08:58:07 PST 2002

> >    The basic idea is behind my SEDA-TFTP is that there are two queues,
> > one for incoming connections, and one for current connections.  The TFTP
> > server only services a set number of connections, and will hold off
> > sending the first packet to a new connection until it has completed one
> > of the old connections.
>
> Good idea -- I didn't know about Matt's PhD project.  A complexity is
> that you cannot hold off making the TFTP connection for too long.  If a
> PXE client times out on the request it usually does Something Pointless,
> like giving up and not rebooting.

   Well there's always going to be a situation in which the server gets
overloaded with requests.  The question is what do you do to deal with
this overload.  I say that if you've got X clients making requests, but
only enough bandwidth/CPU/whatever to handle Y clients, you should let Y
clients succeed and have (X - Y) fail completely, rather than giving
each client Y/Xth of their packets. :)

> > This would help to limit the problems with
> > current TFTP server implementations: clients, once they got through,
> > would be guaranteed quick and complete service.
>
> A better approach is detecting overload and round-robin servicing the
> clients.  But I haven't figured out what "detecting overload" means,
> especially when there are no RFC-suggested timeout values.
>
> Hmmm, but SEDA could play a role here: if your round-robin gets close to
> the per-packet timeout, you want to give up on new connections.

   SEDA can actually do a bit better than that.  There are two reasons
that the RTT would approach the timeout value: if the server is
overloaded with requests, or if one particular client is slowing down
(ie, Greg's PCMCIA card).  SEDA can detect the first situation if its
internal request queues are getting too long, aka packets are arriving
too fast.  The second situation is if the queues are of a reasonable
length, but the RTT for one or more clients is getting high.

> There is also the issue of monotonic progress: if you start losing clients
> to timeouts, you might want to throw away fairness to make certain that
> some clients succeed, while letting others stay just under the timeout
> limit.

   *nod*

> >    If no one else writes such a beast (*nudge**nudge*) I plan to, but
> > only in my Copious Free Time. :)
>
> Great!

   Well, I wouldn't start holding my breath just now. :)

> But there is a very low reward.
> Even around here I get "why write another TFTP program?"

   That's ok.  I described my research to you at Cluster2002 and got a
confused "why would you do that" look. :)

> These things often take more time than you expect.  Writing a simple
> server is simple.  Creating a test setup that causes failures in way
> that you can figure out what went wrong is Really Very Difficult.

   I have no doubt this would take a while.  But what are grad students
for, if not to do useless busy-work that no one in their right mind
would undertake?

-jdm

Department of Computer Science, Duke University, Durham, NC 27708-0129
Email:	justin at cs.duke.edu
Web:	http://www.cs.duke.edu/~justin/