now many nodes can a lan support?

Fri Jan 10 10:22:53 PST 2003

On Fri, 10 Jan 2003, Mike Eggleston wrote:

> My guess is this question has been asked before, but I've not been
> able to find it in the archive file. The question is given a typical
> 10Mb/s lan how many nodes can a cluster support? Assume the cluster
> has its own switch, the head and nodes are connected in a star with
> the switch, the cluster lan is isolated from all other non-cluster
> network traffic, the only way to reach a node is through the head,
> ignore extra traffic from TCP handshakes and such, and the the
> data packet for a work unit is 1KB with a 100B results packet back
> to the head.
> 
> How do I calculate this?

Are you serious about the 10 Mbps lan, as in 10BT ethernet?  Just
asking...;-)

What you want to do is visit:

   http://www.ethermanage.com/ethernet/ethernet.html

(Charles Spurgeon's Ethernet Page).  Yes, it still has 10base
configuration data and full explanations (vampire taps, anyone? :-).

Over the years, it has been THE place to learn about raw ethernet,
although there are several other ethernet-related sites on the brahma
page:

   http://www.phy.duke.edu/brahma

that are also most informative.

Some years ago I would have done a tour of the site, refreshed my own
memory, and fully answered your question as well as given you the link.
At this point though, 10-base isn't worth considering any more.
Literally.  So my first suggestion would be to ask your question
assuming switched 100-base from the beginning (also documented on this
site, of course) -- it isn't any more expensive, after all.

In some sense, however, the answer at 10 base is going to be "more nodes
than you'll ever be able to connect or use", depending on just how much
computation to communication you do for your < 1KB (single packet)
messages.  This is because as long as you use switched technology, you
can pretty much stack up the switches and/or routers indefinitely as far
as ethernet is concerned -- not strictly true, since the latencies will
eventually add up enough to cause problems -- but true enough to get you
far more nodes than you are likely to be able to keep fed and retrieve
data from keeping your master node and its incredibly bottlenecked 10BT
connection happy.  Some tasks can be run across serial connections, or
on different continents, and still proceed very satisfactorily.

Once upon a time not so very long ago, "most" lans were 10BT, and
interconnected by switching/routing/bridging layers.  Even then,
something like SETI or RC5 could be scaled out to tens of thousands of
hosts or more, because they involve a lot of EP computation for a tiny
bit of communication (once the task software itself is distributed).
OTOH, you haven't indicated how LONG a node will work in between those
two packets, and whether the work has to be done to a barrier (so all
nodes have to finish and return the result before any node can get the
next work unit) or independently.  All of this matters.

If the node-work time is very short -- say, order of a millisecond or
less, you may not be able to scale to TWO nodes with 10BT in between and
realize a time saving relative to your master node just doing both work
units itself.  This is simple arithmetic -- you have around 1 MB/second
total bandwidth, and around 1K of data (in two packets) to move around
per task, so 1000 per second saturates the master's bandwidth.  If it is
very long -- say a day -- you could likely cover a fair chunk of the
eastern seaboard with your tasks and still get near-linear speedup, even
over 10BT (how many millisecond time slices in 24 hours?  About 10^9.
Even allowing for considerable chaos and overhead (one full second to
connect to a node, start a task, and collect results), you might be able
to use as many as 10^5 nodes...

  HTH,

     rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu