now many nodes can a lan support?

Mike Eggleston mikee at mikee.ath.cx
Fri Jan 10 10:38:50 PST 2003


On Fri, 10 Jan 2003, Robert G. Brown wrote:

> On Fri, 10 Jan 2003, Mike Eggleston wrote:
> 
> > My guess is this question has been asked before, but I've not been
> > able to find it in the archive file. The question is given a typical
> > 10Mb/s lan how many nodes can a cluster support? Assume the cluster
> > has its own switch, the head and nodes are connected in a star with
> > the switch, the cluster lan is isolated from all other non-cluster
> > network traffic, the only way to reach a node is through the head,
> > ignore extra traffic from TCP handshakes and such, and the the
> > data packet for a work unit is 1KB with a 100B results packet back
> > to the head.
> > 
> > How do I calculate this?
> 
> Are you serious about the 10 Mbps lan, as in 10BT ethernet?  Just
> asking...;-)
> 
> What you want to do is visit:
> 
>    http://www.ethermanage.com/ethernet/ethernet.html
> 
> (Charles Spurgeon's Ethernet Page).  Yes, it still has 10base
> configuration data and full explanations (vampire taps, anyone? :-).
> 
> Over the years, it has been THE place to learn about raw ethernet,
> although there are several other ethernet-related sites on the brahma
> page:
> 
>    http://www.phy.duke.edu/brahma
> 
> that are also most informative.
> 
> Some years ago I would have done a tour of the site, refreshed my own
> memory, and fully answered your question as well as given you the link.
> At this point though, 10-base isn't worth considering any more.
> Literally.  So my first suggestion would be to ask your question
> assuming switched 100-base from the beginning (also documented on this
> site, of course) -- it isn't any more expensive, after all.
> 
> In some sense, however, the answer at 10 base is going to be "more nodes
> than you'll ever be able to connect or use", depending on just how much
> computation to communication you do for your < 1KB (single packet)
> messages.  This is because as long as you use switched technology, you
> can pretty much stack up the switches and/or routers indefinitely as far
> as ethernet is concerned -- not strictly true, since the latencies will
> eventually add up enough to cause problems -- but true enough to get you
> far more nodes than you are likely to be able to keep fed and retrieve
> data from keeping your master node and its incredibly bottlenecked 10BT
> connection happy.  Some tasks can be run across serial connections, or
> on different continents, and still proceed very satisfactorily.
> 
> Once upon a time not so very long ago, "most" lans were 10BT, and
> interconnected by switching/routing/bridging layers.  Even then,
> something like SETI or RC5 could be scaled out to tens of thousands of
> hosts or more, because they involve a lot of EP computation for a tiny
> bit of communication (once the task software itself is distributed).
> OTOH, you haven't indicated how LONG a node will work in between those
> two packets, and whether the work has to be done to a barrier (so all
> nodes have to finish and return the result before any node can get the
> next work unit) or independently.  All of this matters.
> 
> If the node-work time is very short -- say, order of a millisecond or
> less, you may not be able to scale to TWO nodes with 10BT in between and
> realize a time saving relative to your master node just doing both work
> units itself.  This is simple arithmetic -- you have around 1 MB/second
> total bandwidth, and around 1K of data (in two packets) to move around
> per task, so 1000 per second saturates the master's bandwidth.  If it is
> very long -- say a day -- you could likely cover a fair chunk of the
> eastern seaboard with your tasks and still get near-linear speedup, even
> over 10BT (how many millisecond time slices in 24 hours?  About 10^9.
> Even allowing for considerable chaos and overhead (one full second to
> connect to a node, start a task, and collect results), you might be able
> to use as many as 10^5 nodes...

That does help, thanks. Currently I am seeing 10000 units of work
completeing in ~300 seconds on a single cpu box with the multi-threaded
app. That works to ~33 work units per second. For my app I would like to
see 10000 units complete in ~20 seconds or 500 units/sec. If the amount
of work is consistent on each node that would mean ~15 nodes? So if on
15 nodes each node processed 667 units, would that amount of network
traffic saturate the cluster's lan such that in the end it would have
been better to stay on a single (or smp) box?

I have seen the time required to process 10000 units to reach 2200-2400
seconds, or 4.5 units per second.

There is no barrier within the 10000 units, but all 10000 must complete
before the next round of 10000 are worked. On a previous attempt using
pvm I implemented a round-robin deal where the head sends a unit to
A, then B, then C, and so on. The first node that returned its results
received the next unit. The head continued down the list of units until
reaching the bottom. At the bottom loop to the top and look for any
lingering units that have not received results, sending those units to
waiting nodes. In this way I could deal with nodes of different speeds
and even nodes that crashed as not every node would crash at the same
time.

Oh, and I do mean 10baseT as a 10Mb/s network. I can go 100Mb/s if I
needed.

Mike



More information about the Beowulf mailing list