now many nodes can a lan support?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Jan 10 12:38:57 PST 2003
- Previous message: now many nodes can a lan support?
- Next message: now many nodes can a lan support?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 10 Jan 2003, Mike Eggleston wrote: > That does help, thanks. Currently I am seeing 10000 units of work > completeing in ~300 seconds on a single cpu box with the multi-threaded > app. That works to ~33 work units per second. For my app I would like to Multi-threaded app? Task swapping between lots of threads adds significant kernel overhead. You'd much rather work in completely serial fashion through the work units and avoid the overhead. Forking is expensive, context switches and so forth are expensive. > see 10000 units complete in ~20 seconds or 500 units/sec. If the amount Well, in principle you can achieve this on 10 Base -- that is about 500 KB/sec of bandwidth required by the master to deliver the data and retrieve results, out of about a MB/sec of theoretical bandwidth. I'm pretty sure you're safely within the practical latency limits as well -- the 1000 Byte packet should be no problem as it is close to the MTU and hence bandwidth dominated; the small data packet might well be latency dominated but you should be OK. > of work is consistent on each node that would mean ~15 nodes? So if on > 15 nodes each node processed 667 units, would that amount of network > traffic saturate the cluster's lan such that in the end it would have > been better to stay on a single (or smp) box? I "think" you'll be able to scale to fifteen or sixteen nodes. Right now you're working about 30,000 microseconds per task. Your packet exchange "should" take on the order of 1000 microseconds on 10BT (feel free to correct this estimate, anyway) -- roughly one microsecond per byte, although of course you do NOT send it a byte at a time to achieve this but all at once in a single packet. 10 base is slow enought that you can probably neglect TCP and PVM overhead relative to the physical network time. The additional overhead required by the communications will slow down your task completion rate so it is less than 33/second (maybe you'll get 28-30) but that should still provide you with a generous amount of parallel speedup. > > I have seen the time required to process 10000 units to reach 2200-2400 > seconds, or 4.5 units per second. > > There is no barrier within the 10000 units, but all 10000 must complete > before the next round of 10000 are worked. On a previous attempt using > pvm I implemented a round-robin deal where the head sends a unit to > A, then B, then C, and so on. The first node that returned its results > received the next unit. The head continued down the list of units until > reaching the bottom. At the bottom loop to the top and look for any > lingering units that have not received results, sending those units to > waiting nodes. In this way I could deal with nodes of different speeds > and even nodes that crashed as not every node would crash at the same > time. > > Oh, and I do mean 10baseT as a 10Mb/s network. I can go 100Mb/s if I > needed. Things that can mess up the "simple" estimate of (say) 30 task units/second sustained include overhead and networking inefficiencies. A lot of them might end up connected to your cycle -- you really want the master node to be idle when each node tries to send its last result back and and collects its next unit. You also definitely want an ethernet switch, not a hub, as collisions in 10BT can significantly attenuate real world bandwidth. At that point, you have to face the economics of 10 vs 100. 100base cards cost $10-50 each, and even a rotten card like a cheapo RTL8139 (what you're likely to get for $10) will in principle outperform any 10base card. A sixteen port fast ethernet switch costs what, $80? $100? Again, less than $10/port -- little enough that I have one in my house for my personal LAN. So for an investment of $320 to maybe $700, your nodes could all be on switched 100BT. Suddenly, your bandwidth would be about 10 MB/sec (you only need 0.5 MB/sec) and the time to send your packets would be down in the 100-200 microsecond range. The probability of collision would be greatly reduced, and the time that such a collision would affect traffic ditto. You could also at least consider scaling up BEYOND just 16 nodes. Either way you're going to want to try to keep task distribution and execution as smooth and synchronous as possible. But I >>think<< you could scale out as far as 15-16 nodes on 10BT with at least positive gain (maybe or maybe not reaching your design goal), and am pretty sure you could reach your design goal with 100BT. You're coming fairly close to saturating 10Base with your data stream, and 10Base cards (especially ISA cards, which I HOPE are not involved!) aren't really stellar performers and tend to require a lot of systems resources as they function (i.e. the may not support any sort of DMA and may require the full attention of the CPU/kernel while sending or receiving a stream). rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: now many nodes can a lan support?
- Next message: now many nodes can a lan support?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
