Switch recommendations?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Alexander J. Flueck flueck at iit.eduMon Nov 27 07:39:39 PST 2000
- Previous message: Hardware recommendations for Intel Xeons
- Next message: Switch recommendations?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Miles, Here's the latest on switch recommendations. --Alex ------------------------------------------- On Fri, 24 Nov 2000, Steven Berukoff wrote: > > Hi, > > We're putting together a small (~15) node cluster of Alphas (21264 @ > 667MHz) for use in a data analysis application. Our use of the cluster > requires high performance computational capability (hence the use of the > Alphas with their high memory capability) but doesn't involve high network > traffic. Basically, each node will grab a large chunk of data, do FFTs on > pieces of it, store the results locally, then only occasionally contact > the master for more. > > So my question: can anyone provide good recommendations for a switch? > Like I said, high network traffic is not to be a concern, but the cluster > will be augmented at a later time, to perhaps 64 nodes. Obviously, the > switch solution should be able to scale appropriately. Are there > models/manufacturers definitely to avoid? Are there good cost/performance > compromises? > > Any input is greatly appreciated! For the scale and kind of operation you describe, it sounds like you are not likely to be either bandwidth choked or contention choked at the switch level -- if you are choked anywhere it would be at the point of connection between your main master/server and the switch, and it sound like even that isn't likely to be much of a problem (although you'd have to provide more detail to be sure, see below). If these assumptions are correct, you are in the happy position of being able to buy damn near anything and not having your choice greatly affect ultimate performance or your ability to scale to 64 nodes. For example, you can likely consider a stack of 3 24 port switches with or without gigabit uplinks, and if you are choked on the server either put multiple NICs in the server (3, with one port to each switch seems like a nice possibility) or channel bonded NICs to one switch, or a Gbps NIC to one switch. Alternatively, you can consider a larger switch with a fabric that can support 64+ ports -- one that is frequently mentioned on this list as a good price/performance/feature switch is the HP ProCurve switch, which can be purchased over the counter for $1600-1800 with 40 ports. To go beyond, I think it was Don Becker who recommended early this year that one consider buying two 40 port HP ProCurves and putting all 80 ports and the second power supply in one chassis (they support dual power). That gives you more than enough ports and dual power for perhaps $3400, or $42.50/port (plus the cost of the node NIC). The stackable solutions will cost less than this per port -- you could even get four 16 port non-stackable switches and plan to put 4 NICs in your master/server to get to 60 nodes plus the master. 16 port switches are dirt cheap, and since you don't know when you will get more nodes it lets you take advantage of the even lower switch prices that will likely hold if and when you ever need more ports. It is usually a good idea to spend as little as possible now, as Moore's Law will buy you far more for far less money later when you eventually need it. The ProCurve solution (or similar solutions from other vendors) will provide better scaled symmetric internode communications, but if you are indeed in a master-slave paradigm the extra cost isn't likely to improve performance. The only techinical questions I can think of that you should be aware of to fuel the purchase decision are: a) Try to get a fair idea of the ratio between the time each node spends computing vs the time each node spends getting the next chunk of data to work on or returning results. To avoid contention or waiting on a network resource you need a ratio of AT LEAST N:1 where N is the number of nodes you expect to use, and you'll only avoid contention with N:1 if the calculation is perfectly organized to proceed predictably synchronously. If the ratio is 6400:1 (you spend 6400 seconds computing and 1 second getting the next set to compute) and you plan to go to 64 nodes, you're pretty safe -- even without a bit of deliberate antibunching of starting times, the probabilities suggest that the network will "never" be congested. If it is 16:1, you can't get to 64 nodes at all -- you'll have 3 nodes waiting for the fourth to get its data, all the time. b) Try to accurately estimate the number of small-packet transfers of data required per node per FFT cycle. If it is a large number (and cannot be reduced by sensibly rewriting your code) then you MIGHT be performance sensitive to switch latency. In that event you should think about a higher-end switch -- the cheap switches are store-and-forward switches with an aggregate latency that is often in the 200 microsecond range (0.2 milliseconds). This can cost you a lot of performance if you have to send 500 small packets back and forth to start each FFT AND your ratio from a) isn't favorable to begin with. Cutthrough switches are much more expensive but also have much lower latency. I doubt that manageability is very important to you, although it doesn't greatly add to the cost of the switches either. The higher end switches will be manageable and have slightly better latencies and so forth, but it probably isn't worth it for a coarse-grain non-synchronous master/slave paradigm. Hope this helps. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf Dr. Alexander J. Flueck email: flueck at iit.edu Electrical and Computer Engineering phone: 312.567.3625 Illinois Institute of Technology FAX: 312.567.8976 Chicago, IL 60616-3793 http://www.ece.iit.edu/~flueck/
- Previous message: Hardware recommendations for Intel Xeons
- Next message: Switch recommendations?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
