[Beowulf] choosing a high-speed interconnect
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Dan Kidger daniel.kidger at quadrics.comTue Oct 12 16:44:28 PDT 2004
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Chris, > I'm sure posing this may raise more questions than answer but which > high-speed interconnect would offer the best 'bang for the buck': > > 1) myrinet > 2) quadrics qsnet > 3) mellanox infiniband > > Currently, our 30 node dual Opteron (MSI K8D Master-FT boards) cluster > uses Gig/E and are looking to upgrade to a faster network. WelI I am from one of the vendors that you cite so perhaps by reply is biased. But hopefully I can reply without it seeming like a sales pitch. Our QsNetII interconnect sells for around $1700 per node (card=$999, rest is cable and share of the switch). A 4U high 32-way switch would be the nearest match in tems of size for a 30-node cluster. (c $14K iirc) MPI bandwidth is 875MB/s on Opteron (higher on say IA64/Nocona but the AMD PCI-X bridge limits us), MPI latency is 1.5us on Opteron. - only sligthtly better the Cray/Octigabay Opteron product (usually quoted as 1.7us.) Infiniband bandwidth is only a little less than ours, and latency not much worse than twice ours. Myrinet lags a fair bit currently but they do have a new faster product soon to hit the market which you should look out for. All vendors have a variety of switch sizes - either as a fixed size configuration - or as a chassis that takes one or more line cards that can be upgraded if your cluster gets expanded. Some solutions such as Myrinet revE cards need two switch ports per node but otherwise you just need a switch big enough for your node count and allowing for possible future expansion. Very large clusters have multiple switch cabinets arranged as node-level switches which have links to the nodes and top-level 'spine' switch cabinets that interconnect the node-level cabinets. If you have the same number of links to the spine switches as you do to the actual nodes then you should have 'full bisectionall bandwidth'. However you can save money by cutting back on the amount of spine switching you buy. Many interconnect vendors offer a choice of copper or fibre cabling. The former is often cheaper (no expensive lasers) but the latter can be used for longer cable runs and is often easier to physically manage particularly when installing very large clusters. What to buy depends very much on your application. Maybe you haven't proved that your GigE is the limiting factor. I do have figures for Fluent on ours and other interconnects but the Beowulf list is not the correct place to post these. As Robert pointed out, most vendors will loan equipment for a month or so and indeed many can provide external access to clusters for benchmarking purposes. Also for example the AMD Developer Center has large Myrient and Infiniband clusters that you can ask to get access to. Hope this helps, Daniel -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com --------------------
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
