[Beowulf] RDMA NICs and future beowulfs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon Apr 25 17:26:33 PDT 2005
- Previous message: [Beowulf] RDMA NICs and future beowulfs
- Next message: [Beowulf] RDMA NICs and future beowulfs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Vincent Diepeveen wrote: > At 06:02 PM 4/25/2005 -0400, Mark Hahn wrote: > >>>Would anyone on this list have pointers to >>>which network cards on market support >>>RDMA (Remote Direct Memory Access)? >> >>ammasso seems to have real products. afaikt, you link with their >>RDMA-enabled MPI library and get O(15) microsecond latencies. >>to me, it's hard to see why this would be worth writing home... FWIW, we installed one of the first Ammasso based clusters. Still working on building some stuff for it. Mostly for StarCD. I won't comment on positioning other than to say that they occupy an interesting price performance niche. Infiniband is dropping rapidly in price, and is getting more attractive over time. As recent as 6 months ago, it added about $2k/node ($1kUS/HCA + $1kUS/port) to the cost of a cluster. More recently the cost per port appears to be quickly moving towards 400 $US, and the HCA's are dropping so that you can add only $1kUS to the price per node for your cluster. If you select the right switch, which you need anyway for your command and control net, you can get good port-port latencies. I think the next reasonable question to answer is fundamentally what is the cost benefit analysis? If you are performance bound and have an infinite budget, you need to look at the highest performance fabrics. Currently the Ammasso is not that. If you need to optimize performance versus cost constraints, and your code gets some boost from the lower latencies vs ethernet, the question is whether or not the value of that performance is enough to justify the added cost of the cards. >>>Would anyone have hands on experience >>>with performance, usability, and cost aspects >>>of this new RDMA technology? >> >>they work, Agreed. I would like to see a LAM implementation in addition to the MPICH. The installation is actually not that hard, and I have a simple perl script to auto-generate an rnic_cfg file from your host IP according to some simple rules. > but it's very unclear where their natural niche is. Not sure I agree with this. If there were no value in TCP offload, then why would Intel announce (recently) that they want to include this technology in their future chipsets? Basically the argument that I make here is that I think there is a natural place for them, but it is on the motherboards. Much in the same way you have an Graphics Processing Offload Engine in desktop systems, though in the case of motherboards, they found value in supplying the high performance interface rather than the offload engines. I personally think that the offload engine concept is a very good one. I like this model. With PCI-e (and possibly HTX), I think it has some very interesting possibilities. >>if you want high bandwidth, you don't want gigabit. Agreed. Out of sheer curiousity, what codes are more bandwidth bound than latency bound over the high performance fabrics? Most of the codes we play with are latency bound. I wrote a message passing example for my class with a humorous name to illustrate passing vectors (or matrices). I could easily turn this into a bandwidth test by using huge vectors. But I am not sure that most folks are doing that in their codes. >>if you want low latency, you don't want gigabit, agreed ... >> even RDMA-gigabit. I think a CBA is worth doing (and we may do this). If it gives a 5% boost for a 2% increase in cluster cost, is that worth it? If it gives a 30% boost for a 2% increase in cost, is that worth it? What fundamentally is the right cutoff (rhetorical question, cutoff varies based upon needs, funds, application,...) [...] > In reality the bandwidth/latency hunger gets even bigger in future when the > cell type processors arrive. Correct me if i'm wrong, it really needs a > branch prediction table for my branch intensive integer code, but even then > such a processor is kicking butt. I mean 8 processing help units (SPE's) at > 1 cpu and a main power pc processor. > > For floating point that's like 250 Gflop or so practical to their avail. The cell will not automagically give you 250 Gflop. It will not be easy to program. > > That *really* will make the networks the weakest chain. I haven't looked at the design in detail, but it looks like you are going to need a multistage resource scheduler to handle streaming data into the cell. Think of it as a super-multi-core NPU that has a more general instruction set. The Itanium has been out for a while and compilers for it are still maturing. VLIW^H^H^H^H^EPIC is hard. Anyone remember Trace Multiflows? I would not expect to see a gcc for the cell (and now watch IBM make me eat my words). I would expect that programming it is going to be a challenge. [...] > So obviously cell processor is kind of a step back for such software, but > even then we can see a single cell 4.0Ghz probably like a 8 processor > 2.8Ghz Xeon MP machine. Again, this is going to be difficult to program for in all likelihood (and if there are IBMers out there with this who know I am wrong, please let me know, or even better, let me at it :) ). Good compilers are hard. Very good compilers are rare. Suboptimal compilers are the norm. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] RDMA NICs and future beowulfs
- Next message: [Beowulf] RDMA NICs and future beowulfs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
