[Beowulf] fast interconnects
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jim Lux James.P.Lux at jpl.nasa.govSat May 20 14:54:42 PDT 2006
- Previous message: [Beowulf] fast interconnects
- Next message: [Beowulf] Re: noob understanding
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 10:16 AM 5/20/2006, Mark Hahn wrote: > > As for bit error rates.. 10^-15 is the going in worst case, and 10^-18 is > > the typical design point. A bit of a challenge to test the latter however > > (10^18 bits at 10^10 bits/sec takes 10^8 seconds) > >I did a little research, and could only find reference to 10^-12 >as the target BER for 10gbase-t. I'm not sure how much this would >matter though - surely people would still use the usual higher-level >checksum/retransmission, no? Not necessarily, because frame errors reduce channel capacity and require larger buffer space, especially if the transit time through the link is greater than a message length (e.g. a 15kbit frame takes only 15 nanoseconds at 1 Tbps, and that's a couple meters of fiber). You really would rather the data get to the other end un-errored, rather than any sort of ack/nak/go back N kind of protocol. A similar problem exists today with long latency links for TCP/IP. So, at some point, it really starts to pay to do some "coding", particular for Forward Error Correction. The 10GigE thing, for instance, contemplates the uses of (2500,1700) Low Density Parity Check coding (Numbers approximate, but it's something like 1700 parity bits protecting 2500 data bits). > also, from what I read, the main >concern wrt BER is length-related insertion loss. obviously, if >the system can manage 10^-12 at 100M, it'll have a much easier time >for inside-machineroom runs (say, 15M) or in-cluster (<10 most of >the time). At a low level (chip to chip, say) and even at a box-to-box level, avoiding to have any sort of retry mechanism has great value (imagine having a parity check in the middle of your CPU pipeline, where a bit error would trigger a pipeline flush.. you'd rather do error correcting codes and keep the pipeline running) At high data rates, and relatively short runs (<300m), the error rate tends not to scale with length, because, particularly for lower speed implementations today (1 Gbps and lower), the dominant source of problems is related to interfaces (connectors, etc.) and Tx/Rx crosstalk (near end and far end), which are "end point" related. There is a length issue, because of the attenuation in the medium, but, to a certain extent that can be overcome by just increasing the power pushed into the link (which aggravates the crosstalk problem, but in a more manageable way). Basically, once the bits are in the wire, the distance they travel isn't as much an issue. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875
- Previous message: [Beowulf] fast interconnects
- Next message: [Beowulf] Re: noob understanding
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
