[Beowulf] Broadcast - not for HPC - or is it?

Matt Hurd matthurd at acm.org
Thu Oct 7 17:55:12 PDT 2010


Daniel,

> Are you really claiming <9ns port to port ?

Yep, hard to measure without using an oscilliscope.  10G Endace timers
give about 20ns of accuracy on packets which is not quite enough.  As
an aside, I would love to know about any more convenient methods for
measuring sub 10ns latency.

> (Quadrics used to think they were leading edge with 40ns latency port to
> port latency on their switches)
>
> At <9ns for the 'switch' then surely the speed of light in copper (a massive
> 1ns per foot) will dominate over the switch itself?

Quadrics products did much more useful things than Opticast which is
not even a hub, just port A to port B1...Bn replication, one way.

It would indeed be difficult to do any two way serdes at all in sub
10ns if you needed to look inside the packets or deal with contention.
 It is just a n-way coupler at the core, splitting the photon stream.
The optical fibre path internally is just a box with 0.3m tails, for
0.6m = 3ns of fibre path.  My experience corresponds with Jim's
comments:  5 micros per km for copper and fibre roughly; we measure
0.65 to 0.69 C for a variety of twisted pair and optic fibre media.
Have a handy 200m fibre as a 1 microsecond timing sanity checker ;-)
The opto electronics modules are a bit under a nanosecond in
propagation time to polish up the signal and you end up going through
four of those from the cable in to the cable out on the box.

The thing that makes it work is the fact that the signal integrity on
optics is much more flexible than electrical as a 64-way split via
optics gives you a bit under a 21dB loss in a link budget of 39dB or
so.  If you control all aspects of the link, such as putting a small
link in a box,  it leaves enough head room to clean things up and
represent.  Something like that is a lot harder in the electrical
domain as a 21dB loss is a bit nasty.

It is fun to put it all together into a box of convenience with a
single digit nanosecond time against it, even if it is only moderately
useful.  Certainly makes sense for a stock exchange to take the load
of their network instructure and also speed things up.

>
> Plus as others say it is not the broadcast that is the hard bit - it is
> getting the consolidated acks back.

Indeed.

It's been an interesting thread, but I think I've come to the
conclusion that, except for a few financial market uses, such a device
is not really useful for bewoulf or HPC as the MISD model doesn't seem
to be of much practical use unless you can get something cute for
virtually free to suit occaisonal use like those mentioned earlier
such as the PAPERS or integrated Blue Gene.

 --Matt.



More information about the Beowulf mailing list