[Beowulf] fast interconnects, HT 3.0 ...

Tue May 23 20:35:56 PDT 2006

Jim Lux wrote:
> At 06:49 AM 5/23/2006, Richard Walsh wrote:
>> Eugen Leitl wrote:
>>> In regards to keeping the wires short, does this IBM trick of keeping 
>>> all
>>> wires equal-length work well on 3d lattices, and above? This would 
>>> seem to
>>> be a must for those coming (hopefully) Hypertransport motherboards 
>>> with connectors.
>>>
>>    Speaking of Hyper Transport 3.0 and its AC chassis-to-chassis 
>> capabilities and 10 to 20
>>    Gbps performance maximums one-way (non-coherent, off chassis I 
>> believe), what do the
>>    people that know say about scalability.  Are we looking at 
>> coherency within the board complex
>>    and basic reference ability off board or something else?
> 
> WHere coherency means precisely what?  I don't see lockstep execution, 

Hmmm  ... I think he is talking HT coherency.  Could be wrong on this, 
as I see bandwidth being bandied about.

Basically Opteron processors are (strongly) cache coherent when 
connected via coherent HT links.

You can connect IO devices via non-coherent HT, or to processors via 
coherent HT.  There is a little fluff bit that explains this at 
http://www.amd.com/us-en/Processors/ComputingSolutions/0,,30_288_13265_13295%5E13340,00.html#HyperTransport 
about 3/4 of the way down.  Just above "Making the Connection"

> because when you're talking tens or hundreds of picoseconds per bit, 
> "simultaneous" is hard to define.  Two boxes a foot apart are going to 
> be many bit times different.  Something's got to take up the timing 
> slack, and while the classic "everything synchronous to a master clock" 
> has a simplicity of design, it has real speed limits (the light time 
> across the physical extent of the computational unit being but one).

I don't think he is talking about simultaneity...

The HT is coherent as long as the link is coherent.  There are real 
physical restrictions on this link, have to look it up in the specs.

Coherency in this aspect is analogous to strong cache coherency in SMP 
systems.  It does get harder as you increase the number of CPUs, the 
clock speed, and the chassis distance ...

> 
> 
>>    Sounds like the Cray X1E pGAS memory model.  Is there a role for 
>> switches?  And then there is the
>>    its intersection with the  pGAS language extensions (UPC and CAF) 
>> ... raising the prospect of
>>    much better performance in a commodity regime, with possible 
>> implications for MPI use.
> 
> Get to nanosecond latencies for messages, and corresponding fine grained 
> computation, and you're looking at algorithm design that is latency 
> tolerant, or, at least, latency aware.  As long as your "computational 
> granules" are microseconds, propagation delay can probably be subsumed 
> into a basic "computation interval", some part of which is the time 
> required to get the data from one place to another.

As I remember, the NEC SX-q (q >= 6) had a backplane crossbar 
thingamajig switch that hit nanosecond.  Maybe someone from NEC could 
correct my recollection.

Basically whenever you have length scales start approaching each other, 
you can get some interesting and spectacular effects.  One example of 
such effects are superlinear speedup, where the working set size on a 
particular CPU starts approaching the size of the cache as you increase 
the number of CPUs.

> 
> At finer times, you're looking at things like systolic arrays and 
> pipelines.

Large system design is hard.  And expensive.  So are the boxen it plugs 
into.  If the Opterons can plug into a chipset that "bridges" or somehow 
enables more coherent HT links, you can, in theory, build larger systems 
out of them.  The problem with this is that building good large systems 
is hard.  It might be far easier to make the bandwidth/latency hungry 
people/codes happy by pretending the non-coherent HT is a network (it is 
a point to point network), and passing messages over it.  Of course if 
you can make all the links coherent, then you can do all sorts of direct 
puts into a foreign address space ....  shared memory ....  (ooooh ahhhh).

>>    Anyone have a crystal ball or insights on this?

Nope ... got a silicon wafer if that helps... ;)

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615