[Beowulf] fast interconnects, HT 3.0 ...

Eugen Leitl eugen at leitl.org
Wed May 24 12:17:58 PDT 2006

On Wed, May 24, 2006 at 09:09:23AM -0500, Richard Walsh wrote:

>    Jim, I meant cache coherence.  As we know, HT provides cache 
> coherent and non-cache coherent
>    memory management.  Typically within the board complex on an SMP 
> device we want cache coherency.

You cannot have cache coherency over a large amount of systems *and*
have temporally unconstrained execution. There is no free lunch.
There are already coherency issues in distributing such a simple
thing as clock over such a small area as a single die. (Which
is why global clocks will go away one day).

>    The HT 3.0 standard, as I understand it, offers off-chassis memory 
> access at lower bit rates using AC power,
>    but without cache coherence.  This is quite similar to the approach 
> taken on the Cray X1 with cache coherent
>    on-board images and non-coherent access off-board.  The Cray X1 

I think cache coherency on 4-16 CPUs on-board does make some sense.

> support the partitioned Global Address
>    Space (pGAS) programming models of UPC and CAF.   The question here 

pGAS assumes shared memory. There is no such thing as a shared memory,
beyond of multiport memory where "crossbars do not scale" thing applies.

> was: What do those that under
>    stand HT 3.0 better than I do think about its ability to similarly 
> support the pGAS programming style
>    efficiently?  The follow up question was:  What might be the 
> implications for commodity parallel programming
>    in MPI.  I want to get a feel for HT 3.0s scalability in this 
> context, the need/density of potential HT switches,
>    etc. 
>    The discussion on signal coherence was of course interesting ... ;-) ...

