Kidger's comments on Quadric's design and performance

Wed Apr 24 09:16:01 PDT 2002

James Cownie wrote:
> 
> Sorry if you get something like this message twice, I submitted it
> once and nothing has come back, although my correction to one of the
> www addresses went through :-(
> 
> Joachim Worringen <joachim at lfbs.RWTH-Aachen.DE> wrote
> 
> > > This message also reminded me to ask if a long-held opinion is valid - and
> > > that opinion is "that a cache coherent interconnect would offer performance
> > > enhancement when applications are at the 'more tightly coupled' end of the
> > > spectrum."  I know that present PCI based interfaces can't do that without
> > > invoking software overhead and latencies.  Anyone have data - or an argument
> > > for invalidating this opinion?
> >
> > You would need another programming model than MPI for that (see below),
> > maybe OpenMP as you basically have the characteristics of a SMP system
> > with cc-NUMA architecture.
> 
> No, you are confusing two completely different issues. To support
> OpenMP you need a single address space which spans the processors.

You are right, this is completely different. However, I did not mean
that connecting nodes of a cluster with a cache-coherent interface
"gives you an SMP", but more precisely "gives the shared parts of the
distributed distinct address spaces nearly SMP-like access
characteristics", with respect to a suitable programming model. 

This would enable a matching OpenMP-Compiler/run-time-lib to generate
and run code with (more or less) SMP-like performance as does the OMNI
OpenMP-Compiler (currently on top of a software DSM library SCASH on top
of SCore, see http://www.hpcc.jp/Omni - this is all software which is
much more perfomance-sensitive to bad data-placement and has generally a
much higher overhead than such a hw-based solution would have).

There is something similar on top of SCI, namely the HAMSTER project
(http://hamster.informatik.tu-muenchen.de/), but w/o OpenMP, IIRC, and
still some software-overhead to "simulate" cachable remote memory on top
of SCI-connected PCs.

With Quadrics, this should be possible in an even more efficient manner
due to the hardware-MMU and -TLB on the adapter.

To have a real cc-NUMA-SMP, the integration needs to be higher (HP
X-Class, DG/IBM NUMA-Q, ...), this is for sure.  The question is: are
large-scale SMPs as sold by IBM, Sun, ... not the better solution for
such tasks? Quadrics is expensive, and you still have to manage a bunch
of PCs instead a nice, single SMP.

  Joachim

-- 
|  _  RWTH|  Joachim Worringen
|_|_`_    |  Lehrstuhl fuer Betriebssysteme, RWTH Aachen
  | |_)(_`|  http://www.lfbs.rwth-aachen.de/~joachim
    |_)._)|  fon: ++49-241-80.27609 fax: ++49-241-80.22339