Kidger's comments on Quadric's design and performance
joachim at lfbs.RWTH-Aachen.DE
Wed Apr 24 09:16:01 PDT 2002
James Cownie wrote:
> Sorry if you get something like this message twice, I submitted it
> once and nothing has come back, although my correction to one of the
> www addresses went through :-(
> Joachim Worringen <joachim at lfbs.RWTH-Aachen.DE> wrote
> > > This message also reminded me to ask if a long-held opinion is valid - and
> > > that opinion is "that a cache coherent interconnect would offer performance
> > > enhancement when applications are at the 'more tightly coupled' end of the
> > > spectrum." I know that present PCI based interfaces can't do that without
> > > invoking software overhead and latencies. Anyone have data - or an argument
> > > for invalidating this opinion?
> > You would need another programming model than MPI for that (see below),
> > maybe OpenMP as you basically have the characteristics of a SMP system
> > with cc-NUMA architecture.
> No, you are confusing two completely different issues. To support
> OpenMP you need a single address space which spans the processors.
You are right, this is completely different. However, I did not mean
that connecting nodes of a cluster with a cache-coherent interface
"gives you an SMP", but more precisely "gives the shared parts of the
distributed distinct address spaces nearly SMP-like access
characteristics", with respect to a suitable programming model.
This would enable a matching OpenMP-Compiler/run-time-lib to generate
and run code with (more or less) SMP-like performance as does the OMNI
OpenMP-Compiler (currently on top of a software DSM library SCASH on top
of SCore, see http://www.hpcc.jp/Omni - this is all software which is
much more perfomance-sensitive to bad data-placement and has generally a
much higher overhead than such a hw-based solution would have).
There is something similar on top of SCI, namely the HAMSTER project
(http://hamster.informatik.tu-muenchen.de/), but w/o OpenMP, IIRC, and
still some software-overhead to "simulate" cachable remote memory on top
of SCI-connected PCs.
With Quadrics, this should be possible in an even more efficient manner
due to the hardware-MMU and -TLB on the adapter.
To have a real cc-NUMA-SMP, the integration needs to be higher (HP
X-Class, DG/IBM NUMA-Q, ...), this is for sure. The question is: are
large-scale SMPs as sold by IBM, Sun, ... not the better solution for
such tasks? Quadrics is expensive, and you still have to manage a bunch
of PCs instead a nice, single SMP.
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
More information about the Beowulf