[Beowulf] slightly [OT] smp boxes

Mon Oct 20 16:38:58 PDT 2008

Mark Hahn wrote:
>> http://www.numascale.com.
>> Large scale NUMA architecture at cluster pricing.
> 
> that's fairly exciting!  is this stuff somehow related to the newisys 
> horus system?

Hi Mark,

HORUS and NumaChip share the same "vision" I'd say, but there are lots 
of architectural differences.

> 
>> Disclaimer: Yes I work for them. If you have any questions, I'll be 
>> glad to answer them.
> 
> OK, what's the memory profile like?  ie, latency for memory from l1 
> cache hit all the way to all-misses-farthest-node.

Worst case scenario is a RMC (Remote Memory Cache) miss, which is ~1.2us 
for farthest node in a 3D Torus (2 dimension jumps, 4 total). If you 
have a RMC Hit however you are in practice accessing memory on a remote 
Opteron on a local node (~150ns). We support a RMC size of 16 GByte.

> it would be nice to 
> know a bit more about the actual fabric (dolphinics-sci-like?
> the whitepaper mentions "PCI Express Gen-2 type signals", but what
> does that mean?

Without going into too much detail, the fabric physical layer is PCIe 
based using 6  x4(i.e 1GB/s)links. The transport layer however is torus 
based SCI (as we're using IEEE SCI coherency protocol) with counter 
rotating rings.

>  also, the WP seems pretty HT/Opteron-specific;
> plans for intel support?

HT/AMD-Opteron was a natural interface for us to start with since a) it 
is available today, and b) the interface is proven and standardized 
(ever tried interfacing an ASIC with the P6 Processor bus which changes 
every 3 months ?).

However, as we all know, Intel have announced QPI and it would be pretty 
"un-clever" of us to not look into that platform aswell :)

>  I know of only a few systems which ship
> with HTX slots - not an endangered species?

There are only a few systems (as in complete servers from HP/IBM etc) 
that ship with HTX slots today, yes. There are however quite a few 
Motherboard vendors that make HTX ready boards (Tyan/Supermicro/Asus etc.).

>  also, <1us MPI latency
> sounds good, but I'm not clear on the bandwidth: what kind of single
> pair bandwidth is possible, as well as all-pairs?
> 

Pair bandwidth is today limited by the 1GByte/sec links we're using, 
meaning that in practice with protocol overhead etc. you're looking at 
roughly 1.8GByte/sec bidirectional pair bandwidth.

Cheers,
Steffen