[Beowulf] Is there really a need for Exascale?

Fri Nov 30 11:34:32 PST 2012

On 11/30/12 10:57 AM, "Mark Hahn" <hahn at mcmaster.ca> wrote:
>
>>> stacking is great, but not that much different from MCMs, is it?
>>
>> Real memory stacking a la TSV has smaller geometries, way more
>> wire density, lower power burn, and seems to boost memory bandwidth
>> by one order of magnitude
>
>sorry, do you have some reference for this?  what I'm reading is that
>TSV and chip-on-chip stacking is fine, but not dramatically different
>from chip-bumps (possibly using TSV) connecting to interposer boards.
>obviously, attaching chips to fine, tiny, low-impedence, wide-bus
>interposers gives a lot of flexibility in designing packages.
>
>> 
>>http://nepp.nasa.gov/workshops/etw2012/talks/Tuesday/T08_Dillon_Through_S
>>ilicon_Via.pdf
>
>that's useful, thanks.  it's a bit high-end-centric - no offence, but
>NASA and high-volume mass production are not entirely aligned ;)

Actually, not as misaligned as you might think.. We'd like to get away
from custom low volume processes, because they tend to be "workmanship"
sensitive (I.e. Each instance varies from the next), so to the extent that
something like this might wind up being mass manufactured, that's great.

One significant issue that it helps with is the behavior over large
temperature ranges: right now, a large die on a large substrate/carrier
(with umpty bazillion pins/balls/columns) is a recipe for failure because
of CTE mismatch between the carrier (typically some kind of ceramic) and
the board it's stuck to (typically some glass epoxy composite) is
different. If you can break the large die up into many smalller dice with
some sort of regular, wide interconnect (which maps well into the "sea of
cells/modules" architecture in a typical FPGA) the substrate can be made
of something that has a different CTE than the dice.

For what it's worth, large temperature cycles every sol are the big
problem on Mars (or the moon). It's not unlike the environment faced by an
engine control unit in a car parked outside in the winter.. By comparison,
things in "space" are in a pretty benign thermal environment.

I don't know what the commercial volume is for big FPGAs (like the Virtex
7 mentioned in the slides), but it must be fairly large or Xilinx wouldn't
be making them. Probably not like the market for CPUs, but still pretty
big. (after all, they quote prices for Qty:1000 in the distributor's on
line catalogs)

>
>it paints 2.5d as quite ersatz, but I didn't see a strong data argument.
>sure, TSVs will operate on a finer pitch than solder bumps, but
>the xilinx silicon interposer also seems very attractive.  do you
>actually get significant power/speed benefits from pure chip-chip
>contacts versus an interposer?  I guess not: that the main win is
>staying in-package.

Yes.. Although you might be able to go faster on chip to chip just because
there's 1 fewer interface/impedance discontinuity?

>
>it is interesting to think, though: if you can connect chips with
>extremely wide links, does that change your architecture?  for instance,
>dram is structured as 2d array of bit cells that are read out into a
>1d slice (iirc, something like 8kbits).  cpu r/w requests are satisfied
>from within this slice faster since it's the readout from 2d that's
>expensive.  but suppose a readout pumped all 8kb to the cpu -
>sort of a cache line 16x longer than usual.  considering the proliferation
>of 128-512b-wide SIMD units, maybe this makes perfect sense.  this would
>let you keep vector fetches from flushing all the non-vector stuff out
>of your normal short-line caches...

Or even better, it allows mixing of processes in one package, and the
"outside the package" interconnects can then be fewer in number.  Instead
of 32 address lines and 32 data lines between CPU and RAM, and another
batch of address and data lines to a multi gigabit speed serial
interconnect with 4 wires, you can have just the 4 wires.

Sometimes I wish you could buy/use large FPGAs with most of their
balls/columns removed.  So I don't have to worry about pin/pin shorts or
inspectability of the solder *under the package*.  Unfortunately, all
those pins are also the thermal path out of the chip.

I'd love a BIG FPGA packaged in something like a power diode (with a big
copper bolt or fin sticking out each side)  with a few dozen signal wires.
 I don't need 1152 pins:  I need 20 signal pins.

>