[Beowulf] Teraflop chip hints at the future
eugen at leitl.org
Wed Feb 14 10:15:28 PST 2007
On Wed, Feb 14, 2007 at 09:51:21AM -0800, Jim Lux wrote:
> I'm not sure you could put any processor (except maybe something like
> a microcontroller) into a DRAM design and keep the densities
> up. There are all sorts of things that might bite you.. aside from
IBM has just announced at the ISSCC a 1-transistor eDRAM
substitute for the 6T-SRAM cell used in caches. (Others
have already demonstrated 1T-SRAM years ago, AMD has Z-RAM,
Intel Floating Body Cells, T-RAM doesn't need a capacitor,
etc. -- embedded RAM is reasonably common in network
It's 45 nm SOI (starting 2008), 1.5 ns access (SRAM does 0.8..1 ns),
and is supposed to be far more dissipation-friendly. Theoretically
this gives you 6 times the eDRAM of a CPU cache, which is at least
12 MBytes, and possibly up to 48 MBytes (Power6 dual-core has 8 MBytes
> thermal issues, I suspect that the number of mask layers, etc. is
> fairly small for DRAM. The actual materials on the chip (doping
> levels, etc.) may not allow for a reasonably performing processor
> with reasonable feature sizes and thermal properties. Getting the
> heat away from the junction is a big deal.
> I think DRAMs are built with a maximum of 4 layers of interconnect
> with vias, while processors have a lot more layers and a much more
> sophisticated interconnect structure.
Above processes are compatible with CPU processes, so there's some
hope the piggybacking in Terascale doesn't have to be forever.
> Each and every switch has some non-zero power associated with
> changing state. Sure, the core swings smaller voltages and energies,
> but a DRAM cell is a lot smaller than a flipflop or half-adder in the
> CPU, and only one is changing at a time, as opposed to thousands.
At the horizon, there's MRAM which can also do logic with a little
extension to each cell (a kind of nonvolatile FPGA). It's not
that hugely fast, but it's static, and very low power.
> A big advantage of integrating CPU and memory, though, is that you
> don't have to "go offchip" which saves a huge amount in
> drivers/receivers, etc. Of course, this is why everyone is looking
Yes, this is a major advantage. No pads, too, but a few serial
> to integrated photonics and/or real high speed serial
> interconnects. The I/O buffer might consume a hundred or thousand
> times more power than the onchip logic driving it. Trading some more
> logic inside to serialize and deserialize, and do adapative
> equalization, in exchange for fewer "wires out of the chip" is a good deal.
> Then, there's the speed of light problem. Put two chips 10cm apart
Increasing density to true 3d integration is a very good way
to reduce the average distance. Stacking computation modules
on a 3d lattice also minimizes dead space, of course with
current cooling you won't get more than a few 10 MW out of
a paper basket volume before the cluster goes China syndrome.
> on a board, and the round trip time (say for address to get there and
> data to get back) is going to be in the nanoseconds area, even if the
> chip itself were infinitely fast.
The mammal CNS has a 120 m/s signalling limit, yet it can process pretty
complex stimuli in few 10 ms.
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
More information about the Beowulf