Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Teraflop chip hints at the future

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Wed Feb 14 09:17:59 PST 2007


>>> Intel is stacking dram dice above the cpu as an L4 cache, but the article
>> 
>> stacking seems like a major hack - I'd rather think about how to do 
>> processor-in-memory (perhaps zram?).
>
> It's a technology thing.. you can't get DRAM densities with processes used 
> for CPUs and the like. Different fabs, different processes, even though the

1Gb (128GB) seems to be the current state-of-production for normal DRAM;
Intel has 24 MB on some chips, though we mightn't call those production - 
the mass-market chips are at a "mere" 8MB onchip.

so, waving hands wildly, there's about a 16x density advantage; this is 
a bit more than one might expect from transistor counts (~1 vs ~6, iirc),
but as you say, dram is highly tweaked for density.

> feature sizes are similar.  There's also some thermal issues.  If you use a 
> CPU process to build ram, it's not very dense (think cache on current

actually, I was more thinking of putting more memory (not necessarily 
standard dram) onto a CPU-oriented process.

> don't know that you can even build a big CPU on a DRAM process.  DRAMs are 
> pretty highly optimized (read, they've spent billions of dollars on tweaking 
> the device models to within a gnats eyelash of the physics limits).. for

that's not the point, of course - even a small CPU on each dram chip would 
add up to a profoundly powerful system.  for instance, take a pretty mundane
2-socket, 16GB workstation today and notice it's got probably 128 separate
dram chips.  imagine if each of those had even a small onchip processor
(say, 2-4Mt).  the potential is there for something quite useful (I admit 
practical problems to getting dram vendors/industry to do such a thing...)

> instance, because with DRAM you only read or write one location at time, very

well, I have the impression that a lot of the power dissipated by modern
chips is actually the external clock/PLL and drivers.  then again, a dram 
chip only dissipates a fraction of a watt (I looked at a Micron 1Gb ddr2/667- 
it could possibly dissipate <.5 (all banks interleave), but normal
back-to-back sequential activity would be only ~.3W.  that's for ddr2 at
1.8V - ddr3 is 1.5 and I imagine the trend to lower voltages will continue.

> few transistors change state on any given cycle, so the power dissipation is 
> low.  Compare with a CPU where you have thousands of transistors changing 
> state on a cycle.

that's still a good point.  a single transaction on a current dram would 
only warm up one row of one bank.  probably modelable by ignoring the
dissipation of the array itself, and just counting the control/sense/io
logic.

> Go to the IEEE High Speed Digital Interconnect Workshop in Santa Fe this 
> year... there's amazing stuff that people are doing.

alas, my day-job is sys admin/programmer/dogsbody, not designing new,
cutting-edge compute architectures ;(

regards, mark hahn.



More information about the Beowulf mailing list