[Beowulf] CPU Startup Combines CPU+DRAM—And A Whole Bunch Of Crazy

Mon Jan 23 07:38:39 PST 2012

If you read this PDF from Venray Technologies, which is linked to in the
article, you see where the 'Whole Bunch of Crazy" part comes from. After
reading it, Venray lost a lot of credibility in my book.

https://www.venraytechnology.com/economics_of_cpu_in_DRAM2.pdf

--
Prentice

On 01/23/2012 08:45 AM, Eugen Leitl wrote:
> (Old idea, makes sense, will they be able to pull it off?)
>
> http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/
>
> CPU Startup Combines CPU+DRAM—And A Whole Bunch Of Crazy
>
> Sunday, January 22, 2012 - by Joel Hruska
>
> The CPU design firm Venray Technology announced a new product design this
> week that it claims can deliver enormous performance benefits by combining
> CPU and DRAM on to a single piece of silicon. We spent some time earlier this
> fall discussing the new TOMI (Thread Optimized Multiprocessor) with company
> CTO Russell Fish, but while the idea is interesting; its presentation is
> marred by crazy conceptualizing and deeply suspect analytics.
>
> The Multicore Problem:
>
> There are three limiting factors, or walls, that limit the scaling of modern
> microprocessors. First, there's the memory wall, defined as the gap between
> the CPU and DRAM clock speed. Second, there's the ILP (Instruction Level
> Parallelism) wall, which refers to the difficulty of decoding enough
> instructions per clock cycle to keep a core completely busy. Finally, there's
> the power wall--the faster a CPU is and the more cores it has, the more power
> it consumes.
>
> Attempting to compensate for one wall often risks running afoul of the other
> two. Adding more cache to decrease the impact of the CPU/DRAM speed
> discrepancy adds die complexity and draws more power, as does raising CPU
> clock speed. Combined, the three walls are a set of fundamental
> constraints--improving architectural efficiency and moving to a smaller
> process technology may make the room a bit bigger, but they don't remove the
> walls themselves.
>
> TOMI attempts to redefine the problem by building a very different type of
> microprocessor. The TOMI Borealis is built using the same transistor
> structures as conventional DRAM; the chip trades clock speed and performance
> for ultra-low low leakage. Its design is, by necessity, extremely simple. Not
> counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000
> transistors for the original ARM2. The company's early prototypes, built on
> legacy DRAM technology, ran at 500MHz on a 110nm process.
>
> Instead of surrounding a CPU core with a substantial amount of L2 and L3
> cache, Venray inserted a CPU core directly into a DRAM design. A TOMI
> Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16
> ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM.
> Because they're built using ultra-low-leakage processes and are so small,
> such cores cost very little to build and consume vanishingly small amounts of
> power (Venray claims power consumption is as low as 23mW per core at 500MHz).
>
> It's an interesting idea.
>
> The Bad:
>
> When your CPU has fewer transistors than an architecture that debuted in
> 1986, it's a good chance that you left a few things out--like an FPU, branch
> prediction, pipelining, or any form of speculative execution. Venray may have
> created a chip with power consumption an order of magnitude lower than
> anything ARM builds and more memory bandwidth than Intel's highest-end Xeons,
> but it's an ultra-specialized, ultra-lightweight core that trades 25 years of
> flexibility and performance for scads of memory bandwidth.
>
>
> The last few years have seen a dramatic surge in the number of low-power,
> many-core architectures being floated as the potential future of computing,
> but Venray's approach relies on the manufacturing expertise of companies who
> have no experience in building microprocessors and don't normally serve as
> foundries. This imposes fundamental restrictions on the CPU's ability to
> scale; DRAM is manufactured using a three layer mask rather than the 10-12
> layers Intel and AMD use for their CPUs. Venray already acknowledges that
> these conditions imposed substantial limitations on the original TOMI design.
>
> Of course, there's still a chance that the TOMI uarch could be effective in
> certain bandwidth-hungry scenarios--but that's where the Venray Crazy Train
> goes flying off the track.
>
> The Disingenuous and Crazy
>
> Let's start here. In a graph like this, you expect the two bars to represent
> the same systems being compared across three different characteristics.
> That's not the case. When we spoke to Russell Fish in late November, he
> pointed us to this publicly available document and claimed that the results
> came from a customer with 384 2.1GHz Xeons. There's no such thing as an S5620
> Xeon and even if we grant that he meant the E5620 CPU, that's a 2.4GHz chip.
>
> The "Power consumption" graphs show Oracle's maximum power consumption for a
> system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB)
> of Flash and 15x 10,000 RPM hard drives. It's not only a worst-case figure,
> it's a figure utterly unrelated to the workload shown in the Performance
> comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of
> them only come out to 1.3kW--Oracle's 17.7kW figure means that the
> overwhelming majority of the cabinet's power consumption is driven by
> components other than its CPUs.
>
> From here, things rapidly get worse. Fish makes his points about power walls
> by referring to unverified claims that prototype 90nm Tejas chips drew 150W
> at 2.8GHz back in 2004. That's like arguing that Ford can't build a decent
> car because the Edsel sucked.
>
> After reading about the technology, you might think Venray was planning to
> market a small chip to high-end HPC niche markets... and you'd be wrong. The
> company expects the following to occur as a result of this revolutionary
> architecture (organized by least-to-most creepy):
>
>     Computer speech will be so common that devices will talk to other devices
> in the presence of their users.
>
>     Your cell phone camera will recognize the face of anyone it sees and scan
> the computer cloud for backround red flags as well as six degrees of
> separation
>
>     Common commands will be reduced to short verbal cues like clicking your
> tongue or sucking your lips
>
>     Your personal history will be displayed for one and all to see...women
> will create search engines to find eligible, prosperous men. Men will create
> search engines to qualify women. Criminals will find their jobs much more
> difficult because their history will be immediately known to anyone who
> encounters them.
>
>     TOMI Technology will be built on flash memories creating the elemental
> unit of a learning machine... the machines will be able to self organize,
> build robust communicating structures, and collaborate to perform tasks.
>
>     A disposable diaper company will give away TOMI enabled teddy bears that
> teach reading and arithmetic. It will be able to identify specific
> children... and from time to time remind Mom to buy a product. The bear will
> also diagnose a raspy throat, a cough, or runny nose.
>
> Conclusion:
>
> Fish has spent decades in the microprocessor industry--he invented the first
> CPU to use a clock multiplier in conjunction with Chuck H. Moore--but his
> vision of the future is crazy enough to scare mad dogs and Englishmen.
>
> His idea for a CPU architecture is interesting, even underneath the
> obfuscation and false representation, but too practically limited to ever
> take off. Google, an enthusiastic and dedicated proponent of energy
> efficient, multi-core research said it best in a paper titled "Brawny cores
> still beat wimpy cores, most of the time."
>
>  "Once a chip’s single-core performance lags by more than a factor to two or
> so behind the higher end of current-generation commodity processors, making a
> business case for switching to the wimpy system becomes increasingly
> difficult... So go forth and multiply your cores, but do it in moderation, or
> the sea of wimpy cores will stick to your programmers’ boots like clay."
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>