[Beowulf] Is there really a need for Exascale?

Einar Rustad er at numascale.com
Thu Nov 29 15:32:05 PST 2012


On 29. nov. 2012, at 15:52, "Lux, Jim (337C)" <james.p.lux at jpl.nasa.gov> wrote:

> Okay.. So SRAM instead of Cache..
> 
> Or at least cache that doesn't care about off chip coherency (e.g. No bus
> snooping, and use delayed writeback)
> 
> A good paged virtual memory manager might work as well.
> 
> But here's a question... Would a Harvard architecture with separate code
> and data paths to memory be a good idea. It's pretty standard in the DSP
> world, which is sort of a SIMD (except it's not really a single
> instruction... But you do the same thing to many sets of data over and
> over.. And a lot of exascale type applications: finite element codes,
> would have the same pattern)
> 

With the separate L1 I-cache and D-cache of modern processors, they are pretty much 
Harvard architecture already. The L1 I-cache has a very high hit-rate for all programs that 
have a significant runtime (if not, the program would have to be gazillions of lines long..).
The few I-cache misses will not affect the performance of the common data paths very much.

> 
> 
> 
> 
> On 11/29/12 6:47 AM, "Eugen Leitl" <eugen at leitl.org> wrote:
> 
>> On Thu, Nov 29, 2012 at 02:19:26PM +0000, Lux, Jim (337C) wrote:
>>> 
>>> 
>>> On 11/28/12 11:46 PM, "Eugen Leitl" <eugen at leitl.org> wrote:
>>> 
>>>> On Thu, Nov 29, 2012 at 01:14:39AM -0500, Mark Hahn wrote:
>>>> 
>>>> I've been waiting for cache to die and be substituted by
>>>> on-die SRAM or MRAM. Yet to happen, but if it happens,
>>>> it will be with embedded-like systems.
>>> 
>>> 
>>> When running, SRAM consumes a lot more power and space than almost any
>>> kind of DRAM.  2-4 transistors per cell vs 1, if nothing else.
>> 
>> Yes, but we're talking cache. Cache is SRAM with extra logic.
>> Even a cache hit is slower than it would take to access on-die
>> SRAM. Cache coherency doesn't scale due to relativistically
>> constrained signalling. There also cannot be any such thing
>> as a global memory, unless you want it to be slow and spend
>> a lot of silicon real estate to make multiple writes to the
>> same location consistent.
>> 
>>> A big problem is that the CMOS process for dense, low power, fast RAM is
>>> different than what you want to use for a CPU. And even between DRAM and
>>> SRAM there's a pretty big difference. (trenches, etc.)
>> 
>> This is why we need stacked memories. Notice that MRAM might be compatible
>> with CPU fabbing processes. ST-MRAM
>> http://www.computerworld.com/s/article/9233516/Everspin_ships_first_ST_MRA
>> M_memory_with_500X_performance_of_flash
>> should have very good scaling in terms of performance and power
>> dissipation and can potentially be fabricated on top of an
>> ordinary CPU core http://www.cs.utexas.edu/~cart/publications/tr01-36.pdf 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Einar Rustad, VP Business Development
er at numascale.com			
Mob: +47 9248 4510






More information about the Beowulf mailing list