[Fwd: Re: [Beowulf] Cell in HPC]

Wed May 31 14:38:55 PDT 2006

> execution models to share instruction code, but splitting L2 data
> across cores is bound to be a destructive use of the cache in any
> data parallel model.  Obviously, user control of the cache is a large

"data parallel model" basically means you're streaming in/out of dram,
right?  why are these cases not nicely covered by the placement 
instructions implemented in mmx and followons?  you can control 
how a load or store behaves wrt different levels of cache cache.  
IIRC, Intel introduced some new stuff to make the cache shared 
by cores more effective this way (per-core victim traffic writes through?)