[Beowulf] Nehalem and Shanghai code performance for our rzf example

richard.walsh at comcast.net richard.walsh at comcast.net
Tue Jan 20 15:24:42 PST 2009

----- Original Message ----- 

From: "Bill Broadley" bill at cse.ucdavis.edu 

>If gallium arsenide or some other material gave us 10x the clock rate per 
>watt, but 1/2 the transistors would it really matter?  Seemed like even intel 
>is begrudgingly admitting it's the memory bus, and finally the nehalem is 
>blessed with dramatically more bandwidth. 
>Seems like increasingly cores are turning latency limited workloads (for the 
>parallel jobs of course) into bandwidth limited ones.  Without a memory bus 
>that allows for 10x the bandwidth it doesn't really seem like 10x the clock 
>rate would be of particular use. 

Right.  Excepting the potential for improving the performance of serial codes 

or pieces of serial code (and perhaps badly written code) , delivering 10x by 

clock or by core would not seem to change the bandwidth problem both create . 

Manycore core promises even greater multiples.  For bandwidth limited data 

parallel codes, you  might as well stay on the path of lowest economic resistance. 


Beowulf mailing list, Beowulf at beowulf.org 
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090120/5392b4e7/attachment.html>

More information about the Beowulf mailing list