[Beowulf] Opinions of Hyper-threading?

Thu Feb 28 05:20:01 PST 2008

Bill Broadley wrote:
>> The problem with many (cores|threads) is that memory bandwidth wall.  
>> A fixed size (B) pipe to memory, with N requesters on that pipe ...
> 
> What wall?  Bandwidth is easy, it just costs money, and not much at 
> that. Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec... buy a 

Heh... if it were that easy, we would spend extra on more bandwidth for 
Harpertown and Barcelona ...

The point is that the design determines your hard/fixed per socket 
limits, and no programming technique is going to get you around that 
limit per socket.  You need to change your programming technique to go 
many socket.  That limit is the bandwidth wall.

> better video card.  Want 200GB/sec buy 2.  Sure they don't have much 
> memory (512-768MB) and of course no double (although I'm not sure if the 
> now shipping 9600GT fixed that).  Sure video cards have minimal memory 
> (512-768MB), no double precision on the normal cards [2], and are harder 
> to program (CUDA vs the normal compilers).  Any programmed and CUDA and 
> the IBM Cell chip that could comment on how hard it is to do something 
> useful?  In any case, the reality and market acceptance of this approach 
> seem to be aggressively closing.  Thus machines
> with 16-32 threads/cores are becoming rather common (Sun T1000/T2000, quad
> socket quad core Intel, and hopefully RSN 4-8 socket 4 core AMDs).

My point was that it is going to get harder and harder to make effective 
use of those cores.

Basically, I have postulated elsewhere that all computing technology 
evolves to a point where it is bandwidth limited.  Each core on a 
Clovertown can completely swamp the memory controller, yet we put 8 of 
them on a motherboard.  This means that under particular workloads, the 
Clovertown (and Harpertown and Woodcrest) cores are being starved of 
data.  If you can't feed all of them (the cores) fast enough, you can't 
make efficient use of all them (the cores).

I do agree that 16/32 core machines are showing up, and we are excited 
about this, but I am concerned that expectations are not going to be 
met.  Doubling the number of cores won't necessarily double the 
processing power of the machines, especially if a few cores are idling 
while the system is under load, as they cannot get data to compute with.

> 
> Seems like additional cores|threads are an excellent way to make use of 
> tons of memory bandwidth in a latency tolerant fashion to get reasonable 
> real world performance on applications that people actually care about 
> (read that as willing to pay for).  All the while utilizing more 
> commodity technology then the vector machines of yesteryear.
> 
> Latency on the other hand (especially when measured in clock cycles) is 
> a wall, extremely hard to fix, and those nasty laws of physics keep 
> getting in the way.

Bandwidth is as much of a physical issue, but latency is harder.  You 
can overcome the bandwidth issue once we make the transition away from 
using fermions (spin 1/2 particles that follow Pauli's exlusion 
principle) to using boson's (spin 1 particles that can "sit atop" each 
other in configuration space ... sort of like photons).  As light based 
ALU's and logic units are not being developed rapidly at this point, I 
expect us to languish for a while in indirect band gap semiconductor 
(Silicon).  Even a direct band gap semiconductor would be faster ... 
GaAs (as the joke goes) is the material of the future ... and always 
will be (of the future).

> I don't see any particular reason why memory bandwidth can go through a 
> full doublings in the near future if there was a market for it, last I 
> checked nvidia was doing pretty well ;-)

I would like a 512 bit memory bus ... please!!!

> [1] Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
>     yet.  I hope to work on one though.  If anyone has numbers please speak
>     up.

Just saw one here, pretty impressive.

> [2] The nvidia 8600/8800 are single precision AFAIK, no idea if the 9600GT
>     is one of the new generation DP capable chips.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615