[Beowulf] Opinions of Hyper-threading?

Tue Feb 26 16:13:32 PST 2008

Mark Hahn wrote:
>>> And today memory access can stall up to hundreds of cycles, so any 
>>> processor can hide this latency by switching to another thread.
>>
>> My gosh ... we have re-invented the Tera MTA.  ...
> 
> I think the reason we both know what that name means is that they had 
> (have?) a nugget of truth.  after all, a multiplier unit on a chip 

had ... morphed into "The New Cray".  Burton Smith is long gone, now at 
Microsoft.

> doesn't really care on which thread's behalf it's doing work.  MTA is 
> perhaps a bit far towards the pure gatling-gun approach, but I think we

It was very interesting when I first heard about it at an SC9x 
conference.  Spoke to Burton for a few minutes on it.

Ooo was a (very weak) version of something like this.  SMT is a little 
stronger.  Register renaming and all those fancy ooo optimizations were 
in there to make breaking those dependencies down to enable better IPC 
... which is the name of the game in the end anyway ...

You are absolutely right, in that the functional units don't (and 
shouldn't) care what thread they are using.

> can all agree that ultimately
> any program is just a big hairy dataflow graph.

I would like to use that as a quote ... :)

> 
>>> But the you have to make sure the processor has enough cache and 
>>> memory bandwidth to handle the increased memory traffic (like Sun 
>>> Niagara).
>>
>> The problem with many (cores|threads) is that memory bandwidth wall.  
>> A fixed size (B) pipe to memory, with N requesters on that pipe ...
> 
> I think that's why almost everyone agrees with the elegance of AMD's 
> system architecture - memory attached to and thus scaling with ncpus.
> and yes, there's a lot of work already going on regarding making caches
> more intelligent - predicting the multireference or sharing properties
> of a cache block, for instance, to choose when to move it and between
> which caches in a big system.

I seem to remember hearing about the processors-in-core idea many moons 
ago.  It seemed hard to program.  But compare that to the big honking 
pile-o-ram, with many processors, few pipes, and bandwidth limits ...

The AMD model is elegant.  As you expand the number of cores you expand 
the number of memory connections.  This is part of the reason the 2350's 
at 2GHz give some 3+ GHz Intel 5472's a run for the money on a number of 
real world tests.  Sort of like the alpha ... you can make the CPU 
"infinitely fast", but there is the little matter of the rest of the 
system to worry about.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615