question about Intel P4 versus Alpha's

Mark Hahn hahn at physics.mcmaster.ca
Fri Jan 10 16:57:42 PST 2003


> > Is HT anything more than a thinly-veiled attempt at luring more software
> >...
> 
> I would say no as well.  In a single sentence, hyperthreading (its better
> than "super" or "multi" threading ;-) ) delivers idle work slots from each 
> functional unit inside all superscalar processors in an SMP system to 
> any 'needs-to-be-run' thread of work from any process. 

that makes HT sound rather general; it's not.  HT pretty much splits
your CPU into two CPUs (which the OS pretty much has to boot separately);
the two pseudo-CPUs now share the same resources - you'll probably notice
quite a bit higher cache misses, for instance, but otoh the CPU will
tolerate misses better, since it has a whole thread's worth of other work
to do.

> There is some scheduling over head of course.

extremely minor; in general, you have to push quite hard to expose
kernel performance issues.  the IBM numa boys typically resort to 
16+ CPUs before they start showing noticable benefits to their kernel
changes, for instance.

> The following article does an excellent job of explaining this.
> 
> http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html

I find it annoying.  reading one of the excellent academic papers on 
SMT is a lot more informative, clearer and compact.

> However, YMMV depending on job type/work mix.  In the worst case,
> where two processes execute identical instructions at the same time,
> there will be no benefit over multithreading. On the other hand,

the better way to think of it is simply that SMT needs a lot of pipeline
bubbles in order to show any benefit - idle units with one thread that 
the other thread might conceivably use.  one thread actually keeps a 
resource busy, HT isn't going to help at all unless the other thread 
is doing something quite different (not contending for that resource).

if you can somehow partition your program into one thread that does
nothing but pointer-chasing (cache misses) and the other that does
nothing but purely-register multiplies, for instance, then you'll 
see a very nice speedup.





More information about the Beowulf mailing list