Using hyperthreading on 2 Proc Xeon cluster nodes

Bill Broadley bill at math.ucdavis.edu
Sat Jun 8 16:46:12 PDT 2002


> BTW, the 2 virtual processors share the same FPU, so not interesting for 
> HPC.

In the case of the P4 I'd agree, in general even with a shared FPU
I could see hyperthreading being very useful.  Keep in mind even a single
flop per cycle is often a big improvement over real world performance.
If thread A blocks, and thread B can get some work done without having
the expense of a context switch, getting work done during a cache miss 
without the expense of a context switch can be a big win.

But alas for whatever reason the p4 doesn't have enough resources
to get much advantage from the 2-way SMT.  At least on any code I've
found, but I'm still looking.  Someone posted an article that rambus
was necessary for the advantage.  I'm not sure if any of the 2 bank DDR
p4's can actually have 2 seperate outstanding requests at the same time.
Rambus I believe does support multiple misses.

So as usual the first intel implementation isn't that exciting,
but I expect better from the next iteration.  Currently the common
case seems to be 4 processes on a 2 hyperthread cpu's is slower than
2.  

-- 
Bill Broadley
Mathematics/Institute of Theoretical Dynamics
UC Davis



More information about the Beowulf mailing list