[Beowulf] Re: Opteron 275 performance
Robert G. Brown
rgb at phy.duke.edu
Fri Jul 29 16:22:18 PDT 2005
Philippe Blaise writes:
> Hopefully, you show that dual core is quite superior to
> "hyper"-threading for some scientific programs,
> and is cost-friendly, very nice ! but page 14 you write
Be very careful. Hyper TRANSPORT is what AMD (and the HT consortium)
push as a replacment to the traditional bus -- it amounts to putting the
CPUs, memory, and all peripherals on a very high bandwidth low latency
network, IIRC. It enables a sort of SMP design very similar to a
"cluster" inside a single box, whether the processors are single or
dual core, and is the way AMD is going quite heavily in their future
designs. It is very useful and relevant to SMP, multicore, and single
core designs and important to HPC.
Hyper THREADING is Intel's solution to what amounts to an overlong
instruction pipeline on the CPU itself. In single-threaded code a long
pipeline runs well, but if it jumps all over the resulting pipeline
flushes are very costly, so Intel added multiple pipelines that share
and are bottlenecked at e.g. execution units. There is also a related
cache-thrashing that can occur if the memory space of the multiple
threads are disjoint. This sort of HT is reminiscent of Sparc's onetime
hardware context switches.
<editorial comment>In my personal opinion Hthreading is pretty much
useless to "most" HPC applications and seems to be mostly irrelevant to
SMP design as well. If anything, I'd expect hyperthreading to really
complicate SMP kernel design, as one adds a whole layer of complexity to
the already complex processor affinity question for independent threads
and multiple memory locality and ITS possible processor affinities.
It really isn't at all clear that HThreading needs to live (as my kids
would put it:-). AMD just uses shorter instruction pipelines (less to
flush and fill) and seems to outperform Hthreaded Intel at constant
clock, at any rate. At MOST it yields a 30% or so speedup that is
relevant to somebody doing work with lots of independent things going on
on a single CPU -- typically (unsurprisingly) a desktop user watching a
movie or the like -- decoding video and audio at the same time, or
servers handling multiple service threads. In others, it yields a
DECREASE of say 10% due to aforementioned cache-thrashing.</editorial
If you google around you'll find multiple articles and white papers on
the web discussing all this. Which is fortunate, as otherwise it is all
very confusing, this hyper-t[ransport,hreading] with both being touted
as doing everything but eating your meat loaf for you.
Note that these are all just my personal opinions as noted, based on
spending most of a full day googling and reading all about it for one
reason or another -- I think writing an article for Linux magazine. I
may be misremembering some parts of it as I'm on the far end of a long
vacation and drinking a beer as I type, but I think the gist is correct.
If not, as always, I welcome an informative whomp upside the head from
the more knowledgeable.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the Beowulf