[Beowulf] Not all cores are created equal
Nifty Tom Mitchell
niftyompi at niftyegg.com
Mon Dec 29 16:11:59 PST 2008
On Wed, Dec 24, 2008 at 09:03:38PM +1100, Chris Samuel wrote:
> ----- "John Hearns" <hearnsj at googlemail.com> wrote:
> > SGI Altix have 'bootcpusets' which means you can slice
> > off one or two processors to take care of OS housekeeping
> > tasks,
> Now that cpusets have been in the mainline kernel for
> some time you should be able to do this with any modern
> I contemplated doing this on our Barcelona cluster, but
> sacrificing 1 core in 8 was a bit too much of a high price
> to pay. But people with higher core counts per node might
> find it attractive.
This seems like a be a benchmark decision based on application
load and 'implied IO+OS' loading as well as the ability to
localize the IO+OS activity to the sacrificed CPU core.
Of interest CPU and system designers and OS engineers are set on the
SMP model where all the parts are considered equal. This simplification
ignores the reality that interrupts, networking, encryption and file IO
are not floating point intensive and thus leave FPU core transistors idle.
The decisions are different when dedicated IO channel processors or vector
processors are built into the hardware of the system.
Today the apparent cut and paste model of multi core CPU design where the most
critical design issues are at the memory (cache) interface pushes the
issue out to the cluster user/ manager and perhaps into the batch system.
Outside of heat issues adding yet another FPU core is almost free given today's
For a long time I felt that the Intel Hyper-Threading was an interesting
decision in that it all but stated that floating point was a second class
activity in the system. However the complexity to add more execution
units may have nixed more hyper-threading efforts.
The benchmarking (combined with CPU affinity) work might be interesting.
Leaving 12.5% of the FPU resource on the table might look like a lot
at first but since the other seven cores might be idled by a slow rank
sidetracked by interrupts and IO the benchmark FPU delta per rank need
only be about one seventh of that (i.e 2%) to generate a net gain.
This might be an easy percentage to gain by localizing interrupts and
IO so user space activity affinity does not conflict. But this is not
strictly SMP so the hardware and OS design may limit the gains.
Two percent in an eight core system does not seem intuitive. Did I get
this turned inside out?
T o m M i t c h e l l
Found me a new hat, now what?
More information about the Beowulf