[Beowulf] The Walmart Compute Node?
diep at xs4all.nl
Fri Nov 9 11:38:41 PST 2007
Kilian, we still are at layman level there compared to the hardware
guys when i write next:
a) SSE2 is a vector instruction, it executes upon a vector containing
That's either 4x32 bits single precision or 2 x 64 bits double
or it can be 4 x 32 bits integers or 2 x 64 bits integers.
So executing 1 instruction like that means you do 2 double precision
operations at a time.
b) the core2 is a processor that can execute up to 4 instructions a
cycle, but it doesn't have 4 SIMD units,
but more like 2, versus barcelona core has 3. So it can retire
at most 2 SIMD instructions a cycle.
That gives the core2 therefore a theoretic 4 DP operations a cycle,
versus barcelona core 6.
Furthermore certain instructions like multiply just can get executed
by 1 unit.
So the 'flop maximizing software' is never the same like software
that's gonna multiply for you,
which is the endgoal of most number crunchers, either multiply a big
number or matrix.
Processors have become very complex past years,
without help from the manufacturer in question it's impossible to
write fastest code for it in assembler,
that many factors influence its IPC.
On Nov 9, 2007, at 7:11 PM, Kilian CAVALOTTI wrote:
> Hi Vincent,
> On Friday 09 November 2007 09:21:41 am Vincent Diepeveen wrote:
>> Ok easy theoretic calculation, and it's still very rude of course:
>> 1 core 2.4Ghz * 3 instructions a cycle * (sse)2 = 7.2 * 2 = 14.4
> Could you please elaborate a little bit about the "3 instructions a
> cycle * (sse)2" part? I thought Intel's quad-core line was able to
> do 4
> ops/cycle, but you're talking about 6, which makes a huge
> difference in
> the final numbers.
> For instance, theoretical performance using one core of a E5345
> (Clovertown 2.33GHz) would be 9.32 GFlops considering 4 ops/cycle, and
> 13.98 GFlops if we consider 6 ops/cyle. Given that the actual measured
> HPL performance on one core is about 7.86 GFlops, it would give an
> efficiency of a mere 56%, which would badly sucks (a 84% efficency is
> more in the line of reasonnable values to me, for one single core).
> But I may have missed something.
More information about the Beowulf