[Beowulf] The Walmart Compute Node?

Vincent Diepeveen diep at xs4all.nl
Fri Nov 9 11:38:41 PST 2007


Kilian, we still are at layman level there compared to the hardware  
guys when i write next:

a) SSE2 is a vector instruction, it executes upon a vector containing  
128 bits.
That's either 4x32 bits single precision or 2 x 64 bits double  
precision,
or it can be 4 x 32 bits integers or 2 x 64 bits integers.

So executing 1 instruction like that means you do 2 double precision  
operations at a time.

b) the core2 is a processor that can execute up to 4 instructions a  
cycle, but it doesn't have 4 SIMD units,
      but more like 2, versus barcelona core has 3. So it can retire  
at most 2 SIMD instructions a cycle.

That gives the core2 therefore a theoretic 4 DP operations a cycle,  
versus barcelona core 6.

Furthermore certain instructions like multiply just can get executed  
by 1 unit.

So the 'flop maximizing software' is never the same like software  
that's gonna multiply for you,
which is the endgoal of most number crunchers, either multiply a big  
number or matrix.

Processors have become very complex past years,
without help from the manufacturer in question it's impossible to  
write fastest code for it in assembler,
that many factors influence its IPC.



On Nov 9, 2007, at 7:11 PM, Kilian CAVALOTTI wrote:

> Hi Vincent,
>
> On Friday 09 November 2007 09:21:41 am Vincent Diepeveen wrote:
>> Ok easy theoretic calculation, and it's still very rude of course:
>>
>> 1 core 2.4Ghz * 3 instructions a cycle * (sse)2 = 7.2 * 2 = 14.4
>> Gflop
>
> Could you please elaborate a little bit about the "3 instructions a
> cycle * (sse)2" part? I thought Intel's quad-core line was able to  
> do 4
> ops/cycle, but you're talking about 6, which makes a huge  
> difference in
> the final numbers.
>
> For instance, theoretical performance using one core of a E5345
> (Clovertown 2.33GHz) would be 9.32 GFlops considering 4 ops/cycle, and
> 13.98 GFlops if we consider 6 ops/cyle. Given that the actual measured
> HPL performance on one core is about 7.86 GFlops, it would give an
> efficiency of a mere 56%, which would badly sucks (a 84% efficency is
> more in the line of reasonnable values to me, for one single core).
>
> But I may have missed something.
>
> Cheers,
> -- 
> Kilian
>




More information about the Beowulf mailing list