[Beowulf] General thoughts on Xeon 56xx versus E5 series?

Mon Sep 17 03:54:46 PDT 2012

Let me email you a latencytest using all cores at the same time.

All those claims always about using 1 core are not so relevant for HPC,
as we wouldn't need multicore cpu's then.

In a perfect world you're right, regrettably that's not how software  
usually works.
It usually hits 100 other problems where a higher clocked cpu has a  
major advantage
over a lower clocked advantage.

The L3 has an important function yet it is also a very big bottleneck  
to the memory practical for most software
as it adds considerable latency.

I'll mail you my test. Most memory bandwidth tests just work for 1  
core. This is testing the latency for 8 byte reads
at n cores at the same time.

I typically take a gigabyte or so and spread it over the different  
cores to benchmark.

You see typically that latency of the old core2 is pretty ok around a  
60 ns. 2 socket i7 machines are considerable
slower there, when using cpu's of higher frequency (3.4Ghz Xeon @ 12  
cores , hyperthreading turned off of course),
latencies typicall go get faster to around a 90 ns.

Fastest single socket i7's get to around 70 ns.

Could you run this test at different clocked cpu's and tell me your  
conclusions?

This test had been designed to test latency at shared memory  
supercomputers with all cores running at the same time,
we typically see, also if you increase the number of bytes read, that  
they have designed 100 tricks to fool all the single core
latency tests. When using all cores at the same time, which is  
realistic for many software loads, suddenly latencies are
easily up to factor 12 worse than claimed by manufacturers (SGI  
origin and altix series being one example).

Bandwidth gets total overruled by other concerns.

As for the single socket machines:

You see how AMD's bulldozer @ 4 modules totally ughs out on this test.

Its latency already is very bad when running the test at 4 cores, but  
when using all 8 minicores
at the same time suddenly latencies climb towards 160-200 ns.

Over factor 2.5 times worse than intel.

We speak here of highest clocked production cpu's.

You also typically see that when the memory buffer you serve is  
larger, that latencies get slower.
In HPC and especially for software working in this manner as how this  
tests simulates, one typically uses the maximum
amount of RAM available. So using many gigabytes is not a theoretic  
example.

This is a practical test that simulates very accurate how things work  
in for example game tree search.

Typically older hardware has more problems there than newer hardware.

Paul Hsieh later tried to redo this test by just using pointer  
arithmetic, but they never coded it up for more than 1 core.

In every single result, higher clocked cpu's tend to do better.

On Sep 14, 2012, at 5:16 PM, Steffen Persvold wrote:

> Vincent,
>
> Your statement only holds true for the cache bandwidth (which somewhat
> scales with the core frequency), not the DDR3 memory controller
> bandwidth (or latency for that matter). The main limiting factor  
> for the
> DDR3 memory bandwidth is the # of channels (i.e now much data you can
> get in parallel) and how fast the dram is (i.e frequency the DDR3
> interface runs on).
>
> cheers,
> --Steffen
>
> On 9/14/2012 17:08, Vincent Diepeveen wrote:
>> Yes,
>>
>> You can easily see this in the latency numbers of higher clocked
>> processors. they're faster
>> than lower clocked i7's of the same kind.
>>
>> Let me email directly to you a test i wrote for that some years ago.
>>
>>
>>
>>
>>
>> On Sep 14, 2012, at 5:04 PM, Orion Poplawski wrote:
>>
>>> On 09/14/2012 08:54 AM, Vincent Diepeveen wrote:
>>>> The memory controller is on die, so the bandwidth that the CPU
>>>> itself delivers,
>>>> independant from the number of channels, is dependant upon the CPU
>>>> frequency.
>>>>
>>>> Higher frequency means more bandwidth simply with the given memory
>>>> channels
>>>> available.
>>>>
>>>
>>> Really?
>>>
>>> http://ark.intel.com/compare/64590,64591,64587
>>>
>>> Clock Speed		2 GHz		2.5 GHz		3.3 GHz
>>> Max Turbo Frequency	2.8 GHz		3 GHz		3.5 GHz
>>> # of Memory Channels	4		4		4
>>> Max Memory Bandwidth	51.2 GB/s	42.6 GB/s	51.2 GB/s
>>>
>>>>
>>>> On Sep 14, 2012, at 4:41 PM, Orion Poplawski wrote:
>>>>
>>>>> On 09/14/2012 05:00 AM, Igor Kozin wrote:
>>>>>> if memory bandwidth is your concern then there are models which
>>>>>> boost
>>>>>> it quite significantly. e.g.
>>>>>> http://ark.intel.com/products/64584/Intel-Xeon-Processor-
>>>>>> E5-2660-20M-Cache-2_20-GHz-8_00-GTs-Intel-QPI
>>>>>>
>>>>>> probably very few codes are going to benefit from AVX without  
>>>>>> extra
>>>>>> efforts but BW is a clear win.
>>>>>> i'm seeing a good speed up on some applications which can be
>>>>>> attributed to higher BW.
>>>>>
>>>>> There are 6.4, 7.2, and 8 GT/s chips
>>>>>
>>>>> This is an interesting puzzle and the mid tier price point:
>>>>>
>>>>> DUAL INTEL XEON 6C E5-2640 (2.5GHz/7.2GT/s/15MB) CPU  [+  
>>>>> $1,810.00]
>>>>> DUAL INTEL XEON 4C E5-2643 (3.3GHz/8GT/s/10MB) CPU  [+ $1,798.00]
>>>>> DUAL INTEL XEON 8C E5-2650 (2GHz/8GT/s/20MB) CPU  [+ $2,270.00]
>>>>>
>>>>> So for BW limited one would go with the second two, but you have
>>>>> a big choice
>>>>> between low cores/cache high MHz and high cores/cache low MHz.
>>>>>
>>>>> --
>>>>> Orion Poplawski
>>>>> Technical Manager                     303-415-9701 x222
>>>>> NWRA, Boulder Office                  FAX: 303-415-9702
>>>>> 3380 Mitchell Lane                       orion at nwra.com
>>>>> Boulder, CO 80301                   http://www.nwra.com
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>>> --
>>> Orion Poplawski
>>> Technical Manager                     303-415-9701 x222
>>> NWRA, Boulder Office                  FAX: 303-415-9702
>>> 3380 Mitchell Lane                       orion at nwra.com
>>> Boulder, CO 80301                   http://www.nwra.com
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
> -- 
> Steffen Persvold, Chief Architect NumaChip
> Numascale AS - www.numascale.com
> Tel: +47 92 49 25 54 Skype: spersvold
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf