[Beowulf] Re: dual core (latency)

Mikhail Kuzminsky kus at free.net
Tue Jul 19 05:17:34 PDT 2005


In message from Stuart Midgley <sdm900 at gmail.com> (Tue, 19 Jul 2005 
11:05:44 +0800):
>The first thing to note is that as you add cpu's the cost of the cache 
>snooping goes up dramatically.  The latency of a 4 cpu (single core) 
>opteron system is (if my memory serves me correctly) around 
> 120ns. 
AFAIK, cache coherence for dual core Athlon64 is resolved w/using
of SRQ/SRI, i.e. doesn't involve switch.

Opteron, I beleive, has the same possibility (if I remember correctly,
I asked this question on comp.arch, but the answer was "probably yes",
not 100% yes :-)). Then, theoretically, it may be realized "2-level 
cache snooping" : the answer for broadcast request from 2nd core of 
the same chip may be returned using SRI, but other cores send answers
through switch. 

If like scheme is realized, cache snoop traffic
through switch is decreasing for 4cores/2CPUs in comparison w/4 single
core CPUs (about 2/3 of usual - 1/3 of coherence traffic on switch is 
absent) but available throughput to the switch is only half of usual 
for 4 single core CPUs.

This gives increase of cache snoop traffic to 30+% which looks 
Vincent. Interesting, is this my estimation really right or I'm wrong 
somewhere?

Yours
Mikhail

   

> Which is significantly higher than the latency of a dual 
> processor system (I think it scales roughly as O(n^2) where n is the 
> number of cpu's).
>
>Now, with a dual core system, you are effectively halving the 
>bandwidth/cpu over the hyper transport AND increasing the cpu count, 
>thus increasing the amount of cache snooping required.  The end 
> result is drastically blown-out latencies.
>
>Stu.
>
>
>On 19/07/2005, at 10:50, Vincent Diepeveen wrote:
>
>> Hello Stuart,
>>
>> Thanks for your answer regarding numactl tools.
>>
>> Your answer doesn't necessarily explain why the dual core latency  
>> (with or
>> without numactl) is far worse, yes 30%+ worse, than that of single 
>>cpu
>> opterons of the same speed, when benchmarking just 1 core (so the  
>> others
>> sitting idle).
>>
>> Any thoughts on that?
>>
>> Thanks,
>> Vincent
>>
>
>
>--
>Dr Stuart Midgley
>Industry Uptake Program Leader
>iVEC, 'The hub of advanced computing in Western Australia'
>26 Dick Perry Avenue, Technology Park
>Kensington WA 6151
>Australia
>
>Phone: +61 8 6436 8545
>Fax: +61 8 6436 8555
>Email: industry at ivec.org
>WWW:  http://www.ivec.org
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list