[Beowulf] Re: dual core (latency)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netTue Jul 19 05:17:34 PDT 2005
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In message from Stuart Midgley <sdm900 at gmail.com> (Tue, 19 Jul 2005 11:05:44 +0800): >The first thing to note is that as you add cpu's the cost of the cache >snooping goes up dramatically. The latency of a 4 cpu (single core) >opteron system is (if my memory serves me correctly) around > 120ns. AFAIK, cache coherence for dual core Athlon64 is resolved w/using of SRQ/SRI, i.e. doesn't involve switch. Opteron, I beleive, has the same possibility (if I remember correctly, I asked this question on comp.arch, but the answer was "probably yes", not 100% yes :-)). Then, theoretically, it may be realized "2-level cache snooping" : the answer for broadcast request from 2nd core of the same chip may be returned using SRI, but other cores send answers through switch. If like scheme is realized, cache snoop traffic through switch is decreasing for 4cores/2CPUs in comparison w/4 single core CPUs (about 2/3 of usual - 1/3 of coherence traffic on switch is absent) but available throughput to the switch is only half of usual for 4 single core CPUs. This gives increase of cache snoop traffic to 30+% which looks Vincent. Interesting, is this my estimation really right or I'm wrong somewhere? Yours Mikhail > Which is significantly higher than the latency of a dual > processor system (I think it scales roughly as O(n^2) where n is the > number of cpu's). > >Now, with a dual core system, you are effectively halving the >bandwidth/cpu over the hyper transport AND increasing the cpu count, >thus increasing the amount of cache snooping required. The end > result is drastically blown-out latencies. > >Stu. > > >On 19/07/2005, at 10:50, Vincent Diepeveen wrote: > >> Hello Stuart, >> >> Thanks for your answer regarding numactl tools. >> >> Your answer doesn't necessarily explain why the dual core latency >> (with or >> without numactl) is far worse, yes 30%+ worse, than that of single >>cpu >> opterons of the same speed, when benchmarking just 1 core (so the >> others >> sitting idle). >> >> Any thoughts on that? >> >> Thanks, >> Vincent >> > > >-- >Dr Stuart Midgley >Industry Uptake Program Leader >iVEC, 'The hub of advanced computing in Western Australia' >26 Dick Perry Avenue, Technology Park >Kensington WA 6151 >Australia > >Phone: +61 8 6436 8545 >Fax: +61 8 6436 8555 >Email: industry at ivec.org >WWW: http://www.ivec.org > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
