[Beowulf] Re: dual core (latency)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Jul 19 03:30:08 PDT 2005
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 19 Jul 2005 06:42:02 +0200, Vincent Diepeveen wrote > At 11:05 AM 7/19/2005 +0800, Stuart Midgley wrote: > >The first thing to note is that as you add cpu's the cost of the > >cache snooping goes up dramatically. The latency of a 4 cpu (single > >core) opteron system is (if my memory serves me correctly) around > >120ns. Which is significantly higher than the latency of a dual > >processor system (I think it scales roughly as O(n^2) where n is the > >number of cpu's). > > > >Now, with a dual core system, you are effectively halving the > >bandwidth/cpu over the hyper transport AND increasing the cpu count, > >thus increasing the amount of cache snooping required. The end > >result is drastically blown-out latencies. > > > >Stu. > > This doesn't answer even remotely accurate things. Actually it was a very well written and quite accurate discussion of what you were seeing. > A) my test is doing no WRITES, just READS. Doesn't matter, unless you turn off all cache effects on the memory you are dealing with. A memory write is a read-modify-write operation, and memory read is a read operation. You still require that initial "snoop" to grab the cache line. You basically ask all the other processors that have the potential of sharing that cache line to look into which lines they have in cache, and if they have the line in question, please flush that line if it is dirty (e.g. a pending but uncommitted write exists). Otherwise, please hand over the cache line with all due speed. Its not "complex" with 2 CPUs, just a little costly. It gets complex and time consuming with 4. At 4 and higher it is one of the issues you take into consideration when optimizing code. This is also why processor affinity is so important, as you can (to a degree) pre-bias where the pages (and hence cache lines) are sitting relative to the CPU, and tie the memory and processor together. This increases the likelyhood of the line being local, as well as potentially decreases the likelyhood of the line being needed remotely. > B) snooping might be for free. Absolutely not. > C) all other cores are just idle when such a latency test for just 1 > core happens and the rest of the system is idle. The only way you can guarantee that the other cores are "idle" is to turn them off. > D) in all cases a > dual core processor has a SLOWER latency and it doesn't make sense. Makes a great deal of sense as Stuart has pointed out. Your snooping algorithm is somewhat better than O(N**2) on a system with a directory. Without a directory it is closer to O(N**2). The more snooping you need to do before getting a cache line, the more latency you pay to get that initial cache line. A directory based system is effectively a hash table. > E) you don't seem to grasp the difference between LATENCY and BANDWIDTH; Hmmmm. I think Stuart gets it very well. I am not convinced that you get the issue of how important and expensive cache line processing via snoopy algorithms is, and what its impact upon overall processing time is. Joe Landman -- Scalable Informatics LLC http://www.scalableinformatics.com phone: +1 734 786 8423
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
