[Beowulf] Re: dual core (latency)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Stuart Midgley sdm900 at gmail.comMon Jul 18 22:22:09 PDT 2005
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I like your email style :) a) reading doesn't prevent snooping, it causes it. You need to snoop all the caches to make sure the cache line isn't on some other cpu before you go to main memory b) nothing is free - cache snooping costs a lot (even more advanced methods like page caches - see SGI Altix systems - cost a lot) c) cores being idle has absolutely nothing to do with cache snooping (unless you have to flush from a higher level cache or register). A cpu doesn't know priori that a cpu doesn't have process on it or that it isn't holding an old cache line. d) I would expect dual cores to have a larger latency... as per my previous argument e) I guess this is an interesting point. Actually, you would be surprised how MUCH bandwidth and latency have to do with each other in computers. They are VERY tightly coupled. For example... you have a cpu with dual channel DDR3200 memory attached. So you think your bandwidth is 6.4GB/s... then why does streams show a maximum of around 3-4GB/s? Where did the other ~2.5GB/ s go? Now, if you look at the actual bandwidth of loading a single cache line: a cache line is 128bytes which can be access at 6.4GB/s so it takes 128/6.4/1024/1024/1024 s to get = 18.6ns take into account the ~125ns latency and you can get the 128byte cache line in about 143ns which gives a bandwidth of 0.93GB/s. Now, given that the pentium can have 4 outstanding cache loads misses you can in effect over lay 4 operations and 1/4 the latency to around 45ns to give around 2.4GB/s to get the same 128 byte cache line. Now, take into account all the other outstanding factors: some memory is already in fast caches; that you can't quite 1/4 the latency; 4 operations don't quite happen simultaneously due to the 18ns it takes to get the data etc. The end result is that latency has a MASSIVE impact on real bandwidth. Stu. > > This doesn't answer even remotely accurate things. > > A) my test is doing no WRITES, just READS. > B) snooping might be for free. > C) all other cores are just idle when such a latency test for just > 1 core > happens and the rest of the system is idle. > D) in all cases a dual core processor has a SLOWER latency and it > doesn't > make sense. > E) you don't seem to grasp the difference between LATENCY and > BANDWIDTH; > > For example your BANDWIDTH to Mars might be GREAT, but your LATENCY > to Mars > is real ugly, as it takes 200 years for them to return. > > You keep mixing latency and bandwidth. That's ugly, to say polite. > > I'm speaking of LATENCY here, not bandwidth. > > The total BANDWIDTH that my program takes at a dual core is to be > correct: > > 8 bytes * 1 billion (1/ns) / 147 (ns) = 54MB/s > > In fact with some luck your gigabit ethernet card might be able to > handle > 54MB/s. > > Vincent > > -- Dr Stuart Midgley sdm900 at gmail.com
- Previous message: [Beowulf] Re: dual core (latency)
- Next message: [Beowulf] Re: dual core (latency)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
