[Beowulf] Re: dual core (latency)

Mon Jul 18 19:50:09 PDT 2005

Hello Stuart,

Thanks for your answer regarding numactl tools.

Your answer doesn't necessarily explain why the dual core latency (with or
without numactl) is far worse, yes 30%+ worse, than that of single cpu
opterons of the same speed, when benchmarking just 1 core (so the others
sitting idle).

Any thoughts on that?

Thanks,
Vincent

At 08:17 AM 7/19/2005 +0800, Stuart Midgley wrote:
>The numactl tools won't generally help latency.  Latency isn't the  
>issue with Opteron based systems (or any system with multiply  
>connected distributed memory controllers).
>
>The real issue is page locality (which is the case with most numa  
>based systems).
>
>If you run 2 processes on a dual cpu (single core) systems and they  
>both happen to allocate their pages on the same memory controller,  
>they will each only see 1/2 the memory bandwidth and 1 controller  
>sits idle.  That's the real issue (and the extreme pathalogical case).
>
>Linux2.6 generally does a good job of putting the pages on the memory  
>controller attached to cpu that the process is running on.  However,  
>it can't get it perfect.  There are always more than 1process/cpu on  
>a system, so there is always a little noise... so there is always the  
>chance that some pages can be spread around.  Also, the system buffer  
>cache will get spread around effecting everyone.
>
>Add into the mix the possibility of suspending processes and you can  
>end up with a processes pages all over the place.  Since Linux  
>doesn't yet have make migration, once a page is allocated it won't be  
>moved to a different memory controller unless it is swapped out.
>
>With numactl tools you will force the pages to be allocated on the  
>right memory/cpu.  The processes buffer cache will also be locked  
>down (which is another VERY important issue)...
>
>I have used numa tools to double the performance of some codes (or  
>perhaps its more correct to say to get back to the correct performance).
>
>Stu.
>
>
>On 18/07/2005, at 22:38, Vincent Diepeveen wrote:
>
>> I've been toying some with the numactl at dual core and it doesn't
>> really seem to help much. It helps 0.00
>>
>> System: Ubuntu at a quad opteron dual core 1.8Ghz  2.6.10-5 smp  
>> kernel.
>>
>> Latencies as measured by my own program (TLB trashing read of 8 bytes,
>> each cpu 250MB buffer):
>>
>> #cpu latency
>> 1   144-147 ns
>> 2   174 ns
>> 4   206 ns
>> 8   234 ns
>>
>> That single cpu figure is pretty ugly bad if i may say so.
>>
>> All kind of numa calls just didn't help a thing. I've tried for  
>> example:
>>
>>   if(numa_available() < 0 ) {
>>     setitnuma = 0;
>>   }
>>   else {
>>     int i,back;
>>     nodemask_t nt,n2,rnm;
>>     maxnodes = numa_max_node()+1; // () returns 3 when 4 controllers
>>     printf("numa=%i maxnodes=%i\n",setitnuma,maxnodes);
>>
>>     nt = numa_get_interleave_mask();
>>     for( i = 0 ; i < maxnodes ; i++ ) {
>>       printf("node = %i mask = %i\n",i,nt.n[i]);
>>       nt.n[i] = 0;
>>       n2.n[i] = 0;
>>     }
>>     numa_set_interleave_mask(&nt);
>>     nt = numa_get_interleave_mask();
>>     for( i = 0 ; i < maxnodes ; i++ )
>>       printf("checking memory interleave node = %i mask = %i 
>> \n",i,nt.n[i]);
>>
>>     rnm = numa_get_run_node_mask();
>>     printf("numa get run node mask = %i\n",rnm);
>>     back = numa_run_on_node(0);
>>     if( !back )
>>       printf("set to run on node 0\n");
>>     else
>>       printf("failed to set run on node 0\n");
>>
>>   }
>>
>> Whatever i try, single cpu latency keeps 144-147 ns.
>>
>> A dual opteron dual core with 2.2Ghz dual core controllers shows  
>> similar
>> latencies. 200 ns for example when running 4 processes with the same
>> testprogram.
>>
>> This single cpu latency behaviour of dual core opteron is ugly bad
>> compared to other dual opterons which are not dual core.
>>
>> Nearly identical Tyan mainboard with dual opteron 2.2Ghz gives  
>> single cpu
>> with SAME kernel, with SAME program 115 ns latency. When turning  
>> off ECC at
>> that dual opteron it gets down to 113 ns even.
>>
>> The frustrating thing is, the dual opteron 2.2Ghz has pc2700,
>> whereas the quad opteorn dual core has all banks filled
>> with pc3200 registered ram, a-brand.
>>
>> Vincent
>
>
>--
>Dr Stuart Midgley
>sdm900 at gmail.com
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>