[Beowulf][gorelsky at stanford.edu:CCL:dual-coreOpteron275performance]

Wed Jul 13 11:52:42 PDT 2005

Within 1 node there is no need't use MPI, that's just unnecessary memory
overhead.

Processor affinity goes automatically when having a NUMA kernel. More
importantly for the software than processor affinity is that processors do
their work in memory at their own memory controller and that they don't work
at data from remote memory controllers.

Vincent

At 12:56 PM 7/13/2005 -0400, Joe Landman wrote:
>Hi Mikhail:
>
>   If you use numactl, you should have control over processor affinity 
>for a particular process.  I am not sure how this ties in to MPI though, 
>so there may need to be some work there.
>
>Joe
>
>Mikhail Kuzminsky wrote:
>> In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul 
>> 2005 12:24:27 +0200):
>> 
>>>  1) Gerry Creager wrote "Hoowa!"
>>>     Since the results seem useful, I would like to add the following.
>>>     On dual-CPU boards with Athlon32 CPUs, the program "bolam" was 
>>> slow if
>>>     both CPUs on the board were used, it was better to have one MPICH 
>>> process
>>>     per compute node.  This problem did not appear in another cluster 
>>> that had
>>>     Opteron dual-CPU boards (single-core), that is, two processes for 
>>> each node
>>>     did not cause a slowdown.  This is an indication that "bolam" is at a
>>>     threshold for memory access being a bottleneck. 
>> 
>> The original post by S.Gorelsky (re-sent by E.Leitl) was about good
>> scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03 
>> DFT/test397 test. I'm testing just now like Supermicro server 
>> w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky.
>> I used SuSE 9.0 w/2.4.21 kernel.
>> 
>> I understood, that original results of S.Gorelsky were obtained, probably,
>> for shared memory parallelization ! If I use G03 w/Linda (which
>> is main parallelization tool for G03 - parallelization in shared
>> memory model of G03 is available only for more restricted subset
>> of quantum-chemical methods) - then the results are much more bad.
>> 
>> On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for
>> shared memory. The difference is, as I understand, simple because
>> of data exchanges through RAM for the case of Linda; in shared memory
>> model like memory traffic is absent.
>> FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated
>> properly :-)).
>> 
>> I also obtained similar results for other quantum-chemical methods which 
>> show that using of Linda/G03 may give bad scalability for
>> dual-core Opteron.
>> We also have some (developing by us) quantum-chemical application which
>> is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI 
>> process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred. In 
>> the case of (dual single core CPUs)-Opteron nodes the situation is better.
>> 
>> But now for 4cores/2CPUs per Opteron node to force the using of
>> only 2 cores (from 4), by 1 for each chip, we'll need to have
>> cpu affinity support in Linux.
>> 
>> Yours
>> Mikhail
>> 
>>> A complication for this
>>>     interpretation is that the Athlon32 nodes use Linux kernel 2.4.21.
>>>  2) Mikhail Kuzminsky asked "do you have "node interleave memory" 
>>> switched off?
>>>     Reading the BIOS:
>>>     Bank interleaving "Auto", there are two memory modules per CPU so 
>>> there
>>>        should be bank interleaving.
>>>     Node interleaving "Disable"
>>>  3) In an email Guy Coates asked
>>>     > Did you need to use numa-tools to specify the CPU placement, or 
>>> did the
>>>     > kernel "do the right thing" by itself?
>>>     The kernel did the right thing by itself.
>>>     I have a question: what are numa-tools?
>>>     On the computer I find
>>>     man -k numa
>>>        numa   (3)  - NUMA policy library
>>>        numactl(8)  - Control NUMA policy for processes or shared memory
>>>     rpm -qa | grep -i numa
>>>        numactl-0.6.4-1.13
>>>     Is numactl the "numa-tools"?  Is there another package to consider 
>>> installing?
>>>     I see that numactl has many "man" pages.
>>>
>>> Reference, previous message:
>>> >In all cases, 4 MPI processes on a machine with 4 cores (two 
>>> dual-core CPUs).
>>> >Meteorology program 1, "bolam"    CPU time, real time (in seconds)
>>> >      Linux kernel 2.6.9-11.ELsmp     122        128
>>> >      Linux kernel 2.6.12.2            64         77
>>> >
>>> >Meteorology program 2, "non-hydrostatic"
>>> >      Linux kernel 2.6.9-11.ELsmp     598        544
>>> >      Linux kernel 2.6.12.2           430        476
>>>
>>>
>>> -- 
>>>
>>>  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
>>>  Center for Advanced Studies, Research, and Development in Sardinia
>>>
>>>  Postal Address:               |  Physical Address for FedEx, UPS, DHL:
>>>  ---------------               | -------------------------------------
>>>  Alan Scheinine                |  Alan Scheinine
>>>  c/o CRS4                      |  c/o CRS4
>>>  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
>>>  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy
>>>
>>>  Email: scheinin at crs4.it
>>>
>>>  Phone: 070 9250 238  [+39 070 9250 238]
>>>  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
>>>  Operator at reception: 070 9250 1  [+39 070 9250 1]
>>>  Mobile phone: 347 7990472  [+39 347 7990472]
>>>
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>