[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]

Wed Jul 13 09:31:48 PDT 2005

In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul 
2005 12:24:27 +0200):
>  1) Gerry Creager wrote "Hoowa!"
>     Since the results seem useful, I would like to add the 
>following.
>     On dual-CPU boards with Athlon32 CPUs, the program "bolam" was 
>slow if
>     both CPUs on the board were used, it was better to have one 
>MPICH process
>     per compute node.  This problem did not appear in another 
>cluster that had
>     Opteron dual-CPU boards (single-core), that is, two processes 
>for each node
>     did not cause a slowdown.  This is an indication that "bolam" is 
>at a
>     threshold for memory access being a bottleneck. 
The original post by S.Gorelsky (re-sent by E.Leitl) was about good
scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03 
DFT/test397 test. I'm testing just now like Supermicro server 
w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky.
I used SuSE 9.0 w/2.4.21 kernel.

I understood, that original results of S.Gorelsky were obtained, 
probably,
for shared memory parallelization ! If I use G03 w/Linda (which
is main parallelization tool for G03 - parallelization in shared
memory model of G03 is available only for more restricted subset
of quantum-chemical methods) - then the results are much more bad.

On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for
shared memory. The difference is, as I understand, simple because
of data exchanges through RAM for the case of Linda; in shared memory
model like memory traffic is absent.
FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated
properly :-)).

I also obtained similar results for other quantum-chemical methods 
which show that using of Linda/G03 may give bad scalability for
dual-core Opteron. 

We also have some (developing by us) quantum-chemical application 
which
is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI 
process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred. 
In the case of (dual single core CPUs)-Opteron nodes the situation is 
better.

But now for 4cores/2CPUs per Opteron node to force the using of
only 2 cores (from 4), by 1 for each chip, we'll need to have
cpu affinity support in Linux.

Yours
Mikhail

> A complication 
>for this
>     interpretation is that the Athlon32 nodes use Linux kernel 
>2.4.21.
>  2) Mikhail Kuzminsky asked "do you have "node interleave memory" 
>switched off?
>     Reading the BIOS:
>     Bank interleaving "Auto", there are two memory modules per CPU 
>so there
>        should be bank interleaving.
>     Node interleaving "Disable"
>  3) In an email Guy Coates asked
>     > Did you need to use numa-tools to specify the CPU placement, 
>or did the
>     > kernel "do the right thing" by itself?
>     The kernel did the right thing by itself.
>     I have a question: what are numa-tools?
>     On the computer I find
>     man -k numa
>        numa   (3)  - NUMA policy library
>        numactl(8)  - Control NUMA policy for processes or shared 
>memory
>     rpm -qa | grep -i numa
>        numactl-0.6.4-1.13
>     Is numactl the "numa-tools"?  Is there another package to 
>consider installing?
>     I see that numactl has many "man" pages.
>
>Reference, previous message:
> >In all cases, 4 MPI processes on a machine with 4 cores (two 
>dual-core CPUs).
> >Meteorology program 1, "bolam"    CPU time, real time (in seconds)
> >      Linux kernel 2.6.9-11.ELsmp     122        128
> >      Linux kernel 2.6.12.2            64         77
> >
> >Meteorology program 2, "non-hydrostatic"
> >      Linux kernel 2.6.9-11.ELsmp     598        544
> >      Linux kernel 2.6.12.2           430        476
>
>
>-- 
>
>  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
>  Center for Advanced Studies, Research, and Development in Sardinia
>
>  Postal Address:               |  Physical Address for FedEx, UPS, 
>DHL:
>  ---------------               | 
> -------------------------------------
>  Alan Scheinine                |  Alan Scheinine
>  c/o CRS4                      |  c/o CRS4
>  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
>  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy
>
>  Email: scheinin at crs4.it
>
>  Phone: 070 9250 238  [+39 070 9250 238]
>  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
>  Operator at reception: 070 9250 1  [+39 070 9250 1]
>  Mobile phone: 347 7990472  [+39 347 7990472]
>