[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]
kus at free.net
Wed Jul 13 09:31:48 PDT 2005
In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul
2005 12:24:27 +0200):
> 1) Gerry Creager wrote "Hoowa!"
> Since the results seem useful, I would like to add the
> On dual-CPU boards with Athlon32 CPUs, the program "bolam" was
> both CPUs on the board were used, it was better to have one
> per compute node. This problem did not appear in another
>cluster that had
> Opteron dual-CPU boards (single-core), that is, two processes
>for each node
> did not cause a slowdown. This is an indication that "bolam" is
> threshold for memory access being a bottleneck.
The original post by S.Gorelsky (re-sent by E.Leitl) was about good
scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03
DFT/test397 test. I'm testing just now like Supermicro server
w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky.
I used SuSE 9.0 w/2.4.21 kernel.
I understood, that original results of S.Gorelsky were obtained,
for shared memory parallelization ! If I use G03 w/Linda (which
is main parallelization tool for G03 - parallelization in shared
memory model of G03 is available only for more restricted subset
of quantum-chemical methods) - then the results are much more bad.
On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for
shared memory. The difference is, as I understand, simple because
of data exchanges through RAM for the case of Linda; in shared memory
model like memory traffic is absent.
FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated
I also obtained similar results for other quantum-chemical methods
which show that using of Linda/G03 may give bad scalability for
We also have some (developing by us) quantum-chemical application
is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI
process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred.
In the case of (dual single core CPUs)-Opteron nodes the situation is
But now for 4cores/2CPUs per Opteron node to force the using of
only 2 cores (from 4), by 1 for each chip, we'll need to have
cpu affinity support in Linux.
> A complication
> interpretation is that the Athlon32 nodes use Linux kernel
> 2) Mikhail Kuzminsky asked "do you have "node interleave memory"
> Reading the BIOS:
> Bank interleaving "Auto", there are two memory modules per CPU
> should be bank interleaving.
> Node interleaving "Disable"
> 3) In an email Guy Coates asked
> > Did you need to use numa-tools to specify the CPU placement,
>or did the
> > kernel "do the right thing" by itself?
> The kernel did the right thing by itself.
> I have a question: what are numa-tools?
> On the computer I find
> man -k numa
> numa (3) - NUMA policy library
> numactl(8) - Control NUMA policy for processes or shared
> rpm -qa | grep -i numa
> Is numactl the "numa-tools"? Is there another package to
> I see that numactl has many "man" pages.
>Reference, previous message:
> >In all cases, 4 MPI processes on a machine with 4 cores (two
> >Meteorology program 1, "bolam" CPU time, real time (in seconds)
> > Linux kernel 2.6.9-11.ELsmp 122 128
> > Linux kernel 184.108.40.206 64 77
> >Meteorology program 2, "non-hydrostatic"
> > Linux kernel 2.6.9-11.ELsmp 598 544
> > Linux kernel 220.127.116.11 430 476
> Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
> Center for Advanced Studies, Research, and Development in Sardinia
> Postal Address: | Physical Address for FedEx, UPS,
> --------------- |
> Alan Scheinine | Alan Scheinine
> c/o CRS4 | c/o CRS4
> C.P. n. 25 | Loc. Pixina Manna Edificio 1
> 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy
> Email: scheinin at crs4.it
> Phone: 070 9250 238 [+39 070 9250 238]
> Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220]
> Operator at reception: 070 9250 1 [+39 070 9250 1]
> Mobile phone: 347 7990472 [+39 347 7990472]
More information about the Beowulf