[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]
landman at scalableinformatics.com
Wed Jul 13 09:56:09 PDT 2005
If you use numactl, you should have control over processor affinity
for a particular process. I am not sure how this ties in to MPI though,
so there may need to be some work there.
Mikhail Kuzminsky wrote:
> In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul
> 2005 12:24:27 +0200):
>> 1) Gerry Creager wrote "Hoowa!"
>> Since the results seem useful, I would like to add the following.
>> On dual-CPU boards with Athlon32 CPUs, the program "bolam" was
>> slow if
>> both CPUs on the board were used, it was better to have one MPICH
>> per compute node. This problem did not appear in another cluster
>> that had
>> Opteron dual-CPU boards (single-core), that is, two processes for
>> each node
>> did not cause a slowdown. This is an indication that "bolam" is at a
>> threshold for memory access being a bottleneck.
> The original post by S.Gorelsky (re-sent by E.Leitl) was about good
> scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03
> DFT/test397 test. I'm testing just now like Supermicro server
> w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky.
> I used SuSE 9.0 w/2.4.21 kernel.
> I understood, that original results of S.Gorelsky were obtained, probably,
> for shared memory parallelization ! If I use G03 w/Linda (which
> is main parallelization tool for G03 - parallelization in shared
> memory model of G03 is available only for more restricted subset
> of quantum-chemical methods) - then the results are much more bad.
> On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for
> shared memory. The difference is, as I understand, simple because
> of data exchanges through RAM for the case of Linda; in shared memory
> model like memory traffic is absent.
> FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated
> properly :-)).
> I also obtained similar results for other quantum-chemical methods which
> show that using of Linda/G03 may give bad scalability for
> dual-core Opteron.
> We also have some (developing by us) quantum-chemical application which
> is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI
> process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred. In
> the case of (dual single core CPUs)-Opteron nodes the situation is better.
> But now for 4cores/2CPUs per Opteron node to force the using of
> only 2 cores (from 4), by 1 for each chip, we'll need to have
> cpu affinity support in Linux.
>> A complication for this
>> interpretation is that the Athlon32 nodes use Linux kernel 2.4.21.
>> 2) Mikhail Kuzminsky asked "do you have "node interleave memory"
>> switched off?
>> Reading the BIOS:
>> Bank interleaving "Auto", there are two memory modules per CPU so
>> should be bank interleaving.
>> Node interleaving "Disable"
>> 3) In an email Guy Coates asked
>> > Did you need to use numa-tools to specify the CPU placement, or
>> did the
>> > kernel "do the right thing" by itself?
>> The kernel did the right thing by itself.
>> I have a question: what are numa-tools?
>> On the computer I find
>> man -k numa
>> numa (3) - NUMA policy library
>> numactl(8) - Control NUMA policy for processes or shared memory
>> rpm -qa | grep -i numa
>> Is numactl the "numa-tools"? Is there another package to consider
>> I see that numactl has many "man" pages.
>> Reference, previous message:
>> >In all cases, 4 MPI processes on a machine with 4 cores (two
>> dual-core CPUs).
>> >Meteorology program 1, "bolam" CPU time, real time (in seconds)
>> > Linux kernel 2.6.9-11.ELsmp 122 128
>> > Linux kernel 188.8.131.52 64 77
>> >Meteorology program 2, "non-hydrostatic"
>> > Linux kernel 2.6.9-11.ELsmp 598 544
>> > Linux kernel 184.108.40.206 430 476
>> Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
>> Center for Advanced Studies, Research, and Development in Sardinia
>> Postal Address: | Physical Address for FedEx, UPS, DHL:
>> --------------- | -------------------------------------
>> Alan Scheinine | Alan Scheinine
>> c/o CRS4 | c/o CRS4
>> C.P. n. 25 | Loc. Pixina Manna Edificio 1
>> 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy
>> Email: scheinin at crs4.it
>> Phone: 070 9250 238 [+39 070 9250 238]
>> Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220]
>> Operator at reception: 070 9250 1 [+39 070 9250 1]
>> Mobile phone: 347 7990472 [+39 347 7990472]
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf