[Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netWed Jul 13 09:31:48 PDT 2005
- Previous message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Next message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul 2005 12:24:27 +0200): > 1) Gerry Creager wrote "Hoowa!" > Since the results seem useful, I would like to add the >following. > On dual-CPU boards with Athlon32 CPUs, the program "bolam" was >slow if > both CPUs on the board were used, it was better to have one >MPICH process > per compute node. This problem did not appear in another >cluster that had > Opteron dual-CPU boards (single-core), that is, two processes >for each node > did not cause a slowdown. This is an indication that "bolam" is >at a > threshold for memory access being a bottleneck. The original post by S.Gorelsky (re-sent by E.Leitl) was about good scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03 DFT/test397 test. I'm testing just now like Supermicro server w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky. I used SuSE 9.0 w/2.4.21 kernel. I understood, that original results of S.Gorelsky were obtained, probably, for shared memory parallelization ! If I use G03 w/Linda (which is main parallelization tool for G03 - parallelization in shared memory model of G03 is available only for more restricted subset of quantum-chemical methods) - then the results are much more bad. On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for shared memory. The difference is, as I understand, simple because of data exchanges through RAM for the case of Linda; in shared memory model like memory traffic is absent. FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated properly :-)). I also obtained similar results for other quantum-chemical methods which show that using of Linda/G03 may give bad scalability for dual-core Opteron. We also have some (developing by us) quantum-chemical application which is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred. In the case of (dual single core CPUs)-Opteron nodes the situation is better. But now for 4cores/2CPUs per Opteron node to force the using of only 2 cores (from 4), by 1 for each chip, we'll need to have cpu affinity support in Linux. Yours Mikhail > A complication >for this > interpretation is that the Athlon32 nodes use Linux kernel >2.4.21. > 2) Mikhail Kuzminsky asked "do you have "node interleave memory" >switched off? > Reading the BIOS: > Bank interleaving "Auto", there are two memory modules per CPU >so there > should be bank interleaving. > Node interleaving "Disable" > 3) In an email Guy Coates asked > > Did you need to use numa-tools to specify the CPU placement, >or did the > > kernel "do the right thing" by itself? > The kernel did the right thing by itself. > I have a question: what are numa-tools? > On the computer I find > man -k numa > numa (3) - NUMA policy library > numactl(8) - Control NUMA policy for processes or shared >memory > rpm -qa | grep -i numa > numactl-0.6.4-1.13 > Is numactl the "numa-tools"? Is there another package to >consider installing? > I see that numactl has many "man" pages. > >Reference, previous message: > >In all cases, 4 MPI processes on a machine with 4 cores (two >dual-core CPUs). > >Meteorology program 1, "bolam" CPU time, real time (in seconds) > > Linux kernel 2.6.9-11.ELsmp 122 128 > > Linux kernel 2.6.12.2 64 77 > > > >Meteorology program 2, "non-hydrostatic" > > Linux kernel 2.6.9-11.ELsmp 598 544 > > Linux kernel 2.6.12.2 430 476 > > >-- > > Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna > Center for Advanced Studies, Research, and Development in Sardinia > > Postal Address: | Physical Address for FedEx, UPS, >DHL: > --------------- | > ------------------------------------- > Alan Scheinine | Alan Scheinine > c/o CRS4 | c/o CRS4 > C.P. n. 25 | Loc. Pixina Manna Edificio 1 > 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy > > Email: scheinin at crs4.it > > Phone: 070 9250 238 [+39 070 9250 238] > Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] > Operator at reception: 070 9250 1 [+39 070 9250 1] > Mobile phone: 347 7990472 [+39 347 7990472] >
- Previous message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Next message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
