[Beowulf] Strange Opteron 2350 performance: Gaussian-03
libo at buaa.edu.cn
Sat Jun 28 09:37:12 PDT 2008
Sorry, I don't have the same applications as you.
Did you compile them with gcc? If gcc, then -o3 can do some optimization.
-march=k8 is enough I think.
And you make sure the CPU running at the default frequency. Sometime Powernow is active as default.
And BTW, what's your platform? Linux? Which release? X86_64?
----- Original Message -----
From: "Mikhail Kuzminsky" <kus at free.net>
To: "Li, Bo" <libo at buaa.edu.cn>
Cc: <beowulf at beowulf.org>
Sent: Sunday, June 29, 2008 12:23 AM
Subject: Re: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
> In message from "Li, Bo" <libo at buaa.edu.cn> (Sun, 29 Jun 2008 00:07:07
>>I am afraid there must be something wrong with your experiment.
>>How did you get the performance? Was your DFT codes running in
>>parallel? Any optimization involved?
> I was afraid the same, but the results are reproduced twice.
> As I wrote in my message:
> - there were ONE CORE (one CPU for Opteron 246) runs
> - the optimization was performed for OLD Opteron 246 (because
> Gaussian, Inc do not propose binaries optimized specially for
> DFT test397 (as any other DFT) is parallelized well, and on Opteron
> 246 it gives 1.9 times speedup on 2 CPUs. But I didn't run 2-cores
> parallelized job for Opteron 2350: I was stressed by results obtained
> for 1 core.
>>In most of my test, K8L or K10 can beat old opteron at the same
>>frequency with about 20% improvement.
> Sorry, do you have this on Gaussian-03 and for DFT in particular ? Did
> you compile it on K10 using target=barcelona (i.e. optimized for
> barcelona) ?
>>----- Original Message -----
>>From: "Mikhail Kuzminsky" <kus at free.net>
>>To: <beowulf at beowulf.org>
>>Sent: Saturday, June 28, 2008 11:48 PM
>>Subject: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
>>> I'm runnung a set of quad-core Opteron 2350 benchmarks, in
>>> using Gaussian-03 (binary version from Gaussian, Inc, i.e.
>>> by more old - than current - pgf77 version, for Opteron target).
>>> I compare in particular *one core* of Opteron 2350 w/Opteron 246
>>> having the same 2 Ghz frequency and the same amount of cache per
>>> (512K L2 + 0.25*2 MB L3 for Opteron 2350 is just 1 MB L2 for Opteron
>>> 246). Opteron 246 has even more fast DDR2-667 RAM.
>>> The Gaussian-03 performance in some cases is close for both
>>> (I remember that compilation didn't know about Barcelona !), but for
>>> very popular DFT method Opteron 2350 cores looks as slow: one job
>>> gives 33% more bad (than Opteron 246) performance.
>>> But on standard Gaussian-03 test397.com DFT/B3LYP test: *one* (1)
>>> Opteron 2350 core run 15667 sec. (both startstop and cpu) vs 8709
>>> on (one) Opteron 246 !!
>>> There is no powersaved daemon, so the frequnecy of Opteron 2350 is
>>> fixed to 2 Ghz. I reproduced this result twice on Opteron 2350, in
>>> particular one time using forced good numactl behaviour. I'm
>>> reproducing it on Opteron 246 again :-) but I have indirect
>>> confirmation of this timings (based on 2-cpus Opteron 246 parallel
>>> Yes, AFAIK DFT method is cache-friendly, and more slow L3 cache in
>>> Opteron 2350 may give more bad performance. But in 1.8 times ??
>>> Any your comments are welcome.
>>> Mikhail Kuzminsky
>>> Computer Assistance to Chemical Research Center
>>> Zelinsky Institute of Organic Chemistry
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf