[Beowulf] Re: Athlon64 / Opteron test

Tim Moore moor007 at bellsouth.net
Thu May 13 14:51:57 PDT 2004

Hello -

We have been running CFD codes on Opterons for approximately one year.  We
have been experimenting with Itaniums since Thanksgiving.

We have 3 clusters:

[1] Production Cluster: 16 CPU Opteron (8 nodes)/Myrinet/SLES 8 for AMD64.

[2] Opteron Testbed: 4 CPU (2 nodes)/Dual Port Myrinet/SLES 8 AMD64

[3] Itanium 2 Testbed: 4 CPU (2 nodes)/Dual Port Myrinet/ SLES 8 for IA64

I have not tested RH ES3  for AMD64 so I will not make any comments.
However, SLES 8 was the first OS for AMD64 and gives excellent performance.

Q1: I have no response.

Q2: I think it is ready for production mode.  I have a production cluster
running 1 CFD code, 1 Structural Dynamics Code, 2 Particle Codes, and 1
Solid Dynamics Code.

The CFD and Particle Codes run in 64 bit over Myrinet.

The Structural Dynamics Code and Solid Dynamics Code run in 32 bit over
myrinet.  The Structural Dynamics code runs over 32 bit because the
development platform continues to be RH7.3.  I have a dual athlon machine
running 7.3 for compilation that links to the gm libs for AMD64/Myrinet and
it works great.  The solid dynamics code is quite large and am unsure
whether it is the Portland Group Compilers or the code itself.  I do not
think the code developers know.  Anyway, it compiles and runs in 32 bit

The issue, at least to me are the compilers.  I have to use the Portland
Group because of the Cray pointers. GCC works fine for codes that do not
require the pointers.

Q3: Maturity <<<<<maybe>>>>> goes to Intel.  However, I have had great
difficulty compiling some of the codes in 64 bit mode. (Intel is either 64
bit or nothing, unless you want them to emulate 32 bit which is a
performance reduction).  The main issue with Intel is the compilers.  We ran
benchmarks of the CFD code compiled with gcc and Intel to find that the
Intel compilers generate code 3X faster than that of gcc.  On the other
hand, the same CFD code ran on the opterons (compiled with gcc) only 8%
slower than the Intel/I2 version.

To summarize my response to 3...

We build our own opteron nodes and purchase our I2 nodes from Promicro.  We
build a 1U/2U dual opteron node with myrinet for less than 1/3 the cost of a
I2/1.4 GHz/Myrinet node.  You can spend 3 times the money and get an 8%
boost in performance.

I am putting together a paper that documents workshop results and you may
have a draft if you would like.  We will be posting the results on our
website soon.

I hope that I have helped some.

Tim Moore
Chief Scientist, TCG

> Hi,
> We're studying the possibility of installing a 64 bit testing node.
> The main purpose is about getting impressions related to the performance
> increase we would obtain in our particular scenario, computational fluid
> dynamics.
> In order to do the test, we have no doubt about the OS: Red Hat
> Enterprise 3, but we are a bit confused about the harware of choice:
> Athlon64
> Opteron
> As far as we know, Opteron has two main differences:
> - A wider memory interface (128 bit in front of 64)
> - A larger L2 cache memory (1 Mb)
> Before doing any test, the questions are:
> 1)
>  Which is the theoretical maximum performance gain using full 64 bit
> architecture in front of 32 bit, taking into account that our
> computations are done in double precision floating point using really
> big matrices?
> 2)
> Is it nowadays the 64 bit solution using Linux ready for production?
> If this is the case, which problems may we have to deal with in order to
> compile and run our code in a full 64 bit environment?
> 3)
> Which is the most mature solution: AMD Opteron or Intel Itanium?

More information about the Beowulf mailing list