[Beowulf] Athlon64 / Opteron test
Robert G. Brown
rgb at phy.duke.edu
Thu May 13 17:06:04 PDT 2004
On Thu, 13 May 2004, Daniel Fernandez wrote:
> We're studying the possibility of installing a 64 bit testing node.
> The main purpose is about getting impressions related to the performance
> increase we would obtain in our particular scenario, computational fluid
> In order to do the test, we have no doubt about the OS: Red Hat
> Enterprise 3, but we are a bit confused about the harware of choice:
> As far as we know, Opteron has two main differences:
> - A wider memory interface (128 bit in front of 64)
> - A larger L2 cache memory (1 Mb)
> Before doing any test, the questions are:
> Which is the theoretical maximum performance gain using full 64 bit
> architecture in front of 32 bit, taking into account that our
> computations are done in double precision floating point using really
> big matrices?
I don't know what theoretical maximum gain means -- something with
multiplying out clocks, pipeline form factors, datapath widths, and
assuming perfect state? Never happens, more or less useless number.
I do have a moderate amount of faith in what process-specific
microbenchmarks like stream and lmbench tell one (not as much as one
might think, but some useful data points). I also have a LOT of faith
in using your own application as a benchmark on a loaner machine (or
machines). So much so that this is one of the points of focus of my
next couple of Cluster World columns -- the importance of understanding
your application and the deep significance of "YMMV" when asking other
people what to buy.
If stream or stream-like microbenchmarks would be of any use to you, I
have Opterons but not Athlon64's handy. Maybe somebody else can do the
other for you. Remember, though, that even these results may vary
significantly with compiler, with link libraries, with WHICH Opteron
motherboard you get and how memory is loaded onto the motherboard. It
is very, very difficult to get portable numbers without a prototyping
machine that is "like" the architecture you are considering in very
specific form (ideally, identical).
> Is it nowadays the 64 bit solution using Linux ready for production?
> If this is the case, which problems may we have to deal with in order to
> compile and run our code in a full 64 bit environment?
I'm using dual Opterons (Penguin Altus 1000E's) running Fedora right now
without any major problems. Your two problems are likely to be
rebuilding things that aren't for some reason in the Fedora core and
already built for the Opteron. So far I haven't encountered any major
problems compiling gcc-based code linked to the GSL and other
useful/standard libraries. YMMV, but a lot of people have these boxes
now and together they form a powerful force pushing for full and clean
distros, so even if you find something that doesn't work it likely WILL
work and at worst you might have to help make it work in the best of
open source tradition, if it is in your key work pathway.
We're likely to buy nothing else for cluster nodes in the medium run.
So far they are by far the cost-benefit winner.
> Which is the most mature solution: AMD Opteron or Intel Itanium?
Did you mean mature or moribund;-)?
I'm only half kidding. Itanium is dead as a doornail as technology goes
-- overpriced, underperforming, incompatible. Intel is migrating to a
(more or less) Opteron compatible 64 bit processor as fast as they can
get there, as Major Software Companies (MSC) have announced that they
aren't going to do major ports to new chips with new machine languages
and compilers anymore if they can possibly avoid it. If Intel dropped
the price of an Itanium to slightly LESS than that of an Opteron, I
think they'd still have trouble maintaining a market, because Opterons
are relatively easy to port to and will in principle run i386 code
(badly, of course) native. Sometimes. I haven't had a lot of luck with
it, of course, because you can't mix i386 code and 64 bit DLLs and we
installed a 64 bit version of the OS from the start, but theoretically
it will work.
The good news is that Opterons are surprisingly fast for MY applications
for their relatively pokey CPU clocks, and some benchmarks show that
they can be really quite fast indeed for memory intensive applications
relative to e.g. an Athlon or P4 clock. They also run much cooler than
regular Athlons (again for my application). I draw ballpark of 185
watts loaded (dual CPU Opteron) vs 230 Watts or so loaded (dual CPU
Athlon) running more or less the same code.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf