[Beowulf] AMD performance (was 500GB systems)
sdm900 at gmail.com
Sat Jan 12 16:25:30 PST 2013
Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes with
6276's and 256GB ram. On all our codes, we found the throughput to be
greater than any equivalent density Sandy bridge systems (usually 2 x dual
socket in 1RU) at about 10-15% less energy and about 1/3 the price for the
actual CPU (save a couple thousand $$ per 1RU).
The other interesting point aspect is that their performance is MUCH better
than the sandy bridges when over allocate the cores (ie. run >n cpu threads
on n cores). We found the sandy bridges performance completely tanked when
we did this... the AMD's maintained the same performance (as what you get
with n threads).
Consequently, we have about 5 racks of these systems (120 nodes).
Of course, we are now purchasing Phi's. First 2 racks meant to turn up
On Fri, Jan 11, 2013 at 1:03 PM, Bill Broadley <bill at cse.ucdavis.edu> wrote:
> Over the last few months I've been hearing quite a few negative comments
> about AMD. Seems like most of them are extrapolating from desktop
> Keep in mind that it's quite a stretch going from a desktop (single
> socket, 2 memory channels) to a server (dual socket, 4x the cores, 8
> memory channels).
> Also keep in mind that compilers and kernels can make quite a
> difference. The vector units have changed significantly (a factor of 2)
> and the scheduler needs tweaks to account for the various latencies and
> NUMA related values. Using old kernels/compilers may well significantly
> impact AMD and/or Intel.
> I've found the bandwidth and latency mostly controlled by the socket and
> specifically the number of memory channels. 2, 3, and 4 channel per
> socket systems have very similar bandwidth and latency for AMD and Intel
> When taking a pragmatic approach to best price performance I find AMD
> competitive. Normally I figure out how much ram per CPU is needed, disk
> needs, then figure out which Intel chip has the best system price/system
> perf on the relevant applications. Then do similar for AMD. Then buy
> whichever is better. Often the result is a 15% improvement in one
> direction or another (HIGHLY application dependent).
> Of course sometimes a user asks for the "better" system for running a
> wide variety of floating point codes. In such cases I often use CPU2006
> FP rate.
> In a recent comparison I compared (both perf numbers from HP systems)
> * AMD 6344, 64GB ram, SpecFPRateBase=333 $2,915, $8.75 per spec
> * Intel E5-2620, 64GB ram, SpecFPRateBase=322 $2,990, $9.22 per spec
> Whenever possible I try to use actual applications justifying the
> purchase of a cluster.
> When using actual end user applications it's about a 50/50 chance that
> AMD or Intel will win.
> I figured I'd add a few comments:
> * Latency for a quad socket AMD is around 64ns to a random piece
> of memory (not 600ns as recently mentioned).
> * AMD quad sockets with 512GB ram start around $9k ($USA)
> * With OpenMP, pthreads, MPI or other parallel friendly code a quad
> socket amd can look up random cache line approximately every 2.25ns.
> (64 threads banging on 16 memory channels at once).
> * I've seen no problems with the AMD memory system, in general
> the 2k pin/4 memory bus amd sockets seem to performance similarly
> to Intel.
> And example of AMD's bandwidth scaling on a quad socket with 64 cores:
> I don't have a similar Intel, but I do have a dual socket e5:
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
Dr Stuart Midgley
sdm900 at sdm900.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf