[Beowulf] AMD performance (was 500GB systems)

Sat Jan 12 16:25:30 PST 2013

Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes with
6276's and 256GB ram.  On all our codes, we found the throughput to be
greater than any equivalent density Sandy bridge systems (usually 2 x dual
socket in 1RU) at about 10-15% less energy and about 1/3 the price for the
actual CPU (save a couple thousand $$ per 1RU).

The other interesting point aspect is that their performance is MUCH better
than the sandy bridges when over allocate the cores (ie. run >n cpu threads
on n cores).  We found the sandy bridges performance completely tanked when
we did this... the AMD's maintained the same performance (as what you get
with n threads).

Consequently, we have about 5 racks of these systems (120 nodes).

Of course, we are now purchasing Phi's.  First 2 racks meant to turn up
this week.

On Fri, Jan 11, 2013 at 1:03 PM, Bill Broadley <bill at cse.ucdavis.edu> wrote:

>
> Over the last few months I've been hearing quite a few negative comments
> about AMD.  Seems like most of them are extrapolating from desktop
> performance.
>
> Keep in mind that it's quite a stretch going from a desktop (single
> socket, 2 memory channels) to a server (dual socket, 4x the cores, 8
> memory channels).
>
> Also keep in mind that compilers and kernels can make quite a
> difference.  The vector units have changed significantly (a factor of 2)
> and the scheduler needs tweaks to account for the various latencies and
> NUMA related values.  Using old kernels/compilers may well significantly
> impact AMD and/or Intel.
>
> I've found the bandwidth and latency mostly controlled by the socket and
> specifically the number of memory channels.  2, 3, and 4 channel per
> socket systems have very similar bandwidth and latency for AMD and Intel
> systems.
>
> When taking a pragmatic approach to best price performance I find AMD
> competitive.  Normally I figure out how much ram per CPU is needed, disk
> needs, then figure out which Intel chip has the best system price/system
> perf on the relevant applications.  Then do similar for AMD.  Then buy
> whichever is better.  Often the result is a 15% improvement in one
> direction or another (HIGHLY application dependent).
>
> Of course sometimes a user asks for the "better" system for running a
> wide variety of floating point codes.  In such cases I often use CPU2006
> FP rate.
>
> In a recent comparison I compared (both perf numbers from HP systems)
> * AMD 6344,      64GB ram, SpecFPRateBase=333 $2,915, $8.75 per spec
> * Intel E5-2620, 64GB ram, SpecFPRateBase=322 $2,990, $9.22 per spec
>
> Whenever possible I try to use actual applications justifying the
> purchase of a cluster.
>
> When using actual end user applications it's about a 50/50 chance that
> AMD or Intel will win.
>
> I figured I'd add a few comments:
> * Latency for a quad socket AMD is around 64ns to a random piece
>   of memory (not 600ns as recently mentioned).
> * AMD quad sockets with 512GB ram start around $9k ($USA)
> * With OpenMP, pthreads, MPI or other parallel friendly code a quad
>   socket amd can look up random cache line approximately every 2.25ns.
>   (64 threads banging on 16 memory channels at once).
> * I've seen no problems with the AMD memory system, in general
>   the 2k pin/4 memory bus amd sockets seem to performance similarly
>   to Intel.
>
> And example of AMD's bandwidth scaling on a quad socket with 64 cores:
>   http://cse.ucdavis.edu/bill/pstream/bm3-all.png
>
> I don't have a similar Intel, but I do have a dual socket e5:
>   http://cse.ucdavis.edu/bill/pstream/e5-2609.png
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Dr Stuart Midgley
sdm900 at sdm900.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20130113/6b83f233/attachment.html>