<div dir="ltr">Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes with 6276's and 256GB ram.  On all our codes, we found the throughput to be greater than any equivalent density Sandy bridge systems (usually 2 x dual socket in 1RU) at about 10-15% less energy and about 1/3 the price for the actual CPU (save a couple thousand $$ per 1RU).<div>


<br></div><div>The other interesting point aspect is that their performance is MUCH better than the sandy bridges when over allocate the cores (ie. run >n cpu threads on n cores).  We found the sandy bridges performance completely tanked when we did this... the AMD's maintained the same performance (as what you get with n threads).<br>


<div><br></div><div style>Consequently, we have about 5 racks of these systems (120 nodes).</div></div><div style><br></div><div style>Of course, we are now purchasing Phi's.  First 2 racks meant to turn up this week.</div>


<div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jan 11, 2013 at 1:03 PM, Bill Broadley <span dir="ltr"><<a href="mailto:bill@cse.ucdavis.edu" target="_blank">bill@cse.ucdavis.edu</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Over the last few months I've been hearing quite a few negative comments<br>

about AMD.  Seems like most of them are extrapolating from desktop<br>

performance.<br>

<br>

Keep in mind that it's quite a stretch going from a desktop (single<br>

socket, 2 memory channels) to a server (dual socket, 4x the cores, 8<br>

memory channels).<br>

<br>

Also keep in mind that compilers and kernels can make quite a<br>

difference.  The vector units have changed significantly (a factor of 2)<br>

and the scheduler needs tweaks to account for the various latencies and<br>

NUMA related values.  Using old kernels/compilers may well significantly<br>

impact AMD and/or Intel.<br>

<br>

I've found the bandwidth and latency mostly controlled by the socket and<br>

specifically the number of memory channels.  2, 3, and 4 channel per<br>

socket systems have very similar bandwidth and latency for AMD and Intel<br>

systems.<br>

<br>

When taking a pragmatic approach to best price performance I find AMD<br>

competitive.  Normally I figure out how much ram per CPU is needed, disk<br>

needs, then figure out which Intel chip has the best system price/system<br>

perf on the relevant applications.  Then do similar for AMD.  Then buy<br>

whichever is better.  Often the result is a 15% improvement in one<br>

direction or another (HIGHLY application dependent).<br>

<br>

Of course sometimes a user asks for the "better" system for running a<br>

wide variety of floating point codes.  In such cases I often use CPU2006<br>

FP rate.<br>

<br>

In a recent comparison I compared (both perf numbers from HP systems)<br>

* AMD 6344,      64GB ram, SpecFPRateBase=333 $2,915, $8.75 per spec<br>

* Intel E5-2620, 64GB ram, SpecFPRateBase=322 $2,990, $9.22 per spec<br>

<br>

Whenever possible I try to use actual applications justifying the<br>

purchase of a cluster.<br>

<br>

When using actual end user applications it's about a 50/50 chance that<br>

AMD or Intel will win.<br>

<br>

I figured I'd add a few comments:<br>

* Latency for a quad socket AMD is around 64ns to a random piece<br>

  of memory (not 600ns as recently mentioned).<br>

* AMD quad sockets with 512GB ram start around $9k ($USA)<br>

* With OpenMP, pthreads, MPI or other parallel friendly code a quad<br>

  socket amd can look up random cache line approximately every 2.25ns.<br>

  (64 threads banging on 16 memory channels at once).<br>

* I've seen no problems with the AMD memory system, in general<br>

  the 2k pin/4 memory bus amd sockets seem to performance similarly<br>

  to Intel.<br>

<br>

And example of AMD's bandwidth scaling on a quad socket with 64 cores:<br>

  <a href="http://cse.ucdavis.edu/bill/pstream/bm3-all.png" target="_blank">http://cse.ucdavis.edu/bill/pstream/bm3-all.png</a><br>

<br>

I don't have a similar Intel, but I do have a dual socket e5:<br>

  <a href="http://cse.ucdavis.edu/bill/pstream/e5-2609.png" target="_blank">http://cse.ucdavis.edu/bill/pstream/e5-2609.png</a><br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>Dr Stuart Midgley<br><a href="mailto:sdm900@sdm900.com" target="_blank">sdm900@sdm900.com</a>

</div>