[Beowulf] AMD performance (was 500GB systems)
reuti at staff.uni-marburg.de
Fri Jan 11 05:59:26 PST 2013
Am 11.01.2013 um 14:22 schrieb Vincent Diepeveen:
> On Jan 11, 2013, at 6:03 AM, Bill Broadley wrote:
>> Over the last few months I've been hearing quite a few negative
>> about AMD. Seems like most of them are extrapolating from desktop
>> Keep in mind that it's quite a stretch going from a desktop (single
>> socket, 2 memory channels) to a server (dual socket, 4x the cores, 8
>> memory channels).
> Bill - a 2 socket system doesn't deliver 512GB ram.
Maybe I get it wrong, but I was checking these machines recently:
IBM's x3550 M4 goes up to 768 GB with 2 CPUs http://public.dhe.ibm.com/common/ssi/ecm/en/xsd03131usen/XSD03131USEN.PDF
IBM's x3950 X5 goes up to 3 TB with their MAX-5 extension using 4 CPUs, so I assume 1.5 TB with 2 CPUs could work too http://public.dhe.ibm.com/common/ssi/ecm/en/xsd03054usen/XSD03054USEN.PDF
> Your compare at 2 socket domain doesn't make sense for someone who
> needs 512GB ram,
> the performance of 4 socket systems is total different from 2.
>> I figured I'd add a few comments:
>> * Latency for a quad socket AMD is around 64ns to a random piece
>> of memory (not 600ns as recently mentioned).
> I wrote a testprogram for this in 2003.
> You have no idea what TLB trashing accesses are obviously at the
> hundreds of gigabyte area.
> There is 0 cheap systems on the planet where you can get a bunch of
> random bytes in 64 ns
> from a random spot out of 500GB of RAM, a memory line you previously
> hadn't opened yet and
> which with sureness isn't in the cache yet. You will be looking at
> 400+ ns latencies bestcase.
> You won't get it faster at any platform which is affordable (of
> course 512GB of SRAM would be faster,
> yet let's not go into theoretic discussions here - as you can't
> afford 512GB of SRAM).
>> * AMD quad sockets with 512GB ram start around $9k ($USA)
> You can easily build one with new components from ebay for $2k. Then
> add the 512GB ram price to that.
> New from a shop the AMD stuff is dirt cheap as well, as a single core
> ain't fast of course of the new bulldozer line,
> offers fully assembled and everything ready working is around $6k
> mark - excluding 512GB ram of course.
> Yet it has better latency to a 512 GB block of RAM than intels 4
> socket systems.
> And that will be many many hundreds of nanoseconds of course.
>> * With OpenMP, pthreads, MPI or other parallel friendly code a quad
>> socket amd can look up random cache line approximately every 2.25ns.
>> (64 threads banging on 16 memory channels at once).
> You still didn't get the picture of TLB trashing software huh?
> It reads each time from a random memory location. Only at the end of
> the calculation the search space converges a tad,
> but still it's random.
> A measurement i have from a tad older 8 socket intel box here is 700
> ns for similar TLB trashing behaviour.
>> * I've seen no problems with the AMD memory system, in general
>> the 2k pin/4 memory bus amd sockets seem to performance similarly
>> to Intel.
> For random accesses at a single or 2 sockets there is huge
> differences (all cores busy).
> Intel single socket around 90 ns for my benchmark and bulldozer
> single socket around 150-170 ns ( 8 cores busy).
> You really have no idea what 'random' reads are.
>> And example of AMD's bandwidth scaling on a quad socket with 64 cores:
>> I don't have a similar Intel, but I do have a dual socket e5:
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf