[Beowulf] Theoretical peak performance of DGX A100
carlos.bederian at unc.edu.ar
Thu Jun 3 12:55:36 UTC 2021
The Top500 has been listing wrong Rpeak values for most clusters for many
years now, so I wouldn't dwell on it...
Take a Skylake-based cluster like Frontera. Its listed Rpeak is 38,745.9
TFLOPS = 8008 nodes * 56 cores * 32 ops/cycle * 2.7GHz.
But 2.7GHz is the regular base frequency, and to do 32 ops/cycle you need
to use AVX-512. All-core AVX-512 frequencies for a Xeon 8280 are 1.8GHz
base and 2.4GHz turbo, so the Rpeak is off by 12-33%.
On Thu, Jun 3, 2021 at 9:22 AM harsh_google lastname <
harshscience777 at gmail.com> wrote:
> But that wouls bring the theoretical performance to 160 TFLOPS per box,
> which also doesn't match!
> On Thu, Jun 3, 2021, 5:50 PM Carlos Bederián <carlos.bederian at unc.edu.ar>
>> A100 does 19.5 FP64 TFLOPS using tensor cores.
>> On Thu, Jun 3, 2021 at 9:08 AM harsh_google lastname <
>> harshscience777 at gmail.com> wrote:
>>> I am calculating the theoretical peak (FP64) performance of the Nvidia
>>> DGX A100 system.
>>> Now, A100 datasheet lists FP64 performance to be 9.7 TFLOPS.
>>> Two AMD 7742 CPUs will give 128 cores x 2.25 GHz base clock x 16 FP64
>>> ops / cycle = 4.6 TFLOPS.
>>> This gives a total of 82.2 TFLOPS per DGX-A100.
>>> Here is my problem. For any system with DGX A100 on top500.org, numbers
>>> just don't add up. For eg: Selene has 560 DGX boxes, but its theoretical
>>> peak is listed as 79.2 PFLOPS, whereas I expect it should be 46 PFLOPS (ie
>>> 82.2 TFLOPS x560). The same is true for any other DGX based system listed
>>> on top500. What am I missing here?
>>> Harsh Hemani
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf