[Beowulf] [External] Re: HPCG benchmark, again

Massimiliano Fatica mfatica at gmail.com
Tue Mar 22 00:42:06 UTC 2022


No, HPCG  is all memory bandwidth.
You can see this old presentation where GPUs with basically no double
precision, perform on par with others with 10x performance.

http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf

There were more examples during recent HPCG BOFs ( but I can't find the pdf
online, if you want I can send them to you).
For example, if you look at the specs of a K80 ( 2xGK210 , 1.4TF DP and 384
bit memory bus  at 5GHz ) and M40 (GM200, 0.2TF DP and 384 bit memory bus
 at 6GHz), you may think that the K80 will much faster. Exactly the
opposite, and the results scale perfectly with memory bandwidth.

*1 x K80 (2 GK210 GPUs), ECC enabled, clk=875*
2x1x1 process grid
256x256x256 local domain
SpMV = 49.1 GF ( 309.1 GB/s Effective) 24.5 GF_per ( 154.6 GB/s Effective)
SymGS = 62.2 GF ( 480.2 GB/s Effective) 31.1 GF_per ( 240.1 GB/s Effective)
total = 58.7 GF ( 445.3 GB/s Effective) 29.4 GF_per ( 222.7 GB/s Effective)
final = 55.1 GF ( 417.5 GB/s Effective) 27.5 GF_per ( 208.8 GB/s Effective)

*2 x M40 (2 GM200 GPUs), ECC enabled, clk=1114*
2x1x1 process grid
256x256x256 local domain
SpMV = 69.4 GF ( 437.2 GB/s Effective) 34.7 GF_per ( 218.6 GB/s Effective)
SymGS = 83.7 GF ( 645.7 GB/s Effective) 41.8 GF_per ( 322.8 GB/s Effective)
total = 79.6 GF ( 603.7 GB/s Effective) 39.8 GF_per ( 301.9 GB/s Effective)
final = 74.2 GF ( 562.7 GB/s Effective) 37.1 GF_per ( 281.4 GB/s Effective)

Regarding Linpack, on CPU systems  the trailing matrix update is slow, you
can hide all the network traffic with the look-ahead if you have a
decent network (most CPU-only systems on the list are not real  HPC
systems, just some OEMs stuffing the list with cloud systems with very poor
network).
On accelerated systems ( for example GPU), network becomes really critical.

Now, memory bw is the real limitation in most HPC workloads, so if I had to
select a system, I would care more about memory bw than HPL.

M


On Mon, Mar 21, 2022 at 11:51 AM Prentice Bisbal via Beowulf <
beowulf at beowulf.org> wrote:

> M,
>
> Isn't it more accurate to say that HPCG measures the whole system more
> realistically, and memory bandwidth happens to be the "rate limiting step"
> in just about all architectures? Even with LINPACK, which should be
> CPU-bound, the Top500 list shows that HPL results are affected by the
> network. For example, there's this article which is a bit old, but I think
> still applies (doing the same analysis on the current top500 list is on my
> to-do list, actually):
>
>
> https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/
>
> On 3/18/22 8:34 PM, Massimiliano Fatica wrote:
>
> HPCG measures memory bandwidth, the FLOPS capability of the chip is
> completely irrelevant.
> Pretty much all the vendor implementations reach very similar efficiency
> if you compare them to the available memory bandwidth.
> There is some effect of the network at scale, but you need to have a
> really large  system to see it in play.
>
> M
>
> On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins <bdobbins at gmail.com> wrote:
>
>>
>> Hi Jorg,
>>
>>   We (NCAR - weather/climate applications) tend to find that HPCG more
>> closely tracks the performance we see from hardware than Linpack, so it
>> definitely is of interest and watched, but our procurements tend to use
>> actual code that vendors run as part of the process, so we don't 'just' use
>> published HPCG numbers.  Still, I'd say it's still very much a useful
>> number, though.
>>
>>   As one example, while I haven't seen HPCG numbers for the MI250x
>> accelerators, Prof. Matuoka of RIKEN tweeted back in November that he
>> anticipated that to score around 0.4% of peak on HPCG, vs 2% on the NVIDIA
>> A100 (while the A64FX they use hits an impressive 3%):
>> https://twitter.com/ProfMatsuoka/status/1458159517590384640
>>
>>   Why is that relevant?  Well, *on paper*, the MI250X has ~96 TF FP64 w/
>> Matrix operations, vs 19.5 TF on the A100.  So, 5x in theory, but Prof
>> Matsuoka anticipated a ~5x differential in HPCG, *erasing* that
>> differential.  Now, surely *someone* has HPCG numbers on the MI250X, but
>> I've not yet seen any.  Would love to know what they are.  But absent that
>> information I tend to bet Matsuoka isn't far off the mark.
>>
>>   Ultimately, it may help knowing more about what kind of applications
>> you run - for memory bound CFD-like codes, HPCG tends to be pretty
>> representative.
>>
>>   Maybe it's time to update the saying that 'numbers never lie' to
>> something more accurate - 'numbers never lie, but they also rarely tell the
>> whole story'.
>>
>>   Cheers,
>>   - Brian
>>
>>
>> On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen <
>> sassy-work at sassy.formativ.net> wrote:
>>
>>> Dear all,
>>>
>>> further the emails back in 2020 around the HPCG benchmark test, as we
>>> are in
>>> the process of getting a new cluster I was wondering if somebody else in
>>> the
>>> meantime has used that test to benchmark the particular performance of
>>> the
>>> cluster.
>>> From what I can see, the latest HPCG version is 3.1 from August 2019. I
>>> also
>>> have noticed that their website has a link to download a version which
>>> includes the latest A100 GPUs from nVidia.
>>> https://www.hpcg-benchmark.org/software/view.html?id=280
>>>
>>> What I was wondering is: has anybody else apart from Prentice tried that
>>> test
>>> and is it somehow useful, or does it just give you another set of
>>> numbers?
>>>
>>> Our new cluster will not be at the same league as the supercomputers,
>>> but we
>>> would like to have at least some kind of handle so we can compare the
>>> various
>>> offers from vendors. My hunch is the benchmark will somehow (strongly?)
>>> depend
>>> on how it is tuned. As my former colleague used to say: I am looking for
>>> some
>>> war stories (not very apt to say these days!).
>>>
>>> Either way, I hope you are all well given the strange new world we are
>>> living
>>> in right now.
>>>
>>> All the best from a spring like dark London
>>>
>>> Jörg
>>>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20220321/8b3d7064/attachment-0001.htm>


More information about the Beowulf mailing list