[Beowulf] HPCG benchmark, again

Benson Muite benson_muite at emailplus.org
Sat Mar 19 07:18:10 UTC 2022


For memory bandwidth, single node tests such as Likwid are helpful 
https://github.com/RRZE-HPC/likwid

MPI communication benchmarks are a good complement to this.

Full applications do more than the above, but these are easier starting 
points that require less domain specific application knowledge for 
general performance measurement.

On 3/19/22 3:58 AM, Richard Walsh wrote:
> 
> J,
> 
> Trying to add a bit to the preceding useful answers …
> 
> In my experience running these codes on very large systems for 
> acceptances, to get optimal (HPCG or HPL) performance on GPUs (MI200 or 
> A100) you need to obtain the optimized versions from the vendors which 
> include scripts with ENV variable tunings specific the their versions 
> and optimal affinity settings to manage the non-simple relationship 
> between the NICs, the GPUs, and CPUs … you have iterate through the 
> settings to find optimal settings for you system.
> 
> If you set out to do this on your own, the chances of getting values 
> similar to those posted on the TOP500 website are vanishingly small …
> 
> As already noted, buyers of large HPC systems almost always require 
> large scale runs of both HPCG (to demonstrate peak bandwidth) and HPL 
> (to demonstrated peak processor) performance.
> 
> Cheers!
> 
> rbw
> 
> Sent from my iPhone
> 
>> On Mar 18, 2022, at 7:35 PM, Massimiliano Fatica <mfatica at gmail.com> 
>> wrote:
>>
>> 
>> HPCG measures memory bandwidth, the FLOPS capability of the chip is 
>> completely irrelevant.
>> Pretty much all the vendor implementations reach very similar 
>> efficiency if you compare them to the available memory bandwidth.
>> There is some effect of the network at scale, but you need to have a 
>> really large  system to see it in play.
>>
>> M
>>
>> On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins <bdobbins at gmail.com 
>> <mailto:bdobbins at gmail.com>> wrote:
>>
>>
>>     Hi Jorg,
>>
>>       We (NCAR - weather/climate applications) tend to find that HPCG
>>     more closely tracks the performance we see from hardware than
>>     Linpack, so it definitely is of interest and watched, but our
>>     procurements tend to use actual code that vendors run as part of
>>     the process, so we don't 'just' use published HPCG numbers. 
>>     Still, I'd say it's still very much a useful number, though.
>>
>>       As one example, while I haven't seen HPCG numbers for the MI250x
>>     accelerators, Prof. Matuoka of RIKEN tweeted back in November that
>>     he anticipated that to score around 0.4% of peak on HPCG, vs 2% on
>>     the NVIDIA A100 (while the A64FX they use hits an impressive 3%):
>>     https://twitter.com/ProfMatsuoka/status/1458159517590384640
>>     <https://twitter.com/ProfMatsuoka/status/1458159517590384640>
>>
>>       Why is that relevant?  Well, /on paper/, the MI250X has ~96 TF
>>     FP64 w/ Matrix operations, vs 19.5 TF on the A100.  So, 5x in
>>     theory, but Prof Matsuoka anticipated a ~5x differential in HPCG,
>>     /erasing/ that differential.  Now, surely /someone/ has HPCG
>>     numbers on the MI250X, but I've not yet seen any.  Would love to
>>     know what they are.  But absent that information I tend to bet
>>     Matsuoka isn't far off the mark.
>>
>>       Ultimately, it may help knowing more about what kind of
>>     applications you run - for memory bound CFD-like codes, HPCG tends
>>     to be pretty representative.
>>
>>       Maybe it's time to update the saying that 'numbers never lie' to
>>     something more accurate - 'numbers never lie, but they also rarely
>>     tell the whole story'.
>>
>>       Cheers,
>>       - Brian
>>
>>
>>     On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
>>     <sassy-work at sassy.formativ.net
>>     <mailto:sassy-work at sassy.formativ.net>> wrote:
>>
>>         Dear all,
>>
>>         further the emails back in 2020 around the HPCG benchmark
>>         test, as we are in
>>         the process of getting a new cluster I was wondering if
>>         somebody else in the
>>         meantime has used that test to benchmark the particular
>>         performance of the
>>         cluster.
>>         From what I can see, the latest HPCG version is 3.1 from
>>         August 2019. I also
>>         have noticed that their website has a link to download a
>>         version which
>>         includes the latest A100 GPUs from nVidia.
>>         https://www.hpcg-benchmark.org/software/view.html?id=280
>>         <https://www.hpcg-benchmark.org/software/view.html?id=280>
>>
>>         What I was wondering is: has anybody else apart from Prentice
>>         tried that test
>>         and is it somehow useful, or does it just give you another set
>>         of numbers?
>>
>>         Our new cluster will not be at the same league as the
>>         supercomputers, but we
>>         would like to have at least some kind of handle so we can
>>         compare the various
>>         offers from vendors. My hunch is the benchmark will somehow
>>         (strongly?) depend
>>         on how it is tuned. As my former colleague used to say: I am
>>         looking for some
>>         war stories (not very apt to say these days!).
>>
>>         Either way, I hope you are all well given the strange new
>>         world we are living
>>         in right now.
>>
>>         All the best from a spring like dark London
>>
>>         Jörg
>>
>>
>>
>>         _______________________________________________
>>         Beowulf mailing list, Beowulf at beowulf.org
>>         <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>>         To change your subscription (digest mode or unsubscribe) visit
>>         https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>         <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>>
>>     _______________________________________________
>>     Beowulf mailing list, Beowulf at beowulf.org
>>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>>     To change your subscription (digest mode or unsubscribe) visit
>>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 



More information about the Beowulf mailing list