[Beowulf] likwid vs stream (after HPCG discussion)

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Mon Mar 21 17:39:04 UTC 2022

Dear all,

reading through this made me realise how difficult things are these days. 
If you have a HPC cluster for just a few applications, you find it probably 
easier to build the software and replace the hardware as you really could go 
down to the assembler level and have really highly optimised code and hardware 
for exactly these few jobs.
If, as we are, you have to support anything from a single core job to say a 
few hundred cores, MPI and threaded jobs, memory bandwidth extensive to heave 
IO ones, all of that goes out the chimney. I reckon it will be very difficult 
to then find the only 'right' benchmark for you as your applications vary so 
much. So the trick is probably to find the sweet spot for your cluster, which 
might be a different setup for other sites. 

As always, thanks for sharing your thoughts. 

All the best from a sunny and mild London


Am Montag, 21. März 2022, 09:46:31 GMT schrieb Mikhail Kuzminsky:
> In message from Scott Atchley <e.scott.atchley at gmail.com> (Sun, 20 Mar
> 2022 14:52:10 -0400):
> > On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky <kus at free.net>
> >
> >wrote:
> >> If so, it turns out that for the HPC user, stream gives a more
> >> important estimate - the application is translated by the compiler
> >> (they do not write in assembler - except for modules from
> >>
> >>mathematical
> >>
> >> libraries), and stream will give a real estimate of what will be
> >> received in the application.
> > 
> > When vendors advertise STREAM results, they compile the application
> >
> >with
> >
> > non-temporal loads and stores. This means that all memory accesses
> >
> >bypass
> >
> > the processor's caches. If your application of interest does a random
> >
> >walk
> >
> > through memory and there is neither temporal or spatial locality,
> >
> >then
> >
> > using non-temporal loads and stores makes sense and STREAM
> >
> >irrelevant.
> STREAM is not initially oriented to random access to memory. In this
> case, memory latencies are important, and it makes more sense to get a
> bandwidth estimate in the mega-sweep
> (https://github.com/UK-MAC/mega-stream).
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

More information about the Beowulf mailing list