[Beowulf] AMD and AVX512

Mon Jun 21 17:11:37 UTC 2021

Dear all

> System architecture has been a problem ... making a processing unit
> 10-100x as fast as its support components means you have to code with
> that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily
> generate optimal code for the system ( but I swear ... -O3 ... it says
> it on the package!)

From a computational Chemist perspective I agree. In an ideal world, you want 
to get the right hardware for the program you want to use. Some of the code is 
running entirely in memory, others is using disc space for offloading files. 

This is, in my humble opinion, also the big problem CPUs are facing. They are 
build to tackle all possible scenarios, from simple integer to floating point, 
from in-memory to disc I/O. In some respect it would have been better to stick 
with a separate math unit which then could be selected according to your 
workload you want to run on that server. I guess this is where the GPUs are 
trying to fit in here, or maybe ARM. 

I also agree with the compiler "problem". If you are starting to push some 
compilers too much, the code is running very fast but the results are simply 
wrong. Again, in an ideal world we have a compiler for the job for the given 
hardware which also depends on the job you want to run. 

The problem here is not: is that possible, the problem is more: how much does 
it cost? From what I understand, some big server farms are actually not using 
commodity HPC stuff but they are designing what they need themselves. 

Maybe the whole climate problem will finally push HPC into the more bespoken 
system where the components are fit for the job in question, say weather 
modeling for example, simply as that would be more energy efficient and 
faster. 
Before somebody comes along with: but but but it costs! Think about how much 
money is being spent simply to kill people, or at other wasteful project like 
Brexit etc. 

My 2 shillings for what it is worth! :D

Jörg

Am Montag, 21. Juni 2021, 14:46:30 BST schrieb Joe Landman:
> On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> > I have followed this thinking "square peg, round hole."
> > You have got it again, Joe. Compilers are your problem.
> 
> Erp ... did I mess up again?
> 
> System architecture has been a problem ... making a processing unit
> 10-100x as fast as its support components means you have to code with
> that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily
> generate optimal code for the system ( but I swear ... -O3 ... it says
> it on the package!)
> 
> Way back at Scalable, our secret sauce was largely increasing IO
> bandwidth and lowering IO latency while coupling computing more tightly
> to this massive IO/network pipe set, combined with intelligence in the
> kernel on how to better use the resources.  It was simply a better
> architecture.  We used the same CPUs.  We simply exploited the design
> better.
> 
> End result was codes that ran on our systems with off-cpu work (storage,
> networking, etc.) could push our systems far harder than competitors. 
> And you didn't have to use a different ISA to get these benefits.  No
> recompilation needed, though we did show the folks who were interested,
> how to get even better performance.
> 
> Architecture matters, as does implementation of that architecture. 
> There are costs to every decision within an architecture.  For AVX512,
> along comes lots of other baggage associated with downclocking, etc. 
> You have to do a cost-benefit analysis on whether or not it is worth
> paying for that baggage, with the benefits you get from doing so.  Some
> folks have made that decision towards AVX512, and have been enjoying the
> benefits of doing so (e.g. willing to pay the costs).  For the general
> audience, these costs represent a (significant) hurdle one must overcome.
> 
> Here's where awesome compiler support would help.  FWIW, gcc isn't that
> great a compiler.  Its not performance minded for HPC. Its a reasonable
> general purpose standards compliant (for some subset of standards)
> compilation system.  LLVM is IMO a better compiler system, and its
> clang/flang are developing nicely, albeit still not really HPC focused. 
> Then you have variants built on that.  Like the Cray compiler, Nvidia
> compiler and AMD compiler. These are HPC focused, and actually do quite
> well with some codes (though the AMD version lags the Cray and Nvidia
> compilers). You've got the Intel compiler, which would be a good general
> compiler if it wasn't more of a marketing vehicle for Intel processors
> and their features (hey you got an AMD chip?  you will take the slowest
> code path even if you support the features needed for the high
> performance code path).
> 
> Maybe, someday, we'll get a great HPC compiler for C/Fortran.