[Beowulf] bring back 2012?

Wed Aug 17 09:05:12 PDT 2016

On 08/17/2016 11:50 AM, Kilian Cavalotti wrote:
> On Wed, Aug 17, 2016 at 7:10 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>> When Intel first started marketing the Xeon Phi, they emphasized that you
>> wouldn't need to rewrite your code to use the Xeon Phi. This was a marketing
>> moving to differentiate the Xeon Phi from the NVIDIA CUDA processors. That
>> may have been a true statement, but it didn't mention anything about
>> performance of that existing code, and was, frankly, very misleading. The
>> truth is, if you don't rewrite your code, you're not going to see much
>> (relatively speaking) of a performance improvement, and when you do rewrite
>> your code to optimize it for the Xeon Phi, you'll also see amazing speed ups
>> on regular Xeon processors.

Well, this is generally true of every "new" architecture.  You don't 
*need* to rewrite your code.  It will run.  Just not as well as if you 
took the time to optimize it carefully.

>> I've seen several presentations where speed ups of 5x, 10x, etc., on regular
>> Xeons just through optimizing the code to be more thread- and vector-
>> friendly. Some improvements were so significant, they make you ask if the
>> Xeon Phi was even needed. [...]

Been setting up a Phi environment for a customer on one of our machines. 
  Its non-trivial to get it to a workable/realistic state.  This and the 
effort required to get good performance out of this system (12GB ram is 
not large for many codes) ... means that this is more of an experimental 
platform for them.

>> If you pay attention to Intel's marketing and the industry news the past
>> couple of years, you will have noticed that Intel has been promoting "code
>> modernization" efforts, saying all codes need to be modernized to take
>> advantage newer processors, while that is certainly true, "code
>> modernization" is just a euphemism for "rewrite your code". This is Intel
>> backpedaling on their earlier statements that you don't need to rewrite your
>> code to take advantage of a Xeon Phi, without actually admitting it.
>
> Can't agree more, very well described.

Generally, algorithm shifts, better coding, better code/memory layout 
will buy you quite a bit.  You will lose quite a bit if you go crazy 
with pointer chasing code, deep object factories, and other 
(anti)patterns for performance.

About a decade ago, I hacked on HMMer and rewrote one of the core 
expensive routines in a simpler manner for a 2x immediate improvement in 
30 lines of C or so.  It did require a rethinking of a few aspects of 
the code, but generally, keeping performance in mind when coding 
requires thinking about how the calculation will flow, and what will 
impede it.

For massively parallel compute engines like Phi/KNL, memory bandwidth 
(keeping the cores fed) will be a problem.  So you code assuming that is 
the resource that you need to manage more carefully.

And at a higher level, industry/technology changes over time which 
renders code that runs quickly on one platform, running slowly on some 
future platform as the platform design tradeoffs are different.

I hate to say "back in the day", but the first major computing project I 
worked on in the (mumble)1980s(mumble), ram was the expensive resource 
and disk and CPU were relatively "free".  So while working on our large 
simulation code (Fortran of course), we spilled our matrix to disk, as 
we didn't have enough ram to keep all of it in memory.  It made our 
matrix multiple ... interesting ... but it worked.  These days, spilling 
to disk (or swap) is a really bad idea ... just build an MPI version 
that can keep the whole thing in RAM with enough nodes, or 
partition/shard your data so that you can do parallel IO to/from a big 
fast parallel storage platform.

Its all the same issues (optimizing to use as little of the expensive to 
use resource as you can), but the expensive resource keeps changing. 
The trick Intel and the other folks are trying to make work, is to make 
initial use of the platform as easy as possible.  So your code "just 
works".  But if you really want to take advantage of the resource, you 
need to understand the expensive and "free" aspects of the platform, in 
order to optimize to exploit the free and manage the expensive.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615