Parallelization check list

Thu Jun 29 12:09:39 PDT 2000

On Thu, 29 Jun 2000, Gallip William C PHDN wrote:

> We are starting to put together a check-list of things to consider when
> taking a serial code and determining its parallelizability, either for a
> distributed memory and/or SMP environment.  Has anyone already gone through
> a similar exercise and if so could you share the results?  Any help would be
> greatly appreciated.
> 
> Bill Gallip
> Fleet Advanced Supercomputing Technology Center
> NAVSEA Dam Neck
> Virginia Beach, VA

Not exactly a checklist, but look at the talks on

   http://www.phy.duke.edu/brahma

The one presented at last years Extreme Linux Tutorial at Linux Expo is
largely a case study of this very thing.  The other talks (and the draft
book) explain the underlying math, I hope fairly clearly.  I'm working
on a paper that will describe the application of various measurement
tools to the process, to make it at least approximately quantitative
instead of qualitative.

However, be aware that your checklist will at best be an approximate
answer or guideline, not a deterministic protocol.  That is because the
parallelizability depends on algorithm, and the algorithms used in the
serial code may NOT be efficiently parallelizable.  Parallelizability
also depends (deeply) on the design of the parallel environment you're
talking about, which can vary considerably, the size of the problem, and
more.  The only way to fully answer the question for at least some
problems is to do some deep, fully informed research.  There is a
references in the draft book manuscript to a URL for an online book on
parallel algorithms and program design and there are a number of
well-known books that one can buy via a bookseller as well.

The best checklist would likely be one that could be used to select
projects that are definitely parallelizable (and possibly match them
with appropriate parallel architectures) but NOT reject ones as
unparallelizable except at the functional level (the task itself simply
has no usefully parallelizable component and fails the basic Amdahl's
Law type test before you even start).  Instead refer marginal rejects to
a deeper learning and research process.  Sometimes a complete rewrite of
a program with totally different algorithms will make it run very
efficiently at certain (desirable) scales on certain parallel
architectures that you can sometimes afford.  It's like that...;-).

Also remain aware (as the talk indicates) that scaling is everything
when determining parallelization.  Applying a given program to a "small"
task (e.g. a small lattice) may yield little parallel speedup or even a
negative one (it can easily run more slowly!). Applying the SAME PROGRAM
to a "large" task (e.g. a big lattice) may yield a big speedup.  You
have to understand speedup scaling relations with respect to at least
two dimensions (number of nodes and "size" of problem) in interaction
with your hardware layout and its fundamental topology, rates, latencies
and bandwidths (which can vary considerably) to determine whether or not
a program could be parallelized.  It's not generally a binary decision,
and the decision space is generally a lot bigger than even two
dimensional and very complex.

Still, good luck.

    rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu