[Beowulf] Stroustrup regarding multicore

Tue Aug 26 13:33:11 PDT 2008

On Tue, 26 Aug 2008, Perry E. Metzger wrote:

>> Perhaps, but don't most C programmers allocate such an array as a single
>> vector and then repack the indices?
>
> I've never seen anyone allocate "as a single vector and repack the
> indices", though I'm sure that a counterexample exists in someone's
> code out there somewhere. In any case, one has no need to do such a
> thing.

I find that hard to believe, in both cases.  I and plenty of others do
it all the time, and of course there is need.  Just (perhaps) not in
your code.  Although maybe I shouldn't have said "most C programmers" --
I meant "in the context of C programmers who are writing fortran-like
numerical code for e.g. physics".  In systems code or text processing
code I could easily believe that it is quite rare.

In physics it is not.

As a single example, if one is writing physics code summing matrices
with angular momentum indices, the natural range is l = 0,1,2,3...; m =
-l,-l+1,-l+2...0,...l-2,l-1,l.  The array is triangular and the indices
don't start at 0.  I have plenty of code in force that sums expressions
involving the Gaunt numbers, which tabulate with three PAIRS of angular
momentum indices that each run over this sort of range, often
asymmetrically (since the angular momentum indices must triangle and
because there are symmetry relations, one only needs to allocate enough
space so l1_max + l2_max = l3_max).  Clebsch-Gordan coefficients, 6j
symbols, and 9j symbols have similar indexing issues (as bad or worse).

One CAN, of course, solve this problem several ways in C.

    a) Allocate a vector large enough to exactly hold the whole thing and
use complicated pointer arithmetic to compute the displacements of each
element (incorporated into a macro or whatever).
    b) Allocate a matrix[lmax][2*lmax+1] and waste roughly half the space
(or worse, for e.g. Gaunt numbers) and use a bit LESS complicated
pointer arithmetic to deal with the m displacement.  This is the Fortran
way, as a general rule, although to save space in the old days I used to
use method a) in Fortran as well.  UGLY code, very difficult to debug.
Nowadays space on an angular momentum index scale is largely a "who
cares" sort of thing, but it was not always so and my first efforts in C
were on machines with 640K of user space total (with much less than that
available for actual data) under DOS, where I really truly cared.
    c) malloc a vector large enough to exactly hold the whole thing,
allocate a **matrix and malloc a set of addresses of length lmax+1 into
it, then fill it with the shifted displacements of the m-indexed
subvectors, per l.  A tiny bit slower, perhaps (although not usually a
whole lot slower) but totally easy to read.

In the first case one accesses any element with a truly horrible
expression (I once upon a time or three have coded it, because I solved
sets of coupled ODEs with angular momentum indices for years and ODE
solvers invariably want vector arguments, not **pointers).  In the
second case one access a somewhat ugly expression and wastes a lot of
space.  In the third case one can pass the ODE solver the vector it so
passionately wants but write the derivs routine in a perfectly literal
translation of the algebraic formula, summing l and m over their natural
ranges in all nested loops.

Then there is the general problem of dynamic allocation of arrays, which
I thought was nearly always done according to either c) a loop of
mallocs, but malloc is expensive and there IS no guarantee that
successive mallocs in a program will return contiguous blocks so if you
want to ensure that a dynamic matrix is a single block of continuous
memory internally you have to allocate it all at once and put the
appropriate addresses in a suitable **pointer.

And finally there is the convolution of this problem with structs,
especially dynamically allocated structs with unusual index ranges.

Sure, these tricks aren't needed in all C programs -- plenty of them can
do fine with just:

   double x[10],y[10][10],z[10]

and C programmers quickly grow used to messing with the fact that maybe
half of all loops they might want to program "naturally" run from 1 to N
and not 0 to N-1.  But a lot of numerical C programmers simply write a
matrix allocate subprogram (not unlike the one(s) in Numerical Recipes)
that lets them dynamically allocate *x,**y,***z and run the loop indices
from whatever to whatever for rectilinear allocation, or (in my case)
write a somewhat more complex allocator for non-rectilinear allocation.

If one has to do a lot of algebra to convert an complicated algebraic
expression from its natural form in the literature to a displaced-by-one
(or displaced by l+1) form, it's a lot simpler and more reliable to do
this than it is to write an expression in code that CANNOT be compared
to the expression being encoded in the literature without a half-page of
algebra rederiving its C-indexed form.

> (This is not to say that when one calls malloc, if you're calling
> malloc to allocate an array, that you don't pass it a single size_t
> indicating what you're looking for, but that's a different issue.)

How?  Am I missing something?  That's just what I said.  A common
practice (nearly universal in dynamically allocated arrays that you want
to be in contiguous memory) is to malloc a single size_t.  SOME
programmers then access the array using a macro or displacement
arithmetic, but one can just transform it into a directly addressable
array, and in the process free oneself from the indices-start-at-0 and
things-must-be-rectilinear rules.

     rgb

>
> Perry
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977