[Beowulf] Stroustrup regarding multicore

Tue Aug 26 15:48:04 PDT 2008

On Tue, 26 Aug 2008, Perry E. Metzger wrote:

>
> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>> I think that matrix allocation per se is contiguous because of how and
>> where it occurs at the compiler level.  Immediate matrices allocated
>> like a[10][10] are just translated into displacements relative to the
>> stack pointer, are they not?
>
> It depends. You're not guaranteed per se that they'll be allocated on
> the stack, though in practice they are.

Unless they are really large, I imagine.  I'm not sure what the stack
limits are (I don't code to ever stress them) but I'm sure there are
some.

>> Global/external matrices at the a[10][10]
>> are prespecified in the data segment of the program,
>
> Depending on the particular OS. Certainly that's the usual model.

What OS's do it some other way?  I'm just curious.  Is this a kernel
feature or a compiler feature?

>> The only place where it might not be true is when one dynamically
>> allocates memory for a **matrix.  There, if you loop through malloc
>> calls, there is no guarantee that they return contiguous blocks
>
> An vector of pointers to vectors is not a matrix. :)
>
> It acts a bit like one, but that's guaranteed NOT to be contiguous
> allocation because malloc will at the very least add the (invisible to
> the programmer) malloc headers etc.

No, a vector of pointers to addresses in a vector.  To make a 3x3 matrix
**m indexed from -1 to 1 that has actual data in vector *v:

   a) v = (whatever *) malloc(9*sizeof(whatever);
   b) m = (whatever **) malloc(3*sizeof(*whatever)) + 1;
   c) for(i=0;i<3;i++) m[i-1] = &v[i*3 + 1];

(or thereabouts).  m[-1][-1] now dereferences to v[0].  m[0][-1] points
to v[3].  m[1][-1] points to v[6].  m[-1][0] derefs to v[1].  Etc.  If I
did the seat of the pants displacement correctly.

The point is that a) v is a contiguous block of memory.  If one puts
both v and m and index limits in a struct, one can make an entire
"matrix" object a single contiguous block of memory that actually knows
its own limits and permits limits checking with semi-portable routines.
One can now call e.g. an ODE solver that requires a vector argument v
but write the derivs routine using m and completely natural addressing.
And if one wishes, obviously one can do more complicated things to
compute the displacements of the addresses packed into m to do e.g.
triangular arrays.

Is this a pain to set up?  One is basically doing by hand what the
compiler does for you for rectilinear, 0-indexed arrays, so yes.
However, it is often WORTH the pain to save space, to be able to produce
usable code, to be able to create an array in C that can be passed to a
fortran array, although I think macros are likely more useful for that.
In fact, macros are another way to do some of the same things, but not
identical things.

Oh, and free(v), free(m) is all it takes to free the space.  In contrast
to a more usual NR malloc loop, which has to loop frees of each vector
block (with, as you note, its header:-) and then free the ** pointer.

I use/have used both, but generally prefer the vector repacking approach
as it is more obviously efficient.  And yeah, there's little point to
this if all you're doing is allocating a "boring" rectilinear array that
starts at 0, or even 1.  I still often do it for 1 because I can't stand
creating shadow im1 = i-1 variables in loops from 1 to N that contain
lots of complicated algebraic stuff.  As I said, angular momentum table
displacements can get pretty ugly, especially when one gets to 3 pairs
of indices.

    rgb

[My largest piece of fortran, lifetime, was writing a program to check a
physicist's algebraic reduction of a multichannel nuclear scattering
cross-section from its "bare" form -- summing over umpty different
channel angular momenta -- into a "cooked" form that used all the
algebraic reductions of angular momentum tensors into more compact
irreducible forms, which was a sum over a much smaller set of input and
output channel spins of a product of a few 9j symbols, a CG coefficient,
and some phases.  Box-full-of-cards, so to speak.  The "hard way" sums
were over many, many clebsch-gordan (3j) and 6j coefficients, all of
which I had to evaluate with my own code, this being pre-library days at
least if you were poor.

The later stuff with Gaunt numbers was from structure factors in crystal
band structure computations, where Gaunt numbers (triple integrals of
Ylm's) are themselves products/sums of 3j symbols, so I reused some of
the code, and eventually ported it -- twice -- into C.  Actually, it was
a pretty big fortran program too once upon a time -- lots of bessel
functions and integration and coupled ODE solution.]

>
>
>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977