[Beowulf] Java vs C++ for interfacing to parallel library

Robert G. Brown rgb at phy.duke.edu
Mon Aug 21 20:54:40 PDT 2006


On Mon, 21 Aug 2006, Joe Landman wrote:

> I took a simple GSL program I used to introduce students to GSL, that
> was a modified example from one of the GSL example files.  Basically a
> little Hooke's law bit to use as input to an LU solver.  Really short
> GSL program.

Joe,

Since you clearly have time on your hands and a mission (a GOOD thing,
mind you:-) you might try SWIG (link on the previous post).  It looked
to be relatively easy to use, and MIGHT manage heavy lifting.  If you
really want to try to hook the GSL into perl, that is (an interesting
proposition, actually, but I think a pretty major project and an
associated long-term maintenance problem unless you want to port it and
forget it).

> What I discovered is that it doesn't take much to make this not work.
> :(  Specifically passing arrays and vectors back and forth between
> C/Perl is hard (IMO).

Well, remember that in the GSL arrays and vectors and so on are
themselves often, nay usually, structs and not just flat memory blocks
packed into standard **...pointer sets.  I actual worked through them
all once upon a time when trying to develop and extension of the GSL
"matrix" type called "tensor" -- IIRC a matrix in the GSL is a block
(itself a structured type) with additional metadata, and is something of
an opaque data type where you have to (or rather, given that it is C,
are "supposed to") access components via get/set functions from the
OUTside.  It wasn't terribly extensible to a generalized **..***tensor
type of even moderate dimension, so I redid a lot of the basic pieces to
flatten them out and make them a bit less objectoid and end up with a
gsl-ish tensor type up to the 8th or 9th rank.  Alas, the change would
have broken too many functions (I guess) and was not warmly embraced, so
the GSL still doesn't have a proper tensor beyond second rank.

The real question about SWIG and friends is where one can draw the line.
Any decent library has an API and its own data structs and macros and
prototypes and so on.  Most of these types are available to the end user
only via #include files -- the library itself is simply relocatable code
designed to pull those objects out of the right places when routines
within it are called.  An encapsulation program has to be ALMOST as
smart as a compiler/linker, then -- it has to be able to parse out
arbitrary data types from C source and map them not-too-badly into perl
compatible memory constructs.  This is NOT necessarily easy, since I
might well have something like:

  typedef struct {
     int xdim;
     int ydim;
     int zdim;
     double ***tensor;
  } MyTensor;

and might want to encapsulate:

  MyTensor *newtensor(int xdim; int ydim; int zdim)
  {
    int i,j;
    MyTensor *tensor;

    tensor = (MyTensor *)malloc(sizeof(MyTensor));

    tensor->xdim = xdim;
    tensor->tensor = (double ***)malloc(xdim*sizeof(double **));
    for(i=0;i<xdim;i++){
      tensor->tensor[i] = (double **)malloc(ydim*sizeof(double *));
      for(j=0;j<ydim;j++){
        tensor->tensor[i][j] = (double *)malloc(zdim*sizeof(double));
      }
    }
    return(tensor);
  }

(with no error checking or initialization yet, of course).  This is an
obviously useful construct, although I would generally allocate all the
actual memory for the tensor in a single block and do displacement
arithmetic to pack the pointers into tensor->tensor[i][j] instead of the
last malloc to ensure a contiguous block and retain the ability to pass
the entire array as a (void *) pointer to a block of data.

What is SWIG/perl supposed to do with this?  What would perl do if I
allocated a VECTOR of MyTensor **tensors so that dereferencing a
particular element would look like tensor[i]->tensor[j][k][l] with loops
that ran from 0 to tensor[i]->xdim etc.?  If this isn't bad enough, what
would perl do with a linked list (another common enough construct),
especially a SPECIALIZED linked list whose members contained entire data
structs?

Could be it would do exactly the right thing, but even guessing what
that thing might be requires a bit of knowledge about perl's internals.
A really good encapsulation has to be as smart as most compilers, OR the
encapsulating programmer has to be even smarter.  This doesn't mean that
it cannot be done and that some people aren't that smart.  It's just
that one has to REALLY want to do it to make it worthwhile.

>
> Since I don't do this on a regular basis, this isn't so bad.  Also there
> is this PDL thing
> (http://search.cpan.org/~csoe/PDL-2.4.3/Basic/Pod/Impatient.pod) which
> doesn't look so bad, but it still doesn't solve the issues I want solved.
>
>>> Only when you have some ... odd ... structures or objects passing back
>>> and forth which require a bit more work.
>>
>> What's an odd structure?
>
> As I have discovered ... odd structures are arrays ... and anything more
> complex :(
>
> [...]
>
>>> Python has similar facilities.  Generally speaking the dynamic
>>> languanges (Perl, Python, Ruby) are pretty easy to wrap around things
>>> and link with other stuff, as long as the API/data structures are pretty
>>> clean.
>>
>> Ay, that's the rub...;-) That and what you consider "pretty easy"...:-)
>
> Less time than I spent on this so far !

<A_C_love_story>

O-ohh say, can you C?

One of the (many) things that I truly love about C is that one retains
precise control over how memory is used in a well-written C program.
Nothing opaque about it, really.  I can close my eyes and visualize just
how the block of memory representing MyTensor above looks, for example.
I know just what the offsets are (in absolute void * terms) from the
starting address of a MyTensor object to any of its contents, and CAN
arrange for a MyTensor object and all its contents to definitely be
allocated as a single continguous block of memory to avoid
performance-sapping fragmentation without "hoping" that the compiler and
kernel will provide it for me.

With that degree of control I can then do very specific things with loop
blocking relative to cache size or I can create data objects that are
like nothing any fortran programmer ever dreamed of but that perfectly
describe the actual optimal data objects of the task at hand.  Using
void types I can move anonymous blocks of memory around as I need to and
defeat the well-meant but sometimes stultifying "rules" associated with
manipulating ordinary typed variables.

Finally, I can (given all of this information and control) CHOOSE to
treat the resulting object as an opaque/protected data type and only
create, destroy, or access contents of the object via provided
functions, I can CHOOSE to create structs with members that are structs,
I can CHOOSE to closely emulate C++ programming style or even Fortran
programming style (if I want to treat the C compiler as if it had
recently had a lobotomy:-).  This isn't intended to disrespect those
other compilers -- they are both have virtues that make them desireable
to at least certain kinds of programmers for certain kinds of programs.
They just don't generally give you the same degree of control, at least
if you use them the way God intended and don't defeat their strong
typing and error checking.

With assembler I would have -- slightly -- more precise control.  But
not much.  Not much.

Yet C is undeniably an structured language with a full complement of
upper-level amenities.  I've coded in C almost exclusively (as far as
compilers go) since roughly the mid-80's -- about 20 years now -- and I
STILL am learning new, sometimes amazing things about the language or
tricks one can use to write ever more elegant and maintainable code.  C
is scary powerful (and scary dangerous in the sense that you can perhaps
get yourself in more incomprehensible trouble with C than in many
languages if you program with anything less than iron discipline).

Perhaps the danger excites me.  Perhaps mastering the tool in spite of
the danger does -- I rarely indeed find myself in that kind of
difficulty because instead I chose the discipline.

To me one of the great miracles of life is that somebody could actually
write a few nontrivial programs in C and not also fall in love with it.
Oh, it still isn't a "universal" language -- there is no doubt that
certain tasks are better suited for perl (or python, if you are lover of
the snake:-) or PHP or even bash.  For compiled general-purpose code,
though, it reigns supreme, and I see little advantage in Fortran EXCEPT
perhaps better compiler optimization as a payoff for working within its
restricted traditional data types.

So for me, porting C routines to other environments is ALMOST pointless.
Oh, I can see where octave (for example) is useful in precisely the same
sense that mopeds are useful -- an adequate way to get around for
somebody who hasn't far to go and doesn't want to learn to drive, say,
an F16.  Good even for F16 pilots to go to the store, if the store is
close by and doesn't have a big landing strip.  The metaphor gets
strained, of course, rather quickly.  But the point still stands.  C
rocks.

<\A_C_love_story>

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list