[Beowulf] Java vs C++ for interfacing to parallel library

Mon Aug 21 21:59:04 PDT 2006

On Mon, 21 Aug 2006, Ed Hill wrote:

> I want to be able to write codes that can make use of the vast existing
> C and C++ libraries for, say, I/O or computational geometry or "systems"
> type programming while simultaneously using existing Fortran routines
> for building and integrating big systems of equations derived from PDEs,
> ODEs, etc.  And, if at all possible, I'd like to do it without having to
> manually flatten and then re-assemble every single bit of information to
> some sort of least common denominator pointer-to-array-of syntax when I
> pass between the two.
>
> But maybe thats too much to hope for...

Probably.  Remember, the whole MANTRA of the object oriented programmers
has always been that the data description of the problem is paramount,
not the code per se.  To be able to write truly efficient libraries
without tuning them to the way a given compiler manages its memory is a
fond dream, but is it really possible?  Even within ONE language,
performance can vary with things like just where and how a data object
is allocated, and there may be many ways to remove the pelt of a feline.

To use one of my favorite examples (from real life) ODE solvers almost
invariably want to be given vectors -- a vector of deriviatives, a
vector of initial values and results (and maybe a jacobean or other
stuff in a matrix form, maybe various other parameters).  Yet in many
PROBLEMS, the actual ODEs are themselves laid out in an array, or even a
high dimensional tensor!  Even with a single language, one is often
stuck doing horrible things dereferencing components of a
multidimensional matrix out of the underlying vector memory block,
presuming that one took care to allocate the actual memory for the
matrix AS a contiguous block.  And the matrix has to be rectilinear at
that or things get even worse, and even the rectilinear array often has
to be referenced with offsets -- inserting Y_{lm}'s into an array for
example (with those pesky negative indices).

The result is often extremely ugly code, since even within a SINGLE
language one is often forced to write code that is some sort of mangling
of a pointer/vector to and from an array form all because the basic tool
(ODE solver) requires a vector, period but the data itself is not in
vector form.

Within C this has at least one extremely elegant solution.  One can
build a struct or array with precisely the shape of the actual data,
non-rectilinear, arbitrary initial offset, with ELEMENTS that are even
themselves structs.  One can allocate the entire data object as a single
contiguous block of memory (a vector) and do your oddball addressing
arithmetic ONCE to pack the indices of the right offsets into the vector
into a suitable ***pointer.  You can then freely pass an ODE solver the
address of the vector, but write the derivative evaluator so that it
uses the packed pointer and hence dereference the array for the purposes
of doing algebra-based arithmetic in COMPLETELY NATURAL notation.

In at least the fortran I learned in my youth, this has no solution but
the ugly one.  Matrices start at index 1, period, and are rectilinear --
the hell with what your data actually does.  Can't check bounds or
optimize if you break the rule, so instead you have to write ugly and
illegible code when your data (as not infrequently is the case) does not
oblige.  Seriously illegible. Anybody who has looked at the massive
block of legacy fortran knows that this kind of thing IS almost routine
-- fortran programmers have gotten really good at visually interpreting
MYVECTOR[I*(JMAX-1) + J] or the like (sometimes in three or four
dimensions, sometimes with IMIN and JMIN offsets to permit negative I,J
indices) over and over in their code.  Them bounds are perfectly good,
but only at the expense of code readability, long term maintainability,
elegance of the implementation, and so much more.

Within C++ you CAN implement the C solution, but only if you've learned
to use pointers, and in C++ pointer use is anathema (from what I
understand from what my students tell me was covered in classes that
taught C++ back before java was invented:-) at least if you didn't learn
C first.  In C pointers take a long time to learn to use correctly; in
C++ courses they often openly discourage you from using them at all for
anything but trivial purposes, e.g. argument passing in subroutines and
the like.  I don't know how a C++ programmer would solve this problem
and pack a Y_lm -indexed object efficiently so that it could both be
passed to a vector-hungry ODE solver and still be addressable as y[l][m]
for m \in -l,l, l \in 0,lmax.  Without pointers, that is, the Sufi way.
Allocating a recursive set of "objects" each of which is a vector row?
But then how does one guarantee that the data itself is in a vector?

Each language can do it, but each language is likely to do even this
simple task with a ubiquitously necessary tool differently (and may even
be able to do it more than one way within the languages themselves).

Top this off with the often minor but nevertheless annoying real
differences between data conventions and subroutine/stack/heap
manipulation -- e.g. slightly different conventions for the way one
specifies and terminates a string.  Pass by reference vs pass by value.
A language where protection/inheritance mean something vs two where it
doesn't and there is no way to enforce an opaque data type (and often no
INCENTIVE to enforce an opaque data type).  Two languages where structs
are a commonplace thing (in one THE commonplace thing) vs one one where
most legacy code, at least, comes from an era where there was no such
thing as a struct whatever there may be now.

And then there is I/O.  Systems calls.  Linker options.  Preprocessors.
Initialization.  Ouch.

C comes close to being common ground between these three at least, IF
you program in really ugly, restricted C and accept that the C++ port
will both reflect the need to maintain a fortran version (and will still
be ugly) and will obviously not use any of the C++ isms -- just use the
C++ as a hopefully adequate superset of C and forget the rest.  At
least, it is probably less work to wrap a C library (of any sort) into
either a fortran or C++ port than it is to port either fortran or C++
libraries to where they'll be all happy in anything else -- C, C++, or
fortran.

I say this on the basis of having ported a fair chunk of fortran to C in
my day, and having found that you really can't -- you can maybe preserve
the algorithms but otherwise you just have to start over.  C++ to C is
almost as bad, because for all their similarities the resulting code is
just -- different.  Just understanding and porting I/O statements (which
is actually quite a lot of many programs) is a PITA, and of course C++
just doesn't use pointers where many C programmers would be using them
all the time as it would defeat the strong type checking and so on that
are features of the language.  Not so much work with C++ as I just
didn't like it when I tried it and there is a relatively small code
base to steal from (although I have snitched i.e. a multidimensional
quadrature routine and backported it -- painfully to C).

So yeah, I think it is too much to hope for, for anything but very
simple stuff.  C and C++ can and do share at least some code base and
libraries, but either one shares little with Fortran, which is really a
pretty alien language in comparison although they've steadily moved the
compiler standard closer and closer to C as fortran coders demand some
of the amenities available in C.

Perhaps in another decade they'll actually meet somewhere, have a drink
together, and in a night of illicit love spawn a new language called
Cortran, or perhaps fortraC, that groks both printf and hollerith, that
has a binary exponentiation operator (the one feature of Fortran that I
miss, actually:-), that can both do strongly typed arrays -- with
arbitrary index offsets -- but can also manage structs and pointers.
Talk about inheritance -- or the need for those compilers to use
protection, no matter how close they may get...;-)

   rgb

>
> Ed
>
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu