[Beowulf] Java vs C++ for interfacing to parallel library
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSun Aug 20 11:25:05 PDT 2006
- Previous message: [Beowulf] Java vs C++ for interfacing to parallel library
- Next message: [Beowulf] Java vs C++ for interfacing to parallel library
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 20 Aug 2006, Joe Landman wrote: > Jonathan: > > Jonathan Ennis-King wrote: >> Does anyone have experience writing parallel Java code (using MPI) with >> calls to C libraries which also use MPI? Is this possible/sensible? Is >> there a big performance hit relative to doing the same in C++? > > Unless all of the important optimizable calculation is done in libraries > that you are stitching together with Java glue, the compiled languages > are likely to be quite a bit faster. > > There is a sizeable abstraction penalty associated with OO languages. > Many of the design patterns that they encourage (object factories, > inheritance chains, etc) are anathema to high performance. Hear, hear! >> I'm considering writing some parallel code to do fluid flow in porous >> media, the heart of which is solving systems of sparse linear equations. >> There are some good libraries in C which provide the parallel solver >> (e.g. PETSC), but I'm trying to resolve which language to use for my >> code. The choice is between C++ and Java, and although I'm favouring >> Java at present, I'm not sure about its performance in this context. > > Hmmm. For this, C or Fortran may be far more appropriate. Depends upon > what it is you want to do with the code. High performance using MPI > depends upon many factors. If there is one particular part of the code > that is better served by an OO based language, then I might suggest > designing/implementing all the speed sensitive bits in a language which > lets you achieve high performance, and then interfacing them to your OO > language so that the OO system isn't being used for the critical time > sensitive portions. <disclaimer>Parts of the stuff below are editorial comment and religious belief and can be ignored or sniffed at by those of differing belief.</disclaimer> Remember well the observation that you can write object oriented code in a procedural language (and ditto, you can write procedural code in an OO language). Matching the language to the kind of code -- or more likely, the personal taste of the coder -- simply makes development a bit more simple and natural. Untimately, OO vs procedural code is a matter of style as much as anything else. I write "real" code exclusively in C. I'm in the process of (re)writing a random number testing program (dieharder) into a library-based tool that was originally (first pass) quite procedural in its design. In the second pass, as I came to fully understand the data objects better in practice and could start to see how the code could be simplified and compressed, I began to introduce a set of "lazy" shared objects for certain parts of the code. In the third (current) pass I'm splitting off all of the actual testing code, as opposed to the startup/results/presentation UI code, into a library. Since most of the tests share a very similar implementation structure and certain control variables in common, I can now see precisely how to make the code very object oriented with a set of "test objects" (structs and similarly structured test implementations that read from them and fill them in) and a single set of "shell" code for calling a standard test. This reduces writing a UI to nothing but simple, repetitive boilerplate for calling the actual tests and displaying the returned results -- one can focus on the human side of the UI and stop worrying about the tests, and one can relatively easily and scalably add more tests or RNGs to test. Since the code is still both lazy OO and C, I can freely intersperse the use of pointers, can choose to treat variables (incluing all structs/objects) as "opaque" or not as makes sense in the code, and keep the code as efficient as C can make it, which is to say damn near as efficient as assembler. The "objectness" of the encapsulated tests just permits me to write a relatively clean API to the library (without too many test specific global/shared variables or the even greater hassle of dealing with passing variable length argument lists through layers of encapsulating subroutines) so that when I'm done adding a UI or GUI or implementing the tests native inside e.g. R or octave or whatever will be fairly straightforward. The point being that one CAN write non-lazy OO code in C or even in Fortran -- that's more a question of program design and an understanding of the basic data objects that a program requires, although it certainly helps if the language permits the definition of a struct of one sort or another. One has the choice in C, though, of writing fully OO, lazy (mixed) OO or fully procedural code when and where that is appropriate for either ease of coding or program efficiency. I suppose that choice exists to some extent for at least some non-fascist OO environments (e.g. C++ as a sort-of superset of C) but I think that the only people who even know how to do so are those who have learned to code in a non-OO language first -- people who learn C++ as their primary language tend to be pretty clueless about pointers or the performance advantages of NOT using protection and inheritance in your structs but just letting everything access them directly. C provides few safety nets but rather permits you to do pretty much anything you like, at your own risk, in code that is ultimately transparent. Now, I personally believe that all nontrivial programs go through stages like the three described above no matter what language they are written in. This is one of the reasons that Wirth's Pascal had its day and that it passed -- whether one starts at the top or at the bottom or both, one is likely to encounter mismatches that require rethinking all or part of the memory hierarchy one begins with in any difficult project. In that SECOND pass and beyond, both strict-topdown and strict-bottomup languages tend to require MORE work to fix than one that is less hierarchically prestructured. Perhaps there are OO ubercoders that can just "see" what the data objects appropriate to a complex application are from the beginning and can start off with the right top level, mid, AND bottom level objects all perfectly enmeshed and integrated but I have yet to meet one. One of the great (IMO) illusions promoted by OO fanatics is that by using an OO language (per se) to write the code in the first place one can somehow shorten this process and home in on the correct hierarchy of data structures (objects or not) that optimally support the application's efficient implementation from top to bottom. This is not my experience, but hey, the world is a big place and there may be people who just think that way and for them it may be true. For code like the specific stuff you want to implement above that have efficient libraries written in C, my guess is that you would do best using C -- this is pretty much a no-brainer. It is highly probable that in C you have the best access to example programs using the library, UIs, human support in the form of others who use the libraries in their C code, and more. Even communicating with the author/maintainers of the library is bound to be simplest if you are implementing in C. Second best would almost certainly be C++, as C++ can (I believe) call C libraries fairly transparently or with a minimal C++ encapsulation of the C prototypes and data structures. OTOH Fortran and C tend to have somewhat different subroutine call mechanisms so binding a C library into fortran code or VV tends to be a PITA -- for example, C always passes subroutine arguments by value, fortran by reference. In addition, C and fortran use slightly different conventions for other simple stuff e.g. terminating a string. Some of the issues associated with the port are mentioned here: http://star-www.rl.ac.uk/star/dvi/sun209.htx/node4.html as well as elsewhere on the web. Basically, calling C libraries in fortran code is possible but requires some work and code encapsulation (and vice versa for calling fortran routines from inside C code, IIRC -- fortran/C compiler folks can check me on this:-). Java, octave, matlab, python, perl etc. are MUCH WORSE in this regard. All require NONTRIVIAL encapsulation of the library into the interactive environment. I have never done an actual encapsulation into any of them, but I'll wager that it is really quite difficult because each of them has their very own internal data types that are REALLY opaque objects that bear little overt resemblance to the simple "all data objects can be viewed as a projection onto a block of memory with either typed or pointer driven offset arithmetic" view of data in C or for that matter C++ or Fortran (with slighly different projective views in both cases). These languages typically permit you to allocate memory by just using a named variable. This is marvelously convenient for an interactive environment -- it is marvelously expensive in terms of program efficiency because the underlying environment has to manage allocating the memory transparently extensibly (most of the languages permit you to allocate whole vectors or matrices of variables by just referencing them), tracking instances of the memory in code, and freeing the memory when it is no longer referenced or being used. Conservatively, so that they tend to keep things if there is ANY CHANCE of their ever being referenced, making them typically memory hogs almost as bad as a C program would be if every memory reference in the program was to static global memory -- no memory allocation or freeing at all, beyond whatever goes on stack/heap in the course of subroutine calls or internal function execution. Complicated hashes or advanced list structures are used to keep the execution itself moderately efficient (but highly INefficient compared to a decent compiler with flat memory outlays). The point being that you have to interface these opaque and not obviously documented data types to the C library calls. This is surely possible -- it is how all those perl libraries, matlab toolboxes, java interfaces come about. It will probably require that you learn WAY more about how the language itself is implemented at the source level than you are likely to want to know, and it is probably not going to be terribly easy... rgb > >> >> >> Jonathan Ennis-King >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Java vs C++ for interfacing to parallel library
- Next message: [Beowulf] Java vs C++ for interfacing to parallel library
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
