[Beowulf] Using Autoparallel compilers or Multi-Threaded librarieswith MPI

Tom Elken tom.elken at qlogic.com
Fri Dec 14 08:41:15 PST 2007

> -----Original Message-----
> From: Eray Ozkural [mailto:examachine at gmail.com] 
> Sent: Friday, December 14, 2007 2:11 AM
> To: Tom Elken
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Using Autoparallel compilers or 
> Multi-Threaded librarieswith MPI
> On Dec 12, 2007 7:35 PM, Tom Elken <tom.elken at qlogic.com> wrote:
> > Results of the VERY non-scientific survey:
> >
> > # reporting use of Autoparallel features with MPI:          0
> >
> > # reporting use of multi-threaded math libraries with MPI:  1
> >
> Well, then, is there really such a thing that extracts 
> threads from those
> horrible C codes and generates MPI code?

I have heard of SW tools that try to do some of that, but they did not
achieve much commercial success.
But that is not what I meant.

I guess I was relying on memory of readers about my original post about
this subject.  Since that post was way back in November, that was a
dangerous assumption.  Thankfully we have an archive:

'Autoparallel features with MPI' came from this in the original post:
"I was wondering how many people use either auto-parallel compiler
features, or multi-threaded math libraries (Goto, MKL, ACML, etc.) to
provide some thread-level parallelism on a cluster where you primarily
use MPI to achieve your parallel execution.*"

So I meant that the source code is parallelized using MPI.  Then in an
effort to create something like a hybrid MPI/OpenMP program, but without
having to add the OpenMP directives, you use the automatic
parallelization feature of common compilers:
-parallel  in the Intel compiler
-apo       in the PathScale compiler
-Mconcur   in the PGI compiler,  etc.
to find loops which can profitably be parallelized using threads.

Here was the example I mentioned in the original post:
"For example, if an autoparallelizing compiler could find effective
4-way thread-level parallelism in an MPI code and you were running on a
cluster of 8 nodes each with two quad-core CPUs, 64 cores total, you
might choose to run with 16 MPI threads and set your NUM_THREADS
variable to 4, to run with all 64 cores of the cluster executing work
with reasonable efficiency. "

So no one responded that they have done this, let alone finding it to be
faster than running it with purely MPI ranks (no threads).


> Not that I believe 
> it is impossible
> (since I work for a company that does a similar thing) but I 
> would like to know
> which autoparallel MPI code the posters had in mind. Is there 
> a market for
> that kind of a compiler?
> Best,
> -- 
> Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent 
> University, Ankara

More information about the Beowulf mailing list