[Beowulf] Using Autoparallel compilers or Multi-Threaded
tom.elken at qlogic.com
Fri Dec 14 08:41:15 PST 2007
> -----Original Message-----
> From: Eray Ozkural [mailto:examachine at gmail.com]
> Sent: Friday, December 14, 2007 2:11 AM
> To: Tom Elken
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Using Autoparallel compilers or
> Multi-Threaded librarieswith MPI
> On Dec 12, 2007 7:35 PM, Tom Elken <tom.elken at qlogic.com> wrote:
> > Results of the VERY non-scientific survey:
> > # reporting use of Autoparallel features with MPI: 0
> > # reporting use of multi-threaded math libraries with MPI: 1
> Well, then, is there really such a thing that extracts
> threads from those
> horrible C codes and generates MPI code?
I have heard of SW tools that try to do some of that, but they did not
achieve much commercial success.
But that is not what I meant.
I guess I was relying on memory of readers about my original post about
this subject. Since that post was way back in November, that was a
dangerous assumption. Thankfully we have an archive:
'Autoparallel features with MPI' came from this in the original post:
"I was wondering how many people use either auto-parallel compiler
features, or multi-threaded math libraries (Goto, MKL, ACML, etc.) to
provide some thread-level parallelism on a cluster where you primarily
use MPI to achieve your parallel execution.*"
So I meant that the source code is parallelized using MPI. Then in an
effort to create something like a hybrid MPI/OpenMP program, but without
having to add the OpenMP directives, you use the automatic
parallelization feature of common compilers:
-parallel in the Intel compiler
-apo in the PathScale compiler
-Mconcur in the PGI compiler, etc.
to find loops which can profitably be parallelized using threads.
Here was the example I mentioned in the original post:
"For example, if an autoparallelizing compiler could find effective
4-way thread-level parallelism in an MPI code and you were running on a
cluster of 8 nodes each with two quad-core CPUs, 64 cores total, you
might choose to run with 16 MPI threads and set your NUM_THREADS
variable to 4, to run with all 64 cores of the cluster executing work
with reasonable efficiency. "
So no one responded that they have done this, let alone finding it to be
faster than running it with purely MPI ranks (no threads).
> Not that I believe
> it is impossible
> (since I work for a company that does a similar thing) but I
> would like to know
> which autoparallel MPI code the posters had in mind. Is there
> a market for
> that kind of a compiler?
> Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent
> University, Ankara
More information about the Beowulf