[Beowulf] Using Autoparallel compilers or Multi-Threaded libraries with MPI
toon.knapen at gmail.com
Fri Nov 30 07:59:22 PST 2007
IMHO the hybris approach (MPI+threads) is interesting in case every
MPI-process has lots of local data.
If you have a cluster of quad-cores, you might either have one process per
node with each process using 4 threads or put one mpi-process per core. The
latter is simpler because it only requires MPI-parallelism but if the code
is memory-bound and every mpi-process has much of the same data, it will be
better to share this common data with all processes on the same cpu and thus
use threads intra-node.
On 11/30/07, Mark Hahn <hahn at mcmaster.ca> wrote:
> > Many problems decompose well in large chunks that are well done
> > with MPI on different nodes, and tight loops that are best done
> > locally with shared memory processors.
> I think the argument against this approach is more based on practice
> than principles. hybrid parallelism certainly is possible, and in
> the most abstract sense makes sense.
> however, it means a lot of extra work, and it's not clear how much
> if you had an existing code or library which very efficiently used threads
> to handle some big chunk of data, it might be quite simple to add some MPI
> to handle big chunks in aggregate. but I imagine that would most likley
> happen if you're already data-parallel, which also means embarassingly
> parallel. for instance, apply this convolution to this giant image -
> it would work, but it's also not very interesting (ie, you ask "then what
> happens to the image? and how much time did we spend getting it
> and collecting the results?")
> for more realistic cases, something like an AMR code, I suspect the code
> would wind up being structured as a thread dedicated to inter data
> interfacing with a thread task queue to deal with the irregularity of
> computations. that's a reasonably complicated piece of code I guess,
> and before undertaking it, you need to ask whether a simpler model of
> just one-mpi-worker-per-processor would get you similar speed but with
> less effort. consider, as well, that if you go to a work-queue for
> bundles of work to threads, you're already doing a kind of message
> if we've learned anything about programming, I think it's that simpler
> mental models are to be desired. not that MPI is ideal, of course!
> just that it's simpler than MPI+OpenMP.
> -mark hahn
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf