[Beowulf] programming multicore clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Toon Knapen toon.knapen at fft.beFri Jun 15 06:46:19 PDT 2007
- Previous message: [Beowulf] programming multicore clusters
- Next message: [Beowulf] programming multicore clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: >>> Most MPI and OpenMP implementations lock processes to cores for this >>> very reason. >> >> AFAICT this is not always the case. E.g. on systems with glibc, this >> functionality (set_process_affinity and such) is only available >> starting from libc-2.3.4. > > jan 2005 ;) But I'm sure most MPI-implementations are still available on linux-distributions that do not have libc-2.3.4 (or higher). > >> Mixing OpenMP and MPI in one and the same algorithm does indeed not >> generally provide a big advantage. > > I'm curious why this would be. do you have examples or analysis? Maybe my statement was not carefull enough in wording. Basically I've never seen an implementation an algorithm containing a mix of OpenMP and MPI and benefit from this mix. > >> scales. E.g. you can obtain a big boost when running an MPI-code where >> each process performs local dgemm's for instance by using an OpenMP'd >> dgemm implementation. This is an example where running mixed-mode >> makes a lot of sense. > > if you take this approach, you'd do blocking to divide the node's work > among threads, no? or would performance require that a thread's block > fit in its private cache? if threads indeed do blocking, then the > difference between hybrid and straight MPI approaches would mainly be > down to time spent rearranging the matrices to set up for dgemm. > or would the threaded part of the hybrid approach not do blocking? Indeed, every thread will work on its block so OpenMPI and MPI approaches are alike. It is therefore interesting to compare e.g. the scalability of GotoBLAS (using OpenMP) to that of BLACS (using MPI). I have papers somewhere which show great scalability of GotoBLAS up to 8 threads. toon
- Previous message: [Beowulf] programming multicore clusters
- Next message: [Beowulf] programming multicore clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
