Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] programming multicore clusters

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Toon Knapen toon.knapen at fft.be
Sat Jun 16 05:36:17 PDT 2007


Greg Lindahl wrote:
> On Fri, Jun 15, 2007 at 01:49:49PM +0200, Toon Knapen wrote:
> 
>> AFAICT this is not always the case. E.g. on systems with glibc, this 
>> functionality (set_process_affinity and such) is only available starting 
>> from libc-2.3.4.
> 
> Nearly every statement about Linux is untrue at some point in the
> past.


Indeed, this is true for every system that is still in development.
But as I responded to Mark Hahn, there are still many linux 
distributions deployed that have libc-2.3.3 or older. I guess your 
products (I had a quick look but could not find the info directly) are 
also still supporting linux distributions with libc-2.3.3 or older.


> 
>> E.g. you can obtain a big boost when running an 
>> MPI-code where each process performs local dgemm's for instance by using 
>> an OpenMP'd dgemm implementation. This is an example where running 
>> mixed-mode makes a lot of sense.
> 
> First off, I see people using *threaded* DGEMM, not OpenMP. 

I did not differentiate between these two in my previous mail because to 
me it's an implementation issue. Both come down to using multiple threads.


> Second,
> I've never seen anyone show an actual benefit -- can you name an
> example? i.e. "for N=foo, I get a 13% speedup on..."


We have benchmarked our code with using multiple BLAS implementations 
and so far GotoBLAS came out as a clear winner. Next we tested GotoBLAS 
using 1,2 and 4 threads and depending on the linear solver (of which one 
is http://graal.ens-lyon.fr/MUMPS/) we had a speedup of between 30% and 
70% when using 2 or 4 threads.
The scalability of GotoBLAS in respect to the number of threads is 
actually much better. But of course when integrated in a solver, the 
speedup is strongly dependent on the size of the matrices being passed 
to BLAS: the larger the better of course.

toon




More information about the Beowulf mailing list