[Beowulf] programming multicore clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at pbm.comMon Jun 18 17:00:50 PDT 2007
- Previous message: [Beowulf] programming multicore clusters
- Next message: [Beowulf] programming multicore clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Indeed, this is true for every system that is still in development. > But as I responded to Mark Hahn, there are still many linux > distributions deployed that have libc-2.3.3 or older. I guess your > products (I had a quick look but could not find the info directly) are > also still supporting linux distributions with libc-2.3.3 or older. My memory is that older versions of x86_64 libc have a different set of affinity functions (different # of args). PathScale supported both. > >First off, I see people using *threaded* DGEMM, not OpenMP. > > I did not differentiate between these two in my previous mail because to > me it's an implementation issue. Both come down to using multiple threads. It's extremely inconvenient to express an efficient DGEMM in OpenMP, just like it's pretty inconvent to express an efficient serial DGEMM. So you won't find anyone using an OpenMP DGEMM. You can call everything in the universe an implementation issue if you like. > We have benchmarked our code with using multiple BLAS implementations > and so far GotoBLAS came out as a clear winner. Next we tested GotoBLAS > using 1,2 and 4 threads and depending on the linear solver (of which one > is http://graal.ens-lyon.fr/MUMPS/) we had a speedup of between 30% and > 70% when using 2 or 4 threads. Sorry, did you compare against a pure MPI implementation? For example the HPL code can run either way, so it's easy to compare. But if you're comparing a serial code to a threaded code, it's no surprise that the threaded code can be faster, especially solving a problem which is not memory intensive. In fact I'd expect an even bigger win than 1.7X, perhaps you aren't using Opterons ;-) -- greg
- Previous message: [Beowulf] programming multicore clusters
- Next message: [Beowulf] programming multicore clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
