Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Performance Variations using MPI/Myrico

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Steffen Persvold sp at scali.no
Fri Apr 27 07:42:55 PDT 2001


Patrick Geoffray wrote:
> > 3) Any ideas on what could cause this much variation?
> 
> I have some ideas, but nothing I would bet on. Mainly cache trashing : the
> memory copy operation is improved with SSE by using the prefecthing
> support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache
> is happilly flushed as a processor is doing a copy. As the FFT code
> include a copy step, who knows... :-)

Hmm, the NAS application runs in userspace and since this inner loop (FFT code) runs without any communication with
other nodes, why would a SSE patched kernel improve it's memcpy performance. I would believe that the memcpy calls in
the FFT code was either inlined by the compiler, or that a call to libc's memcpy was made. It shouldn't involve any
system (kernel) time at all, right ?? 


Regards,
-- 
 Steffen Persvold                        Systems Engineer
 Email  : mailto:sp at scali.com            Scali AS (http://www.scali.com)
 Norway : Tlf  : (+47) 2262 8950         Olaf Helsets vei 6
          Fax  : (+47) 2262 8951         N-0621 Oslo, Norway

 USA    : Tlf  : (+1) 713 706 0544       10500 Richmond Avenue, Suite 190
                                         Houston, Texas 77042, USA




More information about the Beowulf mailing list