Performance Variations using MPI/Myrico
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Steffen Persvold sp at scali.noFri Apr 27 07:42:55 PDT 2001
- Previous message: Performance Variations using MPI/Myrico
- Next message: Performance Variations using MPI/Myrico
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Patrick Geoffray wrote: > > 3) Any ideas on what could cause this much variation? > > I have some ideas, but nothing I would bet on. Mainly cache trashing : the > memory copy operation is improved with SSE by using the prefecthing > support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache > is happilly flushed as a processor is doing a copy. As the FFT code > include a copy step, who knows... :-) Hmm, the NAS application runs in userspace and since this inner loop (FFT code) runs without any communication with other nodes, why would a SSE patched kernel improve it's memcpy performance. I would believe that the memcpy calls in the FFT code was either inlined by the compiler, or that a call to libc's memcpy was made. It shouldn't involve any system (kernel) time at all, right ?? Regards, -- Steffen Persvold Systems Engineer Email : mailto:sp at scali.com Scali AS (http://www.scali.com) Norway : Tlf : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA : Tlf : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA
- Previous message: Performance Variations using MPI/Myrico
- Next message: Performance Variations using MPI/Myrico
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
