Performance Variations using MPI/Myrico
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Steffen Persvold sp at scali.noFri Apr 27 09:02:38 PDT 2001
- Previous message: Performance Variations using MPI/Myrico
- Next message: Performance Variations using MPI/Myrico
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Patrick Geoffray wrote: > > Steffen Persvold wrote: > > > Hmm, the NAS application runs in userspace and since this inner loop > > (FFT code) runs without any communication with other nodes, why would a > > SSE patched kernel improve it's memcpy performance. I would believe that > > the memcpy calls in the FFT code was either inlined by the compiler, or > > that a call to libc's memcpy was made. It shouldn't involve any > > system (kernel) time at all, right ?? > > Hi Steffen, > > Yes, the NAS FT code does not use the "memcpy()" system call. The copy > step of the FFT is explicit (loop of assignments) and the PGI compiler > is smart enough to use SSE prefetching to optimize this part of the code > if SSE is available. But without a specific patch, the Linux kernel does > not enable the SSE support (basically the kernel has to save the FP and > the SSE registers during context switching), so the SSE optimization for > PIII from PGI is useless. Now I am wondering if compiling with > -Mvect=sse or -Mvect=prefetch with pgf90 WITHOUT the SSE support enabled > in the kernel is not the source of this unstability. Actually, running SSE code (involving any SSE "mov" instructions) on a kernel wich doesn't save the SSE registers between context switches would result in a segmentation fault..... I have learned this the hard way : The original RH6.2 kernel (2.2.14-5.0) had PIII support and therefore saving of SSE registers, but when RH released a kernel update because they experienced data loss during context switches (RHBA-2000:013-01), I upgraded to 2.2.14-6.0.1. This kernel however did not have SSE support enabled, and my hand coded SSE routines suddenly caused a segmentation fault. There are however some SSE instructions that doesn't require a context switch save of registers (i.e "sfence" and "prefetchnta") > > Anyway, 50 % of variation for a pure computation piece of code seems too > large to be explained by the SSE support. SSE on PIII is single > precision only, so it does not help to get more Flops. Maybe there is > something else in the patch that they applied, I will look at it. > I agree. Regards, -- Steffen Persvold Systems Engineer Email : mailto:sp at scali.com Scali AS (http://www.scali.com) Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6 Fax : (+47) 2262 8951 N-0621 Oslo, Norway USA : Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190 Houston, Texas 77042, USA
- Previous message: Performance Variations using MPI/Myrico
- Next message: Performance Variations using MPI/Myrico
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
