[Beowulf] fftw2, mpi, from 32 bit to 64 and fortran

Gus Correa gus at ldeo.columbia.edu
Wed Aug 6 19:53:15 PDT 2008


Hi Ricardo, David, Mark, and list

If as Ricardo says, he suppressed the 5th parameter ("use_work") on the call
to rfftwnd_f77_mpi, which has 6 parameters, wouldn't it start 
mismatching pointers
on the 5th parameter, instead of on the 2nd parameter ("n_fields")?
I.e. "use_work" would take the value of "FFTW_NORMAL_ORDER",
and "FFTW_NORMAL_ORDER" would get a random value (OS permitting),
but the initial 4 parameters would be correct, right?
In any case, there is little difference between this and what David said,
the point of failure is different, the nature is the same.

However, it is interesting that somehow
at runtime the program segfaults in 64-bits, but doesn't fail in 32-bits,
although it most likely computes wrong stuff.
Ricardo have you ever QCd' the 32-bit output before you fixed/inserted 
"use_work"?
If you were in a big lucky strike the random value left on the 
FFTW_NORMAL_ORDER
address matched your needs, and the result may be correct!   :)

Anyway, somehow the program seems to behave differently,
with the OS superego being more compliant (in a nasty sense) in 32-bits 
than it is in  64-bits.
Does the OS paradoxically give less memory room for the stack in 
64-bits, leading to the segfault?
Or does it give the same room, but because the pointers are bigger the 
segfault is more likely?
Or does the segfault happen somewhere else, not on the stack?
Where?
Why in 64-bits?
Why not in 32 bits?

Yes, as David noted about programming, here I also got and continue to 
get these bugs,
particularly in Fortran programs where no parameter checking is enforced.
And the nastier ones are those that don't segfault,
then come back to haunt you when somebody looks at the output,
if you are not careful enough to look at it before anybody else does.

Cheers,
Gus Correa

Compilar e' preciso,
rodar e' impreciso!

... mais uma do vosso alter-ego P'ssoa ... :)

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Lombard, David N wrote:

>On Tue, Aug 05, 2008 at 02:57:42AM -0700, Ricardo Reis wrote:
>  
>
>>On Mon, 4 Aug 2008, Mark Kosmowski wrote:
>>
>>    
>>
>>>So, why did the 32-bit test case work?  Shouldn't the same problem
>>>crash both systems if it is a code issue?
>>>      
>>>
>
>Not necessarily given the error described below.
>
>  
>
>>I asked the same question myself... The function interface is:
>>
>>   call rfftwnd_f77_mpi(plan_c2r, &
>>        1, local_data, work, use_work, FFTW_NORMAL_ORDER)
>>
>>where use_work is an integer, value 1 if you use the work temporary
>>array, 0 otherwise. This was the variable I wasn't passing.
>>    
>>
>...
>  
>
>>The wrapper function for this is (from rfftw_f77_mpi.c):
>>
>>void F77_FUNC_(rfftwnd_f77_mpi,RFFTWND_F77_MPI)
>>(rfftwnd_mpi_plan *p, int *n_fields, fftw_real *local_data,
>>  fftw_real *work, int *use_work, int *ioutput_order)
>>    
>>
>
>  
>
>> .... So it must be a pointer issue revealed by the 64 bit, no? When I
>>wasn't doing it "properly" the value of *ioutput_order wasn't set.
>>    
>>
>
>The value of the first element of local_data was used for the n_fields scalar.
>
>The work array was being laid down starting at the location of the use_work scalar.
>
>The FFTW_NORMAL_ORDER value was being interpreted as use_work scalar.
>
>Finally, ioutput_order scalar was some random value.
>
>So, a lot was going wrong there.  It's just one of life's little, um,  pleasures
>that it looked like it was working for your 32-bit test case.  Don't worry, you'll
>likely do this again, as likely *every* one of us on this list has, too.
>
>BTW, Fortran passes by reference; that's why all args are pointers.
>
>  
>




More information about the Beowulf mailing list