[Beowulf] OpenMP on AMD dual core processors
bill at cse.ucdavis.edu
Fri Nov 21 04:36:43 PST 2008
Fortran isn't one of my better languages, but I did manage to tweak your code
into something that I believe works the same and is openMP friendly.
I put a copy at:
When I used the pathscale compiler on your code it said:
"told.f", line 27: Warning: Referenced scalar variable OLD_V is SHARED by default
"told.f", line 29: Warning: Referenced scalar variable DV is SHARED by default
"told.f", line 31: Warning: Referenced scalar variable CONVERGED is SHARED by
I rewrote your code to get rid of those, I didn't know some of the constants
you mentioned dy and Ly. So I just wrote my own initialization. I skipped
the boundary conditions by just restricting the start and end of the loops.
Your code seemed to be interpolating between the current iteration (i-1 and
j-1) and the last iteration (i+1 and j+1). Not sure if that was intentional
or not. In any case I just processed the array v into v2, then if it didn't
converge I processed the v2 array back into v. To make each loop independent
I made converge a 1D array which stored the sum of that row's error. Then
after each array was processed I walked the 1-d array to see if we had
converged. I exit when all pixels are below the convergence value.
It scales rather well on a dual socket barcelona (amd quad core), my version
iterates a 1000x1000 array with a range of values from 0-200 over 1214
iterations to within a convergence of 0.02.
CPUs time Scaling
2 27.75 1.96 faster
4 14.14 3.85 faster
8 7.75 7.03 faster
Hopefully my code is doing what you intended.
Alas, with gfortran (4.3.1 or 4.3.2), I get a segmentation fault as soon as I
run. Same if I compile with -g and run it under the debugger. I'm probably
doing something stupid.
More information about the Beowulf