<div class="gmail_quote">After the help last week on openmp, I got inspired and bought a dual-quad opteron machine for the department to show 8-way scaling for my students ("Hey, its much cheaper than something new in the optics lab", my dept. chair laughed).<br>
<br>I've been working on said machine over the past few days and found something really weird in an OpenMP example program I descrobed to the list.<br><br>The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) mainboard, w/ 8GB ram.  Initially, I installed the i386 version of Scientific Linux 5.2, but then realized that only half of the RAM was usable, and re-installed SL5.2 x86_64 this morning.<br>

<br>The example program is appended to the end of this email.  Again, it is a 2-d finite-difference solution to the laplace equation (the context being to "predict" lightning strikes based on the potential between the ground and some clouds overhead).<br>

<br>The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I set the number of threads to 8, something weird happens, and instead of taking the normal 241 iterations to converge, the program converges after 1 step.  This reeks of numerical instability to me, but my programming experience in x86_64 is very limited.  <br>

<br>I'm using gfortran, with the simple compile string, <br>     gfortran clouds_example_OpenMP.f90 -m64 -fopenmp<br><br>Any insight into what obvious mistake I'm making would be wonderful!  <br><br>The stability of the output seems erratic to me.  Sometimes when OMP_NUM_THREADS=7 the result converges normally after 241 iterations and at other times, the result converges after 1 iteration (indicating some sort of problem with hardware???)<br>

<br>This didn't occur yesterday when the machine was running SL5.2, i386.<br><br><br><br><br>Simulation Output:<br><br>[nmoore@aykroyd clouds]$ OMP_NUM_THREADS=1<br>[nmoore@aykroyd clouds]$ export OMP_NUM_THREADS<br>
[nmoore@aykroyd clouds]$ ./a.out <br>
 Hello World from thread           0<br> There are           1 threads<br>...<br>convergence criteria is \Delta V <   0.250000003725290     <br>  iterations necessary,          241<br>  initialization time,            0<br>

  simulation time,           57<br><br><br><br><br>[nmoore@aykroyd clouds]$ OMP_NUM_THREADS=2<br>[nmoore@aykroyd clouds]$ export OMP_NUM_THREADS<br>[nmoore@aykroyd clouds]$ ./a.out <br> Hello World from thread           0<br>

 Hello World from thread           1<br> There are           2 threads<br> ...<br>convergence criteria is \Delta V <   0.250000003725290     <br>  iterations necessary,          241<br>  initialization time,            0<br>

  simulation time,           28<br><br><br><br><br>[nmoore@aykroyd clouds]$ OMP_NUM_THREADS=4<br>[nmoore@aykroyd clouds]$ export OMP_NUM_THREADS<br>[nmoore@aykroyd clouds]$ ./a.out <br> Hello World from thread           3<br>

 Hello World from thread           1<br> Hello World from thread           0<br> Hello World from thread           2<br> There are           4 threads<br> ...<br>convergence criteria is \Delta V <   0.250000003725290     <br>

  iterations necessary,          241<br>  initialization time,            0<br>  simulation time,           14<br><br><br><br><br>[nmoore@aykroyd clouds]$ OMP_NUM_THREADS=8<br>[nmoore@aykroyd clouds]$ export OMP_NUM_THREADS<br>

[nmoore@aykroyd clouds]$ ./a.out <br> Hello World from thread           2<br> ...<br>
convergence criteria is \Delta V <   0.250000003725290     <br>  iterations necessary,            1<br>  initialization time,            0<br>  simulation time,            0<br><br>Code listing:<br><br>nmoore@aykroyd clouds]$ cat clouds_example_OpenMP.f90 <br>

!<br>!<br>use omp_lib<br>!<br>IMPLICIT NONE<br>integer,parameter::Nx=2000<br>integer,parameter::Ny=2000<br>real*8 v(Nx,Ny), dv(Nx,Ny)<br>integer boundary(Nx,Ny)<br>integer i,j,converged,i1,i2<br>integer t0,t1,t2<br>real*8 convergence_v, v_cloud, v_ground, max_dv<br>

real*8 bump_P,old_v<br>real*8 Lx,Ly,dx,dy,v_y<br>!<br>real*8 lightning_rod_center, lightning_rod_height<br>!<br>real*8 house_center, house_height, house_width <br>integer num_iterations<br>! <br>integer:: id, nthreads<br>

!$omp parallel private(id)<br>id = omp_get_thread_num()<br>write (*,*) 'Hello World from thread', id<br>!$omp barrier<br>if ( id == 0 ) then<br>        nthreads = omp_get_num_threads()<br>        write (*,*) 'There are', nthreads, 'threads'<br>

end if<br>!$omp end parallel<br><br>! initial time<br>t0 = secnds(0.0)<br><br>dx =0.1d0 ! differential lengths, m<br>dy =0.1d0<br>Lx = Nx*dx ! system sizes, m<br>Ly = Ny*dy<br><br>print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly<br>

print *,"\tNx = ",Nx,"\n\tNy = ",Ny<br>print *,"\tdx = ",dx,"\n\tdy = ",dy<br><br>v_cloud = -10000.d0     ! volts<br>v_ground = 0.d0<br><br>! initialize the the boundary conditions<br>

!<br>! first, set the solution function (v), to look like a <br>! parallel-plate capacitor<br>!<br>!       note that there is one large parallel section and several <br>!       parallel do's inside that region<br>!$OMP PARALLEL<br>

!<br>!$OMP DO<br>!$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v)<br>!$OMP& PRIVATE(i,j)<br>do j=1,Ny<br>        do i=1,Nx<br>                boundary(i,j)=0<br>                v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly)<br>

        end do<br>end do<br>!$OMP END DO<br>!<br>!$OMP DO<br>!$OMP& SHARED(Nx,Ny,boundary) <br>!$OMP& PRIVATE(i)<br>do i=1,Nx<br>        boundary(i,1)=1         ! we need to ensure that the edges of <br>        boundary(i,Ny)=1        ! the domain are held as boundary<br>

end do<br>!$OMP END DO<br>!<br>!$OMP DO <br>!$OMP& SHARED(boundary,Nx) <br>!$OMP& PRIVATE(j)<br>do j=1,Ny<br>        boundary(1,j)=1<br>        boundary(Nx,j)=1<br>end do<br>!$OMP END DO<br>!$OMP END PARALLEL<br>
<br>
<br>! set up an interesting feature on the lower boundary  <br>!       do this in parallel with SECTIONS directive<br>!<br>!$OMP PARALLEL<br>!$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height)<br>!$OMP& PRIVATE(lightning_rod_center,house_center,house_height,house_width))<br>

!$OMP SECTIONS<br><br>!$OMP  SECTION <br>! Lightning_rod<br>lightning_rod_center = Lx*0.6d0<br>lightning_rod_height = 5.0d0<br>call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)<br>

<br>!$OMP  SECTION <br>lightning_rod_center = Lx*0.5d0<br>call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)<br><br>!$OMP  SECTION <br>lightning_rod_center = Lx*0.7d0<br>
call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)<br>
<br>!$OMP  SECTION <br>! house<br>house_center = 0.4d0*Lx<br>house_height = 5.0d0 <br>house_width = 5.0d0<br>call create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)<br><br>!$OMP END SECTIONS <br>

!$OMP END PARALLEL<br><br>! initialization done<br>t1 = secnds(0.0)<br><br><br><br><br><br>! main solution iteration<br>!<br>! repeat the recursion relation until the maximum change <br>! from update to update is less than a convergence epsilon,<br>

convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny)<br>print *,"\nconvergence criteria is \Delta V < ",convergence_v<br>num_iterations = 0<br><br>!<br>! iteration implemented with a goto  or a do-while<br>

converged=0<br>do while( converged .eq. 0)<br><br>        converged = 1<br>        num_iterations = num_iterations + 1<br>        !$OMP PARALLEL <br>        !$OMP DO <br>        !$OMP& PRIVATE(i,j)<br>        !$OMP& SHARED(Ny,Nx,dv,v,boundary))<br>

        do j=2,(Ny-1)<br>                do i=2,(Nx-1)<br>                        dv(i,j) = 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j)<br>                        dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j)))<br>

                end do<br>        end do <br>        !$OMP END DO<br><br>        max_dv = 0.d0<br>        !$OMP DO <br>        !$OMP& PRIVATE(i,j) <br>        !$OMP& SHARED(NX,NY,dv,v))<br>        !$OMP& REDUCTION(MAX:max_dv)<br>

        do j=2,(Ny-1)<br>                do i=2,(Nx-1)<br>                        v(i,j) = v(i,j) + dv(i,j)<br>                        if(dv(i,j) .gt. max_dv) then<br>                                max_dv = dv(i,j)<br>                        endif<br>

                end do<br>        end do<br>        !$OMP END DO<br>        !$OMP END PARALLEL<br><br>        if(max_dv .ge. convergence_v) then<br>                converged = 0<br>        endif<br><br>end do<br><br><br>
<br>
<br><br><br><br>! simulation finished<br>t2 = secnds(0.0)<br><br>print *," iterations necessary, ",num_iterations<br>print *," initialization time, ",t1-t0<br>print *," simulation time, ",t2-t1<br>

<br><br>open(unit=10,file="v_output.dat")<br>write(10,*) "# x\ty\tv(x,y)"<br>do j=1,Ny<br>        !do i=1,Nx<br>        ! skipping the full array print to save execution time<br>        ! the printed data file is normally sent to gnuplot w/ splot<br>

        i=10<br>                write (10,*) i*dx,j*dy,v(i,j)<br>        !enddo<br>        write (10,*)" "<br>end do<br>close(10)<br><br><br>stop<br>end<br><br><br><br><br>subroutine create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)<br>

IMPLICIT NONE<br>integer Nx, Ny,j,boundary(Nx,Ny)<br>integer j_limit<br>integer index_lightning_rod_center<br>real*8 v(Nx,Ny)<br>real*8 lightning_rod_center,lightning_rod_height<br>real*8 dx, dy, v_ground<br><br>index_lightning_rod_center = lightning_rod_center/dx<br>

j_limit = lightning_rod_height/dy<br>do j=1,j_limit<br>        v(index_lightning_rod_center,j) = v_ground<br>        boundary(index_lightning_rod_center,j) = 1<br>end do<br><br>print *,"Created a lightning rod of height ",lightning_rod_height<br>

print *,"\ty_index ",j_limit<br>print *,"\tx-position ",lightning_rod_center<br>print *,"\tx_index ",index_lightning_rod_center<br><br><br>end subroutine<br><br><br><br><br><br><br><br>subroutine create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)<br>

IMPLICIT NONE<br>integer Nx, Ny, boundary(Nx,Ny)<br>real*8 v(Nx,Ny)<br>real*8 v_ground, dx, dy<br>integer i,j,i_limit,j_limit, index_house_center<br>real*8 house_center,house_height,house_width<br><br>index_house_center = house_center/dx<br>

i_limit = 0.5d0*house_width/dx <br>j_limit = house_height/dy<br>do j=1,j_limit<br>        do i=(index_house_center-i_limit),(index_house_center+i_limit)<br>                v(i,j) = v_ground<br>                boundary(i,j) = 1<br>

        end do<br>end do<br><br>print *,"Created a house of height ",house_height<br>print *,"\ty_index ",j_limit<br>print *,"\twidth ",house_width<br>print *,"\thouse bounds: ",index_house_center-i_limit,index_house_center+i_limit<br>

<br>end subroutine<br><font color="#888888"><br>-- <br>- - - - - - -   - - - - - - -   - - - - - - - <br>Nathan Moore<br>Assistant Professor, Physics<br>Winona State University<br>AIM: nmoorewsu <br>- - - - - - -   - - - - - - -   - - - - - - -<br>


</font></div><br><br clear="all"><br>-- <br>- - - - - - -   - - - - - - -   - - - - - - - <br>Nathan Moore<br>Assistant Professor, Physics<br>Winona State University<br>AIM: nmoorewsu <br>- - - - - - -   - - - - - - -   - - - - - - -<br>