[Beowulf] OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Nathan Moore ntmoore at gmail.comWed Nov 26 11:34:30 PST 2008
- Previous message: [Beowulf] Re: OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64
- Next message: [Beowulf] cli alternative to cluster top?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
on the web, http://course1.winona.edu/nmoore/clouds_example_OpenMP_f90.html On Wed, Nov 26, 2008 at 1:22 PM, Bill Broadley <bill at cse.ucdavis.edu> wrote: > I'd be happy to take a look, but email formatting of f90 sometimes seems to > cause issues. Could you send it as an attachment or put it on a webpage? > > Nathan Moore wrote: > > After the help last week on openmp, I got inspired and bought a dual-quad > > opteron machine for the department to show 8-way scaling for my students > > ("Hey, its much cheaper than something new in the optics lab", my dept. > > chair laughed). > > > > I've been working on said machine over the past few days and found > something > > really weird in an OpenMP example program I descrobed to the list. > > > > The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) > mainboard, > > w/ 8GB ram. Initially, I installed the i386 version of Scientific Linux > > 5.2, but then realized that only half of the RAM was usable, and > > re-installed SL5.2 x86_64 this morning. > > > > The example program is appended to the end of this email. Again, it is a > > 2-d finite-difference solution to the laplace equation (the context being > to > > "predict" lightning strikes based on the potential between the ground and > > some clouds overhead). > > > > The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I > set > > the number of threads to 8, something weird happens, and instead of > taking > > the normal 241 iterations to converge, the program converges after 1 > step. > > This reeks of numerical instability to me, but my programming experience > in > > x86_64 is very limited. > > > > I'm using gfortran, with the simple compile string, > > gfortran clouds_example_OpenMP.f90 -m64 -fopenmp > > > > Any insight into what obvious mistake I'm making would be wonderful! > > > > The stability of the output seems erratic to me. Sometimes when > > OMP_NUM_THREADS=7 the result converges normally after 241 iterations and > at > > other times, the result converges after 1 iteration (indicating some sort > of > > problem with hardware???) > > > > This didn't occur yesterday when the machine was running SL5.2, i386. > > > > > > > > > > Simulation Output: > > > > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=1 > > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS > > [nmoore at aykroyd clouds]$ ./a.out > > Hello World from thread 0 > > There are 1 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 57 > > > > > > > > > > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=2 > > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS > > [nmoore at aykroyd clouds]$ ./a.out > > Hello World from thread 0 > > Hello World from thread 1 > > There are 2 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 28 > > > > > > > > > > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=4 > > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS > > [nmoore at aykroyd clouds]$ ./a.out > > Hello World from thread 3 > > Hello World from thread 1 > > Hello World from thread 0 > > Hello World from thread 2 > > There are 4 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 14 > > > > > > > > > > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=8 > > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS > > [nmoore at aykroyd clouds]$ ./a.out > > Hello World from thread 2 > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 1 > > initialization time, 0 > > simulation time, 0 > > > > Code listing: > > > > nmoore at aykroyd clouds]$ cat clouds_example_OpenMP.f90 > > ! > > ! > > use omp_lib > > ! > > IMPLICIT NONE > > integer,parameter::Nx=2000 > > integer,parameter::Ny=2000 > > real*8 v(Nx,Ny), dv(Nx,Ny) > > integer boundary(Nx,Ny) > > integer i,j,converged,i1,i2 > > integer t0,t1,t2 > > real*8 convergence_v, v_cloud, v_ground, max_dv > > real*8 bump_P,old_v > > real*8 Lx,Ly,dx,dy,v_y > > ! > > real*8 lightning_rod_center, lightning_rod_height > > ! > > real*8 house_center, house_height, house_width > > integer num_iterations > > ! > > integer:: id, nthreads > > !$omp parallel private(id) > > id = omp_get_thread_num() > > write (*,*) 'Hello World from thread', id > > !$omp barrier > > if ( id == 0 ) then > > nthreads = omp_get_num_threads() > > write (*,*) 'There are', nthreads, 'threads' > > end if > > !$omp end parallel > > > > ! initial time > > t0 = secnds(0.0) > > > > dx =0.1d0 ! differential lengths, m > > dy =0.1d0 > > Lx = Nx*dx ! system sizes, m > > Ly = Ny*dy > > > > print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly > > print *,"\tNx = ",Nx,"\n\tNy = ",Ny > > print *,"\tdx = ",dx,"\n\tdy = ",dy > > > > v_cloud = -10000.d0 ! volts > > v_ground = 0.d0 > > > > ! initialize the the boundary conditions > > ! > > ! first, set the solution function (v), to look like a > > ! parallel-plate capacitor > > ! > > ! note that there is one large parallel section and several > > ! parallel do's inside that region > > !$OMP PARALLEL > > ! > > !$OMP DO > > !$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v) > > !$OMP& PRIVATE(i,j) > > do j=1,Ny > > do i=1,Nx > > boundary(i,j)=0 > > v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly) > > end do > > end do > > !$OMP END DO > > ! > > !$OMP DO > > !$OMP& SHARED(Nx,Ny,boundary) > > !$OMP& PRIVATE(i) > > do i=1,Nx > > boundary(i,1)=1 ! we need to ensure that the edges of > > boundary(i,Ny)=1 ! the domain are held as boundary > > end do > > !$OMP END DO > > ! > > !$OMP DO > > !$OMP& SHARED(boundary,Nx) > > !$OMP& PRIVATE(j) > > do j=1,Ny > > boundary(1,j)=1 > > boundary(Nx,j)=1 > > end do > > !$OMP END DO > > !$OMP END PARALLEL > > > > > > ! set up an interesting feature on the lower boundary > > ! do this in parallel with SECTIONS directive > > ! > > !$OMP PARALLEL > > !$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height) > > !$OMP& > PRIVATE(lightning_rod_center,house_center,house_height,house_width)) > > !$OMP SECTIONS > > > > !$OMP SECTION > > ! Lightning_rod > > lightning_rod_center = Lx*0.6d0 > > lightning_rod_height = 5.0d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > lightning_rod_center = Lx*0.5d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > lightning_rod_center = Lx*0.7d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > ! house > > house_center = 0.4d0*Lx > > house_height = 5.0d0 > > house_width = 5.0d0 > > call > > > create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP END SECTIONS > > !$OMP END PARALLEL > > > > ! initialization done > > t1 = secnds(0.0) > > > > > > > > > > > > ! main solution iteration > > ! > > ! repeat the recursion relation until the maximum change > > ! from update to update is less than a convergence epsilon, > > convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny) > > print *,"\nconvergence criteria is \Delta V < ",convergence_v > > num_iterations = 0 > > > > ! > > ! iteration implemented with a goto or a do-while > > converged=0 > > do while( converged .eq. 0) > > > > converged = 1 > > num_iterations = num_iterations + 1 > > !$OMP PARALLEL > > !$OMP DO > > !$OMP& PRIVATE(i,j) > > !$OMP& SHARED(Ny,Nx,dv,v,boundary)) > > do j=2,(Ny-1) > > do i=2,(Nx-1) > > dv(i,j) = > > 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j) > > dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j))) > > end do > > end do > > !$OMP END DO > > > > max_dv = 0.d0 > > !$OMP DO > > !$OMP& PRIVATE(i,j) > > !$OMP& SHARED(NX,NY,dv,v)) > > !$OMP& REDUCTION(MAX:max_dv) > > do j=2,(Ny-1) > > do i=2,(Nx-1) > > v(i,j) = v(i,j) + dv(i,j) > > if(dv(i,j) .gt. max_dv) then > > max_dv = dv(i,j) > > endif > > end do > > end do > > !$OMP END DO > > !$OMP END PARALLEL > > > > if(max_dv .ge. convergence_v) then > > converged = 0 > > endif > > > > end do > > > > > > > > > > > > > > > > ! simulation finished > > t2 = secnds(0.0) > > > > print *," iterations necessary, ",num_iterations > > print *," initialization time, ",t1-t0 > > print *," simulation time, ",t2-t1 > > > > > > open(unit=10,file="v_output.dat") > > write(10,*) "# x\ty\tv(x,y)" > > do j=1,Ny > > !do i=1,Nx > > ! skipping the full array print to save execution time > > ! the printed data file is normally sent to gnuplot w/ splot > > i=10 > > write (10,*) i*dx,j*dy,v(i,j) > > !enddo > > write (10,*)" " > > end do > > close(10) > > > > > > stop > > end > > > > > > > > > > subroutine > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > IMPLICIT NONE > > integer Nx, Ny,j,boundary(Nx,Ny) > > integer j_limit > > integer index_lightning_rod_center > > real*8 v(Nx,Ny) > > real*8 lightning_rod_center,lightning_rod_height > > real*8 dx, dy, v_ground > > > > index_lightning_rod_center = lightning_rod_center/dx > > j_limit = lightning_rod_height/dy > > do j=1,j_limit > > v(index_lightning_rod_center,j) = v_ground > > boundary(index_lightning_rod_center,j) = 1 > > end do > > > > print *,"Created a lightning rod of height ",lightning_rod_height > > print *,"\ty_index ",j_limit > > print *,"\tx-position ",lightning_rod_center > > print *,"\tx_index ",index_lightning_rod_center > > > > > > end subroutine > > > > > > > > > > > > > > > > subroutine > > > create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) > > IMPLICIT NONE > > integer Nx, Ny, boundary(Nx,Ny) > > real*8 v(Nx,Ny) > > real*8 v_ground, dx, dy > > integer i,j,i_limit,j_limit, index_house_center > > real*8 house_center,house_height,house_width > > > > index_house_center = house_center/dx > > i_limit = 0.5d0*house_width/dx > > j_limit = house_height/dy > > do j=1,j_limit > > do i=(index_house_center-i_limit),(index_house_center+i_limit) > > v(i,j) = v_ground > > boundary(i,j) = 1 > > end do > > end do > > > > print *,"Created a house of height ",house_height > > print *,"\ty_index ",j_limit > > print *,"\twidth ",house_width > > print *,"\thouse bounds: > > ",index_house_center-i_limit,index_house_center+i_limit > > > > end subroutine > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081126/40611227/attachment.html
- Previous message: [Beowulf] Re: OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64
- Next message: [Beowulf] cli alternative to cluster top?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
