Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] p4_error: interrupt SIGSEGV: 11 Killed by signal 2.

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

cjoung at tpg.com.au cjoung at tpg.com.au
Thu Oct 21 23:02:17 PDT 2004


Firstly,
Thank you to Mark H, Anthony S and in particular Henry G for your helpful
comments on my 
MPI/scaLAPACK problem. After reading your comments, I removed MPICH2, installed 
mpich1.2.6 and this did indeed fix the "Null Comm Pointer" error!

> Date: Tue, 19 Oct 2004 14:21:31 -0700 
> From: "Gabb, Henry" <henry.gabb at intel.com>
> Subject: Re: [Beowulf] MPI & ScaLAPACK: 
>              error in MPI_Comm_size: Invalid communicator
> The "Null Comm pointer" error that you're seeing is almost always due to
> a header mismatch. I don't think Intel Cluster MKL 7.0 supports MPICH2 yet. 
>
> > Error message:
> > aborting job: Fatal error in MPI_Comm_size: Invalid communicator, error stack:
> > MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
> > MPI_Comm_size(66): Null Comm pointer
**********************************************************

Current Problem:
I have another problem now, also related to MPI/scaLAPACK.
I tried to compile and run the example1.f program - It uses the
scalapack PDGESV subroutine to solve the equation [A]x=b
(from the netlib/scalapack website - I did NOT modify it)
(see end of email for copy of example1.f)

It seems to compile ok, but on execution it gives this 
error message:
****************************************
[tony at carmine clint]$ mpif77 -o ex1 example1.f 
                             -L/opt/intel/mkl70cluster/lib/32 
                             -lmkl_scalapack 
                             -lmkl_blacsF77init 
                             -lmkl_blacs 
                             -lmkl_blacsF77init 
                             -lmkl_lapack 
                             -lmkl_ia32 
                             -lguide 
                             -lpthread 
                             -static-libcxa
[tony at carmine clint]$ mpirun -np 6 ./ex1 
p0_24505:  p4_error: interrupt SIGSEGV: 11 
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
[tony at carmine clint]$
****************************************
(there are also comments about "broken pipes", but
I didn't include that part above)

..as far as I can discover on the web, SIGSEGV 
represents a "segmentation fault", beyond this,
I don't know what to do for a fix.

Some people have suggested fixes such as:

* increasing memory sizes: i.e. type in:
> >export P4_GLOBMEMSIZE=536870912
> >
> >(=512MB). 

..but this didn't do anything for me.
Generally, I have not seen any other solutions which 
have had a positive response.
They say, a problem like this is normally attributed
to a bug in the source code, but seeing as the 
netlib/scalapack developers give this source code out 
as the beginners basic 'hello world' program, I doubt 
this program would carry a bug in it!
I would have to guess that there's something ELSE wrong.
(In any case, I tried another program out of a book that
does basically the same thing - same problem)

I was hoping a reader in this forum has seen this problem
before, and knows of a solution.

Any suggestions, even speculative, would be most appreciated,
(Please use simple language - I am quite a novice at all of this...)

with many thanks,

Clint Joung

Postdoctoral Research Associate
Department of Chemical Engineering
University of Sydney, NSW 2006
Australia

ps: My system details and the source code 'example1.f':

OS: Linux Redhat Fedora Core 2
FC: Intel Fortran Compiler V8.0 (ifort) 
(I've also tried building MPI libraries using GNU g77 - same problem)
CC: GNU gcc
(for some reason, MPI libraries don't 'make' properly using intel icpc)
Scalapack et al: Intel Cluster Maths Kernel Library for Linux v7.0
MPI: mpich-1.2.6 (I've also tried mpich1.2.5.2 - same problem)

The example1.f program. It runs ok as far as the actual call to
PDGESV, then it falls over.....
**EXAMPLE1.F***************************************
      PROGRAM EXAMPLE1
*
*     Example Program solving Ax=b via ScaLAPACK routine PDGESV
*
*     .. Parameters ..
      INTEGER            DLEN_, IA, JA, IB, JB, M, N, MB, NB, RSRC,
     $                   CSRC, MXLLDA, MXLLDB, NRHS, NBRHS, NOUT,
     $                   MXLOCR, MXLOCC, MXRHSC
      PARAMETER          ( DLEN_ = 9, IA = 1, JA = 1, IB = 1, JB = 1,
     $                   M = 9, N = 9, MB = 2, NB = 2, RSRC = 0,
     $                   CSRC = 0, MXLLDA = 5, MXLLDB = 5, NRHS = 1,
     $                   NBRHS = 1, NOUT = 6, MXLOCR = 5, MXLOCC = 4,
     $                   MXRHSC = 1 )
      DOUBLE PRECISION   ONE
      PARAMETER          ( ONE = 1.0D+0 )
*     ..
*     .. Local Scalars ..
      INTEGER            ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW
      DOUBLE PRECISION   ANORM, BNORM, EPS, RESID, XNORM
*     ..
*     .. Local Arrays ..
      INTEGER            DESCA( DLEN_ ), DESCB( DLEN_ ),
     $                   IPIV( MXLOCR+NB )
      DOUBLE PRECISION   A( MXLLDA, MXLOCC ), A0( MXLLDA, MXLOCC ),
     $                   B( MXLLDB, MXRHSC ), B0( MXLLDB, MXRHSC ),
     $                   WORK( MXLOCR )
*     ..
*     .. External Functions ..
      DOUBLE PRECISION   PDLAMCH, PDLANGE
      EXTERNAL           PDLAMCH, PDLANGE
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_EXIT, BLACS_GRIDEXIT, BLACS_GRIDINFO,
     $                   DESCINIT, MATINIT, PDGEMM, PDGESV, PDLACPY,
     $                   SL_INIT
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          DBLE
*     ..
*     .. Data statements ..
      DATA               NPROW / 2 / , NPCOL / 3 /
*     ..
*     .. Executable Statements ..
*
*     INITIALIZE THE PROCESS GRID
*
      CALL SL_INIT( ICTXT, NPROW, NPCOL )
      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
*     If I'm not in the process grid, go to the end of the program
*
      IF( MYROW.EQ.-1 )
     $   GO TO 10
*
*     DISTRIBUTE THE MATRIX ON THE PROCESS GRID
*     Initialize the array descriptors for the matrices A and B
*
      CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA,
     $               INFO )
      CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT,
     $               MXLLDB, INFO )
*
*     Generate matrices A and B and distribute to the process grid
*
      CALL MATINIT( A, DESCA, B, DESCB )
*
*     Make a copy of A and B for checking purposes
*
      CALL PDLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA )
      CALL PDLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB )
*
*     CALL THE SCALAPACK ROUTINE
*     Solve the linear system A * X = B
*
      CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB,
     $             INFO )
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         WRITE( NOUT, FMT = 9999 )
         WRITE( NOUT, FMT = 9998 )M, N, NB
         WRITE( NOUT, FMT = 9997 )NPROW*NPCOL, NPROW, NPCOL
         WRITE( NOUT, FMT = 9996 )INFO
      END IF
*
*     Compute residual ||A * X  - B|| / ( ||X|| * ||A|| * eps * N )
*
      EPS = PDLAMCH( ICTXT, 'Epsilon' )
      ANORM = PDLANGE( 'I', N, N, A, 1, 1, DESCA, WORK )
      BNORM = PDLANGE( 'I', N, NRHS, B, 1, 1, DESCB, WORK )
      CALL PDGEMM( 'N', 'N', N, NRHS, N, ONE, A0, 1, 1, DESCA, B, 1, 1,
     $             DESCB, -ONE, B0, 1, 1, DESCB )
      XNORM = PDLANGE( 'I', N, NRHS, B0, 1, 1, DESCB, WORK )
      RESID = XNORM / ( ANORM*BNORM*EPS*DBLE( N ) )
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         IF( RESID.LT.10.0D+0 ) THEN
            WRITE( NOUT, FMT = 9995 )
            WRITE( NOUT, FMT = 9993 )RESID
         ELSE
            WRITE( NOUT, FMT = 9994 )
            WRITE( NOUT, FMT = 9993 )RESID
         END IF
      END IF
*
*     RELEASE THE PROCESS GRID
*     Free the BLACS context
*
      CALL BLACS_GRIDEXIT( ICTXT )
   10 CONTINUE
*
*     Exit the BLACS
*
      CALL BLACS_EXIT( 0 )
*
 9999 FORMAT( / 'ScaLAPACK Example Program #1 -- May 1, 1997' )
 9998 FORMAT( / 'Solving Ax=b where A is a ', I3, ' by ', I3,
     $      ' matrix with a block size of ', I3 )
 9997 FORMAT( 'Running on ', I3, ' processes, where the process grid',
     $      ' is ', I3, ' by ', I3 )
 9996 FORMAT( / 'INFO code returned by PDGESV = ', I3 )
 9995 FORMAT( /
     $   'According to the normalized residual the solution is correct.'
     $       )
 9994 FORMAT( /
     $ 'According to the normalized residual the solution is incorrect.'
     $       )
 9993 FORMAT( / '||A*x - b|| / ( ||x||*||A||*eps*N ) = ', 1P, E16.8 )
      STOP
      END
      SUBROUTINE MATINIT( AA, DESCA, B, DESCB )
*
*     MATINIT generates and distributes matrices A and B (depicted in
*     Figures 2.5 and 2.6) to a 2 x 3 process grid
*
*     .. Array Arguments ..
      INTEGER            DESCA( * ), DESCB( * )
      DOUBLE PRECISION   AA( * ), B( * )
*     ..
*     .. Parameters ..
      INTEGER            CTXT_, LLD_
      PARAMETER          ( CTXT_ = 2, LLD_ = 9 )
*     ..
*     .. Local Scalars ..
      INTEGER            ICTXT, MXLLDA, MYCOL, MYROW, NPCOL, NPROW
      DOUBLE PRECISION   A, C, K, L, P, S
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_GRIDINFO
*     ..
*     .. Executable Statements ..
*
      ICTXT = DESCA( CTXT_ )
      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
      S = 19.0D0
      C = 3.0D0
      A = 1.0D0
      L = 12.0D0
      P = 16.0D0
      K = 11.0D0
*
      MXLLDA = DESCA( LLD_ )
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         AA( 1 ) = S
         AA( 2 ) = -S
         AA( 3 ) = -S
         AA( 4 ) = -S
         AA( 5 ) = -S
         AA( 1+MXLLDA ) = C
         AA( 2+MXLLDA ) = C
         AA( 3+MXLLDA ) = -C
         AA( 4+MXLLDA ) = -C
         AA( 5+MXLLDA ) = -C
         AA( 1+2*MXLLDA ) = A
         AA( 2+2*MXLLDA ) = A
         AA( 3+2*MXLLDA ) = A
         AA( 4+2*MXLLDA ) = A
         AA( 5+2*MXLLDA ) = -A
         AA( 1+3*MXLLDA ) = C
         AA( 2+3*MXLLDA ) = C
         AA( 3+3*MXLLDA ) = C
         AA( 4+3*MXLLDA ) = C
         AA( 5+3*MXLLDA ) = -C
         B( 1 ) = 0.0D0
         B( 2 ) = 0.0D0
         B( 3 ) = 0.0D0
         B( 4 ) = 0.0D0
         B( 5 ) = 0.0D0
      ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.1 ) THEN
         AA( 1 ) = A
         AA( 2 ) = A
         AA( 3 ) = -A
         AA( 4 ) = -A
         AA( 5 ) = -A
         AA( 1+MXLLDA ) = L
         AA( 2+MXLLDA ) = L
         AA( 3+MXLLDA ) = -L
         AA( 4+MXLLDA ) = -L
         AA( 5+MXLLDA ) = -L
         AA( 1+2*MXLLDA ) = K
         AA( 2+2*MXLLDA ) = K
         AA( 3+2*MXLLDA ) = K
         AA( 4+2*MXLLDA ) = K
         AA( 5+2*MXLLDA ) = K
      ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.2 ) THEN
         AA( 1 ) = A
         AA( 2 ) = A
         AA( 3 ) = A
         AA( 4 ) = -A
         AA( 5 ) = -A
         AA( 1+MXLLDA ) = P
         AA( 2+MXLLDA ) = P
         AA( 3+MXLLDA ) = P
         AA( 4+MXLLDA ) = P
         AA( 5+MXLLDA ) = -P
      ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.0 ) THEN
         AA( 1 ) = -S
         AA( 2 ) = -S
         AA( 3 ) = -S
         AA( 4 ) = -S
         AA( 1+MXLLDA ) = -C
         AA( 2+MXLLDA ) = -C
         AA( 3+MXLLDA ) = -C
         AA( 4+MXLLDA ) = C
         AA( 1+2*MXLLDA ) = A
         AA( 2+2*MXLLDA ) = A
         AA( 3+2*MXLLDA ) = A
         AA( 4+2*MXLLDA ) = -A
         AA( 1+3*MXLLDA ) = C
         AA( 2+3*MXLLDA ) = C
         AA( 3+3*MXLLDA ) = C
         AA( 4+3*MXLLDA ) = C
         B( 1 ) = 1.0D0
         B( 2 ) = 0.0D0
         B( 3 ) = 0.0D0
         B( 4 ) = 0.0D0
      ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.1 ) THEN
         AA( 1 ) = A
         AA( 2 ) = -A
         AA( 3 ) = -A
         AA( 4 ) = -A
         AA( 1+MXLLDA ) = L
         AA( 2+MXLLDA ) = L
         AA( 3+MXLLDA ) = -L
         AA( 4+MXLLDA ) = -L
         AA( 1+2*MXLLDA ) = K
         AA( 2+2*MXLLDA ) = K
         AA( 3+2*MXLLDA ) = K
         AA( 4+2*MXLLDA ) = K
      ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.2 ) THEN
         AA( 1 ) = A
         AA( 2 ) = A
         AA( 3 ) = -A
         AA( 4 ) = -A
         AA( 1+MXLLDA ) = P
         AA( 2+MXLLDA ) = P
         AA( 3+MXLLDA ) = -P
         AA( 4+MXLLDA ) = -P
      END IF
      RETURN
      END
      SUBROUTINE SL_INIT( ICTXT, NPROW, NPCOL )
*
*     .. Scalar Arguments ..
      INTEGER            ICTXT, NPCOL, NPROW
*     ..
*
*  Purpose
*  =======
*
*  SL_INIT initializes an NPROW x NPCOL process grid using a row-major
*  ordering  of  the  processes. This routine retrieves a default system
*  context  which  will  include all available processes. In addition it
*  spawns the processes if needed.
*
*  Arguments
*  =========
*
*  ICTXT   (global output) INTEGER
*          ICTXT specifies the BLACS context handle identifying the
*          created process grid.  The context itself is global.
*
*  NPROW   (global input) INTEGER
*          NPROW specifies the number of process rows in the grid
*          to be created.
*
*  NPCOL   (global input) INTEGER
*          NPCOL specifies the number of process columns in the grid
*          to be created.
*
*  =====================================================================
*
*     .. Local Scalars ..
      INTEGER            IAM, NPROCS
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_GET, BLACS_GRIDINIT, BLACS_PINFO,
     $                   BLACS_SETUP
*     ..
*     .. Executable Statements ..
*
*     Get starting information
*
      CALL BLACS_PINFO( IAM, NPROCS )
*
*     If machine needs additional set up, do it now
*
      IF( NPROCS.LT.1 ) THEN
         IF( IAM.EQ.0 )
     $      NPROCS = NPROW*NPCOL
         CALL BLACS_SETUP( IAM, NPROCS )
      END IF
*
*     Define process grid
*
      CALL BLACS_GET( -1, 0, ICTXT )
      CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )
*
      RETURN
*
*     End of SL_INIT
*
      END
***************************************************





More information about the Beowulf mailing list