[Beowulf] HPL as a learning experience

Tue Mar 16 08:27:30 PDT 2010

Hi all,

I wanted to run high performance linpack mostly for fun (and of course to 
learn more about it and stress test a couple of machines). However, so far 
I've had very mixed results.

I downloaded the 2.0 version released in September 2008 and managed it to 
compile with mpich 1.2.7 on Debian Lenny. The resulting xhpl file is 
dynamically linked like this:

        linux-vdso.so.1 =>  (0x00007fffca372000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007fb47bca8000)
        librt.so.1 => /lib/librt.so.1 (0x00007fb47ba9f000)
        libgfortran.so.3 => /usr/lib/libgfortran.so.3 (0x00007fb47b7c4000)
        libm.so.6 => /lib/libm.so.6 (0x00007fb47b541000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fb47b32a000)
        libc.so.6 => /lib/libc.so.6 (0x00007fb47afd7000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb47bec4000)

Then I wanted to run a couple of tests on a single quad-CPU node (with 12 GB 
physical RAM), I used

http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html

to generate files for a single and a dual core test [1] and [2].

Starting the single core run does not pose any problem:
/usr/bin/mpirun.mpich -np 1  -machinefile machines /nfs/xhpl

where machines is just a simple file containing 4 times the name of this host. 
So far so good. 
============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
WR11C2R4       14592   128     1     1             407.94          5.078e+00
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0087653 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0209927 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0045327 ...... PASSED
============================================================================

When starting the two core run, I receive the following error message after a 
couple of seconds (after RSS hits the VIRT RAM value in top):

/usr/bin/mpirun.mpich -np 2  -machinefile machines /nfs/xhpl
p0_20535:  p4_error: interrupt SIGSEGV: 11
rm_l_1_20540: (1.804688) net_send: could not write to fd=5, errno = 32

SIGSEGV with p4_error indicates a seg fault within hpl - that's as far as I've 
come with google, but right now I have no idea how to proceed. I somehow doubt 
that this venerable program is so buggy that I'd hit it on my first day ;)

Any ideas where I might do something wrong?

Cheers

Carsten

[1]
single core test
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
14592         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

[2]
dual core setup
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
14592         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB