Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] hang-up of HPC Challenge

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mikhail Kuzminsky kus at free.net
Wed Aug 20 10:52:51 PDT 2008


In message from Greg Lindahl <lindahl at pbm.com> (Tue, 19 Aug 2008 
19:39:38 -0700):
>On Wed, Aug 20, 2008 at 03:45:43AM +0400, Mikhail Kuzminsky wrote:
>> For some localization of possible problem reason, I ran pure HPL 
>>test  
>> instead of HPCC. HPL performs direct output to screen instead of 
>>writing 
>> to the file.
>>
>> Using MPICH w/np=8 I obtained normal HPL result for N=35000 - 
>>including  
>> 3 "PASSED" strings for ||Ax-b|| calculations. BUT ! Linux hang-ups  
>> immediately after output of this strings.
>
>Well, what did your configuration file tell HPL to do? Does it have
>another test, perhaps a bigger one, or is it supposed to exit? We
>aren't mind-readers.

Pls sorry: I performed now 2 HPL run cases for the same N=10000, 

(1st) - "single" HPL run, i.e. ONE N=10000, ONE blocksize value, and 
ONE any other HPL.dat parameter.

(2nd) - "multiple" HPL run w/same (one) N=10000 and blocksize=100, but 
with a sets of PFACTS etc (see the output below).

1st run finished successfully, 2nd lead to Linux hang-up.     

Yours
Mikhail 

"single" HPL run :
HPLinpack 1.0a  --  High-Performance Linpack benchmark  --   January 
20, 2004
Written by A. Petitet and R. Clint Whaley,  Innovative Computing 
Labs.,  UTK
============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :     100
PMAP   : Row-major process mapping
P      :       2
Q      :       4
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 16 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
    1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
    2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
    3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 
         1.110223e-16
- Computational tests pass if scaled residuals are less than 
          16.0

============================================================================
T/V                N    NB     P     Q               Time 
            Gflops
----------------------------------------------------------------------------
WR11C2R4       10000   100     2     4              23.32 
         2.859e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0767386 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0181586 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0040588 ...... 
PASSED
============================================================================

Finished      1 tests with the following results:
               1 tests completed and passed residual checks,
               0 tests completed and failed residual checks,
               0 tests skipped because of illegal input values.
----------------------------------------------------------------------------

End of Tests.
============================================================================
[1]+  Done                    mpirun -np 8 xhpl

"multiple" HPL run:
HPLinpack 1.0a  --  High-Performance Linpack benchmark  --   January 
20, 2004
Written by A. Petitet and R. Clint Whaley,  Innovative Computing 
Labs.,  UTK
============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :     100
PMAP   : Row-major process mapping
P      :       2
Q      :       4
PFACT  :    Left    Crout    Right
NBMIN  :       2        4
NDIV   :       2
RFACT  :    Left    Crout    Right
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 16 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
    1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
    2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
    3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 
         1.110223e-16
- Computational tests pass if scaled residuals are less than 
          16.0

============================================================================
T/V                N    NB     P     Q               Time 
            Gflops
----------------------------------------------------------------------------
WR00L2L2       10000   100     2     4              23.02 
         2.897e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0980967 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0232126 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0051885 ...... 
PASSED
============================================================================
T/V                N    NB     P     Q               Time 
            Gflops
----------------------------------------------------------------------------
WR00L2L4       10000   100     2     4              22.97 
         2.903e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0832258 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0196937 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0044019 ...... 
PASSED
============================================================================
T/V                N    NB     P     Q               Time 
            Gflops
----------------------------------------------------------------------------
WR00L2C2       10000   100     2     4              22.95 
         2.905e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0980967 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0232126 ...... 
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0051885 ...... 
PASSED

... and here Linux hangs ...


>
>-- greg
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list