AMATA cluster performance

gaiz gaiz at se-ed.net
Sat Jul 13 21:16:51 PDT 2002


Dear All, 
I benchmark Amata cluster using HPL with Atlas. The hardware configuration is as 
follow: 
1. Athlon MP 2000+ dual processor 1 node (2 processor) 1GB RAM 
2. Athlon MP 1800+ dual processor 5 nodes (10 processors) 1GB RAM 
3. Classic Athlon 1000 GHz 6 nodes (6 processors) 512 MB RAM 
4. Athlon Thunderbird 1000 GHz 8 nodes (8 processors) 512 MB RAM 
All nodes are connected using FastEthernet switch. 
Tuning list: 
1. All unnecessary services are stop. The remain service is xinetd, nfs and 
kernel daemon. 
2. The problem size is 36400. That is multiple of 26. 
3. Other HPL parameter comes from testing. 
The peak performance is 16.81 Gflops at problem size 36400.

I also try 39000 but, there are memory thrashing. 


this is HPL.out.5, the output file from HPL.
============================================================================
HPLinpack 1.0  --  High-Performance Linpack benchmark  --  September 27, 2000
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK
============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   36400 
NB     :      90 
P      :       2 
Q      :      13 
PFACT  :   Crout 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 60)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
   1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
   2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
   3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be          1.110223e-16
- Computational tests pass if scaled residuals are less than           16.0

============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
W11C2C4        36400    90     2    13            1912.70          1.681e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0295859 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0115124 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0019131 ...... PASSED
============================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
----------------------------------------------------------------------------

End of Tests.
============================================================================





More information about the Beowulf mailing list