[Beowulf] Need guidelines for NASA's NAS Parallel Benchmarks

Sangamesh B forum.san at gmail.com
Sat Jul 12 04:56:22 PDT 2008


Dear all,

  This is the first time am doing benchmark of a system - Intel quad core,
quad processor with RHEL 5 64 bit OS.

After unpacking the NAS NPB, NPB3.3.tar.gz package, I got following
directories:

Changes.log  NPB3.3-HPF.README  NPB3.3-JAV.README  NPB3.3-MPI  NPB3.3-OMP
NPB3.3-SER  README

I need to do the both MPI and OpenMP version benchmarks.

I did some benchmarks on a test machine from NBP3.3-MPI. It has:

BENCHMARK NAME       CLASS                TYPE
      [9]                               [7]                       [4]

   BT                                   S                       FULL
   CG                                  W                      SIMPLE
   DT                                   A                       FORTRAN
   EP                                  B                        EPIO
   FT                                   C
   IS                                    D
   LU                                   E
   MG
   SP


Obviously, the numebr of benchmarks will be 9 * 7 * 4 = 252.

So I need to get the benchmarks for all 252?

A sample benchmark:

[root at test NPB3.3-MPI]# make BT NPROCS=4 CLASS=S SUBTYPE=full VERSION=VEC

Since this benchmark is done on a test machine - dual core, dual opteron
AMD64 processor , I used MPICH2 and GNU Compilers.

To run the benchmark, I used the sample input data file given with NPB:

[root at test btbin]# mpdtrace -l
test_33638 (10.1.1.1)
[root at test btbin]# mpiexec -np 4 ./bt.S.4.mpi_io_full ./inputbt.data


 NAS Parallel Benchmarks 3.3 -- BT Benchmark

 Reading from input file inputbt.data
 collbuf_nodes  0
 collbuf_size   1000000
 Size:   64x  64x  64
 Iterations:  200    dt:   0.0008000
 Number of active processes:     4

 BTIO -- FULL MPI-IO write interval:   5

 0 1 32 32 32
  Problem size too big for compiled array sizes
 1 1 32 32 32
  Problem size too big for compiled array sizes
 2 1 32 32 32
  Problem size too big for compiled array sizes
 3 1 32 32 32
  Problem size too big for compiled array sizes
[2] 48 at [0x00000000006c1088], mpid_vc.c[62]
[0] 48 at [0x00000000006be4b8], mpid_vc.c[62]
[1] 48 at [0x00000000006bfdf8], mpid_vc.c[62]
[3] 48 at [0x00000000006c1088], mpid_vc.c[62]
[root at test btbin]#

Looks like 'run' is not successful. What's the wrong?  The input file
contains:

[root at test btbin]# cat inputbt.data
200       number of time steps
0.0008d0  dt for class A = 0.0008d0. class B = 0.0003d0  class C = 0.0001d0
64 64 64
5 0        write interval (optional read interval) for BTIO
0 1000000  number of nodes in collective buffering and buffer size for BTIO
[root at test btbin]#

As I doing the benchmarks first time, have no idea to prepare a new input
file.

What parameters should be changed? How these data will affect the benchmark
results?

Is it ok, if I just run

[root at test btbin]# mpiexec -np 4 ./bt.S.4.mpi_io_full

without using any input file?  The output of above dry run is:

[root at test btbin]# mpiexec -np 4 ./bt.S.4.mpi_io_full


 NAS Parallel Benchmarks 3.3 -- BT Benchmark

 No input file inputbt.data. Using compiled defaults
 Size:   12x  12x  12
 Iterations:   60    dt:   0.0100000
 Number of active processes:     4

 BTIO -- FULL MPI-IO write interval:   5

 Time step    1
 Writing data set, time step 5
 Writing data set, time step 10
 Writing data set, time step 15
 Time step   20
 Writing data set, time step 20
 Writing data set, time step 25
 Writing data set, time step 30
 Writing data set, time step 35
 Time step   40
 Writing data set, time step 40
 Writing data set, time step 45
 Writing data set, time step 50
 Writing data set, time step 55
 Time step   60
 Writing data set, time step 60
 Reading data set  1
 Reading data set  2
 Reading data set  3
 Reading data set  4
 Reading data set  5
 Reading data set  6
 Reading data set  7
 Reading data set  8
 Reading data set  9
 Reading data set  10
 Reading data set  11
 Reading data set  12
 Verification being performed for class S
 accuracy setting for epsilon =  0.1000000000000E-07
 Comparison of RMS-norms of residual
           1 0.1703428370954E+00 0.1703428370954E+00 0.6680519237820E-14
           2 0.1297525207005E-01 0.1297525207003E-01 0.9351949888112E-12
           3 0.3252792698950E-01 0.3252792698949E-01 0.4859455174690E-12
           4 0.2643642127515E-01 0.2643642127517E-01 0.7155062549945E-12
           5 0.1921178413174E+00 0.1921178413174E+00 0.9101712010679E-14
 Comparison of RMS-norms of solution error
           1 0.1149036328945E+02 0.1149036328945E+02 0.4854294277047E-13
           2 0.9156788904727E+00 0.9156788904727E+00 0.4195107810359E-13
           3 0.2857899428614E+01 0.2857899428614E+01 0.9649723729104E-13
           4 0.2598273346734E+01 0.2598273346734E+01 0.1391264769245E-12
           5 0.2652795397547E+02 0.2652795397547E+02 0.3629324024933E-13
 Verification Successful

 BTIO -- statistics:
   I/O timing in seconds   :           0.02
   I/O timing percentage   :          16.06
   Total data written (MB)[0] 712 at [0x00000000006d7b98], dataloop.c[505]
[0] 296 at [0x00000000006d79c8], dataloop.c[324]
[0] 288 at [0x00000000006d7398], dataloop.c[324]
[0] 648 at [0x00000000006d7698], dataloop.c[324]
[0] 296 at [0x00000000006d71c8], dataloop.c[324]
[0] 288 at [0x00000000006d6ff8], dataloop.c[324]
[0] 56 at [0x00000000006bf458], mpid_datatype_contents.c[62]
[0] 72 at [0x00000000006bee68], mpid_datatype_contents.c[62]
[0] 72 at [0x00000000006bf538], mpid_datatype_contents.c[62]
[0] 864 at [0x00000000006bea58], dataloop.c[505]
[0] 368 at [0x00000000006d6b18], dataloop.c[324]
[0] 368 at [0x00000000006d68f8], dataloop.c[324]
[0] 648 at [0x00000000006d65c8], dataloop.c[324]
[0] 368 at [0x00000000006d63a8], dataloop.c[324]
[0] 368 at [0x00000000006bf238], dataloop.c[324]
[0] 56 at [0x00000000006be778], mpid_datatype_contents.c[62]
[0] 80 at [0x00000000006be958], mpid_datatype_contents.c[62]
[0] 80 at [0x00000000006be858], mpid_datatype_contents.c[62]
[0] 72 at [0x00000000006be688], dataloop.c[324]
[0] 72 at [0x000000000[1] 720 at [0x00000000006d7b98], dataloop.c[505]
[1] 296 at [0x00000000006d79c8], dataloop.c[324]
[1] 296 at [0x00000000006d7398], dataloop.c[324]
[1] 648 at [0x00000000006d7698], dataloop.c[324]
[1] 296 at [0x00000000006d71c8], dataloop.c[324]
[1] 296 at [0x00000000006d6ff8], dataloop.c[324]
[1] 56 at [0x00000000006c0d98], mpid_datatype_contents.c[62]
[1] 72 at [0x00000000006c07a8], mpid_datatype_contents.c[62]
[1] 72 at [0x00000000006c0e78], mpid_datatype_contents.c[62]
[1] 864 at [0x00000000006c0398], dataloop.c[505]
[1] 368 at [0x00000000006d6b18], dataloop.c[324]
[1] 368 at [0x00000000006d68f8], dataloop.c[324]
[1] 648 at [0x00000000006d65c8], dataloop.c[324]
[1] 368 at [0x00000000006d63a8], dataloop.c[324]
[1] 368 at [0x00000000006c0b78], dataloop.c[324]
[1] 56 at [0x00000000006c00b8], mpid_datatype_contents.c[62]
[1] 80 at [0x00000000006c0298], mpid_datatype_contents.c[62]
[1] 80 at [0x00000000006c0198], mpid_datatype_contents.c[62]
[1] 72 at [0x00000000006bffc8], dataloop.c[324]
[1] 72 at [0x000000000[2] 720 at [0x00000000006d7b28], dataloop.c[505]
[2] 296 at [0x00000000006d7958], dataloop.c[324]
[2] 296 at [0x00000000006d6c98], dataloop.c[324]
[2] 648 at [0x00000000006d7628], dataloop.c[324]
[2] 296 at [0x00000000006d7158], dataloop.c[324]
[2] 296 at [0x00000000006d6f88], dataloop.c[324]
[2] 56 at [0x00000000006d5b98], mpid_datatype_contents.c[62]
[2] 72 at [0x00000000006d5d68], mpid_datatype_contents.c[62]
[2] 72 at [0x00000000006d5c78], mpid_datatype_contents.c[62]
[2] 864 at [0x00000000006d5788], dataloop.c[505]
[2] 368 at [0x00000000006d68f8], dataloop.c[324]
[2] 368 at [0x00000000006d66d8], dataloop.c[324]
[2] 648 at [0x00000000006d63a8], dataloop.c[324]
[2] 368 at [0x00000000006d6188], dataloop.c[324]
[2] 368 at [0x00000000006d5f68], dataloop.c[324]
[2] 56 at [0x00000000006c1348], mpid_datatype_contents.c[62]
[2] 80 at [0x00000000006d5688], mpid_datatype_contents.c[62]
[2] 80 at [0x00000000006c1428], mpid_datatype_contents.c[62]
[2] 72 at [0x00000000006c1258], dataloop.c[324]
[2] 72 at [0x00000000006be598], dataloop.c[324]
[0] 32 at [0x00000000006be318], mpid_datatype_contents.c[62]
[0] 48 at [0x00000000006be4b8], mpid_vc.c[62]
06bfed8], dataloop.c[324]
[1] 32 at [0x00000000006bfd28], mpid_datatype_contents.c[62]
[1] 48 at [0x00000000006bfdf8], mpid_vc.c[62]
[3] 720 at [0x00000000006d7b28], dataloop.c[505]
[3] 296 at [0x00000000006d7958], dataloop.c[324]
[3] 296 at [0x00000000006d6c98], dataloop.c[324]
[3] 648 at [0x00000000006d7628], dataloop.c[324]
[3] 296 at [0x00000000006d7158], dataloop.c[324]
[3] 296 at [0x00000000006d6f88], dataloop.c[324]
[3] 56 at [0x00000000006d5b98], mpid_datatype_contents.c[62]
[3] 72 at [0x00000000006d5d68], mpid_datatype_contents.c[62]
[3] 72 at [0x00000000006d5c78], mpid_datatype_contents.c[62]
[3] 864 at [0x00000000006d5788], dataloop.c[505]
[3] 368 at [0x00000000006d68f8], dataloop.c[324]
[3] 368 at [0x00000000006d66d8], dataloop.c[324]
[3] 648 at [0x00000000006d63a8], dataloop.c[324]
[3] 368 at [0x00000000006d6188], dataloop.c[324]
[3] 368 at [0x00000000006d5f68], dataloop.c[324]
[3] 56 at [0x00000000006c1348], mpid_datatype_contents.c[62]
[3] 80 at [0x00000000006d5688], mpid_datatype_contents.c[6206c1168],
dataloop.c[324]
[2] 32 at [0x00000000006c0fb8], mpid_datatype_contents.c[62]
[2] 48 at [0x00000000006c1088], mpid_vc.c[62]
]
[3] 80 at [0x00000000006c1428], mpid_datatype_contents.c[62]
[3] 72 at [0x00000000006c1258], dataloop.c[324]
[3] 72 at [0x00000000006c1168], dataloop.c[324]
[3] 32 at [0x00000000006c0fb8], mpid_datatype_contents.c[62]
[3] 48 at [0x00000000006c1088], mpid_vc.c[62]
 :           0.83
   I/O data rate  (MB/sec) :          49.85


 BT Benchmark Completed.
 Class           =                        S
 Size            =             12x  12x  12
 Iterations      =                       60
 Time in seconds =                     0.10
 Total processes =                        4
 Compiled procs  =                        4
 Mop/s total     =                  2204.23
 Mop/s/process   =                   551.06
 Operation type  =           floating point
 Verification    =               SUCCESSFUL
 Version         =                      3.3
 Compile date    =              12 Jul 2008

 Compile options:
    MPIF77       = /opt/libs/mpi/mpich2/1.0.6p1/bin/mpif77
    FLINK        = $(MPIF77)
    FMPI_LIB     = (none)
    FMPI_INC     = (none)
    FFLAGS       = -O
    FLINKFLAGS   = -O
    RAND         = (none)


 Please send the results of this run to:

 NPB Development Team
 Internet: npb at nas.nasa.gov

 If email is not available, send this to:

 MS T27A-1
 NASA Ames Research Center
 Moffett Field, CA  94035-1000

 Fax: 650-604-3957


[root at test btbin]#

Can any one this list have the experience on NAS Parallel Benchmarks? If so,
give some guidelines to do the benchmarks properly.

I need to produce the Benchmark results within three days. Is this can be
done?

Thanks in advance,
Sangamesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080712/a706cfce/attachment.html>


More information about the Beowulf mailing list