src/examples/BENCHMARK
Tsunami benchmark
On fitzroy (NIWA’s IBM/AIX Power6 supercomputer), follow the installation instructions, then edit src/examples/tsunami.c
and replace /home/popinet/terrain/etopo2
with e.g. /hpcf/data/popinet/terrain/etopo2
, then do
cd src/examples
qcc -O2 -g -pg tsunami.c -o tsunami ../kdt/kdt.o -lm
Create e.g. tsunami.sh
as
#!/bin/bash
#@ job_name = basilisk
#@ class = General
#@ job_type = serial
#@ output = tsunami.out
#@ error = tsunami.log
#@ account_no = HAFS1301
#@ wall_clock_limit = 45:00
#@ environment = COPY_ALL
#@ queue
./tsunami
then do
llsubmit tsunami.sh
After completion of the run do
tail -n1 tsunami.out
# Quadtree, 2992 steps, 2095.75 CPU, 2260 real, 6.85e+04 points.step/s, 28 var
Note that the computational speed (in points.step/s) is computed using the real runtime (not the CPU time). For reference, the results on my system popinet-new
(Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz) are
tail -n1 tsunami.out
# Quadtree, 2992 steps, 1190.36 CPU, 1216 real, 1.28e+05 points.step/s, 28 var
Profiling graphs
They can be generated using gprof2dot
gprof tsunami gmon.out | gprof2dot.py | dot -Tpng -o tsunami.dot.png
Parallel run
We will first turn off the generation of movies as they are expensive. To do this edit tsunami.c
and add
To use OpenMP on fitzroy do
qcc -O2 -qsmp=omp tsunami.c -o tsunami ../kdt/kdt.o -lm
and submit the job as
#!/bin/bash
#@ job_name = basilisk
#@ class = General
#@ job_type = parallel
#@ node = 1
#@ tasks_per_node = 4
#@ task_affinity = core(1)
#@ output = tsunami-4.out
#@ error = tsunami-4.log
#@ account_no = HAFS1301
#@ wall_clock_limit = 30:00
#@ environment = COPY_ALL
#@ queue
OMP_NUM_THREADS=4 ./tsunami
The results on fitzroy
are
1 core
# Quadtree, 2718 steps, 838.585 CPU, 921.3 real, 1.52e+05 points.step/s, 28 var
4 cores
# Quadtree, 2718 steps, 1030.61 CPU, 642.4 real, 2.17e+05 points.step/s, 28 var
8 cores
# Quadtree, 2718 steps, 1224.36 CPU, 536.5 real, 2.6e+05 points.step/s, 28 var
The results on popinet-new
are
1 core
# Quadtree, 2719 steps, 643.21 CPU, 647.8 real, 2.17e+05 points.step/s, 28 var
4 cores
# Quadtree, 2719 steps, 1398.57 CPU, 388.8 real, 3.62e+05 points.step/s, 28 var
The results on mesu
(UPMC’s cluster) are
1 core
# Quadtree, 2810 steps, 451.53 CPU, 451.9 real, 3.78e+05 points.step/s, 28 var
2 cores
# Quadtree, 2810 steps, 512.59 CPU, 315.4 real, 5.42e+05 points.step/s, 28 var
4 cores
# Quadtree, 2810 steps, 567.32 CPU, 229.8 real, 7.44e+05 points.step/s, 28 var
8 cores
# Quadtree, 2810 steps, 546.32 CPU, 147.1 real, 1.16e+06 points.step/s, 28 var
16 cores
# Quadtree, 2810 steps, 2236.47 CPU, 292.1 real, 5.85e+05 points.step/s, 28 var
32 cores
# Quadtree, 2810 steps, 5255.56 CPU, 361.4 real, 4.73e+05 points.step/s, 28 var
64 cores
# Quadtree, 2810 steps, 13266.5 CPU, 410.9 real, 4.16e+05 points.step/s, 28 var
The results on heyward.dalembert.upmc.fr
are
1 core
# Quadtree, 2810 steps, 898.34 CPU, 898.9 real, 1.89e+05 points.step/s, 28 var
8 cores
# Quadtree, 2810 steps, 1692.4 CPU, 393.6 real, 4.31e+05 points.step/s, 28 var