src/examples/BENCHMARK

    Tsunami benchmark

    On fitzroy (NIWA’s IBM/AIX Power6 supercomputer), follow the installation instructions, then edit src/examples/tsunami.c and replace /home/popinet/terrain/etopo2 with e.g. /hpcf/data/popinet/terrain/etopo2, then do

    cd src/examples
    qcc -O2 -g -pg tsunami.c -o tsunami ../kdt/kdt.o -lm

    Create e.g. tsunami.sh as

    #!/bin/bash
    
    #@ job_name         = basilisk
    #@ class            = General
    #@ job_type         = serial
    #@ output           = tsunami.out
    #@ error            = tsunami.log
    #@ account_no       = HAFS1301
    #@ wall_clock_limit = 45:00
    #@ environment      = COPY_ALL
    #@ queue
    
    ./tsunami

    then do

    llsubmit tsunami.sh

    After completion of the run do

    tail -n1 tsunami.out
    # Quadtree, 2992 steps, 2095.75 CPU, 2260 real, 6.85e+04 points.step/s, 28 var

    Note that the computational speed (in points.step/s) is computed using the real runtime (not the CPU time). For reference, the results on my system popinet-new (Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz) are

    tail -n1 tsunami.out
    # Quadtree, 2992 steps, 1190.36 CPU, 1216 real, 1.28e+05 points.step/s, 28 var

    Profiling graphs

    They can be generated using gprof2dot

    gprof tsunami gmon.out | gprof2dot.py | dot -Tpng -o tsunami.dot.png

    Parallel run

    We will first turn off the generation of movies as they are expensive. To do this edit tsunami.c and add

    #if 0
    event movies (t++) {
    ...
    #endif
    event do_adapt (i++) adapt();
    ...

    To use OpenMP on fitzroy do

    qcc -O2 -qsmp=omp tsunami.c -o tsunami ../kdt/kdt.o -lm

    and submit the job as

    #!/bin/bash
    
    #@ job_name         = basilisk
    #@ class            = General
    #@ job_type         = parallel
    #@ node             = 1
    #@ tasks_per_node   = 4
    #@ task_affinity    = core(1)
    #@ output           = tsunami-4.out
    #@ error            = tsunami-4.log
    #@ account_no       = HAFS1301
    #@ wall_clock_limit = 30:00
    #@ environment      = COPY_ALL
    #@ queue
    
    OMP_NUM_THREADS=4 ./tsunami

    The results on fitzroy are

    1 core
    # Quadtree, 2718 steps, 838.585 CPU, 921.3 real, 1.52e+05 points.step/s, 28 var
    4 cores
    # Quadtree, 2718 steps, 1030.61 CPU, 642.4 real, 2.17e+05 points.step/s, 28 var
    8 cores
    # Quadtree, 2718 steps, 1224.36 CPU, 536.5 real, 2.6e+05 points.step/s, 28 var

    The results on popinet-new are

    1 core
    # Quadtree, 2719 steps, 643.21 CPU, 647.8 real, 2.17e+05 points.step/s, 28 var
    4 cores
    # Quadtree, 2719 steps, 1398.57 CPU, 388.8 real, 3.62e+05 points.step/s, 28 var

    The results on mesu (UPMC’s cluster) are

    1 core
    # Quadtree, 2810 steps, 451.53 CPU, 451.9 real, 3.78e+05 points.step/s, 28 var
    2 cores
    # Quadtree, 2810 steps, 512.59 CPU, 315.4 real, 5.42e+05 points.step/s, 28 var
    4 cores
    # Quadtree, 2810 steps, 567.32 CPU, 229.8 real, 7.44e+05 points.step/s, 28 var
    8 cores
    # Quadtree, 2810 steps, 546.32 CPU, 147.1 real, 1.16e+06 points.step/s, 28 var
    16 cores
    # Quadtree, 2810 steps, 2236.47 CPU, 292.1 real, 5.85e+05 points.step/s, 28 var
    32 cores
    # Quadtree, 2810 steps, 5255.56 CPU, 361.4 real, 4.73e+05 points.step/s, 28 var
    64 cores
    # Quadtree, 2810 steps, 13266.5 CPU, 410.9 real, 4.16e+05 points.step/s, 28 var

    The results on heyward.dalembert.upmc.fr are

    1 core
    # Quadtree, 2810 steps, 898.34 CPU, 898.9 real, 1.89e+05 points.step/s, 28 var
    8 cores
    # Quadtree, 2810 steps, 1692.4 CPU, 393.6 real, 4.31e+05 points.step/s, 28 var