sandbox/prouvost/miscellaneous/awk.c

    int main() {}

    I share here some useful examples of what we can do with awk when post-treating data. You can do the same with other programming languages such as python, but awk is fun, it does the job in only one line and it’s easy to write awk commands to use inside gnuplot plots. It’s also very efficient when used in bash programs. Note that I don’t pretend to write the most efficient awk programs.

    Elements of AWK

    Awk is a program which reads an input file line by line and allows to perform operations based on the data written in these lines.

    Extract lines containing a particular value

    Lets assume you made a parametric study/series of experiments and you registered in a single file all the results. For example, in the first column you have the radius of a bubble, and in the second one the pressure in the bubble at a given time. You can simply extract the results for a particular radius (lets says R=1) writing

    /*
    awk '$1==1' data_file
    */

    where $1 describes the values in the first columns of the data file.

    set term pngcairo enhanced size 500,500
    set output 'ex1.png'
    
    p "./../data1" w p pt 7 ps 2 lc rgb 'blue' t 'original data',\
    "< awk '$1==1' ./../data1" w p pt 7 lc rgb 'red' t 'data R=1'
    Example extract lines containing a particular value (script)

    Example extract lines containing a particular value (script)

    Extract max value

    Lets assume you want to extract the maximum pressure value in the previously described experiment and the corresponding bubble radius. Then,

    /*
    awk '{ if ($2>tmp_p) {tmp_p=$2;tmp_R=$1} } END{print tmp_R, tmp_p}' data_file
     */

    will do the job. Note that the variables declared inside awk are automatically initialized with the value 0.

    set term pngcairo enhanced size 500,500
    set output 'ex2.png'
    
    p "./../data1" w p pt 7 ps 2 lc rgb 'blue' t 'original data',\
    "< awk '{ if ($2>tmp_p) {tmp_p=$2;tmp_R=$1} } END{print tmp_R, tmp_p}' ./../data1" w p pt 7 lc rgb 'red' t 'max pressure'
    Example extract maximum pressure and corresponding radius (script)

    Example extract maximum pressure and corresponding radius (script)

    Compute moving average

    Lets assume you registered a noisy pressure signal over time and you want to filter it using a moving average on 100 points. In the first column is the time and in the second the pressure. You can use

    /*
    awk -v n=100 '{sum -= mem[NR%n]; mem[NR%n]=$2; sum += $2} (NR>(n-1)){print $1,sum/n}' data_file
     */
    set term pngcairo enhanced size 500,500
    set output 'ex3.png'
    
    p "./../data2" u 2:5 w p lc rgb 'blue' t 'original data',\
    "< awk -v n=100 '{sum -= mem[NR%n]; mem[NR%n]=$5; sum += $5} (NR>(n-1)){print $2,sum/n}' ./../data2" w l lw 2 lc rgb 'red' t 'moving average'
    Example moving average (script)

    Example moving average (script)

    Integration

    Last example for now, lets assume you want to compute the impulse \displaystyle I = \int_t (p-p_0) dt with p_0=1 and you only have registered the time in the first column and the pressure in the second one. You can use a “trapeze” integral

    /*
    awk  '{ if ($1!=tmpt) {s+=($2+tmpy)/2*($1-tmpt)} ; print $1,s ; tmpy=$2 ; tmpt=$1  }'  data_file
     */
    set term pngcairo enhanced size 500,500
    set output 'ex4.png'
    
    set ytics nomirror tc "blue"
    set y2tics nomirror tc "red"
    set ylabel "p" tc "blue"
    set y2label "I = \int (p-1) dt" tc "red"
    
    p "./../data3" u 1:3 w p lc rgb 'blue' axis x1y1 not,\
    "< awk 'BEGIN{s=0}{ if ($1!=tmpt) {s+=(($3+tmpy)/2-1)*($1-tmpt)} ; print $1,s ; tmpy=$3 ; tmpt=$1 }' ./../data3" w p lc rgb 'red' axis x1y2 not
    Example integrate (script)

    Example integrate (script)

    Structure/options I often use

    The option -v allows to use variables computed outside of the awk command. The keyword BEGIN is for operation to do at the begining of the program, END at the end (after reading all lines), (NR>1) is for all line number strictly greater than 1.

    /*
    awk -v var=42 -v var2=3 'BEGIN{...} (NR>1){...} END{...}' filename
    */

    Note: the index of the array defined in awk can be everything: you can even define non integer indices such as a[0.1] (ex: awk ‘{a[0.1]=1 ; print a[0.1]}’ data1 works…)

    User-defined Functions

    awk does not contain all “standard” mathematical functions as built-in functions. It contains these ones. The “missing” functions (such as arcsin, arctanh, …) and variables (\pi) can be deduced and created from the built-in functions.

    Example:

    /*
    
    awk 'function asin(x) { return atan2(x, sqrt(1.-x*x)) } BEGIN {pi = atan2(0, -1) ; print pi,asin(pi)} '
    
    */

    arcsin : function asin(x) { return atan2(x, sqrt(1.-x*x)) }

    arctanh : function atanh(x) {return log((1.+x)/(1.-x))/2}

    \pi : pi = atan2(0, -1)

    Note: beware of the domain of definition, and problematic values with these definitions (x close to 1) for arctanh for example…)

    Note: It is possible to write awk scripts and not all the commands in one line

    Miscellaneous

    awk as a calculator in bash:

    /*
    a=$(awk "BEGIN {toto = 1+1 ; print toto}")
    */

    It may be usefull to reorganise a file to plot it with line-points with gnuplot

    /*
    awk '($1>=13){print ; c++ ;  if (!(c%3)) print Blank} '  data_file
    */