/** # COVID-19 Data From the [European Center for Disease Prevention and Control](https://www.ecdc.europa.eu). ~~~gnuplot Number of cases. The grey curve $f(x)$ is a fit of a [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function). set term svg font ',11' size 800,600 # get the data if it's not already there # note that ssconvert is required and can be installed with # sudo apt install gnumeric ! test -f covid-`date +%Y-%m-%d`.xslx || \ (wget https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-`date +%Y-%m-%d`.xlsx \ -O covid-`date +%Y-%m-%d`.xslx && \ ssconvert covid-`date +%Y-%m-%d`.xslx covid.csv) || \ rm -f covid-`date +%Y-%m-%d`.xslx set key above set xlabel 'Days (after the hundredth case)' set ylabel 'Number of cases' set grid # choose which countries to display countries="United_States_of_America Italy Spain France Germany United_Kingdom China Japan Brazil Sweden Argentina New_Zealand Australia India" replaceu(x)=system("echo ".x."| sed 's/_/ /g'") # Try a sigmoid function f(x)=a*exp((x-c)/b)/(exp((x-c)/b) + 1.) a = 50000. b = 5. c = 20. sum = 0. fit [0:]f(x) '< awk -f covid.awk < covid.csv' index 'Italy' \ using ($1):(sum=sum+$3) via a,b,c set label 1 at 17,50000 "double every 2 days" rotate by 57 set label 2 at 30,80000" double every 3 days" rotate by 45 set label 3 at 30,21000" double every 4 days" rotate by 37 tend = 300 set logscale y plot [0:tend][100:1e7]for [country in countries] sum=0, \ '< awk -f covid.awk < covid.csv' index country \ using 1:(sum=sum+$3) w lp t replaceu(country), \ 100.*exp(log(2)/2.*x) not w l linec -1, \ 100.*exp(log(2)/3.*x) not w l linec -1, \ 100.*exp(log(2)/4.*x) not w l linec -1, \ f(x) lc rgb 'grey' unset label 1 unset label 2 unset label 3 ~~~ ~~~gnuplot Relative number of cases set ylabel 'Relative number of cases (%)' plot [0:tend][1e-2:] for [country in countries] sum=0, \ '< awk -f covid.awk < covid.csv' index country \ using 1:((sum=sum+$3)/$5*100.) w lp t replaceu(country) ~~~ ~~~gnuplot Number of deaths g(x)=a1*exp((x-c1)/b1)/(exp((x-c1)/b1) + 1.) a1 = 50000. b1 = 5. c1 = 20. sum = 0. fit [0:]g(x) '< awk -f covid.awk < covid.csv' index 'Italy' \ using 1:(sum=sum+$4) via a1,b1,c1 set ylabel 'Number of deaths' plot [0:tend][1:3e5] for [country in countries] sum=0, \ '< awk -f covid.awk < covid.csv' index country \ using 1:(sum=sum+$4) w lp t replaceu(country), \ g(x) lc rgb 'grey' ~~~ ~~~gnuplot Relative number of deaths set ylabel 'Relative number of deaths (%)' plot [0:tend][1e-4:] for [country in countries] sum=0, \ '< awk -f covid.awk < covid.csv' index country \ using 1:((sum=sum+$4)/$5*100.) w lp t replaceu(country) ~~~ ~~~gnuplot Mortality (%) unset logscale set ylabel 'Mortality (%)' plot [0:tend][0:20]for [country in countries] sum=0, sum1=0, \ '< awk -f covid.awk < covid.csv' index country \ using 1:(100.*(sum1=sum1+$4)/(sum=sum+$3)) w lp t replaceu(country), \ 100.*g(x)/f(x) lc rgb 'grey' ~~~ ~~~gnuplot New cases per day set ylabel 'New cases / day' df(x)=(a*exp((x-c)/b))/(b*(exp((x-c)/b)+1))-(a*exp((2*(x-c))/b))/(b*(exp((x-c)/b)+1)**2) set logscale y plot [0:tend][10:1e5]for [country in countries] '< awk -f covid.awk < covid.csv' \ index country using 1:3 w p t replaceu(country), \ df(x) lc rgb 'grey' ~~~ ~~~gnuplot Relative number of new cases per day set ylabel 'Relative # new cases / day (%)' plot [0:tend][1e-4:]for [country in countries] '< awk -f covid.awk < covid.csv' \ index country using 1:($3/$5*100.) w p t replaceu(country) ~~~ ~~~gnuplot Deaths per day set ylabel 'Deaths / day' dg(x)=(a1*exp((x-c1)/b1))/(b1*(exp((x-c1)/b1)+1))-(a1*exp((2*(x-c1))/b1))/(b1*(exp((x-c1)/b1)+1)**2) plot [0:tend][1:]for [country in countries] '< awk -f covid.awk < covid.csv' \ index country using 1:4 w p t replaceu(country), \ dg(x) lc rgb 'grey' ~~~ ~~~gnuplot Relative # deaths per day set ylabel 'Relative # deaths / day (%)' plot [0:tend][1e-6:]for [country in countries] '< awk -f covid.awk < covid.csv' \ index country using 1:($4/$5*100.) w p t replaceu(country) ~~~ ~~~gnuplot New cases against total nr. of cases set ylabel 'New cases / day' set xlabel 'Number of Cases' set logscale xy plot [10:1e7][1:100000] \ for [country in countries] sum = 0, '< awk -f covid.awk < covid.csv' \ index country using (sum=sum+$3):3 w p t replaceu(country), \ x/5 ~~~ If we assume an exponential growth $N(t)\propto e^{t/b}$, the ratio of the total number of cases to the number of new cases per day is $$ N(t)/\frac{dN(t)}{dt} = b $$ which gives roughly $b=5$ days, according to the graph below. This means an average "doubling time" of $\log(2)\times 5\approx 3.5$ days. ~~~gnuplot Total number of cases / New cases set ylabel 'Total number of cases / New cases' set xlabel 'Number of Cases' unset logscale set logscale xy plot [10:1e7][:] \ for [country in countries] sum = 0, sum1 = 0, '< awk -f covid.awk < covid.csv' \ index country using (sum=sum+$3):((sum1=sum1+$3)/$3) w p t replaceu(country), 5 ~~~ ## Rescaling with a sigmoid The [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) or [logistic](https://en.wikipedia.org/wiki/Logistic_function) function used to fit all data is $$ f(t) = \frac{N}{1 + e^{- (t - t_d)/t_g}} $$ where $N$ is the maximum, $t_g$ is the "growth timescale" and $t_d$ is (half) the total duration. Note that the choice of this function can be justified based on the same arguments used by [Verhulst](https://en.wikipedia.org/wiki/Pierre_Fran%C3%A7ois_Verhulst) who first introduced it as a model of population growth (in particular to counter the arguments advanced by [Malthus](https://en.wikipedia.org/wiki/Thomas_Robert_Malthus)): i.e. the initial exponential growth is eventually bounded by a "crowding effect". In the case of epidemics, this can be interpreted as the increasing difficulty of the pathogen to spread as the density of already infected individuals increases. Note that this function $f(t)$ is solution of the following differential equation : $$ \frac{d f(t)}{dt}= \frac{f(t)}{t_g} (1 - \frac{f(t)}{N}) $$ The fit for the total number of cases for all countries is convincing, although beware of the ["von Neuman fitting effect"](https://en.wikiquote.org/wiki/John_von_Neumann): *With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.* ~~~gnuplot Rescaled number of cases. The grey curve is the [sigmoid function](). nc = 0; do for [c in countries] { nc = nc + 1; } array country[nc] array N[nc] array Ne[nc] array tg[nc] array td[nc] array tde[nc] i = 1; do for [c in countries] { country[i] = c; i = i + 1; } # Try a sigmoid function f(x,a,b,c)=a*exp((x-c)/b)/(exp((x-c)/b) + 1.) do for [i=1:nc] { a = 500000. b = 4.5 c = 100. sum = 0. set fit errorvariables set fit prescale print country[i] fit [0:]f(x,a,b,c) '< awk -f covid.awk < covid.csv' index country[i] \ using ($1):(sum=sum+$3) via a,b,c N[i] = a Ne[i] = a_err/a tg[i] = b td[i] = c tde[i] = c_err/c } set xlabel '(t - t_d)/t_g' set ylabel 'Number of cases / N' set grid unset logscale plot [-5:10][*:*] \ for [i=1:nc] sum=0, \ '< awk -f covid.awk < covid.csv' index country[i] \ using (($1-td[i])/tg[i]):((sum=sum+$3)/N[i]) w p t replaceu(country[i]), \ f(x,1.,1.,0.) lc rgb 'grey' t 'sigmoid' ~~~ And also for the number of deaths (with different fitting parameters). ~~~gnuplot Rescaled number of deaths array N1[nc] array N1e[nc] array t1g[nc] array t1d[nc] do for [i=1:nc] { a = 100000. b = 5. c = 80. sum = 0. print country[i] fit [0:]f(x,a,b,c) '< awk -f covid.awk < covid.csv' index country[i] \ using 1:(sum=sum+$4) via a,b,c N1[i] = a N1e[i] = a_err/a t1g[i] = b t1d[i] = c } set xlabel '(t - t1_d)/t1_g' set ylabel 'Number of deaths / N1' plot [-5:10][*:*] \ for [i=1:nc] sum=0, \ '< awk -f covid.awk < covid.csv' index country[i] \ using (($1-t1d[i])/t1g[i]):((sum=sum+$4)/N1[i]) w p t replaceu(country[i]), \ f(x,1.,1.,0.) lc rgb 'grey' t 'sigmoid' ~~~ Things are more noisy for the slopes. ~~~gnuplot Rescaled new cases per day phi(x)=1./sqrt(2.*pi)*exp(-x*x/2.) Phi(x)=(1.+erf(x/sqrt(2.)))/2. set xlabel '(t - t_d)/t_g' set ylabel 'New cases per day x t_g / N_{max}' df(x,a,b,c)=(a*exp((x-c)/b))/(b*(exp((x-c)/b)+1))-(a*exp((2*(x-c))/b))/(b*(exp((x-c)/b)+1)**2) set logscale y plot [-6:10][1e-3:1] \ for [i=1:nc] \ '< awk -f covid.awk < covid.csv' index country[i] \ using (($1-td[i])/tg[i]):($3*tg[i]/N[i]) w p t replaceu(country[i]), \ df(x,1.,1.,0.) lc rgb 'grey' t 'derivative of sigmoid' ~~~ ~~~gnuplot Rescaled deaths per day set xlabel '(t - t1_d)/t1_g' set ylabel 'Deaths per day x t1_g / N1_{max}' plot [-6:10][1e-3:1] \ for [i=1:nc] \ '< awk -f covid.awk < covid.csv' index country[i] \ using (($1-t1d[i])/t1g[i]):($4*t1g[i]/N1[i]) w p t replaceu(country[i]), \ df(x,1.,1.,0.) lc rgb 'grey' t 'derivative of sigmoid' ~~~ The evolutions for each country can be summarised by these three parameters, as represented below. Only countries for which the fitting error on $N$ or $N1$ is smaller than 20% are represented. ~~~gnuplot Fitting parameters for cases. The percentages give the relative fitting error on $N$. set xlabel 't_d x 2 (days)' set ylabel 't_g x log(2) (days)' # select only countries for which the fitting error is less than 20% select(i,j) = Ne[i] < 0.2 ? j : 1e100 unset key set xrange [35:120] set yrange [2:10] unset logscale scale=1000. plot \ "< echo '40 4.5 100000'" u 1:2:(sqrt($3/scale)) pt 7 ps variable lc 12, \ "< echo '40 4.5'" u 1:2:("100,000") w labels, \ td using (select($1,$2*2)):(tg[$1]*log(2)):(sqrt(N[$1]/scale)):1 \ pt 7 ps variable lc variable, \ td using (select($1,$2*2)):(tg[$1]*log(2)):(sprintf("%s\n%.1f\%", replaceu(country[$1]), Ne[$1]*100.)) w labels ~~~ ~~~gnuplot Fitting parameters for deaths. The percentages give the relative fitting error on $N$. set xlabel 't1_d x 2 (days)' set ylabel 't1_g x log(2) (days)' # select only countries for which the fitting error is less than 20% select1(i,j) = N1e[i] < 0.2 ? j : 1e100 scale=100. plot \ "< echo '40 4.5 10000'" u 1:2:(sqrt($3/scale)) pt 7 ps variable lc 12, \ "< echo '40 4.5'" u 1:2:("10,000") w labels, \ t1d using (select1($1,$2*2)):(t1g[$1]*log(2)):(sqrt(N1[$1]/scale)):1 \ pt 7 ps variable lc variable, \ t1d using (select1($1,$2*2)):(t1g[$1]*log(2)):(sprintf("%s\n%.1f\%", replaceu(country[$1]), N1e[$1]*100.)) w labels ~~~ A dummy program. Everything is done by gnuplot. */ int main() { fprintf (stderr, "Please ignore this error. It is just to force updates.\n"); exit (1); // always fails (forces rerun) }