Time-Discrete Parameter Identification Algorithms for Two Deterministic Epidemiological Models applied to the Spread of COVID-19


 The pandemic outbreak of COVID-19 threatens people worldwide. Politicians around the globe have to balance different interests. Accordingly, they make decisions which heavily impact our daily life or even curtail fundamental rights. They consequently base their judgements on advices from scientists. For these reasons, many epidemiological models have appeared and have been famous since the beginning of the twentieth century. However, scientists depend on rich data to evaluate them and their assumptions. As this pandemic outbreak has lead to large data collections by the John Hopkins University since its beginning, we take advantage of this situation and compare two deterministic epidemiological models heavily used - the Susceptible-Infectious-Recovered model (SIR model) and the Susceptible-Infectious-Recovered-Dead model (SIRD model). In contrast to other works which concentrate on future predictions, we aim to investigate the validity of assumptions with respect to available data. For that purpose, instead of using these models forward in time, we propose modified time-discrete versions of the continuous models with relaxing all transfer rates to be time-varying and develop algorithms to predict these coefficients based on the principle of “divide-and-conquer” for the inverse problems. Since Asia, Europe and North America are especially affected by this pandemic, we choose countries from these continents to illustrate our results. In the main part of this article, we escepially discuss our results for Hubei (China). As supplementary material, we illustrate our findings for the aforementioned choice of countries. We finally draw some conclusions for future applications of these models and data acquisition.


Introduction
Since its outbreak in Wuhan (China) in December 2019, the worldwide spread of COVID-19 has enormously impacted our daily life and governments around the globe are actually facing tremendous different challenges. Due to complexity of the current situation and extensive consequences of governments' decisions, politicians heavily rely on scientific advice. Thus, John Hopkins University collects epidemiological data from many countries during the last months 1 . Additionally, many biological and medical studies regarding different aspects of this new corona-virus have been rapidly appeared in scientific journals [2][3][4][5][6][7][8] . For example, Lauer et al. published one study of estimation of the incubation time for COVID-19 9 .
Additionally, even more theoretical scientists like mathematicians have recently gained further interests in epidemiology. One of mathematical epidemiology's most groundbreaking work introduced the now well-known SIR model by Kermack and McKermack in 1927 10 . Kermack and McKendrick assumed a fixed population size and decomposed this population in three different homogeneous groups of people, namely susceptible people, infectious people and recovered people. They excluded births, deaths and deaths by disease from their model. Due to its success and simplicity, their works were reprinted in 1991 [11][12][13] . In upcoming decades, epidemiologists and mathematicians have proposed many variants and extensions of this basic model by, for example, adding age or spatial structures [14][15][16][17][18] .
After the outbreak of COVID-19, many scientists are recently publishing articles with emphasis on epidemic forecasts which strongly relate to mathematical modelling. Many approaches, mainly focusing on stochastic arguments, with respect to predicting forecasts of the total number of infected people have been appeared during the last weeks [19][20][21][22][23][24] or in the past 25,26 . Recently, neural networks have been applied to forecasting 27 .
Since these questions definitely deserve their current attention, current data acquisition by John Hopkins University allows us to certainly take a closer look on the parameter identification problem in such epidemiological models, so-called inverse problems 28 . Here, one aims to estimate different parameters appearing in our epidemiological models based on observed data. Different deterministic approaches by optimisation techniques 29,30 like least-squares methods 31,32 , variational imbedding 33 and Gaussian fitting 34 or stochastic models like time-series 35 or parameter identification by Bayesian methods 36 have been successfully used in inverse epidemiological problems. Evaluating the calculated constant or time-varying transfer rates on acquired data, we have reasonable foundations to make model assumptions plausible.
In contrast to many of the aforementioned works, our main goal is the proposal of parameter identification algorithms for SIR and SIRD models which avoid optimisation techniques for the full problem and use principal techniques from applied linear algebra and applied statistics on a decoupled time-discrete problem formulation for both the SIR and the SIRD model. We start with continuous formulations of the time-continuous SIR model and the time-continuous SIRD model. Based on forward finite differences, we propose implicit time-discrete models for both model approaches. However, instead of using constant transfer coefficients as in the classical models, we allow all transfer rates in these models to be time-varying. With this proposed modification, we obtain implicit time-discrete models with time-varying transfer rates we want to identify. Given observed data for susceptible people, infectious people, recovered and dead people, we can now identify our time-varying parameters by simply solving one small linear system at each time step instead of application of optimisation techniques with respect to the full SIR or SIRD problem. Hence, we are able to avoid high-dimensional, possible non-convex problems with many local minima. If we further want the earlier parameters constant, we additionally propose an estimation based on arithmetic means, medians and standard deviations for uncertainty estimation which can readily be applied to the data.
Due to current data acquisition of John Hopkins University, we test our developed parameter identification algorithms on data of the worldwide spread of COVID-19 and evaluate their performances. Finally, we discuss some future research possibilities.

Methods
In this section, we describe our continuous and time-discrete forward SIR and SIRD models. The ideas of these models heavily base on the pioneering works by Kermack and McKendrick [10][11][12][13] . Afterwards, we propose two algorithms for solving each inverse problem to compute time-varying transfer coefficients in these models.

SIR Model
We begin with describing the SIR model. Before stating equations, we give some information on assumptions of the model. After its formulation, we propose a time-discrete version of the model. In contrast to traditional works with constant transfer coefficients, we introduce time-varying coefficient for our parameter identification algorithm which we finally portray in this subsection. From a mathematical viewpoint, we can directly solve this inverse problem without application of regression algorithms. Instead, we apply optimisation techniques on smaller subproblems.

Assumptions
In its classical formulation [16][17][18] , the following assumptions should be fulfilled for application of the SIR model: • One neglects the population's age-structure, inhomogeneities with respect to spatial dependence and group behaviour.
One assumes that the population consists of homogeneous pools like in simulations of different biological system 54 or chemical reactions 55 ; • In addition, one divides the population into the three homogeneous pools of susceptible people (S), infectious people (I) and recovered people (R); • Since the considered time period is small, one ignores natural births and natural and infection-related deaths within the population. Thus, the population size N is constant in time and it holds for all times t in the respective time interval; • Except from the transfer coefficient from susceptible people to infectious people which might depend on government's rulings in this classical formulation, all other transfer rate between these pools are constant.
For further details on assumptions, we refer the interested reader to two papers by Brauer 56,57 . We illustrate a sketch of this model in Figure 1. Let further [t 1 ,t M ] be the respective time interval with start time t 1 and end time t M . Later, since our constant time data step amounts to one day, we consider a sequence t j M j=1 of time points regarding our time-discrete model versions.

2/14
Susceptible People (S) Figure 1. Sketch of the SIR model. We divide the people into three homogeneous pools of susceptible people (P), infectious people (I) and recovered people (R). The arrows portray flows between these pools with respective transfer rates.

Continuous Formulation
As an abbreviation for the time derivative of a function f , we simply write The classical continuous formulation of the SIR model reads . This means that solutions could be strongly differentiable in a classical or weakly differentiable in a Lebesgue-measurable manner 58 .

Time-Discrete Formulation
For abbreviation, we shortly write f j := f (t j ) for all j ∈ {1, . . . , M}. We apply finite differences such that the approximation . . , M − 1} since ∆t = 1 with respect to our data and for ease of notation.
In addition, we apply an implicit scheme for forward modelling in contrast to Dehning et al. in their stochastic SIR model 19 .
To state our modified time-discrete version of the SIR model (2), we relax our conditions and allow time-dependence of the transfer coefficient β . Consequently, application of finite differences leads to as our time-discrete version of (2) for all j ∈ {1, . . . , M − 1} with initial conditions S 1 > 0, I 1 > 0 and R 1 ≥ 0 as stated before.

Time-Discrete Inverse Parameter Identification SIR Problem
Why do we proceed with (3) although the transfer coefficient β is generally assumed to be constant in time? If β satisfied this condition, we must solve a regression problem due to for all j ∈ {1, . . . , M − 1} to obtain β and possibly use all data points. Depending on the cost function defined for this high-dimensional, probably non-convex minimization problem, this normally yields an optimisation problem with many local extrema if we take also add the other subproblems to our investigation. Since we want to avoid this, we use our transformed system (3).

Preprocessing of Population Data
Our input data consists of cumulative infected people for M time points. Hence, we necessarily must process these data first to develop an algorithm for our time-discrete inverse parameter identification SIR problem (6). Our procedure reads for all j ∈ {1, . . . , M}.

Possible Parameter Reduction by Simple Statistical Analysis
Since often the transfer coefficient β is assumed to be constant in time, we further need to process our time-varying transfer coefficient sequence β j M j=2 . For this purpose, we simply calculate the arithmetic mean and respectively the median β . With (8), we obtain the standard deviation

Optimisation Procedure for Time-Varying Parametric Coefficients
Since social behaviour even changes doe to severe epidemics before established rulings by governments, Chowell et al. 60 , Fisman et al. 61 , Althaus 62 and Brauer 63 suggested application of a time-varying contact rate (exponential decaying) with positive constants δ and ε which adapts to given data. For that purpose, we need to identify a proper cost function. Since we do not want to discuss optimisation techniques in details (we refer interested readers to the book of Nocedal and Wright 64 ), we simply use Nelder-Mead-algorithms 65 as implemented in GNU Octave's function FMINSEARCH 59 due to its simplicity and its ability to capture local extrema of non-differentiable functions. However, one still has to deal with problems of ending in local extrema 64 , but this is out of the scope of this work. Now, our scoring function for minimization of the desired parameters δ and ε, given data points (t j , α j ) for all j ∈ {2, . . . , M}, reads In case of Hubei(China), we additionally consider Gaussian recovering rates with real constants a, b and c. We proceed similarly to the case of exponentially decaying contact rates and we want to shortly note that an additional uncertainly quantification analysis by the Fisher-Information matrix (i.e. Hessian matrix) becomes possible by applying a two times continuously differentiable function like the l 2 -norm or mollified versions of the above used l 1 -norm. However, we relinquish it here.

SIRD Model
We start with a description of the SIRD model. Before giving its mathematical formulation, we report typical assumptions of the model. Afterwards, we present a time-discrete version of the model. Again, we use time-varying coefficients for our parameter identification algorithm in contrast to constant ones.

Assumptions
All assumptions of the SIR model apply. Additionally, we extend the pool of recovered people (R) by an additional homogeneous pool of dead people (D) and for ease of notion, we still count dead people (D) for total population size N because we estimate D N.

Susceptible People (S) Infectious People (I) Recovered People (R)
Dead People (D) γ · I (t) Figure 2. Sketch of the SIRD model. We divide the people into three homogeneous pools of susceptible people (P), infectious people (I), recovered people (R) and dead people (D). The arrows portray flows between these pools with respective transfer rates.
A sketch of this model is given in Figure 2.

Continuous Formulation
We have the classical continuous formulation

Time-Discrete Formulation
Equivalently to the case of the modified time-discrete SIR model, we generalize the constant transfer rates and allow them to be time-varying. Accordingly, this yields for our modified time-discrete SIRD model for all j ∈ {1, . . . , M − 1} with initial conditions S 1 > 0, I 1 > 0, R 1 ≥ 0 and D 1 ≥ 0.

Time-Discrete Inverse Parameter Identification SIRD Problem
We proceed similarly to the time-discrete inverse parameter identification SIR problem. At first, we notice for all j ∈ {1, . . . , M − 1}. We reduce (14) by (15) and get for all j ∈ {1, . . . , M − 1}. The validity of the same argument as in the time-discrete inverse parameter identification SIR problem holds.

Preprocessing of Population Data
Similarly to the case of the time-discrete inverse parameter identification SIR problem, we need to process the same given for all j ∈ {1, . . . , M}.

SIRD Parameter Identification Algorithm
We summarize our procedure in Table 2 and use GNU Octave to implement this algorithm 59 .

Parameter Reduction by Simple Statistical Analysis
We proceed as in the corresponding description of the time-discrete parameter identification SIR model.

Optimisation Procedure for Time-Varying Parametric Coefficients
Again, we apply the same procedure as for the time-discrete parameter identification SIR model.

Results
Here, we consider our results for Hubei (China) because the worldwide spread of COVID-19 started there. Hence, the longest timed data are available in this case 1 . In this case, the population size reads N = 59170000 as stated in our additional information. We consider a time period ranging from 22 January, 2020 (t = 1) to 30 April, 2020 (t = 100).
In Figure 3, we plot cumulative cases of reported infected people, cumulative cases of reported recovered people and cumulative cases of reported dead people. Apparently, there are two jumps around day 35 and day 85 of the COVID-19 epidemic which possibly occur due to changes in Chinese reports. Especially, the last jump values are excluded for calculation of means and medians. For details, we refer to our code availability statement.
In all different subplots of Figure 4, we depict our results of Hubei (China) for our proposed time-discrete parameter identification SIR algorithm. We relinquish integrating the standard deviations of arithmetic mean and median in our plots for our estimates of β . In all subplots of Figure 5, we illustrate our results of Hubei (China) for our time-discrete parameter identification SIRD algorithm. Especially, the last jump values are excluded for calculation of means and medians. For details, we refer to our code availability statement.

Discussion
In media, we normally see non-decreasing curves which report cumulative cases of infected, dead or recovered people. However, we note that we need those values at a specific time in our aforementioned SIR and SIRD models. Hence, the grouped plots of Figures 4 and 5 differ in general from the ones we notice in Figure 3 or in media.
Regarding all transfer rates, we note that they should be non-negative for all times. However, due to changes in Chinese reports, we observe statistical outliers around day 85. Thus, we exclude those data from calculate which react sensitively on such phenomena like arithmetic means.
If we closely look at the contact rates in Figures 4 and 5, we are able to detect an exponentially decaying behaviour of contact rates in time. Hence, in accordance to descriptions of Brauer 63 and references therein 60-62 , we can probably conclude that people change their social behaviour due to life-threatening diseases. This general trend does not only occur in data from Hubei (China), but is also observed in the analysed data in our supplementary material. However, we want to note that the contact rate heavily depends on social behaviour and if care eases up, this will generally lead to rising contact rates and to exponential growth of infected people again.
Let us shortly remark on the recovery and death rates. Surprisingly, we only notice an approximately Gaussian behaviour in recovery rates in Hubei (China) compared to all other countries in our supplementary material. Hence, we recommend to stay with arithmetic means and median for these transfer rates regarding their biological meaning 14,63 .
For all data, we have to keep in mind that these are reported cases and nobody really knows real numbers with respect to infected, dead and recovered people due to COVID-19. Hence, we state that application of such models and of our parameter identification algorithms build on profound data which need to be provided.
Due to this short time period of observed data, it is reasonable to apply SIR or SIRD models. Furthermore, as already pointed out in our discussion, time-decreasing contact rates can also be assumed at the beginning of such epidemics in general. In addition to these observations, we stress that our proposed algorithms are not only applicable to the actual COVID-19 epidemic, but to all epidemics which fulfill the assumptions of SIR and SIRD models. Therefore, it is a reasonable contribution to report about.
As future research projects, we might consider different adaption like statistical 66 or stochastic approaches by basis functions 67 . Pollicott et al. applied a deterministic continuous approach through multiple differentiation and interpolations by splines and trigonometric functions for the time-varying contact rate function α (t) 68 .
Since it is out of the scope of this work, it might be an interesting point to integrate time-delay in our parameter identification approach without applying a full optimization process as given by Chen et al 69 . In addition to that, one could evaluate extensions to fractional differential operators in time [70][71][72] . As a concluding remark, one can apply PDE-ODE-models to epidemiology 73 to introduce spatial or even further structures.