Inverse Problem for Identiﬁcation of Infectivity and Recovery Rates in SIR Epidemic Models as Functions of Time Illustrated with Corona Virus Dynamics up to July 09, 2020

This work deals with the inverse problem in epidemiology based on a SIR model with time-dependent infectivity and recovery rates, allowing for a bet-ter prediction of the long term evolution of a pandemic. The method is used for investigating the COVID-19 spread by ﬁrst solving an inverse problem for estimating the infectivity and recovery rates from real data. Then, the estimated rates are used to compute the evolution of the disease. The time-depended parameters are estimated for the World and several countries (The United States of America, Canada, Italy, France, Germany, Sweden, Russia, Bulgaria, Korea, New Zealand) and used for investigating the COVID-19 spread in these countries.


Introduction
Mathematical modeling and forecasting the spread of epidemic diseases has a long history, see [3], [5], [10], and [29]. The earliest published paper on mathematical modeling of spread of disease was carried out in 1766 by Daniel Bernoulli. Trained as a physician, Bernoulli created a mathematical model to defend the practice of inoculating against smallpox [14]. Infectious diseases include measles, malaria, varicella, HIV, Ebola, and SARS. A systematic review of the risk of death associated with Middle East respiratory syndrome (MERS) as well as risk factors for associated complications is given in [26]. There is no vaccination for some of the infectious diseases, only preventive practices -see [15]. Kermack and McKendrick [16] introduced their model in 1921. Their theory is a mathematical hypothesis proposed to explain the rapid rise and fall in the number of infected people with a contagious illness in a closed population over time. It is the origin of SIR (Susceptibles-Infected-Recovered) type models. This formalism is the basis of all current modeling of the dynamics and evolution of infectious diseases, see [4], [27], [33], [35], and [36]. The models are useful in understanding the basic principles of the system, even in choosing a proper policy of infectious disease control. COVID-19 attracted attention in recent publications [11], [18], [28], [31], [32], and [40].
The dynamical systems for epidemics are highly nonlinear -they include diverse characteristics, from population specifics to the immune system of an individual. Every new epidemics must be studied and mathematical models constructed to answer important questions on handling the disease. The parameters in the SIR model, the rate of transmission a and the rate of removal b, depend on the evolution of the epidemic disease over time, see [1].
Complex and realistic mathematical models are used to assist policy decision making, recent relevant works include [12]. This work aims to create a method that can accurately identify the time dependent parameters of the SIR system using real data and then use the computed parameter values to predict the spread of the epidemics. If some conditions change, then the projection based on the "old" data will no longer be accurate and will require adjustment.
The parameter estimation is referred to as the inverse modeling problem. It means adjusting the parameters of a mathematical model to reproduce measured data. Model deficiencies are usually due to inaccurate parameters and must always be addressed. The inverse problem is crucial for calibrating the model and for controlling the model parameters. Approaches involving inverse problems can be successfully applied to a variety of important processes, including the spread of infectious diseases, allowing epidemiologists and public health specialists to make predictions on epidemics [5].
The COVID-19 coronavirus appeared in late 2019 and quickly spread across many countries. According to [13] and [38], by July 9 2020, there were more than 12 million confirmed cases of infected people, with more than 550,000 reported deaths globally. Governments closed the so-called non-essential businesses and services for weeks in order to slow down the growth of infections -especially among vulnerable populations -and thus, save lives. Numerous mathematical models are developed to forecast the future of the COVID-19 epidemic spread worldwide. They are used to assist governments in making decisions to cope with the virus and its consequences. As of July 2020, there are no vaccines for this highly contagious disease; nor efficient antiviral drugs.
The present work extends the method proposed in [23] for finding optimum values for the infectivity and recovery rates. In [23], these values are assumed to be constant over the whole interval because we applied the method to a short outbreak of influenza instead of considering the long term evolution of a pandemic as with the case of COVID-19. Here, we assume the infectivity and recovery rates are functions of time, namely a = a(t) and b = b(t).

The direct problem
The SIR model is undoubtedly the most famous mathematical model for the spread of an infectious disease. A constant population of size N is divided into three classes: susceptible S, infectives I and removed R. Removed individuals R are no longer susceptible nor infectious for whatever reason; for example, they have recovered from the disease and are now immune, or they have been vaccinated, or perhaps they have died from the disease. Since the isolated infectious people are still capable of infecting individuals from class S (see for example the big number of infected medical personnel), we count these people as members of the class I. The diagram for the SIR model is given on There exist other models considering additional events and features, such as birth and death rates (non-constant population), vaccine effect, reinfection, latent period, and so on. In this work, we use the SIR model, because the considered time period is relatively short.
We assume that the population under study is well mixed so that every person has equal probability of coming into contact with every other person. This major assumption does not hold in many situations [27], e.g. sexually transmitted diseases and COVID-19 social distancing.
Let a∆t be the probability that a random infectious individual infects a random susceptible individual during the time period ∆t. Then with S(t) susceptible and I(t) infectious population, the expected number of newly infectious individuals in the total population N during time ∆t is aSI∆t, i.e., the rate of change of the class S(t) is −aSI, where a > 0.
We also assume that infectious individuals leave the I(t) class with rate b and they move directly into the R(t) class. There are reports for recovered from COVID-19 individuals who are re-infected; hence, we added a dashed arrow from the class R to the class I. Since the number of these cases at the present moment is very limited, we cannot estimate the rate c and we do not include this option into the system.
Since the differential equations (1) and (2) do not depend on R, it is convenient to split the system into two parts -equations (1) and (2) as one system, and equation (3) by itself. If the coefficients a and b are given, we can derive initial conditions from the given data and solve the problem numerically by any of the well known methods for initial value problems (IVPs). We refer the reader to [2] for more information on methods for solving IVPs numerically.
The main question is how to find the coefficients a and b for a given infection, e.g. COVID-19. The problem for the estimation of the constants a and b is an inverse problem solved in [23]. A similar approach for identifying coefficients in an Euler-Bernoulli equation from over-posed data is used in [22] and [25].

The inverse problem
We assume that (from the available data) the values of S and I are known at two time moments, initial time moment T I and final time moment T F , and If the coefficients a and b are known, the solution of equations (1) and (2) can be determined using the initial conditions (4). In general, the terminal conditions (5) are not satisfied exactly. Therefore, if the coefficients a and b are given, the problem is overdetermined.
Let us assume that the coefficients a and b are constant and unknown. In this case, the general solution of the system (1), (2) depends on four constantstwo constants from the integration and the unknown coefficients a and b. The number of conditions/equations in (4) and (5) is also four; hence, the problem for identifying the coefficients along with the functions S and I is well-posed, provided that an exact solution exists (or can be obtained). This type of problems belongs to the class of the so-called inverse problems.
Still, the problem for obtaining (S, I) and (a, b) from equations (1) and (2) under the conditions (4) and (5) could also be ill-posed. For arbitrary values of S I , S F , I I , I F , there may be no solution (S, I), (a, b) satisfying the equations (1), (2) and all of the conditions in (4), (5). For this reason, we assume that the problem is posed correctly after Tikhonov [37], i.e., it is known a-priori that a solution of the problem exists. In other words, the the boundary data have "physical meaning" and, therefore, a solution exists.
We are now ready to construct an algorithm for approximating the solution (S, I), (a, b) of the inverse problem (1), (2), (4), (5). For estimating the coefficients a and b of the SIR model, we use an approach that is very similar to the variational method used in [23]. This method is a generalization of the Least Squares Method. More details about the method for transforming the inverse problem into a correct direct problem can be found in [8], [21], [22], [25].

Coefficients a and b depend on time
The original SIR model assumes that the coefficients a and b are constants. In real live situations, e.g. the COVID-19 pandemic, the coefficients vary in time, i.e., a = a(t) and b = b(t). The coefficients are affected by governments' restrictions, societal experiences, treatment, among others.
Let D be a data set of function values S(t k ), I(t k ) at some time moments t 1 , t 2 , . . . . We assume there exists an algorithm for solving the inverse problem (1), (2), given the conditions (4), (5) and the data set D. Here, we consider a and b to be piece-wise constant (step) functions of time The constants a k and b k can be estimated by solving the inverse problem (1), (2), (4), (5) using S I = S(t k−1 ), S F = S(t k ), I I = I(t k−1 ), and I F = (t k ), for k = 2, 3, . . .

The reproduction rate and the herd immunity
Epidemics grow if the derivative dI/dt > 0 and decrease if dI/dt < 0. An important parameter, which characterizes infectious diseases, is the reproduction rate (also called reproduction number or ratio) of the infection From equation (2) and (7), it follows that dI/dt = aSI − bI = ρb − bI = (ρ − 1)bI.
Consequently, the epidemic grows if ρ(t) > 1 and decreases if ρ(t) < 1. The reproduction rate ρ(t) represents the expected number of secondary infections produced by a single primary infected person. The reproduction rate in Hubei Province (China) by using case-report data from January 11 to February 6, 2020 was estimated in [17] as 4.1192 (with 95% confidence interval 4.0473-4.1912). The reproduction rate ρ(t) also relates to the fraction of the population that gets sick. Another important characteristics of an infectious disease is the so-called herd immunity -the minimum fraction of the population,f , that is required to have immunity in order to prevent an epidemic [30]. The stability analysis of the SIR model shows thatf = 1 − 1/ρ. The society can acquire herd immunity in two ways. The "natural" way is to let the epidemic spread until the required fraction of the population,f , obtains immunity. The most common "non-natural" way is vaccination. Since vaccines are not presently available for COVID-19, some countries, e.g. Sweden, tried the "natural" way. As of this moment, their approach is questionable, according to the results for their pandemic spread, see Section 4.7. Many countries have been practicing social distancing and other measures to slow down the transmission rate a of the epidemic.

Solving the inverse problem for the time dependent infectivity and recovery rates
A method for solving the inverse problem in the case of constant coefficients a and b is given in [23], along with a discussion of some theoretical aspects. Here, we present the numerical algorithm for estimating the coefficients if they depend on time, i.e., a = a(t) and b = b(t). The inverse problem for the time dependent infectivity and recovery rates follows the idea of the method used in [25]. The original problem is replaced by a minimization problem. For the correct embedded problem, a difference scheme and a numerical algorithm can be constructed.

Minimization problem using data values at two time moments
First, we start with the sub-problem in case of using data values at two time moments. Since we are concerned with the numerical solution of the system (1)-(3), we seek approximations of the functions S(t), I(t), and R(t) at the discrete set {t 0 , t 1 , . . . , t n } of points in the interval [T I , T F ], where T I is the initial time moment, T F is the final time moment, and n is an integer greater than 1. The mesh of equidistant points is shown in Fig. 2.
We define the step size as τ = T F − T I n − 1 . The nodes are the equidistant points t k = T I + kτ , k = 0, 1, . . . , n. Now we are ready to discretize equations (1) and (2) This discretization secures the second order of approximation O(τ 2 ). Since the problem (8)-(9) is non-linear, we use an iterative procedure, assuming thatS k and I k are given from the previous iteration. Consider the function where k and δ k are the residuals of equations (8) and (9), respectively.
Since the function Φ is a homogeneous quadratic function of k and δ k , its absolute minimum is zero. On the other hand, the function Φ attains its minimum if and only if k = 0 and δ k = 0 for all k = 1, 2, . . . , n − 1. Hence, there exists one-to-one correspondence between the solution of the system of equations (8), (9), (4), (5) and the problem for the minimization of the function Φ under the conditions (4) and (5).

Equations for S and I
The necessary conditions for minimization of the function Φ with respect to its arguments S k and I k are Conditions (11) yield the following difference equations for k = 1, 2, . . . , n − 1. Adding the initial and terminal conditions (4) and (5), we obtain a well-posed linear system of 2(n + 1) equations for the unknown sets of values (S 0 , S 1 , S 2 , . . . , S n ) and (I 0 , I 1 , I 2 , . . . , I n ).

Equations for a and b
We rewrite the function Φ in the form where The necessary conditions for minimization of the function Φ with respect to a and b are as follow The solution of the system (21), (22) is 3.2 Minimization problem using the entire dataset Let us assume that the number of infectious individuals at some time moments ν 1 , ν 2 ,. . . , ν m , is known and given by for l = 1, 2, . . . , m, while the values of the coefficients a and b are unknown. Suppose that for every 1 ≤ l ≤ m, there exists index k l , such that ν l = t k l , i.e., the set of time moments {ν 1 , ν 2 , . . . , ν m }, is a subset of the set of mesh nodes {t 0 , t 1 , t 2 , . . . , t n }. To simplify the calculations, let us introduce the notations χ k such as: if there exists k ∈ {0, 1, 2, . . . , n} such that t k = ν l , 1 ≤ l ≤ m, then χ k = σ l and µ k > 0, otherwise χ k = 0 and µ k = 0, where µ k is the weight of equation (25) where k and δ k are the residuals of the equations (8) and (9), respectively.

Equations for S and I
The necessary conditions for minimizations of the function Φ with respect to S k and I k are given by equations (11). For the case under consideration, we obtain for k = 1, 2, . . . , n − 1. Adding the initial and terminal conditions (4), (5), we obtain a well-posed linear system with 2(n + 1) equations for the unknown set of values (S 0 , S 1 , S 2 , . . . , S n ) and (I 0 , I 1 , I 2 , . . . , I n ).

Equations for a and b
The equations for a and b in this case are the same as the equations (23) and (24), derived in Subsection 3.1.2.

Algorithm for solving the inverse problem
We use the following approach for solving the system (12), (13), (4), (5): i) With given initial values forŜ,Î, a, b, the system (12), (13), (4), (5) is solved for the functions S and I. ii) The deviation of the values of S and I fromS andĪ are computed as: If they are smaller than a given tolerance ε 0 , then the algorithm proceeds to iii); otherwise,Ŝ andÎ are replaced by S and I, respectively, and the calculations return to i); iii) The coefficients a and b are computed from (23) and (24). If the difference between the new and old values of a and b is less than ε 0 then the calculations terminate; otherwise, the iterations continue, go back to i).
3.4 Parameters a, b, and ρ as functions of time Fig. 3 The mesh and the subsets of fixed length of P days for identifying a(t) and b(t).
The data set (25) contains the number of infected individuals for every day for a period of m days. We divide the set into subsets of fixed length of P days (say P = 7, or 10, or 14 days), as shown in Fig. 3. Next, we apply the algorithm presented in subsection 3.3 for every sub-interval [k − P + 1, k], for k = P, P + 1, . . . , m. Consequently, we obtain a k and b k as functions of time. For every sub-interval we calculate the value of the reproduction rate

Data mining and projections
Models can help us understand what would be the worst case scenario and plan proper actions to achieve the best possible outcome under the given constraints. They cannot accurately predict what will happen. The forecasts of the epidemic spread are based on the available COVID-19 data as of June 21, 2020, at [38] and [39]. Our experiments indicate that the optimal value of ε 0 is 10 −8 . The parameters a, b, and ρ as functions of time are computed with τ = 1/40 and µ = τ . They are estimated with calculations performed over 7, 10, and/or 14 day intervals. The dynamic of the COVID-19 disease is projected based on the values of the parameters taken at the last day of the period used for the transmission are recovery rates estimation.
This section presents our results for the COVID-19 pandemic. In particular, the following characteristics are important for understanding the infectious disease dynamics: -Estimated up-to-date transmission rate a(t); -Estimated up-to-date removal rate b(t); -Estimated up-to-date reproduction rate ρ(t); -Projected COVID-19 infectious cases I(t) using the up-to-date estimated rates, taken at the last time period, for which data is available. The functions I(t), S(t), and R(t) satisfy the direct problem, if the coefficients are known. After estimating a and b, we find the numerical solution of the direct problem (1)- (3) with proper initial conditions in order to project the future behavior of the epidemic. In all calculations, we use second order Runge-Kutta method, a type of predictor-corrector method. The first step is: Next, the corrector step improves the prediction obtained in the first step: Finally, for k = 1, 2, . . . . We use τ = 0.01 for solving the direct problem numerically by the Runge-Kutta method.
It must be noted that these forecasts are valid as far as the rates remain as estimated; however, since COVID-19 is a highly contagious epidemic, for which very little is known, the rates of transmission, removal, and reproduction may suddenly change due to measures and/or events affecting the spread. Over the last several months, people learned how to protect themselves from being infected; consequently, this helped keep the transmission and reproduction rates low. Fig. 4(a)-(c) presents the obtained values of the parameters a, b, and ρ as functions of time, calculated over 7, 10, and 14 day intervals. The values of a, b, and ρ for the last day are given in Table 1. Forecasts, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 4(d). The reproduction rate ρ is still greater than one and the number of infected increasing. Table 1 The values Table 2. The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 5(d). The the reproduction rate ρ was ove 2 in the beginning of July but it slightly decreasing lately. Table 2 The  Table 3. The projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 6(d). Table 3 The  Table 4. The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 7(d). The data shows that the epidemic is slowing down significantly and it may disappear in one year. Table 4 The  Table 5. Three projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 8(d). The reproduction rate has been less than one since the middle of May, but, the end of May it is again greater than one and the epidemic spread is increasing.   Table 6. The projected values, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 9(d). The reproduction rate has been less than one since the middle of April, leading to decrease of the epidemic spread. For the last week the reproduction rate tends to increase. Table 6 The values of a, b, and ρ for the last day under consideration (Germany)  Table 7. Three forecasts, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 10(d). The authors could not find detailed data for Sweden after May 30. Table 7 The  Table 8. Three projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 11(d).

The World
Russia flattened the curve of the reproduction rate ρ, with values being around one lately. Table 8 The  Table 9. Three projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 12(d). Table 9 The  Table 10. The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 13(d). Table 10 The  Table 11. The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 14(d).   Table 12. The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 15(d). The reproduction rate became greater than one during the last three weeks. Table 12 The  There are similarities between the epidemic spread for some countries. Fig. 17 presents the obtained values of the reproduction rate ρ as function of time, calculated over 7 day intervals, for four European countries: France; Germany; Italy; and Bulgaria. The obtained values of the reproduction rate ρ as function of time, also calculated over 7 day intervals, for the world, the United States of America, Russia, and Brazil are shown in Fig. 18.

Conclusion
The inverse problem for estimating the time-dependent transmission and removal rates in the SIR epidemic model is derived and solved. The minimization problem uses the entire dataset with data available on July 9, 2020 for estimating the non-constant rates. The obtained numerical results demonstrate that the transmission and removal rates and the unknown functions are accurately estimated. The numerically computed rates are used for forecasting the COVID-19 pandemic for the world and a number of countries. The results of this research give insight of the pandemic in parts of the world and could help in determining policy. The SIR model is a good choice for the short period of time of this epidemic; however, it possesses known limitations in case of a long term infectious disease. In future, we plan to use other models. Depending on future developments of the disease, we may consider models addressing non-constant population, latency, reinfection, and vaccine.