A hybrid SIR model applied to “Covid- 19” pandemic

Modeling has become a tool capable of guiding public policies, especially in the area of health. Specifically, modeling in epidemiology makes it possible to follow the evolution of infections and to understand the behavior of viruses. Unfortunately, the traditional SIR models and the statistical prediction models commonly used suffer from the lack of accurate information and the unavailability of a large amount of data, also they do not take into account the interactions within the population. This paper proposes a hybrid SIR model which takes into consideration the spatio-temporal dynamics of individuals. The model is based on the discrete stochastic diffusion equations. To build the equation system, the 2D diffusion equations are coupled to the human displacement probability law pattern through a discretization made by the finite volume method for complex geometries. Beyond the health consequences it causes, COVID-19 (coronavirus) is a test case used to validate the proposed model. We used the case of a developed country before confinement to fit to the chosen displacement pattern, and to analyze the sensitivity of the parameters of the model taking into account the accuracy of the statistics provided.


Introduction
The world news is dominated by the pandemic of the corona virus which makes considerable damage and put a lot of pressure on the health systems in various countries. Since the outbreak of the corona virus pandemic in December 2019 in China, which has spread to countries in the world, researchers are working in synergy to predict the spread of the disease and explain certain phenomena using data. Mathematical modelling has gained more attention and awareness in epidemiology and the medical sciences (Anderson, february, 1999), (Levin, Grenfell, Hastings, & Perelson, 1997). A class of these models is the dynamical epidemic model called Susceptible-Infected-Removed (SIR) model (Ng, Turinici, & Danchin, September, 2003). The SIR model as the most epidemic models is based on dividing the host population into a small number of compartments, each containing individuals that are identical in terms of their status with respect to the disease in question (Earn, 2008).
Within the framework of prediction models related to the spread of Covid-19, some studies focus on the estimation of the basic reproduction number 0 based on the available data from the official statistics (Dur-e-Ahmad & Imran, April, 2020) (Ye, et al., February, 2020), others focused on the variation over time of the coefficients (the infection rate and the removal rate) of the SIR model from statistical data providing more detail to the existing prediction models (Zhong, et al., March, 2020). Despite these developments, the complexity of the epidemic has given decision makers a lot of difficulty to take timely actions because of the non-homogeneous configuration of the population, the movement of the population and above all, the lack of accurate information and unavailability of large amount of data.
In this paper we propose a more complete approach to epidemic prediction which takes into account the spatial and temporal displacement of individuals. This consideration leads to a hybrid SIR model by setting certain hypotheses. The motivation comes from the fact that the forecasts made by the statistics are not always representative due to the complexity of human interactions and the geographical conditions. Therefore, it's difficult to say with certainty that the evolution of a pandemic like this one would be controlled by the statistical data knowing that the statistics made are based on the tested cases. In this configuration and with the complexity of human mobility, it is necessary to analyze the effect of the individuals' interactions in relation to the reliability of the approach with the reality. To highlight the requirements on dynamics of individuals, this work relies on the existing studies relating to the development of human mobility patterns (BROCKMANN, Dirk, HUFNAGEL , & GEISEL , 2006), (Bachir, Danya, Gauthier, El Yacoubi, & Khodabandelou, 2017).
As an application a simulation is made in the case of a developed country on the evolution of the epidemic of the Covid-19 and discussed with the actual data coming from www.kaggle.com. For the proposed approach, a discrete model based on the 2D diffusion equation is used. For numerical resolution a discretization is made using the finite volume method for complex geometries. A GIS software was subsequently used to extract the spatial data used as inputs to the adaptive model. Data retrieved in form of datasets are processed using algorithms and techniques of feature engineering found in (Igual & Segui, 2017) and implemented in python environment to solve numerically the systems of equations obtained. The results of the simulation are shown in comparison with the tendency of the available data in order to appreciate the hypothesis on the input parameters.
The rest of this paper is structured as follows: section 2 presents the literature review where the basic theories are described. Section 3 presents the methodology, the manner in which the model is built with finite volume discretization for complex geometries and diffusion matrix structure to obtain a hybrid SIR. Section 4 describes the application of the model, presents some results and discussion. At the end in section 5 with a conclusion and perspectives.

Background and literature review
The basic idea is focused on the reaction-dispersion equation where the movement of individuals is described by a dispersion operator. This approach was previously used to build a model for the spatial spread of diseases involving hosts in random displacement during certain stages of the progression of the disease (Wu, 2008). It led to a diffusion model based on the conservation law and Fick's law. The model is applied to the study of two cases of diseases namely the spread of rabies in continental Europe during the period 1945-1985 and the rate of spread of West Nile virus in North America.
In (BROCKMANN, Dirk, HUFNAGEL , & GEISEL , 2006) works it is notice that in population dynamical systems the diffusive dispersal is quite frequently combined with a reaction kinetic scheme which accounts for local interactions between various types of reacting agents. Sometimes groups of individuals of a single species which interact are classified according to some criterion. For instance in the context of epidemic modeling, a population is often classified according to their infective status. In an approximation which neglects the intrinsic fluctuations of the reaction kinetics, one obtains for these systems reaction-diffusion equations, the most prominent example of which is the Fisher equation for the concentration u(x, t) of a certain class of individuals, a species etc. A paradigmatic system which naturally yields a description in terms of (1) and which has been used to describe to geographic spread of infectious diseases is the SIS-model in which a local population of N individuals segregates into the two classes of susceptible S, who may catch a disease, and infected I, who transmit it. The transmission is quantified by the rate α and the recovery by the rate β (Anderson, May, & Anderson, Infectious Diseases of Humans: Dynamics and Control, 1992). The reaction scheme could not be simpler: (2) In the limit of large population size N, the dynamics can be approximated by the set of differential equations Assuming that the number of individuals does not change (i.e. I(t)+S(t) = N(t)) and that disease transmission is more frequent than recovery (α > β), one obtains for the rescaled relative number of infected: u(t) = αI(t)/N(αβ) which is a single ordinary differential equation (ODE) describing logistic growth: where λ = αβ. If, additionally reactants are free to move diffusively, one obtains (1) for the dynamics of the relative number of infected u(x, t) as a function of position and time.
The diffusion part of the basic equation is related to the human displacement which is random and complex. Thus it requires complementary effort to its formulation. That is why we find in the literature the relevant studies on this topic independently to the discipline. In physics, random walk processes with a power-law single-step distribution are known as Lévy flights (Metzler & Klafter, 2000), (Shlesinger, Zaslavsky, & Frisch, 1995); Lévy flights are qualitatively different from ordinary random walks. Another study in telecommunications used subscriber's telephone data to estimate human mobility in developed countries , (Bachir, Danya, Gauthier, El Yacoubi, & Khodabandelou, 2017), (Bachir, et al., 2019), (Bachir, et al., 2018). According to their thinking, the pervasive usage and the high penetration rates of mobile phones have made mobile network data the largest mobility data source ever. Many other theories are used to explain human displacement as the trajectory distance, also called jump length, which corresponds to the traveled distance during a trip. Brockmann et al (BROCKMANN, Dirk, HUFNAGEL , & GEISEL , 2006) states that the jump length Δr follows a power-law distribution: (5) where β < 2. This finding reveals that people usually have short length trips and fewer long distance travels. More recently, the jump length was described as following a truncated power-law distribution (GONZALEZ, C, A HIDALGO , & BARABASI , 2008) with β = 1.75±0.15, Δr0 = 1.5 km and k is a cut-off value depending on the dataset. Jump length statistics are given in Tab. 3.3. In our study, the median and an average distance are respectively 10.2 km and 129.5 km. The minimum value is 4 m while the 99th percentile is around 45 km. For the truncated power law distribution, our cut-off value is k = 1000 km.

Methodology
Having gone through the existing models, a question arises whether this approach will have a similar impact in reality, in case of any geometry. Since the concern of our work is to provide an adaptive part to prediction models related to mobile entities in order to reduce as much as possible the error margins related to the movement. It seems important to verify this approach in a real geographical context. Then the basic equations of the model will be discretize using the finite volume method for the complex geometries.

discretization of the model in a real context
The sample regions are approximated by polygons as shown in Figure 1 and Figure 2.
• represents the average speed of diffusion • , the matrix of random contributions between sites • ∆ is the time step in the numerical scheme Similarly, for the rest of the sites (P, N, K) we obtain the following equations: Boundary conditions are implicitly considered because the quantities above are zero everywhere else except on the boundaries between our experimental environments.
For a general formulation for any region H we can write the numerical scheme as follows: • +1 represents the number of mobile entities at the time t + dt in H  , 2006). For the construction of the probability matrix, we assumed that all displacements follow the probability law which states that, the trajectory distance so called jump length, corresponds to the traveled distance during a trip (BROCKMANN et al.). The jump length Δr follows a power-law distribution (∆ ) = ∆ −(1+ ) where β < 2. This finding reveals that people usually have short length trips and fewer long distance travels. In this case we will consider the average distance that connects the centers of two regions i and j like ∆ = (from )

Hybrid SIR model Formulation
In the SIR model, there are three compartments: • Susceptible: individuals who might become infected if exposed assuming that they have no immunity to the infectious agent, • Infectious: infected individuals who can transmit the infection to susceptible individuals who they contact, • Removed: individuals who are immune to the infection, and consequently do not affect the transmission dynamics in any way when they contact.
Given that coronavirus has a very high mortality rate, the removal is carried out either through isolation from the rest of the population or through immunization against infection or through recovery from the disease with full immunity against reinfection or through death caused by the disease. We add to the traditional SIR model a fourth state (death state) that takes in consideration the number of deaths due to the illness. The representation of the states is given by the following diagram Where D is the diffusion matrix; ∆ the Laplace's operator; t is time (in hour); β the infection rate, i.e., the infected ratio by one infective during unit time; α the removal rate, i.e., the ratio of the removed by cure and γ the death rate. By substitution of (16) in the system we obtain: And +1 = ∆ + The boundary conditions are implicitly considered because the quantities above are zero everywhere else except on the boundaries between regions.

Data and parameter values
The simulation is focused on the Italy before confinement from march 1 st to 6, 2020. The choice of Italy is motivated by two major reasons. First because the probability law used was set up in the context of the human interaction in developed countries, secondo the availability and the accuracy of the official data related to the pandemic. The Covid-19 pandemic data used in this study come from the www.kaggle.com website where we extracted the dataset (Table 1). The map of Italy is extracted from Google Earth, and the demographic data are collected on Wikipedia. We use the ArcGIS software to compute the distance between the centroids of the regions, and to approximate the perpendicular border length of the adjacent regions (Fig 4). The population will be assumed to be homogeneous where everyone is likely to contract the disease. A short time interval is chosen for the simulation (March 1 to 6) due to time-varying property of the rates of the SIR model. In case where a long period in simulation is scheduled, it would be necessary to take into account these variations as a time function in the definition of the SIR model parameters. The infection rate (β), removal rate (α) and death rate (γ) are obtained empirically from the available official statistics like in (Zhong, et al., March, 2020). N = total human population in a giving region I = total infected given by the test cases Ir = supposed total infected M = total death Rc = recovered by cure α = recovered rate by cure β = the infection rate γ = death rate The adjustment of the coefficients α, β and γ is mainly based on the number of deaths generally invariant, it does not depend on the sample as the number of infected which are count from the tested cases. The other data, namely the number of infected or recovered seem not representatives because in general the tests are performed on a sample made up of suspicious people with a high body temperature, which is law compared to the real size of the population. Population of recovered humans M(t) Population of died humans N(t)=S(t)+I(t)+R(t)-M(t)

Simulation results
In the numerical resolution of the equations obtained from the hybrid model, 4 discrete equations are solved by region, making a total of 80 equations for the 20 regions. The following figures present the result from some selected regions. The graphics at the left show the simulation of the epidemic, assuming that the entire population is susceptible and the graphics at the right are the representation of the statistical data collected during the same period.

Discussions
In all these figures we note a discrepancy between data and their estimations. The explanation is due to the fact that in the simulation, the entire population of a region is susceptible whereas, the data come from the results of tests carried out on suspect individuals and representing less than 10% of the population in each region. On the other hand according to the WHO 80% of infected do not show any symptoms (WHO, 2020).
From Fig 5 (Righ), between March 1 and 06, 2020, out of 13,556 tested there are 2,612 positive cases, around 135 deaths and 469 recovered. With these data a mortality rate in "lombardia" region is 5%. We find this rate very high according to the WHO statement. Also the recovered rate is around 18% but with the WHO report, it would be possible to have more than 80% of cases recover. This motivates our approach, assuming that the entire population is tested and that at least 80% of the cases are asymptomatic and heal on their own while being contagious. This will justify the proximity of the black curve representing the total number of infected to the blue curve representing the number of recovered in the simulations. The only parameter that we keep identical coming from data is the number of deaths. On the other hand with the explicit taking into account of the spatiotemporal dynamics implemented in equations the results of simulations are more realistic compared to the volume of the population and their interactions. Giving that "Lombardia" is the epicenter of the pandemic, the model tells us why the regions neighbor to "Lombardia" are highly infected than the others.
Another reality that emerges from this study is that not only the dynamics between regions are difficult to control but also the instant decision-making of decision-makers distorts the linear character of prediction of such pandemic. To better help decision-makers, it would be better to make prediction models dynamic and contextual according to spontaneous situations. The model proposed in this article goes in this direction because it is possible to change the patterns of population displacement, to modify the coefficients α, β and γ of the dynamics of the epidemic during its evolution and to take decisions according to a given simulation frequency. Equally, the proposed model allows us to work on an effective population after the balance sheets of inputs and outputs implemented by the diffusion.
The study of the epidemic in the region of "Sardegna" which is an island gives us a concrete limit of the proposed approach due to the nonexistence of neighbors with whom there is sharing of land border. This specific case also shows the limit of the traditional SIR model when the study of an epidemic begins with no cases of infection. The infection evolves in "Sardegna" because of the cases imported by the spatiotemporal dynamics of the mobile entities. An improvement of the proposed model would be to find a way of taking in to account the interregional movements of people other than displacement on land.

Conclusion
This study presents a hybrid SIR model build by introducing the individual's dynamics through a probability law pattern. The pattern of displacement is inspired by the studies carried out by (BROCKMANN, Dirk, HUFNAGEL , & GEISEL , 2006) and (Bachir, Danya, Gauthier, El Yacoubi, & Khodabandelou, 2017) which are based on the models of movement of individuals in urban areas in developed countries. In this way, one of the displacement laws has been combined with the diffusion equations for modeling the interactions of individuals in developed countries. To build the proposed hybrid SIR models, the probabilistic diffusion equations obtained are superimposed on the traditional SIR models. For the validation of the proposed approach, a simulation is made in the case of the novel coronavirus spread. With the displacement pattern hypothesis for developed countries, the simulation is made for the case of Italy which represents for, us an interesting study sample. Data from the 20 regions of Italy was available thus facilitating the implementation of the spatiotemporal dynamics between regions. To bring out the impact of the dynamics of individuals in the simulation, the regions of Italy are divided into three zones namely, the epicenter region and its surroundings regions, the regions far from the epicenter and isolated regions without land border with other regions. It emerges from this division with regard of the proposed model and supported by the data that the regions close to the epicenter are among the regions which register the highest infection rate compared to those which are distant. Also, the infection rate is very low or even zero for isolated regions. This approach is proposed to the decision-makers for a global view considering the whole population as susceptible, while the official data collected from the tested cases are partial and can distort decision-making.
The results of the simulation are satisfactory despite certain limits. As a perspective to improving this model, it is necessary to add the time-varying property of the propagation coefficients rates of the epidemic. It would also be important to find a displacement pattern which integrates different type of displacement in order to take into account the exchanges of individuals among the regions which do not share land borders.