Transmission Dynamics of the Global COVID-19 Epidemic: Analytical Modeling and Future Prediction

Background: The outbreak and epidemic of COVID-19 has created worldwide impact and attracted global attention. Considerable effort has been devoted to the study of the transmission dynamics of COVID-19. However, there are lack of simple and straightforward expressions of the growing curves of important indicators, such as the cumulative number of conﬁrmed cases and the cumulative number of dead cases. Methods: We adopt two methods. The ﬁrst method is based on regression analysis. We ﬁt the available data into a curve by the method of least squares. The best curve is obtained by solving a multivariable minimization problem. The second method is based on differential equations. We establish an analytical model of transmission dynamics based on the susceptible-exposed-infectious-recovered-dead (SEIRD) process using a linear system of ordinary differential equations, which characterize the daily change in each compartment. The size of each compartment (i.e., the number of people in each stage of the SEIRD process) is readily available based on the solution to these differential equations. Results: Both methods are applied to the COVID-19 epidemic data in the world as a case study. Furthermore, predictions of the cumulative number of conﬁrmed cases and the cumulative number of dead cases in April 2020 using our models and methods are also provided. From a global perspective, unless powerful and effective social and medical impacts are made, by the end April of 2020, the cumulative number of conﬁrmed cases is 23.333 and 36.068 millions respectively using regression analysis and analytical model, and the cumulative number of dead cases is 1.148 and 2.528 millions respectively using regression analysis and analytical model, based on the current situation. Conclusions: In this paper, we make some progress towards analytical expressions of the daily growth of the cumulative number of conﬁrmed cases and the cumulative number of dead cases, two most important and daily reported ﬁgures.


Motivation
The outbreak and epidemic of COVID-19 has created worldwide impact and attracted global attention [6,13]. The total number of coronavirus cases and the total number of coronavirus deaths in the world are growing exponentially and explosively [2,4,5]. Such pandemic has generated significant impact socially, economically, and politically.
Considerable effort has been devoted to the study of the transmission dynamics of COVID-19 [9,12]. Li et al. reported that the mean incubation period was 5.2 days, the mean serial interval was 7.5 days, the basic reproductive number R 0 was 2.2, and the epidemic was doubled in size every 7.4 days [15]. Liu  outbreak at the early stage on the Diamond Princess cruise ship [21]. Zhao et al. pointed out that the early outbreak data largely follow the exponential growth, and estimated that the mean R 0 for the 2019-nCoV ranges from 2.24 to 3.58 [22]. Zhou et al. found that the median duration of viral RNA shedding from oropharyngeal specimens was 20 days (range of 8-37 days) [23].
It is clear that all existing studies have only focused on individual quantities such as the basic reproductive number, the mean serial interval, the mean incubation period, and the mean recovery time. However, based on these quantities, it is still not possible to predict how the number of infected people, the number of recovered individuals, and the number of dead cases change daily. There are lack of simple and straightforward expressions of the growing curves of important indicators, such as the cumulative number of confirmed cases and the cumulative number of dead cases.

Contributions
In this paper, we make some progress towards analytical expressions of the daily growth of the cumulative number of confirmed cases and the cumulative number of dead cases, two most important and daily reported figures. We adopt two methods.
The first method is based on regression analysis. We fit the available data into a curve by the method of least squares. The best curve is obtained by solving a multivariable minimization problem. The details are presented in Section 2.
The second method is based on differential equations. We establish an analytical model of transmission dynamics based on the susceptible-exposed-infectious-recovereddead (SEIRD) process using a linear system of ordinary differential equations, which characterize the daily change in each compartment. The size of each compartment (i.e., the number of people in each stage of the SEIRD process) is readily available based on the solution to these differential equations. The details are presented in Section 3.
Both methods are applied to the COVID-19 epidemic data in the world as a case study. Furthermore, predictions of the cumulative number of confirmed cases and the cumulative number of dead cases in April 2020 using our models and methods are also provided. The details are presented in Section 4.

Regression Analysis
In this section, we develop a regression analysis method.

The Method
Assume that a group of n available data points (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) are to be fit into a function y = f (a 1 , a 2 , ..., a k , x), where a 1 , a 2 , ..., a k are parameters of f to be decided. The method of least squares [1] is used to find a 1 , a 2 , ..., a k . The sum of squared residuals is where E is viewed as a function of a 1 , a 2 , ..., a k . To minimize E(a 1 , a 2 , ..., a k ), we only need to find a 1 , a 2 , ..., a k , such that ∇E(a 1 , a 2 , ..., a k ) = 0, i.e., ∂ E/∂ a j = 0, for all 1 ≤ j ≤ k. This is a multivariable minimization problem.
For transmission dynamics of coronavirus, we consider an exponential function in the form of f (x) = ab x + c, where a and b are parameters and c is a given constant, such that y i ≈ ab x i + c, for all 1 ≤ i ≤ n. To use the method of least squares to find a and b, the sum of squared residuals is where E is viewed as a function of a and b. To minimize E(a, b), we need To solve the above equations, we notice that the first equation implies that To find b, we need s 1 /s 2 = s 3 /s 4 . Notice that F(b) = s 1 s 4 − s 2 s 3 is an increasing function of b. Hence, b can be found by using the standard bisection method ( [8], pp. 22), such that The quality of the above regression analysis can be evaluated by the adjusted relative error (ARE), defined for y i as Notice that is the relative error for y i . Since the magnitudes of y 1 , y 2 , ..., y n can differ dramatically, simply taking the maximum relative error does not seem appropriate, since a large relative error for a very small y i does not seem significant. Therefore, the relative error for y i is adjusted by a factor of y i max 1≤i≤n (y i ) .

A Case Study
In this section, we apply our method to a case study.
Let C World (t) be the cumulative number of confirmed cases in the world by the tth day (i.e., x t = t and y t = C World (t)). Based on the data published by Worldometer [5] during March 1-31, 2020 (with March 1, 2020 as the first day), we get C World (t) = 16729.4550655 × 1.1340597 t + 70000.
Let D World (t) be the cumulative number of dead cases in the world by the tth day (i.e., x t = t and y t = D World (t)).

An Analytical Model
In this section, we establish an analytical model of transmission dynamics.

The SEIRD Process
In this section, we describe the SEIRD process (see Figure 1).
• Susceptible individuals have no immunity to the disease, who may be exposed to the disease and move into the "Exposed" compartment through contact with an exposed or infectious person.
• Exposed individuals have been exposed to the disease, but have not shown sign or illness and thus behave like normal persons; however, they can transmit disease to others, and will move into the "Infectious" compartment.
• Infectious individuals show clear symptom of sickness, and have been tested and confirmed to be patients, who are typically quarantined or hospitalized, can still transmit disease to family members, medical staff, and other people, and eventually move into the "Recovered" or the "Dead" compartments.
• Recovered individuals can no longer become infected, typically because they have immunity from a prior exposure, which is often appropriate if immunity is longlasting or the disease is being modeled over a relatively short time period.
• Dead individuals remain in the "Dead" compartment forever.

A System of Differential Equations
In this section, we establish an analytical model of transmission dynamics using a system of ordinary differential equations.

Susceptible Exposed Infectious
Recovered Dead E E d d d Figure 1: The SEIRD process Let S(t), E(t), I(t),V (t), D(t) be respectively the numbers of individuals in the susceptible, exposed, infectious, recovered, and dead compartments at time t (measured in days).
The daily reported cumulative number of confirmed cases is C(t) = I(t) +V (t) + D(t). The daily reported cumulative number of dead cases is D(t).
Let N, R E , R I , T E , T I , α be positive constants defined below.
• N is the size of (i.e., the number of people in) the population.
• R E is the reproductive number of an exposed individual.
• R I is the reproductive number of an infectious individual.
• T E is the average number of days in the "Exposed" compartment.
• T I is the average number of days in the "Infectious" compartment.
• α is the percentage of infectious individuals who eventually die.
Notice that the basic reproductive number is actually R 0 = R E + R I .
• dS(t)/dt: R E /T E is the number of newly exposed people contacted by an exposed individual every day, and (R E /T E )E(t) is the total number of such people on the tth day. Similarly, R I /T I is the number of newly exposed people contacted by an infectious individual every day, and (R I /T I )I(t) is the total number of such people on the tth day. Therefore, the total number of newly exposed people (who leave the "Susceptible" compartment) is (R E /T E )E(t) + (R I /T I )I(t) on the tth day.
• dE(t)/dt: The number of newly exposed people (who move into the "Exposed" com- On the other hand, the number of exposed people who move into the "Infectious" compartment is E(t)/T E on the tth day. Therefore, the change in the "Exposed" compartment is (( • dI(t)/dt: The number of exposed people who move into the "Infectious" compartment is E(t)/T E on the tth day. On the other hand, the number of infectious people who leave the "Infectious" compartment is I(t)/T I on the tth day. Therefore, the change in the "Infectious" compartment is E(t)/T E − I(t)/T I on the tth day.
• dV (t)/dt: The number of infectious people who move into the "Recovered" compartment is (1 − α)(I(t)/T I ) on the tth day.
• dD(t)/dt: The number of infectious people who move into the "Dead" compartment is α(I(t)/T I ) on the tth day.

An Analytical Solution
In this section, we provide an analytical solution to the linear system of ordinary differential equations.
Notice that dE(t)/dt and dI(t)/dt form an autonomous linear system of differential equations: It is well known that such differential equations accommodate analytical solutions [3].

A Case Study
In this section, we apply our method to a case study.
The parameters in our model are set as follows. The incubation period means the time between catching the virus and beginning to have symptoms of the disease. Most estimates of the incubation period for COVID-19 range from 1-14 days, most commonly around five days. We set D E = 5 [15,16]. Using available preliminary data, the median time from onset to clinical recovery is approximately 2 weeks for mild cases and 3-6 weeks for severe or critical cases. We set D I = 20 [17,23].
For the same data in Section 2.2 for the world, we get

Future Prediction
Using our analytical results in Sections 2 and 3, we display predictions of C World (t) and

15
We have made some progress towards analytical expressions of the daily growth of the cumulative number of confirmed cases and the cumulative number of dead cases, two most important and daily reported figures. Our analytical methods and results have been tested using the COVID-19 epidemic data in the world, and proven to be effective and accurate. We have also predicted the cumulative number of confirmed cases and the cumulative number of dead cases in April 2020 using our models and methods.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication: Not applicable.
Availability of data and material: Yes.