The primary differences from our D-SEIQ model and SEIR model are 1) replacing recovered individuals R with quarantined individuals Q, and 2) introducing dynamics to the calculation of the effective reproduction number Rt that is dependent on time.
Some previous work employed the traditional SEIR model, which assumes that the exposed individuals (who have been infected but no symptoms yet) are not infectious. However, it is reported that COVID-19 might be transmissible for exposed individuals . Besides, due to lack of specialized treatment, the infectious period should not be interpreted as time between infective (I) and recovered (R) but time between infection (I) and quarantined (Q). Therefore, we proposed to replace the recovered individuals R with the quarantined individuals Q and the model became the SEIQ model. The quarantined individuals Q indicated the confirmed cases who were centrally quarantined. The epidemic spreading model for the SEIQ model is shown in Figure 1.
The transmission dynamics are governed by the following system of equations: (see Equation 1 in the Supplementary Files)
where N = S(t) + E(t) + I(t) + Q(t) is the total population, which is assumed consistent.
Like the SEIR model, parameter β indicates the infectious rate with β = Rt/TE where Rt is the dynamic effective reproduction number and TE is the average duration of incubation; parameter σ indicates the incubation rate with σ = 1/TE. However, in our model, parameter γ indicates the quarantine rate with γ = 1/TI (where TI is average duration of an infectious individual to be detected and quarantined). The parameter TI may vary across different regions and the difference reflects the timeliness of patient detection and admission.
The basic reproduction number R0 is the most important parameter to determine the intrinsic transmissibility of COVID-19, and it is defined as the average number of infections one infectious agent can generate over the course of the infectious period without any interventions. R0 was assumed to be a constant or arbitrarily modified at specific points for forecasting in previous work [12, 13]. However, in real-world scenarios, with the development of epidemic, more and more interventions are often taken to control the spread, which gradually reduce R0. In this work, the basic reproduction number R0 is generalized to a dynamic value Rt, which is defined as the average number of secondary infectious cases generated by an infectious at time t. After the worldwide outbreak of COVID-19, many governments took considerable measures to contain the spread of the virus. In our preliminary analysis and similar to previous work , the infectious rate β was shown to decrease exponentially with time. As parameter TE is constant, the effective reproduction number Rt should follow similar pattern as decreasing exponentially with time. Thus, we introduced time-dependent dynamics to the calculation of Rt for better simulation of the real-world transmission, (see Equation 2 in the Supplementary Files)
where R∞ is the final reproduction number at the end of the pandemic and θ is the decrease ratio of the reproduction number, which is associated with the corresponding interventions. When t = 0, Rt=R0, and it gradually reduces to R∞. The epidemic is considered to be under control with Rt < 1, and the reasonable range of R∞ was referred to some previous analysis of coronavirus .
Parameter constraints and optimization
The simulation and prediction of the D-SEIQ model requires determination of the parameters R0, R∞，TE, TI, θ. Although we incorporated machine learning to help us to fit the reported data, the parameter range needs to be set carefully and to conform to epidemiological rationality. For instance, Wu et al. applied an adjusted SEIR model to estimate R0 (R0= 2·68) in major cities of China by analyzing the number of cases exported from Wuhan internationally . Some work concluded that the daily reproduction number varied between 2 and 8. Therefore, we set the reasonable range for parameter R0 to be [2, 7]. Likewise, after reviewing the previous work on the analysis of COVID-19 [2, 11], we summarized the ranges for parameters in our model as Table 1. And, we set TE > TI as additional constraints. Therefore, the parameter optimization process is as follows:
- Initialize the number of confirmed cases Q at time t = 0 according to the official report.
- Initialize the parameters R0, R∞, TE, TI, θ
- Calculate the time-dependent effective reproduction number Rt
- Solve ordinary differential equations in Equation (1) to determine E(t), I(t), Q(t)
- Set loss function as the sum of mean squared errors of daily and cumulative confirmed numbers, and then estimate the parameters R0, TE, TI, R∞, θ based on grid search with dynamically adapted search steps to obtain the best D-SEIQ model at time t.
We obtained the updated data of the cumulative confirmed cases from the National Health Commission (NHC) of the People’s Republic of China. The newly confirmed cases were also collected on a daily basis. Considering that medical resources and interventions might vary in different regions, we fitted our model on the data from three different regions: 1) China excluding Hubei, 2) Hubei excluding Wuhan, and 3) Wuhan.
Moreover, we adjusted the number of newly confirmed cases in Wuhan between 12 February and 14 February, due to the inclusion of clinically confirmed cases without coronavirus test. The clinically confirmed cases between 12 February and 14 February were assumed to be suspicious cases in last 7 days. Specifically, we redistributed the clinically confirmed cases according to the distribution of suspected cases over the past 7 days.
Forecasting long-term trends of confirmed case numbers
Because the China’s NHC publicly reported case numbers starting from 20 January, we set this date as the starting point of our training data. As of 10 March, the daily in- creased case numbers declined to single digits across most areas in China, we set this date as the ending point of our model.
We updated our models dynamically from the 7th day following the starting point (i.e., 27 January). In this article, we presented the prediction of our models at the time points of 1st to 5th week, namely 27 January, 4 February, 11 February, 18 February, and 25 February.
For example, the model for the first week (as of 27 January) used the data from 20 January to 26 January for model construction and forecasted the daily increased and cumulative case numbers from 27 January to 10 March.