Modelling and estimating the transmissibility of COVID-19: Transmission dynamics in Shaanxi Province of China as a case study

To control and contain the outbreaks of emerging infectious diseases such as COVID-19, it is important to know how easy and fast they transmit among people. To explore the essential information of the novel infectious agents, people always confront an inverse problem: using (partially) observed numbers of infected people by time and region to dig up the underlying characteristics of unknown infectious agents. Epidemics armed with advanced statistical inference and mathematical theory has been developed to help reconstruct transmission dynamics processes and to estimate key features of infectious diseases. In this study we use COVID-19 outbreak in Shaanxi province as an example to illustrate how the infectious disease dynamics method can be used to help build the transmission process and to estimate the transmissibility of COVID-19. Three transmission dynamics models were proposed for this. By separating continuous importation from local transmission and treating imported cases as the source rather than results of local transmission, the basic reproduction number of COVID-19 in Shaanxi province was estimated in the range from 0.46 to 0.61, well below the critical value of 1.0. This indicates that COVID-19 cannot self-sustain in Shaanxi province and reflects the timely and strong control measures taken in Shaanxi province.


Introduction
The emerging coronavirus infectious disease, COVID-19, has been circulated worldwide since January 2020. To control its spread, it is crucial to accurately estimate its important epidemiological characteristics such as transmissibility and to predict its further spread. In epidemiology, mathematical equations have been used to describe the transmission dynamic processes and to predict how infectious diseases spread within a population. It has a long history and goes back to Ronald Ross (1911). The problems people face in understanding infectious disease outbreaks are always inverse: we observe the numbers of infected people by time and region, but we don't know the underlying characteristics of the infectious disease, especially for novel emerging infectious diseases such as COVID-19. With the progress of advanced statistical inference methodology and mathematical theory, epidemics has become a well-developed field which targets quantitatively describing infectious disease dynamics (Anderson and May, 1991;Heesterbeek and Diekmann, 2000;Keeling and Rohani, 2007). Based on the observed data and inference methods and modelling techniques, epidemics estimate the key parameters of epidemic processes and provide policy makers timely and accurate information of infectious disease and their spread within the population (e.g., Wu J et al 2020).
The transmissibility of an infectious agent describes how easy and fast the infectious disease can spread within a population. It is usually measured by the basic reproduction number (denoted it as R0), which is defined as the average number of secondary infections generated by an infectious person introduced into a completely susceptible population (Anderson and May, 1991). Although many methods of estimating R0 have been developed (Heffernan et al., 2005;Vynnycky and White, 2010), the difficulty in measuring R0 of COVID-19 lies in the fact that it is a novel coronavirus. The knowledge of the well-known coronaviruses such as SARS and MERS have been borrowed for understanding the early transmission dynamics of COVID-19 (e.g., Wu J et al., 2020). Nevertheless, epidemiological characteristics of COVID-19 appear quite different from those of both SARS and MERS (Biggerstaff et al., 2020). The basic knowledge of COVID-19 epidemiological features should be obtained from the epidemic data during outbreaks. During the initial outbreaks of COVID-19 in China from January to March 2020, the national and provincial governments and public health authorities have collected lots of data about the outbreaks and individual cases' information. These data undoubtedly provide a good chance for us to understand the transmissibility of COVID-19 and the impact of control measures taken on stopping the spread. The reliable and accurate estimates depend on our understanding of how SARS-CoV-2 is transmitted in the population and the appropriate inference methods to calibrate the transmission models. We noticed that although a few modelling studies have been published in China Wu W et al 2020), this has not been correctly addressed. In this study we propose three epidemic models for the transmission dynamics of SAR-CoV-2: renewal equation model, SEDAR and SEEDAR models (Figure 1), and use advanced inference methods to estimate R0 by applying the three models to observed data in the outbreak in Shaanxi province of China. We found that having separated continuous importation from local transmission, R0 of COVID-19 in the Shaanxi outbreak is well below the critical value 1.0; among the three models, the simplest model (renewal equation) gives the best model fitting and prediction.

Renewal equation model:
Bayesian inference suggests the basic reproduction number (R0) during the COVID-19 outbreak in Shaanxi province has a median 0.61 and 95% confidence interval (CI) from 0.54 to 0.68 (Table 1). The serial interval (SI), the lag in onset dates of symptoms between an infector and its infectee, is estimated to have a mean 4.6 days and standard deviation 11.7 days, which is comparable with the direct observation of SI from the outbreak ( Figure S3). Model projection into next month (right panel of Figure 2) indicates that the outbreak will die out within one week (i.e., the end of February 2020) and is unlikely to generate any further local cases under the current restriction measures. This prediction is consistent with what was happened in Shaanxi province.
The time-varying reproduction number Rt shown in Figure 3 demonstrates how the transmissibility changed along the course of the outbreak. Assuming SI_mean=3.9 days and SI_sd=5.1 days which are directly estimated from the observed SIs, Rt increased to about 3.0 within the first week and then reduced to low values but occasionally exceeded the critical value 1.0. Its overall average is 0.60 which is nearly identical to the median of posterior of R0 in the above model fitting. If assuming SI_mean=4.6 and SI_sd=11.7 (i.e., the estimate of SI from the above model fitting), we found the similar changing patterns of Rt and nearly the same overall mean Rt. The change of Rt reflects the enhancement of intervention measures with the outbreak in Shaanxi province.

SEDAR model:
The estimates of model parameters are shown in Table 1. The calibration of the SEDAR model under the situation with equal incubation and infectious periods for both symptomatic and asymptomatic infection shows that: R0 is estimated at 0.59 with 95%CI from 0.51 to 0.71. The incubation period is 1.8 days (95%CI: 1.6, 2.9), and the infectious period is 3.8 days (95%CI: 3.5, 5.3). The SEDAR model fitting to the observed data of local cases and its prediction over one month ahead is shown in the middle panel of Figure 2. Sensitivity analyses show that nearly the same estimates of model parameters are obtained when considering different values of L2 and D2 and . This reflects the fact of the outbreak in Shaanxi province that the asymptomatic infections occupied a very small proportion of all infections (1.1%)  and therefore had small effects on model performance.

SEEDAR model:
The estimates of parameters of SEEDAR model are also shown in Table 1.
The MCMC samples of model parameters under the situation (i.e., L2=L1, D2=D1 and =0.5) are shown in Figure S5. The R0 is estimated at 0.46 with 95%CI from 0.30 to 0.79. The incubation period of symptomatic infection is 5.0 days (95%CI: 1.3, 9.7) which is consistent with the observed values (mean= 4.4 days and sd=5.5 days), and the infectious period of symptomatic infections is 4.8 days (95%CI: 1.6, 14.1), which is longer than the delay from onset date of symptoms to hospitalization: mean= 2.6 days and sd=3.7 days ( Figure S4). The duration of pre-symptomatic transmission (L3) is estimated at 1.5 day (95%CI: 1.0 to 4.4days), which suggests the fraction of transmission from strictly pre-symptomatic infections was 1.5/(1.5+4.8) =24% (c.f. Ferretti et al., 2020). The SEEDAR model well fits to the observed data and predicts that the outbreak will die out within about half month (the right panel of Figure 2).

Discussion
In this study, three dynamics models were proposed to model the transmission dynamics of COVID-19 epidemic within Shaanxi province from the early January to late February 2020. By distinguishing imported and local cases in their contribution to local transmission dynamics, we show the basic reproduction number R0 of COVID-19 in the Shaanxi outbreak was well below the critical value 1.0. This indicates that SAR-CoV-2 cannot self-sustain under the current control measures within Shaanxi and will stop once the importation of COVID-19 cases is halted. Our model predictions reflect the real situation in Shaanxi province since the late of February 2020.
The estimate of R0 from renewal equation and SEDAR models are close to each other, and its estimate from SEEDAR model is lower; nevertheless, their 95%CIs are closely overlapped. Overall, the estimate of R0 is in the range from 0.46 to 0.61. The model fittings to the local cases shown in Figure 2 indicates that the renewal equation model provides the best fit to the observations and SEDAR model is the worst. This is further confirmed by the values of DIC in Table 1 (Burnham and Anderson 2002): 126.9, 175.1 and 160.5 for Renewal equation, SEDAR and SEEDAR models, respectively. Furthermore, the better performance of the SEEDAR model than the SEDAR model confirms the existence of pre-symptomatic transmission (Ferretti et al., 2020).
It is worth emphasizing that although the renewal equation model is the simplest in model structure, it gives the best model fit. Given distribution of SI of COVID-19 which is specified by its mean and standard deviation, it is straightforward to obtain the estimate of the R0 . In this study we perform a joint estimation of R0 and SI, and the results agree well with two compartmental models in estimate of R0 and the empirical knowledge of SI (Biggerstaff et al., 2020). However, it should bear in mind that the successful performance of the joint estimation of R0 and SI in this study may be conditional on the very low proportion (i.e., 1.1%) of asymptomatic infections (c.f. Griffin et al., 2011). Another thing worth pointing out is the continuous importation along the course of an outbreak within one region except the epicentre Wuhan city, China. This spectacular feature, unlike infectious disease pandemics before (such as 2003 SARS pandemic and 2009 influenza pandemic), may reflect the rapid and huge movements of modern human beings. If assuming the earliest importation as the only index case(s), the transmission dynamics cannot be appropriately investigated and might be misled Wu W et al., 2020).
In modelling the transmission dynamics of COVID-19 over the whole mainland China, we found that R0 was estimated at 2.23 before 8 th February 2020 and then it dropped to 0.04 . To check whether there was any potential breaking point in transmissibility of COVID-19 within Shaanxi, we calculate the instantaneous reproduction number Rt (Cori et al., 2013). The result shown in Figure 3 indicates that no clear pattern emerged that supported a potential breaking point in transmissibility although Rt exceeded 1.0 on five days (8, 20, 31 January and 2, 10 February 2020). This may reflect the quick and strong control measures taken within Shaanxi province and the COVID-19 was under effective control soon since it was announced in public on 20 January 2020 that COVID-19 can be transmitted among people.
In conclusion, it is important to separate continuous importation from the local transmission when modelling the local transmission dynamics of COVID-19. Modern inference methodology and mathematical theory can help reveal the unobserved transmission dynamic process and hence provide valuable information for us to understand and control the spread of COVID-19. The renewal equation model, albeit being simple in model structure and of only three parameters, provides the good model fitting and therefore is a practical candidate for analysing transmission dynamics and estimating the transmissibility.

Data
The outbreak data of COVID-19 in Shaanxi province, China were collected from Shaanxi provincial government website from 23 January to 20 February 2020. There are 245 cases reported during the period, among which 113 are imported from outside of Shaanxi and 132 are locally transmitted within Shaanxi province (left panel of Figure S1). The dates of symptom onset were recorded for only 210 cases. The delay from symptom onset to reporting from these 210 cases is estimated to have a mean of 7.54 days and a standard deviation of 4.12 days. The other 35 cases whose dates of symptom onset were missed are imputed from their reporting dates and the distribution of delay from symptom onset to reporting. The timeline of dates of symptom onset of 245 cases is shown in the right panel of Figure S1.
To provide the direct dates for modelling local transmission dynamics, we construct modified dates of symptom onset for imported cases. If the date of symptom onset of one imported case was earlier than the date of entry into Shaanxi province, then its modified date of symptom onset is its entry date; otherwise, the modified date of symptom onset is its date of symptom onset. There are 19 cases whose dates of symptom onset were earlier than their entry dates. With these arrangements, the timeline of modified dates of symptom onset is shown in Figure S2.
From 101 pairs of infector-infectee observed during the outbreak, the SI is estimated to have a mean of 3.88 days and standard deviation of 5.08 days ( Figure S3). From 144 cases that had dates of exposure and symptom onset, the incubation period is estimated to have a mean of 4.51 days and standard deviation of 5.40 days and the delay from onset of symptoms to hospitalization has a mean of 2.63 days and a standard deviation of 3.56 days ( Figure S4).

Renewal equation model
It is assumed that, once infected, individuals have an infectivity profile given by a probability distribution ws, dependent on time since infection of the case, s, but independent of calendar time, t. The distribution ws typically depends on individual biological factors such as pathogen shedding or symptom severity. For simplicity, the distribution ws is approximated by the distribution of serial interval (SI). In the original renewal equation model, Fraser (2007) considers a situation where the only importation is index case(s) at the beginning of the outbreak and other cases are generated by local transmission. During the spread of COVID-19 in 2020, the outbreak within a region (except Wuhan the epicentre) took place with continuous importation. Let ct be the number of cases whose symptoms onset at day t, its expected value is approximated by (1) Here It-s is the number of imported cases that have the onset date of symptoms on day t-s and ws represents the probability mass function of the SI of length s days, which can be obtained by = ( ) − ( − 1), with G(.) representing the cumulative distribution function of the gamma distribution. The gamma distribution is characterised by its mean SImean and standard deviation SIsd, both of which are to be estimated jointly with R0 from the collected data during an outbreak (Griffin et al., 2011). Because only 19 cases among 113 imported cases had symptom onset before entering Shaanxi province, the assumption that all cases started their infectivity duration within Shaanxi province, which is implicitly required in equation (1), should be approximately satisfied.
Equation (1) can be rearranged to give an expression for instantaneous reproduction number Rt: That is, Rt, can be estimated by the ratio of the number of new infections produced at time step t, ct, to the total infectiousness of infected individuals at time t, given by ∑ ( − + − ) min ( −1, _ ) =1 , the sum of infection incidence including both imported and locally generated up to time step t-1 or the maximum of SI (whichever is the smallest), weighted by the infectivity function ws. Rt is the average number of secondary cases that each infected individual would infect if the conditions remained as they were at time t (Cori et al., 2013). Rt can be used to monitor the change of transmissibility along the course of outbreak.

SEDAR transmission model
We assume the transmission dynamics of SARS-CoV-2 virus is described by an SEDAR compartmental model as shown in Figure 1A. That is, a susceptible person (S) contracts SARS-CoV-2 virus from infectious people and then enters the latent class (E); a fraction () of these exposed after an average latent period (L1) progress to become diseased (I) and the other fraction (1-) remain asymptomatic but become infectious after an average latent period (L2). The diseased infections will be detected and admitted to hospital and isolated from the community after an average period D1 and the asymptomatic cases recover after an average infectious period D2. The model can be described by the following set of differential equations: Here N is the total size of the population under investigation and is assumed to be constant during the outbreak. The definitions of model parameters are given in Table 1. Importantly, the model includes imported cases (i.e., Imported(t) in equation for I) from outside the population as reported by public health authorities (c.f., Bai et al., 2020). This is to treat imported cases as the source rather than the results of local transmission from the region under investigation.
The steady-state solution of the equation system (2) can be easily obtained. The expression for S* (the size of the population susceptible to infection at equilibrium) is * = From this, we can obtain the expression of basic reproduction number (3) (Vynnycky and White 2010). In the special situation where L1=L2 and D1=D2, expression (3) reduces to (3a) Ferretti et al. (2020) shows that 30% to 50% of all transmission are pre-symptomatic transmission. To take the pre-symptomatic transmission into account, we modify the above SEDAR model by including a secondary exposure compartment (see Figure 1B). For simplicity, this new compartment is assumed to be asymptomatic but of the same infectivity as the symptomatic infections. The corresponding equations are modified as

SEEDAR transmission model
Compared with the SEDAR model, a new parameter L3, the duration of the late incubation period on which the infected person can pass the virus on, is introduced and is to be estimated (See Table 1).
Similarly, the basic reproduction number R0 for SEEDAR model can be obtained by deriving the expression of equilibrium number of susceptible people, and it is given by

Inference method
Inference is carried out within the Bayesian framework (Bettencourt and Ribeiro 2008;Cauchemez et al., 2004),  . Given values of parameters  for renewal equation model (1), simulating the time series of infections, denoted as (t), t= tstart,…, tend, is straightforward. Here tstart and tend represent the start day and end day of the outbreak data collected, respectively. For each set of parameter values of SEDAR and SEEDAR models, the Runge-Kutta fourth order method is used to solve the model equations and to obtain model predicted time series of infections.
In the inference of model parameters, directly observed dataset of confirmed/ hospitalized/ reported cases are used as illustrated in the following. The likelihood function for the observed times series of local cases x(t), t= tstart,…, tend, is given as Here with η being the dispersion parameter of the negative binomial distribution.
The parameters are estimated using Markov Chain Monte Carlo methods with Gibbs sampling and non-informative flat prior. The boundaries of uniformly distributed priors are set upon from the literature (Biggerstaff et al., 2020) and the data collected from Shaanxi province ( Figures S3 and S4). We give the details of the MCM sampling method below.
To propose new values for parameters, we use normal random walk. Suppose the current value of the j th parameter of  is j (t-1) , the new proposal is j * = j (t-1) + jz Here z is a standard normal variable and j is the step size of the j th parameter. The normal proposal density is given by That is, j * follows N(j (t-1) , j 2 ) (normal distribution with mean= j (t-1) , and standard deviation =j). The proposal is accepted as the next step of the Markov chain with probability  = min(A,1), where Here (.) denote the prior density, L(j * |y) the likelihood of parameter j * given data y. For a truncated normal walk on the range (a,b), the proposal density is given by Where (.) is the cumulative distribution function of standard normal. The expression for A is consequently modified as Sample a uniformly distributed random number (r) between 0 and 1, To generate nearly independent samples of model parameters, the Markov chain samples are to be thinned every 400 th observations. To respond to the acceptance rate, the following adaptive procedure is applied: if the acceptance ratio over 400*200 iterations is less than 12%, then decrease the jump step to 80% of its current size (i.e., j =0.8j); if it exceeds 40%, then j =1.2j. Otherwise the jump step j remains unchanged. To allow the MCMC process to fully converge, a burn-in period of 400,000 iterations is chosen and the estimates of model parameters are obtained from the further 400,000 iterations.
To compare the performance of the above three models (Burnham and Anderson, 2002), the deviance information criterion (DIC), which combines goodness of fit and model complexity (Spiegelhalter et al., 2002), is used. It measures fit via the deviance Dev() =-2logL(|Data) and complexity by estimate of the 'effective number of parameters' pD= mean(Dev())-Dev(mean()) (i.e., posterior mean deviance minus deviance evaluated at the posterior mean of the parameters). The DIC is calculated as DIC = Dev(mean()) + 2pD = mean(Dev()) + pD.
The model that has the smallest DIC is the best.

Compliance and ethics
The authors declare no Conflict of Interest.