Revealing the Transmission Dynamics of COVID-19: 1 A Bayesian Framework for 𝑹 𝒕 Estimation 2

15 In epidemiological modelling, the instantaneous reproduction number, 𝑅 ! , is important to 16 understand the transmission dynamics of infectious diseases. Current 𝑅 ! estimates often suffer 17 from problems such as lagging, averaging and uncertainties demoting the usefulness of 𝑅 ! . To 18 address these problems, we propose a new method in the framework of sequential Bayesian 19 inference where a Data Assimilation approach is taken for 𝑅 ! estimation, resulting in the state- 20 of-the-art ‘DARt’ system for 𝑅 ! estimation. With DARt, the problem of time misalignment 21 caused by lagging observations is tackled by incorporating observation delays into the joint 22 inference of infections and 𝑅 ! ; the drawback of averaging is improved by instantaneous 23 updating upon new observations and a model selection mechanism capturing abrupt changes 24 caused by interventions; the uncertainty is quantified and reduced by employing Bayesian 25 smoothing. We validate the performance of DARt through simulations and demonstrate its 26 power in revealing the transmission dynamics of COVID-19. 28

factor, as shown in Figure 1(B). In this way, all the relevant observations are fully exploited to 218 enable us to reduce the uncertainty of ! estimation. 219 In summary, comparing with the sliding-window (i.e., averaging inference) approaches, our 220 sequential Bayesian updating mechanism of DARt features an instantaneous ! estimation and 221 smoothing uncertainty through the utilisation of all available observations. 222

Validation through simulation 223
Due to the lack of ground-truth ! in real-world epidemics, we present a set of experiments 224 based on synthetic data to establish the face validity of DARt system. Figure 2 illustrates the 225 design of simulation experiments where a synthesised ! , is adopted as the ground truth to 226 validate its estimated > ! . We also estimated ! using the state-of-the-art ! estimation package 227 EpiEstim 25 to compare the effectiveness in overcoming three aforementioned issues (i.e., 228 lagging, averaging and uncertainty). truth ! sequence is synthesised using piecewise Gaussian random walk split by two abrupt change points, simulating the drop of mean ! from 3.2 to 1.6 and then 0.8 under 233 two intensive interventions. The sequence of incident infection ! is simulated based on a 234 renewal process parameterised by the synthesised ! and starting with 5 infections on the 235 first day. The observation process includes applying a convolution kernel representing 236 the probabilistic observation delay to obtain the expectation of observation ̅ ! and adding 237 Gaussian noise representing the reporting error to obtain the noisy 'real' observation ! . 238 The inputs (in grey) to the DARt system are the distributions of generation time, 239 observation kernel and simulated noisy observation ! . The system outputs are the 240 estimated > ! , estimated ̂! and change indicator B ! . These outputs are compared with the 241 synthesised ! , ! and the time of abrupt changes. Also, the observation function is 242 applied to the estimated ̂! to compute the recovered observation C ! with uncertainty, 243 which is compared to the 'real' observation. This provides an indirect way to validate the 244 correctness of inference results. These comparisons results are shown in Figure 3. 245 Data synthesis. We first generated a synthesised ! curve from a piece-wise Gaussian random 246 walk mimicking the scenario of two successive interventions (Figure 3 (A)). To approximate 247 the early stage of exponential growth, the simulation started with . = 3.2, which reflects the 248 basic reproduction number of COVID-19 and followed a Gaussian random walk 249 !)#~( ! , (0.05) / ). At = 20, we set /. =1.6 indicating the mitigation outcome 250 of soft interventions. After soft interventions, the epidemic is still being uncontrolled with the 251 evolution of ! resuming to the Gaussian random walk as above. At = 30, ! experienced 252 another abrupt decrease to a value under 1, where we set 0. =0.8 to indicate the suppression 253 effects of intensive interventions (e.g., lockdown). Afterwards, the epidemic is being controlled, 254 and the evolution of ! follows the same random walk as above. 255 With the simulated ! curve, we followed the renewal process using the generation time 256 distribution as reported by Ferretti et al. 3 (i.e., the Weibull distribution with shape and scale 257 equal to 2.826 and 5.665 days respectively) to simulate the infected curve ! starting with 5 258 infections at = 0. Then we generated the lagging observation curve of onset cases ̅ ! using 259 the incubation time distribution 3 (i.e., the lognormal distribution with log mean and standard To compare with the state-of-the-art EpiEstim package 25 , we input the infection curve and the 271 distribution generation time to EpiEstim to estimate ! with a 7-day sliding window following 272 the recommended practice. It is noted that two approaches of preparing infection curve input 273 for EpiEstim were taken as common practice: 1) 'plug-and-play' use 1 by taking observations 274 ! as the infection without adjusting for the observation delay; 2) the two-step strategy 29 that 275 shift ! backwards in time by the median observation delay (5 days in the simulation). We 276 implemented both practices and compared with DARt. 277 All validation results are shown in Figure 3. 278 be alleviated by temporal shifting in EpiEstim, its inferred ! cannot match well to the 302 synthesised ! curve, failing to reflect two abrupt changes in a timely fashion. 303

Applicability to real-world data 304
We applied DARt to estimate ! in four different regions during the emerging pandemic. Each    We find that the daily reported cases in Sweden have shown periodic drop every 7 days that 388 are likely caused by misreporting. By looking at the results, we can find that the influence of 389 such periodic fluctuations has been smoothed by DARt to provide a consistent ! curve. 390 Comparing DARt performance with different types of observations. Figure 5 shows the 391 results of ! estimation using onsets and confirmed cases ( Figure 5

(A)) as observations in 392
Hong Kong to estimate ! , respectively. We choose Hong Kong for illustration purpose since 393 both onset and reported confirmed cases are publicly accessible. The CIs of inferred results 394 from two different observations largely overlapped (Figure 5 (B)). For the emerging outbreak 395 started in mid-November, as the record of onset cases is not completely documented by the 396 government yet (especially for the recent week) and the tails of onset and confirmed curves in 397

448
In this paper, we have proposed DARt by adopting a Bayesian inference scheme for estimating 449 ! . Our work provides a state-of-the-art ! estimation tool supporting a wide range of 450 observations. In the system, epidemic states can therefore be updated using newly observed 451 data, following a data assimilation process in the framework of sequential Bayesian belief 452 updating. For the model inference, a particle filtering/smoothing method is used to approximate an analytical approximation to the posterior probability and can be regarded as limiting mean value of ! but also its estimation uncertainty is important for advising governments on 495 policymaking, an analytical approximation to help properly quantify uncertainty is desirable. 496 Fourthly, change detection is approximated by the change indicator ! . The change indicator 497 is included as part of the latent state and inferred during particle filtering. This work opens a 498 venue to explore variational Bayesian inference for switching state models 36 . Crucially, 499 variational procedures enable us to assess model evidence (a.k.a. marginal likelihood) and 500 hence allow automatic model selection. Examples of Variational Bayes and model comparison 501 to optimise the parameters and structure of epidemic models can be found in previous studies 37 . 502 These variational procedures can be effectively applied to change detection. 503 Finally, the method outlined in the paper can, in principle, be applied to generative epidemic 504 models that include more latent states that underwrite the renewal process; for example, contact 505 rates, transmission strengths, et cetera. We envisage that such models would be considered 506 from observational, spatial-temporal and social perspectives. From the observational aspect, 507 multiple epidemic curves are generally available (e.g., daily onsets, deaths or confirmed cases). 508 This allows using different kinds of data to inform model parameters (and structure). This sort 509 of modelling may call for a generative model that explicitly includes the latent states generating 510 the data at hand (e.g., hospital admissions). Dynamic causal models 37 are potential candidates 511 here because they extend conventional (SEIR) models to include spatial location, mobility, 512 hospitalisation et cetera. From the spatial-temporal perspective, one could construct a 513 homogeneously mixed spatial-temporal model with connected regions that share the same 514 model structure but with distinct model parameters 38 . Mobility information could then be used 515 to inform inter-regional spread, when suitably parameterised. From the socio-behavioural 516 aspect, one could build a comprehensive model by including epidemic-relevant behavioural 517 factors into the model, especially human mobility trends. This usually entails modelling 518 differential contact rates between subpopulations (or populations in specific locations) as a 519 function of other latent states: for example, modelling social distancing as a non-linear function 520 of the prevalence of infection 39 . Mobility is reflected in the use of public transportation, 521 people's average walking distance, people's attitude towards disease (cautious or passive), and 522 people's lifestyle (e.g., work from home, take-away). All of these metrics are, to a greater or 523 lesser extent, available as empirical constraints on suitably structured generative models. By 524 considering different modelling factors, the estimation results of ! should be more accurate 525 and furnish more precise credible intervals.
In conclusion, our work provides a practical scheme for accurate and robust ! estimation. It 527 opens a new avenue to study epidemic dynamics within the Bayesian framework. We provide 528 an open-source ! estimation package as well as an associated Web service that may facilitate 529 other people's research in computational epidemiology and the practical use for policy 530 development and impact assessment.

Sequential Bayesian Inference 590
In Figure 8, we illustrate the Bayesian inference scheme of DARt with the following 591 descriptions. 592 § State transition model 593 In our model, indirectly observable variables ! and ! are included in the latent state. The state 594 transition function for ! is commonly assumed to follow a Gaussian random walk or constant 595 within a sliding window as implemented in EpiEstim. This kind of simplification is not capable 596 of capturing an abrupt change in ! under stringent intervention measures. To address this problem, we introduce an auxiliary binary latent variable ! to characterise and switch 598 between two distinct evolution patterns of ! -smooth transition (Mode I, ! = 0) and abrupt 599 change (Model II, ! = 1): 600 where ( !&# , 6 / ) is a Gaussian distribution with the mean value of !&# and variance of 602 6 / , describing the random walk with the randomness controlled by 6 . U[0, !&# + ∆] is a 603 uniform distribution between 0 and !&# + ∆ allowing sharp decrease while limiting the 604 amount of increase. This is because we assume that ! can have a big decrease when 605 intervention is introduced but it is unlikely to increase dramatically as the characteristics of 606 disease would not change instantly. 607 The transition of the change indicator ! , is modelled as a discrete Markovian process with 608 fixed transition probabilities: 609 where is a value close to and lower than 1. The above function means that the value of ! is 612 independent of !&# , while the probability of Mode II (i.e., ! = 1) is quite small. This is 613 because it is unlikely to have frequent abrupt changes in ! . 614 For the incident infection ! , the state transition can be modelled based on Equation (1) as 615 We formulate the inference of the latent state ! =< ! * , ! * , ! * > with the observations ! 631 as within a data assimilation framework. Sequential Bayesian approach (also called 'filtering') 632 is adopted to infer the time-varying latent state, which updates the posterior estimation using 633 the latest observations following the Bayes rule. 634 Let us denote the observation history between time 1 and as #:! = [ # , / , … , ! ]. Given that 635 previous estimation ( !&# | #:!&# ) and new observation ! , we would like to update the 636 estimation of ! , i.e., ( ! | #:! ) following the Bayes rule with the assumption that #:! is 637 conditionally independent of #:!&# given ! : 638 where ( ! | #:!&# ) is prior and ( ! | ! ) is the likelihood. The prior can be written in the 640 marginalised format: 641 where ! is assumed to be conditionally independent of #:!&# given !&# , and the transition In DARt, we adopt backward smoothing to infer the latent state at a certain time, given all 657 observations relevant to the state. Based on the filtering results of ( ! | #:! ) for ∈ {1, . . , }, 658 we can obtain the smoothing results ( ! | #:% ), where is the total number of observations. 659 To assimilate the information from subsequent observations, we use a standard backward pass sequential Bayesian update module with two phases (forward filtering and backward 670 smoothing). The latent state that can be observed in ! are defined as ! =< 671 ! * , ! * , ! * > where ! * is the instantaneous reproduction number, ! * is a binary state 672 variable indicating different evolution patterns of ! * , ! * = [ ! * &% $ )# , ! * &% $ )/ , … , ! * ] is 673 a vectorised form of infection numbers ! , * indicates the most recent infection that can 674 be observed at time is from the time * due to observation delay, and 5 is the length of 675 the vector ! * such that ! is only relevant to ! * and ! * )# only depends on ! * via the 676 renewal process. 677 For Wuhan, we adopted the daily number of onset patients from the retrospective study 1 (from 682 the middle of December to early March). For UK data, we downloaded the daily report cases 683 (cases by date reported) from the official UK Government website for data and insights on 684 Coronavirus (COVID-19) 32 (from early March to the end of November 2020). Data for UK 685 Cities were also downloaded from the same resource 32 (from early August to the end of 686 November). For Sweden data, we downloaded the daily number of confirmed cases from the 687 European Centre for Disease Prevention and Control 33 (from early March to the end of 688 November 2020). For Hong Kong, we downloaded the case reports from government website 31 689 (from early July to the end of November 2020), including descriptive details of individual 690 confirmed case of COVID-19 infection in Hong Kong. For those asymptomatic patients whose 691 onset date are unknown, we applied their reported date as their onset date, and for those whose 692 onset date is unclear, we simply removed and neglected these records. Only local cases and 693 their related cases are considered, while imported cases and their related cases are excluded. 694 Code availability 695 We are releasing DARt as open-source software for epidemic research and intervention policy 696 design and monitoring. The source code of our method and our web service are publicly 697 available online (https://github.com/Kerr93/DARt). 698 699