Forecasting COVID-19 onset risk and evaluating spatiotemporal variations of the lockdown effect in China

: It is important to forecast the risk of COVID-19 symptom onset and thereby evaluate how effectively the city lockdown measure could reduce this risk. This study is a first comprehensive, high-resolution investigation of spatiotemporal heterogeneities in the effect of the Wuhan lockdown on the risk of COVID-19 symptom onset in all 347 Chinese cities. An extended Weight Kernel Density Estimation model was developed to predict the COVID-19 onset risk under two scenarios (i.e., with and without Wuhan lockdown). The Wuhan lockdown, compared with the scenario without lockdown implementation, delayed the arrival of the COVID-19 onset risk peak for 1-2 days in general and lowered risk peak values among all cities. The decrease of the onset risk attributed to the lockdown was more than 8% in over 40% of Chinese cities, and up to 21.3% in some cities. Lockdown was the most effective in areas with medium risk before lockdown.


Introduction
Emerging infectious diseases, such as coronavirus disease 2019 (COVID- 19), have become global challenges for the public health sector 1,2 . This new virus is highly contagious and can be transmitted through respiratory droplets or physical contact 3,4 . As of 8th April, 2020, 1,353,361 persons worldwide have been identified as being infected by COVID-19 5 . Confirmed cases have appeared in 213 countries or regions 5 , and currently, the number is still rapidly increasing ( Fig.   1). With the fast development of modern transportation, people are able to travel on an accelerating scale and speed, which could make pathogens spread more easily and hence, greatly aggravate the transmission of diseases. To reduce the spread of COVID-19, China, from 23rd January 2020, imposed a lockdown measure in Wuhan (the regional epicentre of the COVID-19 pandemic at that time), the capital of Hubei Province with a population over 10 million. The Wuhan lockdown prevented anyone from entering or leaving Wuhan by any means of transportation from that specific lockdown moment. It also limited local movements of residents to only those critical activities. As a result, from around 10th February 2020, just 18 days after the implementation of the lockdown, a significant drop in daily new COVID-19 infections was witnessed in all of China's provinces except Hubei. Such early decreases in new infections enabled the relatively limited healthcare resources to be saved for both COVID-19 and those non-COVID-19 patients in the greatest need across China, which has further enabled the resource reallocation to COVID-19 patients in Wuhan and speeded up the recovery of the whole 3 country from the epidemic. As of 8th April 2020, the end of the Wuhan lockdown, 77,838 (about 93%) confirmed cases across China have recovered 6 . Lockdown increasingly seems a necessary measure to curb the current escalating pandemic for many other countries, thus it is urgent to provide scientific evidence regarding the exact effect of the lockdown measure and precisely how effectively the lockdown measure could bring COVID-19 under control 7 . Some early efforts have been made in this direction, however the findings are mixed. For example, Yang et al. (2020) showed that, if Wuhan had been locked 4 down five days later, the cumulative number of COVID-19 infections during 23rd January-24th April 2020 would have tripled in China 8 . Wu et al. (2020) predicted that, even with the lockdown implemented in Wuhan, many other major cities in China would still undergo local outbreaks with exponentially growing infections, as seen in Wuhan 9 .
A recent study estimated that the Wuhan lockdown was associated with the later appearance of the reported COVID-19 cases in other cities by 2.91 days (95% CI: 2.54-3.29) 10 . Nevertheless, all existing studies have used mathematical models of infectious disease transmission that rely on theoretical epidemiological parameters for making prediction (e.g., Susceptible-Exposed-Infectious-Recovered [SEIR] models). In such instances, the spatial relationships among cities (i.e., the First Law of Geography that everything is related to everything else, but near things are more related than distant things 11 ) have been significantly downplayed. Facing new infectious diseases (e.g.,  with limited prior knowledge of their features and also limited associated comparability with previous diseases, there has been much uncertainty in setting theoretical parameters and assumptions of mathematical prediction models [12][13][14][15] , which could have led to highly mixed and uncertain conclusions among existing studies. Moreover, all existing models, by using daily confirmed COVID-19 cases, are based on and also predictive of only the date of reporting, which is usually later than the date of the COVID-19 symptom onset.
It has been found that COVID-19 patients are most infectious during the first week after the symptom onset 16,17 . Therefore, by using findings purely on the basis of COVID-19 dates, the best time for preventing new COVID-19 infections may be missed.
Infectious diseases are spread by the spatial movements of dynamic hosts and/or vectors 18 , thus adopting appropriate spatial, data-driven models under the Fourth Paradigm has been considered a minimum requirement for exploring the progression of the risk of new infectious diseases such 5 as COVID-19, and also, for evaluating the effects of epidemic containment measures (e.g., Wuhan lockdown) 19 . The Weight Kernel Density Estimation (WKDE) model is one of such models, which conducts retrospective analyses to speculate the dates of infection for confirmed cases, on the basis of their locational information, and predicts the place-specific risk of infection caused by spatial movements of infected people 20 . Such spatial, data-driven models have integrated the First Law of Geography and alleviated reliance on theoretical assumptions and parameters, thereby making more robust, place-specific predictions than those based on mathematical prediction models in the contexts of new infectious diseases. The WKDE model, however, did not consider the changing effects of modern factors on traditional spatial relationships among places, such as travel vehicles, which have decreased travel impedance among places. For more accurate prediction, it is necessary to use dynamic mobility data to capture those modern factors, and improve existing spatial models to incorporate such big data into prediction.
In this study, an extended WKDE model has been developed, on the basis of the original WKDE model, to forecast the risk of COVID-19 symptom onset and evaluate the effects of the Wuhan lockdown on the risk of the COVID-19 onset at a high spatiotemporal resolution across China. This is a first-ever comprehensive, high-resolution investigation of spatiotemporal heterogeneities in the effects of the Wuhan lockdown on the risk of the COVID-19 onset in all 347 Chinese cities. The extended WKDE incorporates inter-city human mobility data for the calibration of traditional spatial relationships among cities in a high-speed era, to predict the risk of the COVID-19 onset and the spatiotemporal patterns of the onset risk under two scenarios -with and without Wuhan lockdown. The analysis was based on a spatiotemporal dataset of 40,486 confirmed COVID-19 cases in China during the period from the 31st December 2019 to 6 2nd March 2020 (the 40th day after Wuhan lockdown), among which 1,189 cases had available reported dates of symptom onset 21 ; dates of symptom onset for the remaining 39,297 cases without such information were speculated by an established statistical method 22 . The daily Wuhan out-migration to all other cities and the associated percentage of Wuhan migrants to every other city were calculated to indicate inter-city human movements in the extended WKDE model. Province; the onset risk in 96.8% of the areas in China was lower than 0.2 (Fig. 2a). The influences of human mobility on the onset risk were not apparent until approximately one week before the Wuhan lockdown, in which cities receiving a large number of passengers from Wuhan started to present higher onset risk, such as Chongqing (west), Beijing (north), and Guangzhou (south) (Fig. 2c). Approximately one week after the Wuhan lockdown, the onset risk reached the peak, before decreasing steadily over the whole area outside Hubei Province (Fig. 2f). The risks of symptom onset after two (Fig. 2g) and three weeks of Wuhan lockdown (Fig. 2h), in about 64.3% and 81.6% of cities outside Hubei Province, dropped to below 0.6, respectively. By the end of the fifth week following the Wuhan lockdown, the areas with the risk of onset higher than 0.8 shrank to only Wuhan and surrounding cities in Hubei (Fig. 2k). The accuracy of the predicted risk of symptom onset was evaluated on a daily basis, by calculating the percentage of confirmed cases reported in areas in which the predicted risk of symptom onset was higher than 0.8. A percentage higher than 70% is considered acceptable 20 .

Fig. 2. Predicted risk of COVID-19 symptom onset across all Chinese cities before the date of Wuhan lockdown (a-d) and under two scenarioswith (e-j) and without (k-m) Wuhan lockdown measureafter
The extended WKDE model resulted in prediction accuracy of over 70% regarding the onset risk in the first week after the base day. Such accuracy achieved by the extended WKDE model was higher than that resulting from the original WKDE model (Fig. 3). The prediction accuracy in the second week after the base day, as possibly could be expected, was naturally lower due to the accumulation of prediction errors over time. The outperformance of the extended WKDE model was attributed to human mobility being integrated into the WKDE model. under the lockdown scenario ( Fig. 2e-g), the onset risk under the non-lockdown scenario on the same date was significantly higher (Fig. 2k-m). Around the particular time that the onset risk reached a peak (Fig. 2m), the areas of onset risk which was higher than 0.8 would have expanded to include all Chinese cities, except those in the north and west, which had a low inter-city population flows from Wuhan. Wuhan lockdown has, indeed, contributed to the decreased risk of the COVID-19 onset in all 347 cities of China (Fig. 4a) A daily comparison of the predicted onset risk under lockdown and non-lockdown scenarios reflected the contributions by the Wuhan lockdown in three aspects: a constant lower daily onset risk after the lockdown, the delayed arrival of peaks of the daily onset risk by 1-2 days, and the subsequent lower peak risks. These three contributions could be observed not only in megacities, such as Shanghai, Beijing, and Shenzhen, but also in medium-size cities, such as Xiangtan (in Hunan Province), Hanzhong (Shaanxi), Zhangzhou (Fujian) (Fig. 4b). After the Wuhan lockdown had subsequently prevented more disease cases to be imported to other cities, the majority of cities outside Hubei were able to control the local spread of the disease, given the drops of the onset risk to an apparently lower level after two weeks (Fig. 4b). About three weeks after the Wuhan lockdown, decreases in the risk of onset relative to the corresponding peak values ranged from 21.1% to 78.9% among 285 cities outside Hubei with a peak risk of above 11 0.4, from 15.7% to 62.4% among 27 cities with a peak risk between 0.2-0.4, and from 0.2% to 5.1% in 18 cities with a peak risk of below 0.2 (Fig. 4a).  (Fig. 4a). The human mobility from Wuhan to those areas was much less intense than to most other cities in China (Fig. 5). In the six selected cities which were deemed representative of all Chinese cities in terms of the geography and economic development, it was also observed that the higher the existing risk of onset by the lockdown date, the less the percentage of the onset risk reduction was attributed to the Wuhan lockdown (Fig. 4b).
In the whole country, areas with a high percentage (>8%) of reduction of onset risk attributed to the Wuhan lockdown (Fig. 4b) largely coincided with areas with medium risk of onset (0.2-0.6) before the lockdown (Fig. 2d).

Discussion
This study for the first time has provided data-driven evidence on the effects of the Wuhan lockdown measures on the risk of COVID-19 symptom onset in all 347 Chinese cities in a high spatiotemporal resolution, i.e., demonstrating the changing patterns of the risk of COVID-19 symptom onset at city level on a daily basis. Specifically, Wuhan lockdown has lowered and delayed (for 1-2 days) the arrival of the daily onset risk peak in all other cities, also resulting in decreased risk of symptom onset in those cities following the Wuhan lockdown on 23rd January 2020 when compared to the scenario without the lockdown. It has also reduced the infection risk by imported cases from Wuhan in other cities within one week after the lockdown was imposed.
This situation varied across cities: the larger the volume of passengers a city received from 13 Wuhan, the larger the avoided risk of symptom onset. Furthermore, the Wuhan lockdown was found to be most effective in reducing the onset risk in areas with medium risks prior to lockdown. Nevertheless, despite the limitations listed above, this study has presented an unprecedented spatiotemporal prediction, thanks to the detailed publicly available information gathered from anonymous cases. Transparent reporting of travel and contact history of such a large number of anonymous infected cases has been realized in China, thus opening a new avenue in the era of big data under the Fourth Paradigm, for more advanced models to refine results from mathematical prediction models. It also enables and encourages multiple stakeholders outside the public health sector to be involved in collaborative control and prevention efforts to contribute to the curbing of increasingly more frequent and complex epidemics in the future 23,24 . 15 The high spatiotemporal resolution evidence from this study would be necessary for lockdown decision-making not only in all countries undergoing the first wave of infections, but in African and Latin American countries that will inevitably be hit by the next wave of infections 25   Extended WKDE model for predicting the risk of COVID-19 onset. The model makes the prediction through the following three steps: Step 1: Retrospective inference on historical existence likelihood of COVID-19 infection at each location, in which each confirmed case had a period of stay. The aim of this step is to speculate the infection date of each confirmed case, on the basis of the date of symptom onset, for estimating the risk that the case has transmitted pathogens to others from the infection date. 19 The incubation period of each confirmed case from infection to symptom onset is modelled following a Weibull distribution 30 : where pincubation(Δt) denotes the probability that the incubation period of each confirmed case equals to Δt days; λ and k denote the mean and standard deviation of the incubation period, which, in this study, were assumed to be 5.2 and 2.8 days, respectively 31,32 ; and e denotes the natural exponential.
All days in this study period (8 th December 2019 -27 th February 2020) are in the order denoted as t1, t2,… t82. The probability that each confirmed case was infected on a certain day and thus became infectious is: where Pinfection(L, ti) denotes the probability that one confirmed case at location L was infected on day ti; tL denotes the day of symptom onset for the confirmed case at location L; n(tL) is the number of onset cases at location L on day tL; pincubation(tL-ti) denotes the probability that the incubation period of the confirmed case is equal to (tL-ti) days.
Step 2: Spatial extrapolation for inferring historical existence likelihood of COVID-19 infection in the whole study area. Let L1, L2,… Ln be the unique locations of all confirmed cases used for the prediction. The risk of infection at each random location is estimated as: where Pinfection(S, ti) denotes the probability that any infected person visited a random location S, where Ponset(S, tz) denotes the likelihood that at least one person infected by a confirmed case at location S develops clinical symptoms on day tz; ti denotes the date of infection for that person, so always i<z. Note that Ponset(S, tz) values represent point estimates over the continuous space.
Such a risk measure may not be intuitive for decision making and hence could be standardized to 21 the range of 0 and 1 by being divided by the maximum predicted risk among all locations on day tz. By doing so, the standardized predicted risk is seen as the relative risk of symptom onset to the highest risk of symptom onset in the study area, which can serve as an intuitive indicator for epidemic control and prevention work on the ground. Moreover, point estimates of the risk can be averaged flexibly over any areal unit (e.g., city, residential community), and hence could overcome the modifiable areal unit problem (MAUP) during epidemic response 33 .
Assessment of prediction models. The accuracy of the predicted risk of symptom onset has been evaluated on a daily basis, by calculating the percentage of confirmed cases reported in areas in which the predicted risk of symptom onset was higher than 0.8. A percentage higher than 70% is considered acceptable 20 . The predicted results were also compared to results from the original WKDE model 20 for all cities. Since prediction errors could accumulate with time in a prediction for further future, the risk predicted on the basis of data no later than "the day before" is usually the most accurate. As a consequence, all risks of symptom onset mentioned in this study have been predicted based on confirmed COVID-19 cases with onset dates no later than the previous day, except the risk on 27 th February 2020, which was predicted based on the data on or before 20 th February 2020 (the last date of symptom onset speculated among confirmed cases).

Data availability
The datasets generated and analysed during this study are available from the corresponding author on reasonable request.