This study demonstrated that the Baidu Search Index could be used in the early warning and predicting the COVID-19 epidemic with different keywords in different period. The Baidu Search Index of “Influenza” and “Pneumonia” could be used in the early warning of COVID-19 in Wuhan on December 2019. After the National Health Commission sent the rapid response team to Wuhan but before COVID-19 was notifiable in China, the continuously high search volume of “SARS”, “Pneumonia” and “Coronavirus” in Wuhan and the increased concern of these search keywords in Hubei (excluded Wuhan) and China (excluded Hubei) indicated that these three keywords could be used in predicting the severity of COVID-19 in Wuhan and the further spread in China. After COVID-19 was closely monitor in China in phase 3, the Baidu Search Index of “COVID-19”, “SARS”, “Pneumonia”, “Coronavirus” and “Mask” could be used in predicting the epidemic trends with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei).
With the popularity of internet, more and more people tend to search for health-related knowledges and information when getting sick. In this study, we found that the search volume of “Influenza”, “Pneumonia” in phase 1 in Wuhan and the search volume of “SARS”, “Pneumonia”, “Coronavirus” in phase 2 in all study areas increased with the number of new onset cases. There were high correlated and consistent distribution between the search volume and the number of cases by symptom onset date. This implicated that public paid great concerned to this emergency infectious disease and searched for help from internet immediately once they develop the disease symptoms. This could be further approved as the search volume in Wuhan, the most severe epidemic area, was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). As there is always time lag between onset to diagnosis, the Baidu search data could play important role in the early warning of COVID-19.
Previous studies about the early warning of COVID-19 only focused on the whole China and the period after COVID-19 was notifiable and with inconsistent findings. A study reported that the internet search data had 10–14 days lead time in the prediction of COVID-19[11], while other studies showed 18 days [19]and 0–4 days lead time [10]. However, the epidemic situation in different areas were different and the public concerned were closed related to severity of epidemic, population density, economic level, etc. [10]. In this study, we analyzed the data in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei) separately according to the epidemic situation and weight the search behavior by population. We found that the Baidu Search Index of “COVID-19”, “SARS”, “Pneumonia”, “Coronavirus” and “Mask” was increase ahead of the number of reported cases, which showed 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei) in phase 3. The long lead time in Wuhan may due to the severe shortage of medical resources. Lots of cases could not receive timely diagnosis and treatment and have to self-quarantine at home [13]. The lead time in Wuhan was consistent with the average interval of 12 days from symptom onset to laboratory confirmation [20]. But in Hubei (excluded Wuhan) and in China (excluded Hubei), the medical resource was relative sufficient, the lead time in these areas were shorter and consistent with the 3–7 days interval from onset to diagnosis [21].
To better control the wide spread of COVID-19, the Chinese government launched Public Health Events Level-I Emergency Response on January 23–25, 2020[2]. A series of comprehensive intervention measures including lockdown of Wuhan [21, 22], travel restrictions [21], cases isolation and contact tracing [21, 23], etc. were implements in China. The epidemic being well control at the end of February. A study in Jilin Province used the Susceptible-Exposed-Infectious-Asymptomatic-Recovered/ Removed model to evaluate the effectiveness of local interventions implemented on February 1, 2020 and found the incidence of cases reduce 99.99% [24]. Shengjie Lai et al. [25] built a travel network-based stochastic susceptible-exposed-infectious-removed model to simulate the COVID-19 spread in mainland China. And found that if without non-pharmaceutical interventions, as of February 29, the number of COVID-19 cases would increase 51 folds in Wuhan, 92 folds in other cities of Hubei, and 125 folds in other provinces. Our study showed that without the comprehensive interventions, the number of cases would be 2.84 and 5.81 folds of the actual number, respectively in Wuhan and Hubei (excluded Wuhan), while it was slightly decrease in China (excluded Hubei) from 21 January to 9 February. The different may due to we only focused on the first 14 days after the implementation of control measures. And the number of cases used in the prediction in Wuhan was underestimated due to the serious shortage of medical resources. Before February 18, a lot of cases in Wuhan were be delayed diagnosis [10]. The different between Wuhan, Hubei (excluded Wuhan), and China (excluded Hubei) may due to the lockdown was implemented more than 40 days after the first case onset in Wuhan, the disease already widely spread among the city. But it still timely prevented the large population movement out from Wuhan before Chinese New Year [25]and prevented the further spread to Hubei (excluded Wuhan) and China (excluded Hubei).
The limitations of this study were that we simulated the epidemic curve of COVID-19 before January 20, 2020 by using software to capture the daily onset data from the publications. The exact daily number of cases by onset date used in the analysis may deviate from the real situation. However, it would not change the epidemic trend for visualization in time series analysis. Another limitation is that we only focus on the largest search engine Baidu in China, other search engine such as Weibo, 360, Yahoo and Sogou should also be considered. Lastly, we only considered the Baidu Search Index in the regression model, the population movement and impact of policy should also be considered.