People with the travel and exposure history of high-risk areas with COVID-19 patients will be required quarantined to control the spread of the pandemic. Since the understanding of the new coronavirus's characteristics and the effective treatments remains uncertain, people usually compared COVID-19 with the SARS, which outbroke in 2003 in China with a mortality rate of 11 % [19, 22]. Due to the separate isolation precautions policy and the fear of an unknown virus, people with exposure history are likely to conceal their own and their family's high-risk behaviors, which undermines the government's early attempts to control the suspected cases of COVID-19 [23]. Using Internet search engines, we could predict the potential quantity of affected persons; and the real-time data of the Baidu Index helps monitor the epidemic development and formulates the corresponding government policies.
China had achieved preliminary success in controlling the COVID-19 pandemic by April 22, 2020. The correlation analysis between Chinese public searches of COVID-19-related symptoms and the actual number of confirmed cases will be helpful for exploring the relationships between Internet search values and COVID-19 pandemic and provide novel insights for controlling the epidemic of COVID-19.
The current research shows that the related DBIV reached a peak earlier than the DGCC, and the dynamic changes of DBIV were also earlier than DGCC. We noticed that the higher the search values, the higher the cumulative confirmed cases will be during the growth period, which indicated that the searchers could be the potential infectors of the virus. Besides, DGCC and DBIV presented with a positive correlation during the whole observation period (even in the decline period), which implied the DBIV declined with a decreased number of DCGG. However, when DGCC was declining, the number of cumulative cases continued to increase instead, which could be an explanation for the negative correlation between cumulative cases and DBIV during the decline period. The public's search behaviors for health-related information can reflect their potential physical and psychological problem [7, 24]. The declined searches of COVID-19-related symptoms indicated that the public's mentality might be more relaxing in the decline period compared with the growth period.
We can tell from Baidu's time plots for COVID-19-related symptoms and the number of confirmed cases that the former dynamic changes appeared earlier than the latter. Among 34 provinces/regions in China, although most areas in this research showed statistical correlations among the DBIV and DGCC (except sputum production), Hong Kong, Macao, Taiwan, and Tibet did not present with such correlations. One possible reason could be that the Baidu search engine is not the primary search tool in these places [4]. Additionally, there was only one confirmed case in Tibet, which was insufficient to conduct the statistical analysis. Besides, there was no correlation between DGCC and DBIV of cough in Shanghai, which might owe to the incompleteness of search keywords related to cough. Of interest, no correlation between DBIV of sputum production and DGCC was observed. A reasonable explanation could be that sputum production is more common in the elderly with chronic respiratory diseases, and such searches might be correlated to seasonal influenza every year in the late autumn to early spring [25]. Based on our research, the increase in the DBIV of COVID-19-related symptoms could be treated as an abnormal signal worthy of government departments' corresponding action in advance.
The increased number of relevant searches indicates there are more potentially infected candidates. Around 97.5% of people with identifiable exposure history would develop symptoms within 11.5 days, and 1 % of them had a more extended incubation period of more than 14 days [26]. We found that the average maximum of DBIV's growth rate was 20 days earlier than DGCC in most areas except Heilongjiang. On May 10, 2020, the Heilongjiang government reported that the pandemic had relapsed; thus, the apex of DBIV appeared later compared with other provinces [27]. Compared with the traditional diagnosis and treatment process, most potential patients are inclined to search the Internet for help, indicating the difference to publicly reported overrepresent severe cases of COVID-19 [7, 28, 29]. Those potential infectors were likely to use search engines (usually Baidu in China) to search for the related information, so the Baidu index could reflect the approximate number of these potential infectors. The mild potential infectors may possess a more extended incubation period theoretically on account of several days lags before being confirmed [30]. The soaring DBIV of COVID-19-related symptoms in a certain area might be a precursor for the future outbreak of the epidemic. The STCI analysis shows that the peak DBIV of COVID-19-related symptoms appeared 19-22 days earlier than the peak DGCC. However, the results of the time-lag correlation analysis delivered a shorter lag than STCI. Since the STCI study only compared the interval between the peak DBIV of COVID-19-related symptoms and DGCC, it did not take other data into account. Therefore, time-lag correlation analysis could be better to explore the lag patterns of DBIV and DGCC. We found that the optimal time lag of DBIV of fever, cough, fatigue, sputum production, and shortness of breath was 0, 4, 2, 3, 1 day/days, respectively. According to Cuilian et al, the peak of Internet searches about COVID-19 appeared 10-14 days earlier than the peak of reported daily growth cases in China [31], and 10 days earlier in America [32]. People who searched the terms of "新冠" or "冠状病毒" (keywords in Cuilian's study) were more likely to experience the incubation period, while the searchers querying the COVID-19-related symptoms were likely those who were infected and had already experienced the incubation period. Moreover, there is no time-lag for "fever"; this may attribute to the body temperature reporting mechanism adopted by both the Chinese government and local institutions. This reporting system required that people with fever be actively isolated and quarantined immediately to prevent the potential further spread of COVID-19 [33, 34]. Therefore, people with fever would be isolated and confirmed subsequently. As a result, no time lag was observed.
Limitations
There are some limitations needed to be recognized. Firstly, we only utilized Baidu's data to perform our research; other search engines, such as Weibo and Twitter, were not included. Secondly, some keywords related to the symptoms of COVID-19 were not included in the current study, and the keywords utilized in the current work could not guanine the consistency and efficiency of the long-term prediction in the future. Therefore, future studies are suggested to add or delete the corresponding keywords of COVID-19-related symptoms to confirm that the time lag patterns exist between DBIV and DGCC. Thirdly, the detailed information about the individual searchers remains unclear, so it is impossible to identify the specific potential infectors. Besides, there were several documented issues with the predictability of disease incidence trends using search engines. To avoid the failure of predicting an epidemic with the utilization of the Internet search engine, a random forest regression model is suggested in the future study to facilitate our observing results [35].