The current study observed that Big data of the Internet could be used to warn the outbreak of epidemic diseases. In this research, analytical research into the correlation between search behavior of COVID-19-related keywords and the number of confirmed cases is conducted according to the Internet's big data. We discovered that the search volume of several COVID-19-related keywords has a strong correlation to the number of confirmed cases. And STCI research predicts the onset of epidemic peaks earlier than previous big data monitoring (usually a week in advance), longer than the incubation period for epidemic diseases.
China, which had reached prideful successes for the control of the COVID-19 pandemic, as one of the few countries, had resumed production in the whole society. The search behavior of Chinese citizens during the epidemic help analyze the correlation between clinical symptoms of affected people and retrieved values. People with the travel history of highly regulated areas and exposure history with the confirmed patients will be required quarantined. Without a clear understanding of the characteristics and effective treatments of the new coronavirus, people usually compare COVID-19 with the SARS, which outbroke in 2003 in China with a mortality rate of 11% [19, 22]. Due to the separate isolation precautions policy, people tend to conceal their own and their family's high-risk behaviors, such as a case of Zhengzhou poison King [23], who intentionally conceal his exposure history, resulted in hundreds of people's isolation. A deep fear of an unknown virus undermines the government's early attempts to control COVID-19. Using Baidu Index, we can figure out the potential quantity of affected people. Moreover, real-time data of the Baidu Index is particularly crucial for the monitor of epidemic development and the formulation of corresponding government policies.
Our research suggests that DGCC dynamic change lags behind the public Baidu index values of COVID-19-related symptoms. When the search volumes increased, the cumulative confirmed cases increased as well, which indicates that the searchers could be the potential infector of the virus. Although the number of confirmed cases is increasing, the public's attention has dropped significantly manifested by the declined Baidu Index value during DP. Those presents the related daily Baidu index values reached a peak earlier than the DGCC, and has a priority to decline in the later period. Based on this result, we can hypothesize related DBIV can be used as an indicator of epidemic development. Public search behavior can reflect potential physical and psychological problems [6, 24]. The decline of search values also indicates that the public's attention to COVID-19 is lighter in the later stage of the pandemic compared with the former stage. We can use the Baidu index to supervise the epidemic situation as well as the public attention to COVID-19.
Overall, five keywords of DBIV were positively correlated with DGCC during the outbreak. From dynamic fluctuations, we can identify the coordinated changes of DBIV and DGCC, with the former keep ahead of the later. For 34 provinces/regions, although most areas in this research showed statistically essential correlations of the DBIV with DGCC (except sputum production), Hong Kong, Macao, Taiwan, and Tibet did not show that correlation. This is probably owing to the Baidu search engine is not the primary search tools in non-mainland areas, such as Hong Kong, Macao, Taiwan [4]. There are few cumulative confirmed cases in Tibet (only one cumulative case), which leads to insufficient cases to calculate the correlation using SPSS 23.0. However, there is no correlation between DGCC and DBIV for cough in Shanghai. This is probably owing to the incompleteness of search words related to keywords. Based on our research, the increase in the related DBIV value can be treated as an abnormal signal, compared with a period of past time, which is worthy of the corresponding action by government departments in advance. Sputum production is more common in the elderly with chronic respiratory diseases and tends to possess a strong connection with seasonal influenza that occurs every year in the late autumn to early spring [25].
The growth rate of the Baidu Index represents the newly increased searchers compared to the previous day. The increased number of relevant searchers indicates more potentially exposed persons. Around 97.5% of people with identifiable exposure history will develop symptoms within 11.5 days; more than 14 days occupy 2% (99th percentile) [26]. We found that the maximum of DBIV's growth rate was 20 days earlier than DGCC on average in most areas except Heilongjiang. The abnormality in Heilongjiang may suggest the possibility of insufficient preparation for the pandemic. People used the Internet to search for symptoms rather than going to the hospital, indicating the difference to publicly reported overrepresent severe cases [6, 27, 28]. Since the government implemented the isolation measures during the epidemic, the standard medical treatment process is slower and more complicated [29]. Moreover, many community hospitals cannot prescribe medicine for fever patients result in omission to potential patients with minor symptoms. These people are likely to use search engines (usually is Baidu) for related information, so the Baidu index provided an original way to reflect the number of these potential infecters. Those mild potential infectors may possess a more extended incubation period theoretically on account of a lag of several days in confirmed cases [30]. The longer search-to-confirmed interval, the more time for relevant departments to make adequately prepare. The results mean that the big data of public search behavior can detect the COVID-19 pandemic situation in advance, to some extent, highlighting the importance of including search engine data for follow-up prevention and control. We can derive a vital message that the network search value about Clinical Characteristics of COVID-19 using the Baidu Index can monitor the development of the epidemic. The results will be more convincing if all mainstream search engine data is included