A systematic review of the correlation between web-based query and outbreak of emerging infectious diseases and meta-analysis of influenza-like illnesses

Background: Emerging infectious diseases (EIDs) are among the widespread ever-changing threats to public health. Web-based queries using information gathered from social media can enhance global syndromic surveillance to trace EIDs activity. This systematic review aimed to investigate the correlation of web-based queries to outbreak of EIDs. Methods: Nine electronic databases were systematically searched and updated in August 2018 including; PubMed, Virtual Health Library (VHL), WHO Global Health Library (GHL), Scopus, ISI, Google Scholar, POPLINE, and Systems for Information of Grey Literature in Europe (SIGLE), New York Academy of Medicine (NYAM Grey Literature Report). A prior protocol was registered at Prospero (CRD42016038104). In a total five included articles, 47 datasets were included for reviewing. The correlation was assessed through Spearman and Pearson tests using either google trends or number of tweets. Results: Meta-analysis of influenza-like illness data revealed that correlation was significant (0.784 (0.743-0.820, 0.964 (0.918-0.985) for both Spearman and Pearson tests respectively . Conclusions: Web-based surveillance systems could serve as a good method in predicting events of EIDs.


Background:
Emerging infectious diseases (EIDs) have been escalating in the past 20 years and threatening to over 17 million deaths worldwide in public health every year 1 . During ongoing emerging infectious diseases, prediction using search query data provides an optimal robust and sensitive solution for rapidly detecting the distribution of diseases and other health conditions over time, forecasting disease outbreaks in different geographical areas and controlling an outbreak 2 . This query system proved its power in most recent epidemics, such as influenza epidemics 3 . Traditional surveillance systems reply on both virological and clinical data, then national and regional data is published on a weekly basis, frequently with a 1-2 week reporting lag 3 . In developing countries, surveillance for such detection is costly, and lack the public health framework to determine outbreaks at their earliest stages. Furthermore, the internet has freely available web-based sources of information and subsequently faster detection at low cost. Eighty percent of American internet users, or about 113 million adults, are believed to search online for health information about specific diseases or medical conditions 4 , millions of people worldwide use online to search for health-related information each day, making web-based queries a valuable source of information on recent health trends 5 . This calls into question about the precision of these queries on the detection and estimation of the global EIDs burden.
Therefore, this study aims to investigate the correlation of web-based queries to the outbreak of EIDs. Otherwise, the random-effects model was used 6 . All P values were two-sided and were considered statistically significantly less than 0.05.

Results
Among the 5674 records were found, distributed among the nine databases and after removing the duplicated articles, 4478 remained for analysis of the titles and abstracts.
During the phase of screening the titles and abstract, four thousand and five studies were excluded, and thus 273 studies were eligible for analysis of the complete text. After excluding 268 articles, five articles were eligible for review ( Figure 1 and Table 1). The

Discussion
We investigated how well web queries submitted to the social media mimic the results from other systems for emerging infectious diseases (EIDs) surveillance. For the most common advantages, the web-based query could help track and predict ongoing pandemics for the most popular infectious diseases worldwide, and therefore planning for better prophylaxis and prevention 11 . As well, when these data were combined with other applications such as air traffic data, the query could enhance tremendously to the prediction of the spread of certain infectious diseases 12 .
Through a meta-analysis of influenza/influenza-like illness data, we found a significant This can be explained by the temporal resolution of data if it was used on a monthly basis. 6 From a different perspective, other internet-based surveillance, as in the study of Milinovich et al., was not only used in tracking and predicting influenza prevalence but also in the management of other infectious diseases such as dengue fever. Through using a wide range of specific search terms, 17 infectious diseases (26.6%) were found to be significantly correlated 6 . They also recommended that search terms that present highly significant correlation should be kept for re-using as they can help in providing a quicker response on future emerging disease management However, social media such as Twitter or Google can have a few limitations. A significant one is that we could not collect demographic data like age, sex and racial characteristics of patients via tweets, which could cause difficulty for the public health sector to make a response 9 . Another concern is that Twitter is used mainly in one group of the population, for example, people living in metropolitan areas, which may cause unavoidable bias in data retrieving as the data cannot represent the characteristic of the whole population 7 . Twitter also required a large human resource to classify the tweets, and therefore implied the potential of human error 10  Hence, we are looking forward to more research on how certain factors could alter predictive results and in this way, developing tools to filter those factors in the attempt to complete the capacity of prediction thanks to web-based queries.

Conclusions
In conclusion, web-based surveillance systems could serve as a good method in predicting events of emerging infectious diseases.

List of abbreviations
EIDs: Emerging infectious diseases ILI: Influenza-like illnesses

Declarations
Ethics approval and consent to participate:

Consent for publication:
Not applicable

Availability of data and material:
All data generated or analysed during this study are included in this published article.

Competing interests:
The authors declare that they have no competing interests.

Acknowledgements:
Not applicable.

Supplementary materials
Supplementary  Page 17 Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).

Page 12
Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias).

Page 12
Conclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future research.

Page 12
Funding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review.
Page 13