Internet Search Effort on Covid-19 and the Underlying Public Interventions and Epidemiological Status

This study aimed to quantify the relative importance of indices of personal freedom, economy, and epidemiology on the public interest on Covid-19 expressed by internet searches on the topic. The relationship between the effective reproduction rate R t , news media cover, and web search effort was also quantied. Data of online search in Greece on Covid-19 topic for one year were analyzed using indices of social distancing, nancial measures, and epidemiological variables using machine learning. Temporal autocorrelation of web search effort was quantied and control charts of web search, R t , new cases and new deaths were employed. Results indicated that the trained model exhibited a t of R 2 = 91% between the actual and predicted web search effort. The top ve variables for predicting web search effort were new deaths, the opening of international borders to non-Greek nationals, new cases, testing policy, and restrictions in internal movements. Web search had negligible temporal autocorrelation between weeks. Web search peaked during the same weeks that the R t was peaking although new deaths or new cases were not peaking during those dates. The extent to which online searches may reect the actual epidemiological situation is discussed.


Introduction
Disease spread is a complex phenomenon and requires an interdisciplinary approach spanning from medicine to statistics and social sciences (Christakos et al. 2006). Covid-19 exhibited a global spatial spread in a relatively very short time frame resulting in being characterized as a pandemic by the World Health Organisation (WHO 2020). Quantifying human psychology or motivation to be informed about facts is admittedly a complex and potentially polarized issue (Kreps and Kriner 2020). Health-related issues are particularly di cult, as humans in an effort to protect their own or other peoples' lives may need to compromise their daily habits, personal and social freedom or nancial interests (Clinton et al. 2021). Public interest in a disease spread is a dynamical phenomenon, as different factors may dominate depending on the nature of a disease or the economic status, or the health system of the country they reside and these factors exhibit strong interaction effects (Green et al. 2020). In addition, intervention measures employed in order to control the disease are dynamically evaluated by the public (Bento et al. 2020).
Quantifying Covid-19 web search interest into quanti able characteristics in a consistent and reproducible way, could facilitate explaining human preferences as well as divergences between intervention measures (Bento et al. 2020, Zitting et al. 2021). In addition, it can provide a way to understand mass psychology during such events , Zitting et al. 2021. To date there is no known anti-viral treatment, and vaccination against Covid-19 was implemented relatively recently (Molina et al. 2020). Therefore the available options against the virus were the immune system and health status of each individual, social distancing measures, and testing (Mack et al. 2007, Guo et al. 2020, Utsunomiya et al. 2020). Thus, there were relatively few options to be employed in order to diminish the spread of the disease (Haug et al. 2020). Substantial analysis has been conducted regarding the e cacy of different intervention measures in the disease control (Haug et al. 2020). In addition, the impacts of social distancing and movement restrictions have been reported from a social and psychological perspective. However, it is hard to quantify what is the public risk perception and what is triggering the highest public attention around Covid-19 (Cori et al. 2020, Dryhurst et al. 2020. Public attention could be triggered by fear of getting sick or fear of death (Menzies and Menzies 2020, Pradhan et al. 2020). However, public interest could also be driven by fear of poverty, unemployment, or nancial insecurity, or reciprocally nancial aid or dept relief (Giordano 2020, Mamun and Ullah 2020, Esobi et al. 2021). Additional causes of public interest are disruption of individual and social freedom of movement and gathering, outgoing and entertainment, national and international travels, and school closures (Clinton et al. 2021). There clearly is a trade-off between freedom, safety, democracy, and economy (Landman and Splendore 2020, Simon 2020).
Quantifying the public interest associated with the intervention measures comparable with the corresponding epidemiological parameters and ranking the relative contribution of each factor on the web search interest regarding Covid-19 may facilitate understanding elements of human nature.
In this study online search effort was used as a proxy of societal interest in Covid-19 ( search topic includes searches in all major languages (covered by Google translator) conducted through a Greek IP address or internet provider. Data are normalized by Google Trends so as to represent search interest relative to the highest search term in Greece for that date. For example a value of 33% means that the term is one third as popular than the most popular searched topic on that date. The temporal resolution of the data is one week.
Epidemiological data New con rmed Covid-19 cases (New cases) per day in Greece were used a proxy of disease spread as a scale variable. Similarly, new con rmed Covid-19 deaths (New deaths) per day (scale variable) were used as a proxy of mortality. New cases and deaths are the epidemiological variables most commonly reported in the media. The original temporal resolution of the data was one day. As the temporal resolution of the web search effort data is one week, new cases and new deaths were averaged per week and weekly average values were employed throughout the analysis. Data regarding new Covid-19 cases and deaths per day were retrieved from the database 'Our world in data' (Roser et al. 2020) publicly available at: (Our_World_in_Data 2021b). The original data derive from the European Centre for Disease Prevention and Control (ECDC), and from John Hopkins University (Our_World_in_Data 2021b). New cases and new deaths were averaged per week to match the temporal resolution and dates of the web search effort data.
Public intervention data All data were recorded at a daily temporal resolution as time series as ordinal factor variables with different factor levels (unless otherwise explicitly stated) and averaged per week to match the resolution and dates of web search effort weekly data. These are publicly available from the database 'Our world in data' (Roser et al. 2020) at: (Our_World_in_Data 2021a). A detailed description of the variables with different levels of restrictions is provided in the supplementary material.

Containment and closure policies (C)
The policies examined were: school closure, workplace closure, cancelation of public events, gathering restrictions, stay at home requirements, internal movement restrictions, and international travel restrictions.

Economic interventions (E)
The economic interventions examined included: income support, dept relief, and scal support measures (scale variable).

Health system policies (H)
The health system policies investigated included: testing policy, contact tracing, facial coverings, and vaccination policy.

Covid-19 situation
In an effort to compare the actual epidemiological situation in Greece against the web search effort, the daily score of the effective reproduction number R t was used. The R t is the average number of secondary infections produced when one infected individual is introduced into a susceptible host population (Anderson and May 1992). It indicates the average number of people who will contract a contagious disease from one person with that disease (Pandit 2020). The R t , is de ned as: where S is the number of susceptible individuals in a total population of N individuals, while R (t) 0 is the average number of individuals infected by a single infected individual when everyone else is susceptible (Arroyo-Marioli et al. 2021). In practice R t > 1 indicates a growing disease spread while R t < 1 a disease spread that is diminishing, with larger values indicating higher spread. R t was calculated per day as part of the 'Our world in data' (Dong et al. 2020, Roser et al. 2020) epidemiological dataset publicly available at: (Our_World_in_Data 2021b). Daily values of R t were used averaged per week to match the temporal resolution and dates of web search effort data.

News cover data
Greek news media cover in time for the same dates as web search effort was retrieved from the Media Cloud database (mediacloud.org). News media cover is estimated by counting the number of articles mentioning Covid-19 topic nationally or locally in Greece divided by the total of news media articles published. This resulted in the proportion (%) of Covid-19 news media articles per day. This was done in order to examine the correlation between web search effort and news cover on the topic and quantify the extend that web search is related with news (Bento et al. 2020, Lampos et al. 2021).

Analysis
Arti cial Neural Networks (ANN) ANN were used to quantify the complex relationship between web search effort (dependent variable) and the 16 explanatory variables. ANN are machine learning computing algorithms that can solve complex problems imitating animal brain in a simpli ed manner and can handle correlated independent variables (Rojas 1996, Hasson et al. 2020). Perception-type neural networks consist of arti cial neurons or nodes, which is information processing units arranged in layers and interconnected by synaptic weights (connections); (Rojas 1996, Hasson et al. 2020). Neurons can lter and transmit information in a supervised fashion in order to build predictive model that clari es data stored in memory (Rojas 1996, Hasson et al. 2020).
The Multilayer Perceptron (MLP) module was used to build the ANN and test its accuracy (Salgado et al. 2020). The MLP in ANN was trained with a back-propagation learning algorithm which uses the gradient descent to update the weights towards minimizing the error function (Salgado et al. 2020). The data were randomly assigned to 60% training, and 40% testing subsets. The training dataset is used to build the ANN model (Rojas 1996). The testing data is used to nd errors and validate the model. Before training, all covariates were normalized using the formula (x−min)/(max−min), which returns values between 0 and 1.
For the hidden layer the hyperbolic tangent was used as activation function (Zamanlooy and Mirhassani 2013). The activation function Oj for each neuron of the jth output neuron takes real numbers as arguments and returns real values between -1 and 1. For the output layer, the identity function was used as activation function. Gradient descent optimization with the batch algorithm was used. The batch algorithm uses all records in the training dataset to update the synaptic weights between neurons (IBM 2016). The scaled conjugate gradient method was used for the batch training of the ANN (Marwala 2010). Before each iteration, the synaptic weights in the training dataset are updated. The algorithm nds the global error minimum by minimizing the total error made in the previous iteration (Møller 1993, IBM 2016).
Four parameters -initial lambda, initial sigma and interval center and interval offset -determine the way the scaled conjugate gradient algorithm builds the model. Lambda controls if the Hessian matrix is negative de nite (Møller 1993). Sigma controls the size of weight change that affects the estimation of Hessian through the rst order derivatives of error function (Rojas 1996). The parameters interval center ao and a force the simulated annealing algorithm to generates random weights that iteratively minimize the error function (IBM 2016). Initial lambda was set to 0.0000005, initial sigma to 0.00005. Interval center was de ned as 0 and interval offset was set to ±0.5 (IBM 2016).

Temporal autocorrelation
The correlation of web search in time was calculated, as this can provide information whether a high search effort week is likely to be followed by another high search effort one, or a low search effort is followed by a low search effort one, or if the search effort of a week is not indicative of the search effort of the following weeks (Moustakas and Evans 2017). To do so the temporal autocorrelation function for web search effort per week as a time unit lag was calculated (Reynolds and Madden 1988).

Comparing web search and the Covid-19 situation
Inquiry-Charts, (I-charts) also termed as Shewart charts; (Montgomery 2020), were computed. An I-chart is a type of control chart used to monitor the process mean when measuring individuals at regular intervals from a process (Montgomery 2020). Each point on the chart represents the value of an individual observation (Montgomery 2020). The center line is the process mean (the mean of the individual observations). The control limits (upper and lower con dence intervals) are a multiple (k) of three sigma (k=3 σ) above and below the center line. The process sigma is the standard deviation of the individual observations. I-charts display individual data points and monitor mean and shifts in the process when the data points collected at regular intervals of time (Montgomery 2020). I-Charts may facilitate identifying the common and assignable causes in the process, if any (Rigdon et al. 1994). The green line on each chart represents the mean, while the red lines show the upper and lower control limits. An in-control process shows only random variation within the control limits (Montgomery 2020). An outof-control process has unusual variation, which may be due to the presence of special causes (Montgomery 2020). I-charts of web search effort, R t , new cases, and new deaths were calculated.
Linear mixed effects models (LME); (Pinheiro and Bates 2000) were used to analyze the relationship between web search (dependent variable) and news media cover as well as R t as xed effects (independent variables). The model structure included also the random effects of time in terms of week number, accounting for the temporal autocorrelation in the data. This analysis was conducted in order to quantify and compare the effect of news on web searches in conjunction with the Covid-19 actual situation as expressed by the R t .

Results
The trained ANN model with the 16 epidemiological and intervention variables exhibited a relative error of 0.058 (Sum of Squares error = 0.978) in the training dataset and a relative error of 0.237 in the test data (Sum of Squares error = 1.130). All error computations are based on the testing sample. The predictive accuracy of the trained ANN was R 2 = 91%, p < < 0.001 between actual weekly values of web search effort of Covid-19 in Greece and the ANN model outputs (Fig. 1a). Model residuals exhibited a minor negative trend against the predicted value ( Fig. 1b) but this deviance could not be differentiated from zero (con dence intervals where always crossing the zero line, R 2 = 5%, p < 0.01).
Results regarding variable importance indicate that web search effort on Covid-19 in Greece was primarily featured by new deaths per week (relative importance of 100%), followed by international travelling restriction measures applicable to non-Greeks with relative importance of 95%, while new infections were ranked third with feature importance relative to other variables of 82% (Fig. 2). Testing policy was ranked fourth while restrictions on internal movement fth, both with relative importance of 77 and 76% respectively (Fig. 2). School closing exhibited a relative importance of 71%, facial covering requirements 67%, staying at home requirements 66%, while restrictions on social gatherings 59% (Fig. 2). Work place closures featured a relative importance of 56%, cancelling public events 56%, scal measures 55%, while dept/contract relief 54% (Fig. 2). Ultimately, vaccination policy had a relative importance of 34%, income support 30%, and contact tracing 12% (Fig. 2).
The temporal autocorrelation of web search effort indicated no signi cant time lags between weeks indicating that the search effort of one week is not exhibiting signi cant correlation with the search effort of the next or previous weeks (Fig. 3).
The I-chart of web search effort indicated a negative deviation from con dence intervals during the rst week in the analysis (week 1, week ending in 1 March 2020) as well as during week 14 (31 May 2020), while exceeded the upper con dence interval for two weeks in a row during weeks 24 & 25 (2 August 2020 to including the week starting at 16 August 2020), as well as also exceeding the upper con dence intervals during weeks 34 to 36 (18 October 2020 to 1 November 2020; Fig. 4a).
The I-chart of R t indicated that values were above the upper con dence interval during the rst two weeks of available data ( Fig. 4b; weeks 3 & 4; the R t needs some prior data before it can be calculated and thus may not be calculated during the rst week that a case was found). Values of R t were below the lower con dence interval between weeks 8 to 13 (19 April 2020 to including the week starting at 24 May 2020), above the con dence interval the weeks 23 to 25 (2 August 2020 to including the week starting at 16 August 2020), and above the upper con dence interval the weeks 35 to 37 (25 October 2020 to including 8 November 2020; Fig. 4b). Values of R t were below the lower con dence interval the during the weeks 41-45 (6 December 2020 to 3 January 2021; Fig. 4b).
The I-chart values of new cases were always below the lower con dence interval till week 24 (9 August 2020), within the con dence interval till week 35 (25 October 2020), and above the higher con dence interval until week 43 (20 December 2020), within the con dence interval between weeks 44-49 (27 December 2020-31 January 2021), and above the con dence interval for the remaining two weeks of data (Fig. 4c).
The I-chart values of new deaths were below the lower con dence interval till week 33 (11 October 2020, week 31 was marginally within the con dence interval; Fig. 4d), within the con dence interval till week 37 (8 November 2020), and above the higher con dence interval until week 47 (17 January 2021), and within the con dence interval for the remaining four weeks of data (Fig. 4d).
Results from LME between web search as dependent variable and news media cover and Rt as xed effects and time as a random effect had a model t of AIC = 389.01, BIC = 398.57, logLik = -189. 50. The xed effect model structure had an intercept of -61.4 (standard error = 18.48), an R t coe cient of 56.23 (standard error = 8.62) and a news media coe cient of 3.22 (standard error = 0.72); (Fig. 5a) with all xed effects having a p-value < < 0.001. Web search consistently increased with increases of R t , across news media cover values (Fig. 5b).

Discussion
Results derived here studying indicated that fear of, or interest in death was the variable with the highest explanatory power in predicting internet search effort, implying that the rst concern is either to stay alive, the interest in the topic increased. It is also interesting to note that web search effort was not temporally correlated between weeks and thus there is no evidence that individuals are searching on the topic by habit or fear of the situation during the past few weeks, but rather dynamically tuning with the current weekly situation.
During the rst half of August 2020, both R t and web search effort were peaking despite the fact that new deaths or new cases were not peaking soon after. Greece was topping the list of tourist-imported Covid-19 cases in the UK followed up the 2020 summer season among all other examined countries (Aggarwal et al. 2021). It therefore seems plausible that infected visiting individuals were never recorded and returned back into their home countries. It was however widely discussed whether the opening of international borders to tourism with low testing frequencies or without a negative PCR Covid-19 test of incoming individuals was a safe practice (Pavli et al. 2020, Rocklöv et al. 2020, Sharun et al. 2020). Thus, the interest in the opening of international borders may have an interaction effect with the interest in testing policy in the total web search effort. The temporal coincidence of web search effort peak with the peak of R t is interesting given that the R t is not something regularly reported in the news, media, or non-scienti c websites as more often than not new cases, or deaths, or intervention measures are reported (Liu et al. 2020). Web search effort peak was also coinciding with R t peaks in October, from 27 October 2020, the date when new cases exceeded the threshold of 1,000 per day. Thus ndings reported here con rm both that web search effort can be used to predict peaks in COVID-19, as well as that peaks of online searches precede the reported con rmed cases and deaths (Lampos et al. 2021 The idea that crowd wisdom might re ect the reality better than expert opinion or any single individual, has been considered as provocative in the past (Galton 1907, Prelec et al. 2017. Web search has the cultural, geographical, social, temporal diversity and a sample size that is hard to be ignored (Surowiecki 2005, Sunstein 2006). In Greece web search effort peaked during the same time periods with R t indicating that web search may re ect the reality better than the recorded situation at least in the scarcity of testing data. Web search was increasing considerably more with higher values of R t rather than when news media cover was high.   Temporal autocorrelation function of web search effort across weekly time lags. Values close to 1 or -1 indicate strong (positive or negative) correlation while values close to zero no correlation. The horizontal red lines indicate a 95% signi cance con dence intervals, while the vertical blue lines the actual correlation value for that time lag. Correlation values within the 95% con dence interval are not signi cant and they may not be differentiated from random correlations.  Results from a linear mixed effects model (LME) between web search (dependent variable) and the independent xed effects of news media cover and Rt. Time (week) was also included as a random effect. a. LME estimates of news cover and Rt on web search effort. Red circles indicate mean estimated effect while whiskers indicate a 95% con dence interval. b. LME estimates of the effects of Rt across news media cover values. Higher news media cover values result in higher web search but web search is consistently increasing across Rt values.