Web search data
Data mining (Moustakas and Katsanevakis 2018) from Google Trends (https://trends.google.com/trends/) was performed in order to connect these data with the current status at the time of search - see e.g. (Effenberger et al. 2020, Szmuda et al. 2020). Google Trends is a public repository of information on online search patterns of individuals that use Google as their search engine. Data regarding search effort on the topic 'Coronavirus disease 2019' (thereby 'Covid-19') was retrieved from 26 February 2020 to 14 February 2021. Throughout the analysis week 1 (first week) is the week 24 Feb 2020 - 1 March 2020, while week 52 (last week) is the week 8 Feb 2021 - 14 Feb 2021. The Covid-19 search topic includes searches in all major languages (covered by Google translator) conducted through a Greek IP address or internet provider. Data are normalized by Google Trends so as to represent search interest relative to the highest search term in Greece for that date. For example a value of 33% means that the term is one third as popular than the most popular searched topic on that date. The temporal resolution of the data is one week.
Epidemiological data
New confirmed Covid-19 cases (New cases) per day in Greece were used a proxy of disease spread as a scale variable. Similarly, new confirmed Covid-19 deaths (New deaths) per day (scale variable) were used as a proxy of mortality. New cases and deaths are the epidemiological variables most commonly reported in the media. The original temporal resolution of the data was one day. As the temporal resolution of the web search effort data is one week, new cases and new deaths were averaged per week and weekly average values were employed throughout the analysis. Data regarding new Covid-19 cases and deaths per day were retrieved from the database 'Our world in data' (Roser et al. 2020) publicly available at: (Our_World_in_Data 2021b). The original data derive from the European Centre for Disease Prevention and Control (ECDC), and from John Hopkins University (Our_World_in_Data 2021b). New cases and new deaths were averaged per week to match the temporal resolution and dates of the web search effort data.
Public intervention data
All data were recorded at a daily temporal resolution as time series as ordinal factor variables with different factor levels (unless otherwise explicitly stated) and averaged per week to match the resolution and dates of web search effort weekly data. These are publicly available from the database 'Our world in data' (Roser et al. 2020) at: (Our_World_in_Data 2021a). A detailed description of the variables with different levels of restrictions is provided in the supplementary material.
Containment and closure policies (C)
The policies examined were: school closure, workplace closure, cancelation of public events, gathering restrictions, stay at home requirements, internal movement restrictions, and international travel restrictions.
Economic interventions (E)
The economic interventions examined included: income support, dept relief, and fiscal support measures (scale variable).
Health system policies (H)
The health system policies investigated included: testing policy, contact tracing, facial coverings, and vaccination policy.
Covid-19 situation
In an effort to compare the actual epidemiological situation in Greece against the web search effort, the daily score of the effective reproduction number Rt was used. The Rt is the average number of secondary infections produced when one infected individual is introduced into a susceptible host population (Anderson and May 1992). It indicates the average number of people who will contract a contagious disease from one person with that disease (Pandit 2020). The Rt, is defined as:
where S is the number of susceptible individuals in a total population of N individuals, while R(t)0 is the average number of individuals infected by a single infected individual when everyone else is susceptible (Arroyo-Marioli et al. 2021). In practice Rt > 1 indicates a growing disease spread while Rt < 1 a disease spread that is diminishing, with larger values indicating higher spread. Rt was calculated per day as part of the 'Our world in data' (Dong et al. 2020, Roser et al. 2020) epidemiological dataset publicly available at: (Our_World_in_Data 2021b). Daily values of Rt were used averaged per week to match the temporal resolution and dates of web search effort data.
News cover data
Greek news media cover in time for the same dates as web search effort was retrieved from the Media Cloud database (mediacloud.org). News media cover is estimated by counting the number of articles mentioning Covid-19 topic nationally or locally in Greece divided by the total of news media articles published. This resulted in the proportion (%) of Covid-19 news media articles per day. This was done in order to examine the correlation between web search effort and news cover on the topic and quantify the extend that web search is related with news (Bento et al. 2020, Lampos et al. 2021).
Analysis
Artificial Neural Networks (ANN)
ANN were used to quantify the complex relationship between web search effort (dependent variable) and the 16 explanatory variables. ANN are machine learning computing algorithms that can solve complex problems imitating animal brain in a simplified manner and can handle correlated independent variables (Rojas 1996, Hasson et al. 2020). Perception-type neural networks consist of artificial neurons or nodes, which is information processing units arranged in layers and interconnected by synaptic weights (connections); (Rojas 1996, Hasson et al. 2020). Neurons can filter and transmit information in a supervised fashion in order to build predictive model that clarifies data stored in memory (Rojas 1996, Hasson et al. 2020).
The Multilayer Perceptron (MLP) module was used to build the ANN and test its accuracy (Salgado et al. 2020). The MLP in ANN was trained with a back-propagation learning algorithm which uses the gradient descent to update the weights towards minimizing the error function (Salgado et al. 2020). The data were randomly assigned to 60% training, and 40% testing subsets. The training dataset is used to build the ANN model (Rojas 1996). The testing data is used to find errors and validate the model. Before training, all covariates were normalized using the formula (x−min)/(max−min), which returns values between 0 and 1.
For the hidden layer the hyperbolic tangent was used as activation function (Zamanlooy and Mirhassani 2013). The activation function Oj for each neuron of the jth output neuron takes real numbers as arguments and returns real values between -1 and 1. For the output layer, the identity function was used as activation function. Gradient descent optimization with the batch algorithm was used. The batch algorithm uses all records in the training dataset to update the synaptic weights between neurons (IBM 2016). The scaled conjugate gradient method was used for the batch training of the ANN (Marwala 2010). Before each iteration, the synaptic weights in the training dataset are updated. The algorithm finds the global error minimum by minimizing the total error made in the previous iteration (Møller 1993, IBM 2016).
Four parameters - initial lambda, initial sigma and interval center and interval offset - determine the way the scaled conjugate gradient algorithm builds the model. Lambda controls if the Hessian matrix is negative definite (Møller 1993). Sigma controls the size of weight change that affects the estimation of Hessian through the first order derivatives of error function (Rojas 1996). The parameters interval center ao and a force the simulated annealing algorithm to generates random weights that iteratively minimize the error function (IBM 2016). Initial lambda was set to 0.0000005, initial sigma to 0.00005. Interval center was defined as 0 and interval offset was set to ±0.5 (IBM 2016).
Variable importance in ANN
Variable importance was quantified using the outputs of the trained ANN model, in order to evaluate the effect of each input variable on the web search effort by using the variance based method (de Sá 2019, Ju et al. 2019). The input variables are ranked according to the sensitivity formula defined as:
where V(Y ) is the unconditional output variance, E is the integral over Y|Xi, while the variance operator V implies a further integral over Xi. Variable importance is then computed as the normalized sensitivity. Si is the appropriate measure of sensitivity to rank the variables in order of importance for any combination of interactions and non-orthogonality among variables (Ju et al. 2019). The total sum of the overall V(I) of the ANNs is 1 (Ju et al. 2019).
Temporal autocorrelation
The correlation of web search in time was calculated, as this can provide information whether a high search effort week is likely to be followed by another high search effort one, or a low search effort is followed by a low search effort one, or if the search effort of a week is not indicative of the search effort of the following weeks (Moustakas and Evans 2017). To do so the temporal autocorrelation function for web search effort per week as a time unit lag was calculated (Reynolds and Madden 1988).
Comparing web search and the Covid-19 situation
Inquiry-Charts, (I-charts) also termed as Shewart charts; (Montgomery 2020), were computed. An I-chart is a type of control chart used to monitor the process mean when measuring individuals at regular intervals from a process (Montgomery 2020). Each point on the chart represents the value of an individual observation (Montgomery 2020). The center line is the process mean (the mean of the individual observations). The control limits (upper and lower confidence intervals) are a multiple (k) of three sigma (k=3 σ) above and below the center line. The process sigma is the standard deviation of the individual observations. I-charts display individual data points and monitor mean and shifts in the process when the data points collected at regular intervals of time (Montgomery 2020). I-Charts may facilitate identifying the common and assignable causes in the process, if any (Rigdon et al. 1994). The green line on each chart represents the mean, while the red lines show the upper and lower control limits. An in-control process shows only random variation within the control limits (Montgomery 2020). An out-of-control process has unusual variation, which may be due to the presence of special causes (Montgomery 2020). I-charts of web search effort, Rt, new cases, and new deaths were calculated.
Linear mixed effects models (LME); (Pinheiro and Bates 2000) were used to analyze the relationship between web search (dependent variable) and news media cover as well as Rt as fixed effects (independent variables). The model structure included also the random effects of time in terms of week number, accounting for the temporal autocorrelation in the data. This analysis was conducted in order to quantify and compare the effect of news on web searches in conjunction with the Covid-19 actual situation as expressed by the Rt.