The Rise of West Nile Virus in Europe: investigating the combined effects of climate, land use and economic changes

West Nile Virus (WNV) has recently emerged as a major public health concern in Europe, its recent expansion also coincided with some remarkable socio-economic and environmental changes, including an economic crisis and some of the warmest temperatures on record. Here we empirically investigate the drivers for this phenomenon at a European wide scale by constructing and analysing a unique spatial-temporal data-set, that includes data on climate, land-use, the economy, and government spending in key sectors such as health and the environment. Drivers and risk factors of WNV were identiﬁed by building a conceptual framework, and relationships were tested using a Generalized Additive Model (GAM), which could capture complex non-linear relationships and also account for the spatial and temporal auto-correlation. As expected, land use factors (higher percentage of wetlands and agricultural land) and climate factors (higher summer rainfall and higher summer temperatures) were positive predictors of WNV infections. However, we found that a mean winter temperature of between 2 ° C and 6 ° C was the best predictor of annual WNV infections, suggesting that successful overwintering of infected adult mosquitoes (likely Culex pipiens ) was key to the intensity of outbreaks for a given year. This result, taken in the context of recent winter warming due to climate change, demonstrates how the virus has become so prevalent in Europe. Furthermore, lower GDP growth, lower spending growth on environmental factors such as agriculture, forest, ﬁsheries, and waste water management were also strong predictors of WNV infections, suggesting that austerity and cuts to key sectors beneﬁted vector species and the virus during this crucial period. Waste water management spending may be a reliable causative factor, due to, for instance, the neglect of hazard prevention eﬀorts, such as spending on ﬂood defences, essential works like sanitation and upkeep and of infrastructure, that can lead to the creation of mosquito breeding habitats, e.g. potholes, blocked drain. This study should act as a cautionary tale; central governments looking to embark on austerity measures should exercise extreme caution given the long term trade-oﬀs and unintended consequences they can have on the environment and human health, especially when considering we are facing numerous threats brought about by global warming and other anthropogenic induced changes, that can beneﬁt emerging diseases i.e. global trade in wild animals, intensive agriculture animal rearing and land use conversion.

Furthermore, at the time of writing, government debt is soaring in Europe because of the Covid-19 crisis [17], and there is a danger that European countries will revert back to austerity measures such as those imposed during the previous economic crisis, which will weaken public health systems' ability to combat diseases. [14,18,19,20,21,22].
In this study, we focus on WNV, a single-stranded RNA Flavivirus closely related to other Flaviviridae pathogens such as dengue, Japanese encephalitis and yellow fever viruses [23]. While the largest proportion of people infected with WNV will not develop any symptoms, around 20% of them will get West Nile fever, a flu like illness, or severe West Nile disease. The most severe manifestations of the disease include acute aseptic meningitis or encephalitis, anterior myelitis, hepatosplenomegaly, hepatitis, pancreatitis, and myocarditis. On rare occasions, infection can lead to Guillain-Barré syndrome and other demyelinating neuropathies [23,24]. Although WNV is a zoonotic pathogen, infecting mammals, particularly humans and horses, the transmission cycle only involves mosquitoes and birds, as mammals are considered dead end hosts, i.e. they cannot transmit the disease back to mosquitoes [24]. In Europe, the main vectors are Culex pipiens, Culex modestus, and Coquillettidia richiardii, although Aedes species can also transmit the disease [25] (see figures S16-S18 in additional information for vector distribution maps). Vector abundance and diversity is modulated by physical environmental factors, such as temperature, rainfall, water resources [23,25,26]. As well as vectors, crucial for transmission of WNV is the presence of suitable hosts, such as terrestrial and wetland birds. These hosts can occupy a diverse range of areas; indeed, spring migration of birds from infected regions of Sub-Saharan Africa to temperate regions of Europe is considered to be one of the main drivers of the disease in Europe [25,27]. Although WNV infections are more common of late, sporadic West Nile virus outbreaks have occurred in humans and equines in southern and eastern European countries over the last century; the majority of which have occurred in wetland areas and densely inhabited urban areas [25]. In 2010, a major outbreak of West Nile WNV hit a number of countries in the European region: Greece, Hungary, Romania, and Turkey. Since then, outbreaks have occurred annually in multiple regions, including more northerly regions that had not previously reported cases (in Germany, Slovakia, Slovenia, and the Czech Republic) (see fig 1). This culminated in another major outbreak in 2018 which hit almost all susceptible regions, that is, areas in which local transmission occurred at least once over the study period. We should also bear in mind that the actual number of cases in Europe is likely to be much higher than reported, since most WNV infections in humans are asymptomatic, and we can therefore assume that only serious cases are reported. To explain the recent rise of WNV infections in Europe in more depth, we empirically investigate the combined impact of three sets of factors: (1) Climate-related factors, including temperature, rainfall and surface water; (2) Geographic factors which include continuous/discontinuous urban fabric, and regional coverage of wetlands and arable land; (3) Economic and socio-economic factors which capture the impact of the economic crisis and are proxied by GDP growth, spending on agriculture, forestry and fisheries, waste water management, and health (for a more detailed explanation of how all factors interact with WNV, see the conceptual framework).
Our analysis focuses on regions in the seven European countries hardest hit by WNV outbreaks -Austria, Bulgaria, Croatia, Greece, Hungary Italy, and Romania, our time series data set captures the time period before and after the economic crisis.

Results
To carry out our empirical analysis, we compiled a spatial temporal data-set that captures the main drivers and risk factors of WNV infections in Europe (see Table 1 and supplementary information -conceptual framework). Figure 1 characterizes the climate in the study regions. As we can observe, WNV infections occurred in regions with climates that can be described as 'Hot-summer Mediterranean,' 'Humid subtropical,' Temperature oceanic' and ' Warm-summer humid continental' or 'Temperate oceanic.' Figure 2 shows the WNV infection incidence rates over the study period. From 2007 to 2009, very few regions were affected by WNV, however, 2010 saw an outbreak that spread far and wide. Since 2010, the number of regions that have been affected by WNV increased. In 2018, a massive outbreak affected almost all of the regions in our study. The relationship between the incidence of WNV infections and the climate, land-use, and economic factors was modelled via a Generalized Additive Model (GAM), which also accounted for the spatial and temporal autocorrelation. Table 2 show the results of our statistical analysis. We built the statistical model in a stepwise fashion using the lowest Akaike Information Criterion (AIC) to help us assess the different specifications. We selected relevant variables in each specification according to their category, i.e. climate, land-use and economic. We included all variables in our final specification to ascertain the contribution of each driver, all else equal. Table 2 also summarises the relevant statistics (AIC, Deviance, Adjusted R squared and so on) to compare the different specifications. We find that our final model (Full model) has the best fit in terms of the AIC, followed by the economic model, the climate model and land use model, as shown in Table 2 (for model diagnostics see additional information figures  S8-15 and Table S1). Note that as we are not estimating a standard regression model, the figures reported should not be read as coefficients, but degrees of freedom of the smooth terms. Given that we cannot interpret the coefficients to infer the sign and magnitude of the relationship, we visualise it by plot. Figures (3)(4)(5) plot the partial effects-the relationship between a change in each of the covariates and a change in the fitted values in the full model. If we don't control for other factors, the climate variables are highly statistically significant. Mean summer temperature (°C) is associated with an increase in the incidence of human annual WNV infections; WNV infections are also associated with the amount of rain days per summer (Jun-Aug). Both variables have a linear relationship with the outcome variable. Mean winter temperature has a quadratic relationship with WNV. Number of days of rain in the winter is also positively correlated with annual WNV infections. The impact of the climate variables remains highly statistically significant in specification 4 (full model), with the exception of mean summer temperature, whose impact becomes substantially weaker. By plotting the partial effects, we can also see that the WNV cases do not occur in regions when average summer temperatures are below 22°C, while they steadily rise in temperatures above this point (figure 3, top left). In addition, winter temperatures below 2°C reduce WNV infections and so do temperatures above 6°C. We also examined how the extent of regional surface water in the summer affect the incidence of the infection. As the partial effects plotted in figure 4 (top left) show, the relationship is linear and negatively correlated with WNV infections. As expected, the percentage of wetlands and arable land in a region (Land-use model) are positively correlated with the incidence of WNV and highly significant. However, the percentage of discontinuous urban fabric, which represents built up areas (residential suburbs, villages) only becomes statistically significant in our final specification, which is positively associated with WNV infections. The economic indicators are also highly significant and have very strong negative relationships with WNV infections. These figures represent a 2007 baseline GDP / government spending growth for each country. Generally, regions in counties which tended to have lower annual GDP growth and lower central government spending on areas of the environment, such as agriculture, forest fisheries spending and waste water management, have much higher WNV infection rates. On the contrary, central government spending on health is positively correlated with WNV infections, a finding which we will expand on in the next section.

Discussion
A range of regional environmental risk factors associated with human WNV infections, along with a set of economic indicators capturing the impact of the economic crisis, was used to explain the rise in WNV outbreaks in Europe over the last 13 years (2007-2019). This allowed for the identification of changes, in both environmental and economic factors that possibly led to the recent rise in infections. This is the first study we know of that considers this phenomenon in such detail, at a European wide scale.
Over the past 70 years, the countries analysed in this study have been experiencing increasingly warmer temperatures throughout the year (see figs [6][7][8][9]. According to our initial analysis, the last decade has been the warmest. The results of our final model show that average summer temperatures above 22°C are positively associated with an increase in WNV infections. It is likely that rising temperatures have created more favourable conditions for the virus and this is why it is occurring in more northerly regions than in previous decades. Higher temperatures, especially in summer months, is a key driver of WNV transmission, since warmer temperatures are known to influence the hatching rate and development time of mosquitoes, and shorten the extrinsic incubation period (EIP) of WNV and related viruses [28, 29, 30, 31, 32, 33]. Our analysis of the mean winter temperature (Dec-Feb) reveals a quadratic relationship with WNV and that it is the strongest predictor of annual WNV infections. Temperatures below 2°C and above 6°C have a negative impact on WNV infections. Our results are consistent with findings by [34]. The authors looked at Culex pipiens overwintering potential in mosquitoes in sheds and houses. Temperatures in sheds ranged from 2.7-17.0°C, and in houses 8.8-23.7°C. Survival rates were much better in sheds that had cooler temperatures, and also more humidity, which is also consistent with our results. Diapausing Culex pipiens mosquitoes do not necessarily do better under warmer conditions and there is a temperature range in which they can successfully diapause. Although our results may be slightly less accurate than results derived through an experimental study (since we analysed infection data at a regional scale using mean monthly temperatures), as with the study by Koenraadt et al. [34], the results do adhere to the same quadratic relationship. WNV infections for the months following winter are higher, if the specific temperature range of between 2°C and to 6°C is met. This is likely because WNV can overwinter with the mosquitoes when conditions are right, and then can be transmitted early on in the year, and throughout the year. This result, taken in the context of recent winter warming due to climate change, demonstrates how the virus has become so prevalent in Europe. Given current trends, we can also expect to see regions that have previously been too cold for Culex pipiens to survive over winter become viable locations and cause further havoc in regions that are currently experiencing just a few annual cases. On the contrary,

3/23
regions which currently have optimal conditions for overwintering mosquitoes may become too warm. It is also important to note that it is not currently clear at what temperatures Culex modestus and Coquillettidia richiardii overwinter as adults, if at all.
The number of rain days per summer is also positively correlated with WNV infections. This result is consistent with the literature, i.e. a steady flow of aquatic resources for mosquitoes has a positive impact on their abundance, and therefore an increase in disease transmission; [35,36]. Rainfall patterns have also been shifting since the 1950s (see figs 6-9) although unlike climate, there is no a clear trend and results are more difficult to interpret. In general, Austria, Croatia and Italy are seeing less intense rainfall in the summer months, but higher in Autumn, whereas Bulgaria, Greece, Hungary and Romania are receiving more rainfall in summer months. Additional figures (S4-S7) also capture this information in a long term-time series. Our results also show that higher summer surface water extent for a given year in a given region, is negatively correlated with WNV. This was not an expected finding, since we would have expected higher levels of surface water would mean more breeding grounds for mosquitoes. However, the result is consistent with part of the literature as sometimes desiccation of water resources can bring mosquito and bird hosts closer together and can increase transmission potential [23,25]. These results may also suggest that regions with higher surface water extent for a given year may be experiencing flooding and fast water movement, which may wash away mosquito eggs and larvae, and also may inhibit contact between birds, mosquitoes and humans [23,25,37]. It is important to note that this variable probably does not capture the creation of short-term water resources created by rainfall (e.g. pools, puddles), which can be used as breeding habitat by mosquitoes. It rather captures long term and large water surface such as deltas, lakes and flood plains.
As for the land-use variables, as expected, regions with a larger proportion of arable land (particularly rice paddies and irrigated agriculture) are more likely to see outbreaks of WNV. This is consistent with the literature as such land-use patterns attract bird and mosquito species capable of transmitting WNV, that can then have a knock-on effect on human populations. The percentage of discontinuous urban fabric, which represents populated area of low to medium density such as residential suburbs, villages etc. [see 38] is positively associated with WNV, also consistent with the literature. However, this variable only becomes weakly statistically significant in our final specification. This could be interpreted as locations with such land-use patterns are particularly at risk when climate conditions permit and/or during periods of economic decline. According to our additional descriptive analysis (figures S2-S3), we are seeing increases in land use types associated with human WNV infections, which may have also contributed to the rise in cases over the past decade. In terms of economic factors influencing WNV, in general, regions in countries which were experiencing lower GDP growth, lower spending growth on environmental factors such as agriculture, forest, fisheries, and waste water management had a higher number of cases over the study period. Waste water management spending may be a reliable causative factor, due to, for instance, the neglect of hazard prevention efforts, such as spending on flood defences, essential works like sanitation and upkeep of infrastructure, that can lead to the creation of mosquito breeding habitats, e.g. potholes, blocked drains [23, 25, 39]. We find that government spending on the agriculture, forest + fisheries sectors, which represents investment or subsidies that can go into improving environmental care, landscapes and biodiversity in rural locations, as well as investments in agricultural technology that improve worker safety or mechanise agriculture, had negative effects on WNV infections. A drop in government spending on these three economic sectors can be considered as a proxy for austerity. Our results suggest that economic problems and spending reductions in key sectors have clearly had knock on effects. On the contrary, our statistical analysis shows that central government spending on health had a positive, albeit concave, impact on WNV infections, which is consistent with the literature, since better funded health systems tend to do better at disease diagnosis and detection, but clearly once those are improved, more health spending does not lead to more cases, hence the concavity [40, 41,42,43]. Additional figure S1 provides a multiple time-series of the economic situation in the 7 countries in terms of GDP and government spending.
Although not all the variables used in this study represent disease transmission mechanisms directly, understanding the relative impact of the environmental, climate and economic factors on disease outcomes can help risk assessors predict where diseases are likely to occur in the future, by identifying locations with vulnerabilities in public health systems and/or by identifying impoverished areas that tend to be susceptible to disease. One important factor we did not analyse in this study, which may have also contributed to the upsurge in infections over the past 10 years, is the prevalence of WNV in migratory birds. It may be the case that infection levels in birds in endemic regions are now much higher, which has increased the chance of wider outbreaks in Europe. It is important to note that, with any analysis dealing with regional data, results should be taken with caution because of issues of scale and uncertainty introduced by the aggregation procedure; therefore, our findings should be evaluated with results from more localised studies. Nevertheless, this study should act as a cautionary tale; central governments looking to embark on austerity measures should exercise extreme caution given the long term trade-offs and unintended consequences they can have 4/23 on the environment and human health, especially when considering we are facing multiple threats brought about by global warming and other anthropogenic induced changes that can benefit emerging diseases, i.e. global trade in wild animals, intensive agriculture / animal rearing and land use conversion. It is vital for governments to invest in environmental infrastructure and health services to prevent future epidemics, rather than helping to instigate them.

Data collection and methods
In this study, we compiled a spatial temporal data-set that captures the main drivers and risk factors of WNV infections in Europe. We compiled the data-set based on findings from a literature review (see supplementary information -conceptual framework).

Aggregation
All data were aggregated annually to produce the final yearly panel data-set and aggregated at the NUTS 3 country subdivision level [see 44], apart from economic data which was sourced at the country level (central government).

WNF Case Data
WNV case data were provided at request by the European Centre for Disease Prevention and Control (www.ecdc. europa.eu). Cases data are collected weekly by EU member states and affiliates. Data were aggregated at NUTS 3 country subdivisions [44]. Positive cases were confirmed using WNV-specific IgM in blood. All cases were aggregated yearly to create the annual panel data set.

Economic, Socio-Economic and Demographic Data
Economic data were extracted from the Eurostat database (https://ec.europa.eu/eurostat/data/database), which provides comparable statistics and indicators and is presented in yearly time series. To capture factors determining the economic crisis, austerity and cuts to public spending we selected country level "Gross domestic product", "Agriculture, forestry", fisheries spending","Waste water management spending" , and "Health spending". The "Agriculture, forestry, fisheries spending" variable captures spending in rural areas on factors that would help to improve the environment and agricultural development which would benefit agricultural workers and/or mechanise production [see 45]. In order to represent spending before and after the economic crisis, we created a baseline index for each variable set at 2007 levels, which represented negative or positive growth from the point just before the economic crisis hit Europe.
Population count data to predict the number of persons at risk in a region were sourced from the Socio-economic Data and Applications Center's Gridded Population of the World data set [46]. This data-set estimates population count for the years 2000, 2005, 2010, 2015, and 2020, consistent with national censuses and population registers. Data were extracted from areas where vector presence was predicted. R's Zoo package was used to replace values for missing years, by implementing a linear interpolation method that would predict trends between years. This way, increases or decreases in human population were controlled for in the final model.

Climate Data
Climate data were sourced from the E-OBS gridded data-set [see 47]. The data-set was created using a series of daily climate observations at meteorological stations throughout Europe and the Mediterranean. Files were processed in R with the Tidyverse, NetCDF and Raster packages to create regional seasonal variables: "mean temp winter (°C)", "mean temp spring (°C)", "mean temp summer (°C)", "winter precipitation (total mm per region),"spring precipitation (total mm per region","summer precipitation (total mm per region)". Although, to standardize the data across regions we created variables which counted rain occurrence per season i.e. "days of rain in winter","days of rain in spring" and "days of rain in summer". Winter was designated as December to March, Spring as March to June, and summer June to September. We did not include "mean temp spring (°C)" in our final data set as it was correlated with winter and summer temperature variables, we concluded that we would capture more variation using the winter and summer variables which were not highly correlated. use changes were controlled for in the final models. To incorporate some of the main land-use WNV risk factors into our model, we selected variables: "continuous urban fabric", "discontinuous urban fabric", "wetlands (fresh water)" and "arable land".

Surface Water
Regional surface water data was sourced using the JRC Monthly Water History, v1.2 data set (https://developers. google.com/earth-engine/datasets/catalog/JRC_GSW1_2_MonthlyHistory) via Google Earth Engine using a 30 metre pixel resolution. This data set contains maps of the location and temporal distribution of surface water from 1984 to 2019 and provides statistics on the extent and change of those water surfaces. Data were generated using scenes from Landsat 5, 7, and 8. Each pixel was individually classified into water / non-water using an expert system and the results were collated into a monthly history [see 49] . In order to standardize this data across regions, we grouped surface water observations by region and season. We then converted the surface water observations to Z-scores to give us the standard deviation of the water extent. This would help us determine if the water extent was average, below the mean (low), or above the mean (high).

General additive regression model to assess impact of independent variables on WNV case data at regional level
One of the main issues with our data-set is that it did not meet some basic assumptions for statistical inference, and specifically the data are not independent and identically distributed random variables (iid). More specifically, the data-set captured repeated measurements over the same regions, and observations were not independent because of spill over effects from neighbouring regions, therefore we needed to implement an appropriate statistical design to control for both temporal and spatial pseudo replication (lack of independence). We could deal with this in two ways, 1) either using a generalized linear mixed model (GLMM) approach, relaxing the assumption of independence and estimating the spatial/temporal correlation between residuals, or 2) model the spatial and temporal dependence in the systematic part of the model [50]. We opted to use a Generalized Additive Model (GAM) using R's Mgcv statistical package because of its versatility and ability to fit complex models that would converge even with low numbers of observations and could capture potential complex non-linear relationships. One of the advantages of GAMs is that we do not need to determine the functional form of the relationship beforehand. In general, such models transform the mean response to an additive form so that additive components are smooth functions (e.g., splines) of the covariates, in which functions themselves are expressed as basis-function expansions. The spatial auto-correlation in the GAM model was approximated by a Markov random field (MRF) smoother, defined by the geographic areas and their neighbourhood structure. We used R's Spdep package to create a queen neighbours list (adjacency matrix) based on regions with contiguous boundaries i.e. those sharing one or more boundary point. We used a full rank MRF, which represented roughly one coefficient for each area. The local Markov property assumes that a region is conditionally independent of all other regions unless regions share a boundary. This feature allowed us to model the correlation between geographical neighbours and smooth over contiguous spatial areas, summarizing the trend of the response variable as a function of the predictors [see 51, section 5.4.2]. In order to account for variation in the response variable over time, not attributed to the other explanatory variables in our model, we used a saturated time effect for years, where a separate effect per time point is estimated.
We first tried to fit our model using a Poisson distribution. However, the mean of our dependent variable (dengue cases by region and year) was lower than its variance -E(Y) <Var(Y), suggesting that the data are over-dispersed. We also tried to fit our models using the negative binomial, quasi-Poisson and Tweedie distribution, all particularly suited when the variance is much larger than the mean. After several tests, we concluded that the Tweedie distribution worked well with our data and allowed us to model the incident rate, although results were comparative across all distributions (note that WNV infection count data, offset by a log of population at risk was used for the neg bin and quasi-Poisson models). Analysis of model diagnostic tests did not reveal any major issues; in general residuals appeared to be randomly distributed (see additional information -figures S8-S15 and Table S1 for diagnostics). Tweedie distributions are defined as subfamily of (reproductive) exponential dispersion models (ED), with a special mean-variance relationship . A random variable Y is Tweedie distributed if: which we assume to be Tweedie distributed; Xit -is a vector of economic, demographic, environmental and climate variables. Y ear t is a function of the time intercept and Region i represents neighbourhood structure of region.

Climate modelling
In order to model long term climate trends and seasonal or within-year variation, we fit a GAM model using the following equation: where B 0 is the intercept, f seasonal and f trend are smooth functions for the seasonal and trend features. Month is represented by x 1 and x 2 is the series of years in the entire time period i.e. within-year and between year (see figures S4-S7) To model any variation in, or interaction between, the trend and seasonal features of the data. We also fit a slightly different GAM model to allow for an interaction between f seasonal and f trend using the following equation (see figures [6][7][8][9]: To create the models, we used either monthly mean regional temperatures (°C) or regional sum precipitation (mm) as mentioned in the previous section. Temperature models were fit using the Gaussian distribution and precipitation models fit using the Tweedie distribution.