Mapping the Vector Abundance of Endemic Mosquito-Borne Diseases in Southern Quebec

Background: Climate change is increasing the dispersion of mosquitoes and the spread of viruses of which some mosquitoes are the main vectors. This increases the risk of humans coming into contact with infected mosquitoes and developing diseases with sometimes fatal consequences. In Quebec, the surveillance and management of endemic mosquito-borne diseases, such as West Nile virus or Eastern equine encephalitis, could be improved by mapping the areas of risk supporting vector populations. However, there is currently no active tool tailored to Quebec that can predict annual mosquito population abundances. Methods: Our modelling approach is designed to meet this need. Four species of mosquitos were studied in this project for the period from 2003 to 2014 for the southern part of the province: Aedes vexans (VEX), Coquillettidia perturbans (CQP), Culex pipiens-restuans group (CPRg) and Ochlerotatus stimulans group (STMg) species. We used a mixed linear regression approach to model the abundances of each species or species groups as a function of meteorological and land cover variables. Results: The best models incorporate, for CPRg, the agricultural land, grassland and woodland classes and the average minimum temperature in September of the previous year; for STMg, the urban and woodland classes and the mean precipitation in June; for CQP, urban areas and the mean precipitation in January and August; and nally, for VEX, the agricultural land class and the mean precipitation in January, February and September. Conclusions: The models proved to be robust and precise over almost the entire study area, and the presence of signicant climate variables for each of the species or species groups makes it possible to consider their use in predicting long-term spatial variations, based on climate and landscape change, in the abundance of mosquitoes potentially harmful to public health in southern Quebec.

Not all of these mosquito species have the same preferences in terms of larval habitat and climate sensitivity. For example, each species of mosquito will be more or less sensitive to variations in meteorological factors as diverse as the number of degree-days, the photoperiod, or intra-or interannual temperatures, depending on its biological characteristics and stage of development [5,41,42,43,44,45,46]. It is, therefore, essential that the models developed are speciesspeci c [47] and combine spatial and temporal aspects [11,13,19,20]. Spatial variations will re ect environmental changes and major climate trends, and temporal variations will represent meteorological changes as well as landscape changes over time.
Our study, therefore, involved modelling the spatial distribution of CPR, STM, CQP and VEX mosquito populations as a function of land cover and meteorological variables in southern Quebec. More speci cally, the objective of this paper was to develop a modelling method for mapping mean annual mosquito abundances in southern Quebec with a spatial model that uses landcover and meteorological data. The purpose of these models was to create maps of mean annual mosquito abundance, for each of the four species, to guide campaigns to prevent and control the spread of the diseases borne by the selected species. The sub-objectives were to (1) determine the relevant meteorological variables and land cover classes that characterize an environment conducive to the development of each species based on known entomological characteristics; (2) develop, for each of the four species or groups of species, a spatial model for predicting mean annual abundances based on these meteorological and environmental variables; and (3), map the annual mosquito abundances predicted by the model.

Study area
The study area located in southern Quebec spans 102,850 km 2 (Fig. 1). It was determined based on available mosquito sampling data and follows the human ecumene of the greater Montreal, Quebec City and Gatineau areas. Fifty per cent of traps are located in urban areas, 20% in woodlands, 15% in vegetated nontreed areas, 8.5% in agricultural areas, and the remaining 6.5% in other areas. The study area is crossed from southwest to northeast by the St. Lawrence River.
Most of the areas south of the river are dense agricultural areas with some open woodlands and the odd wetland. North of the river, past a narrow agricultural strip, there is a forest area populated by mixed and coniferous forest whose density increases toward the north. According to the Köppen-Geiger classi cation [48], most of our study area has, in its southern part, a warm-summer humid continental climate (Dfb boreal climate), with the warmest month of the year (July) averaging below 22 °C, but more than four months averaging above 10 °C. The temperature in the coldest four months varies between -38 and 0 °C. The northern part of the study area has a subarctic climate (Dfc boreal climate). The average temperature in the warmest month is above 10 °C and below 22 °C, that in the coldest month is between -38 and 0 °C, and less than four months average above 10 °C. Precipitation is distributed evenly throughout the year. In these types of climates, the season of mosquito activity generally extends from May to September.

Mosquito data
There are approximately 80 species of mosquitoes in Canada [40,49]. To limit the number of mosquito species to be studied, we selected for our study four species that pose a potential public health risk: CPR, CQP, VEX and STM. Because of di culties distinguishing Culex pipiens from Culex restuans morphologically, these species are very commonly grouped under the CPR group (CPRg); Oc. stimulans and Oc. hexodontus are also grouped together for this reason. The mosquito catch data came partly from the provincial program under the Institut national de Santé publique du Québec (INSPQ) and Quebec's Ministère de la Santé et des Services sociaux (MSSS) for the surveillance of mosquito-borne diseases in Quebec, and partly from the Public Health Agency of Canada (PHAC) using the database of a private company involved in mosquito control (GDG Environnement Ltd.). Trapping was conducted with Center for Disease Control (CDC) CO 2 light traps. The dataset covers the years 2003 to 2016, but with varying intensity (signi cant variation in the number of catches by traps per year and in the traps location). To ensure that our data adequately re ect annual abundance, we selected traps that had operated for a minimum of 15 consecutive weeks during the mosquito season. A shorter period would not have been su cient to represent annual abundances, and a longer period would have been too restrictive in the number of traps selected. The years 2003The years , 2004The years , 2005The years , 2013 and 2014 were the only one who meet this criterion.

Environmental data
The rst category of environmental data used are the land cover class (LCC) layers derived from previously classi ed satellite (raster) data and vector data (Table 1). Eight LCCs were selected for our exercise based on our literature review: aquatic area (shallow and deep water), bare soil, urban/built-up, wetland, agricultural land, pasture/grassland, vegetated non-treed, and woodland. The raster layer processed by Agriculture and Agri-Food Canada (AAFC) [50] was used as a general base, and its classi cation was improved by the addition from multi-source data: The classi ed SPOT raster land cover of Canada south of the treeline provided by the Government of Canada (GoC) [51] and two other layers in vector format were used for this purpose, namely the Ducks Unlimited Canada (DUC) [52] wetland map and Natural Resources Canada's National Hydrographic Network (NHN) [53]. The different sources of LCC data were combined using the following decision rule: in non-urban areas, the information provided by the AAFC layer prevailed. In urban areas, the information provided by the SPOT image prevailed over the AAFC image. The LCC was classi ed as a wetland if and only if the data provided by AAFC, SPOT and DUC all indicated the presence of a wetland. The LCC was classi ed as an aquatic area based on the NHN information, which prevailed over all other data sources. Around each trap, 1-km buffers were created from which the presence percentages for each LCC were extracted. The analyses were performed in PCI Geomatica [54] and ArcGIS [55].  The second category of environmental data used is obtained from daily meteorological records at Environment Canada ground stations [56], interpolated with inverse distance weighting [14,57] and resampled for every 1 km over the entire study area. One image per day is produced for the years 2002 to 2014, inclusively. Four meteorological variables are calculated: minimum, mean and maximum daily temperatures and total daily precipitation. These variables are then averaged within a 1-km buffer around each trap to produce minimum, mean and maximum monthly and annual temperatures and mean monthly precipitation. To meet the requirements of a normal residual error structure and homoscedasticity assumed in a linear gaussian model, we tested several possible transformations of the abundance data, and ultimately chose Box-Cox transformations. In addition, the variables extracted from the LCCs and meteorological images were standardized (mean=0, sd=1) to make them comparable in a multiple model. The name of the trap, unique according to its location, was introduced as a random factor in the linear regression to take into account repeated measures at the same locations across years.
To select relevant variables, the predictive variables were rst screened by retaining only those variables having a signi cant effect on annual abundances (p_value of less than or equal to 0.05). Then, among those variables, only those with a pairwise correlation of less than 0.8 were retained, as well as the most signi cant variable when the pairwise correlation coe cient was larger than 0.8 [14]. The nal selection of the variables included in the models was based on three maximum variance in ation factor (VIF) thresholds: 3, 5 and 10. The three VIF thresholds used were taken from compatible studies in the literature [15,58,59,60].

Construction of the multivariate regressions
The selected model structure was a linear mixed-model (LMM) with the trap identi er as a random effect. Two approaches to building models were used: the backward stepwise approach and the forward stepwise approach [59]. The consistency between the two approaches was assessed. The variable selection process was carried out until all the explanatory variables in the models were signi cant, meaning they had a p_value of less than or equal to 0.05. Models without interactions and/or quadratic terms were designated as follow "LMMwithout". The number following the designation (3, 5, or 10) indicated the VIF threshold used. Next, the signi cance of the quadratic and interaction terms was tested for each of the variables retained in the multiple model. This approach was preferred to testing the quadratic and interaction effect together with univariate selection process because of the larger amount of predictive variables. Thus, when the relationship between a variable and the abundance of a species did not offer a strictly linear visual aspect, adding a quadratic or interaction term made it possible to better approximate the trend shown in the relationship. Quadratic and/or interaction terms were retained only if the original explanatory variable(s) and the term itself were signi cant. This second series of models was called the LMMwith model. We thus obtained six models that matched according to the backward stepwise and forward stepwise approaches, per species or group of species: three LMMwithout models for each VIF threshold and three LMMwith models for each of the three VIF thresholds.
Normality of residuals was checked using the Shapiro-Wilk, Anderson-Darling, Lilliefors and Jarque-Bera tests [60,61], and their distribution was evaluated graphically using P-P and Q-Q plots. Homoscedasticity of Student residuals was evaluated graphically, as was their frequency distribution (which had to lie between -2 and +2). Student residuals were mapped in order to identify speci c clusters or structures in the extreme values. Cook's distances were also calculated to identify outliers. The relevance of the six models per species or group of species was evaluated using Akaike's information criterion (AIC) [12,14,62]. The best or nal model per species or group of species was the one that had the lowest AIC score.
The expected values for 2014 were then calculated by the model and compared to the observed values, since these were the data chosen for validation by calculating raw and Student residuals. The same steps for residual analysis were carried out as described in the previous paragraph.
Most of the statistical operations were performed using the XLSTAT module [63] for Excel. To double-check the selection of variables, we redid the stepwise procedure in R version 3.4.2 [64].

Mapping
The Student residual values for the model calibration step and the external validation step (comparison with 2014 data) were spatially represented by categorizing the residuals as follows: "excellent" for an absolute value between 0 and 1; "very good" for a value between 1 and 2; "poor" for a value between 2 and 3; and "very poor" for a value between 3 and 4.5.
With the purpose of comparing predicted and observed hot spot zones (for 2014), annual abundances for each of the four mosquito species or species groups were predicted for the entire study area for 2014 only (even though this could have been done for each year) using the validated LMM nal models. As LMM models predicted Box-Cox transformed abundance of mosquito, we re-transformed the value to obtain true abundance. The abundance was classi ed inspired by Natural Breaks, in high, medium and low risk area, corresponding to high, medium and low mosquito abundance. The categories limits where different from one species to another. To map the models, the LCC variables were centered and reduced using the means and standard deviations of the training data. The LCC percentages and meteorological variables used in each model were calculated so that each pixel of the image received the average value of the surrounding pixels within a 1-km buffer.

Preparation of variables and univariate regression
Calculating the p_values for the simple regressions between the annual abundances and the predictive variables allowed us to use 24 variables for CPRg ( Following the multicollinearity test conducted with VIF thresholds set to 3, 5 and 10, there were no meteorological variables below the VIF threshold of 3 for any of the four species or species groups. To con rm whether adding meteorological variables improved the models, it was necessary to retain multiple models with thresholds equal to and greater than 3. Table 3 shows the selected variables by VIF threshold and by species. The meteorological variables associated with abundance were very different from one species to another. CQP and VEX appeared to be sensitive to winter (January precipitation), summer (July and August precipitation) and fall (September precipitation) conditions. CPRg abundances seemed to be affected more by spring (March) and summer (August) precipitation. As for STMg, all seasons seemed to be signi cant in terms of their mean precipitation.
For the LCCs, agricultural land appeared to have a predominant in uence, since it was positively associated with CQP, VEX and CPRg. The urban LCC was negatively correlated with CQP and STMg, and the grassland and woodland LCCs were negatively correlated with CPRg and positively correlated with STMg.
Lastly, for VEX only, water was negatively correlated.
Construction of the multiple model CPRg: AIC scores were quite similar for the LMMwithout3 (AIC=651), LMMwithout10 (AIC=652) and LMMwith10 (AIC=651) models. However, with the goal being to assess the long-term impact of climate change and since the LMMwithout3 model did not include meteorological variables, we set it aside. The LMMwith10 model was selected as the nal LMM model ( Table 4). The mean annual abundance of CPRg was, therefore, predicted by a linear model combining three land cover variables (agricultural land, grassland and woodland), a climate variable (mean temperature in September of the previous year) and an interaction term combining the climate variable and woodland. The agricultural land LCC had a positive effect on the mean annual abundance of CPRg, while the grassland and woodland LCCs and the mean temperature in September of the previous year had a negative effect. The interaction effect between the woodland LCC and temperature balance this negative effect by making it stronger with negative value of mean temperature value and almost null for positive mean temperature value (see Suppl. 1). By comparing prediction for 2014 to the observed data for 2014, the frequency distribution of Student residuals looked like a normal distribution. Ninety-ve point one per cent (95.1%) of the values were between [-2;+2]. The mean was positive (+0.602), meaning that the mean annual abundance values predicted by the model were generally higher than those actually observed.
CQP: No interaction or quadratic terms were signi cant in the models for CQP. LMMwithout10 had the lowest AIC score (AIC=530) ( Table 4). The mean annual abundance of CQP was, therefore, predicted by a combination of two meteorological variables (mean precipitation in January and August) that had a positive effect on abundance and the percentage of urban land surrounding the trap, which had a negative effect on abundance. For  STMg LMMwith10 had the lowest AIC score (AIC=1013) ( Table 4). The mean annual abundance of STMg was, therefore, predicted by a linear model combining two land cover variables (urban and woodland, the latter including a quadratic term) and a single meteorological variable (mean precipitation in June). The presence of urban land had a negative effect, as well as the mean precipitation in June. The effect of woodland has to be interpreted with caution VEX LMMwith10 had the lowest AIC score (AIC=245) ( Table 4). The mean annual abundance of VEX was, therefore, predicted by a linear model combining a land cover variable (agricultural land, in simple and quadratic form) and three climate variables (mean precipitation in January, February and September of the For the four species, the Q-Q and P-P plots of the raw residuals con rmed that their distributions were near normal. The Student residuals were well distributed and did not appear to have a speci c structure, thus supporting the normality assumption for raw residuals and the homoscedasticity assumption for Student residuals, as required to validate a linear model. Similarly, the Cook's distances con rmed the good results obtained with the Student residuals histograms. Distances were all less than 0.1 for all species or species groups except CPRg, which had a single value slightly greater than 0.15 (trap OKA 007_0). For all species, the calibration residual values were spread across the region without any particular clustering. The residuals from the external validation showed the same lack of spatial pattern, except for a cluster of negative values around the Marguerite-D'Youville wildlife refuge (45°385 N, 73°77 W): the observed values from traps were higher than those predicted (Fig. 2, 3, 4, 5), including for the CPRg model, which generally overestimated abundances.
Mapping of mean annual abundance for the entire study area The maps allow us to observe that the hot spots on the maps overlap with the highest mosquito abundances observed in 2014, in general. Additionally, outside of the calibrated area (where colors are lighter), the prediction are more di cult to interpret and probably less reliable in terms of mosquito abundance prediction.
CPRg: The predictive map for 2014 shows some homogeneity on the island of Montreal, with some highly localized hot spots, shown in red. Outside the island of Montreal, there is more variability, with some hot spots located in densely wooded areas. However, the regression coe cient for this variable is negative, meaning that the "September temperature × woodland" interaction term attenuates the simple "woodland" variable. Intermediate abundances are mainly associated with the presence of agricultural plots. The few cold spots can be explained by the presence of grasslands, which have a negative in uence in the regression model, or, further north in the study area, by drastically cooler temperatures in September, or by high densities of woodland.
CQP: High abundances of CQP are associated with less urbanized areas, which is consistent with the value of the coe cient associated with this variable in the nal model (Fig. 7). The more subtle variations in abundance can be explained by the two precipitation variables. The observational data for 2014 blend generally very well with the simulated data, which highlights the accuracy and precision of the model.
STMg: There is a lower abundance of this group of species in urbanized areas compared to woodlands where there's a higher abundance (Fig. 8). High risk areas are mostly associated with rapidly increasing abundance near woodlands.
VEX: The predicted abundances show a very rapid variation at the edge of agricultural areas (Fig. 9). For this variable, Temperature differences probably explain the more subtle additional variations.

Discussion
This paper led to the creation of maps of mean annual abundance for four mosquito species or species groups that are key to public health, for southern Quebec. Validation of the model with external data (year 2014) established the robustness of the model and effectiveness of the method used.
Predictive mapping for CPRg is based on the agricultural land, grassland and woodland LCCs, the mean minimum temperatures in September of the previous year, and the interaction between these temperatures and the woodland LCC. While this group of mosquitoes is known to be very prevalent in urban areas [26,65], other authors have demonstrated the impacts of other agricultural land, woodland or grassland cover variables on the occurrence of CPR [21,25]. These publications show that agricultural land cover has a positive impact and dense woodland cover has a negative impact, which is consistent with our results.
Many models that predict CPR abundance do not use LCCs as predictive variables, but only meteorological variables [11,44,66], so it is di cult to compare their results to our work. Moreover, some of these authors highlight the importance of adding environmental factors to improve the accuracy of their predictions [44].
Studies on VEX habitats indicate that this species develops mainly in temporary pools of water caused by recent ooding or precipitation [19,67]. These pools and other puddles may be located on agricultural land [31], woodland, grassland [25,32,33] or pastures [32,33]. This is partly consistent with our results showing that agricultural land has a positive in uence on VEX abundance in conjunction with winter and fall precipitation. Little statistical modelling work has been done on VEX in Canada. The only existing work is that of Ripoche [44], and it uses only meteorological variables as predictive variables. Therefore, our results cannot be compared to theirs.
Regarding STMg, we found that it is rather positively in uenced by the presence of wooded areas and negatively in uenced by the presence of urban areas.
This result is consistent with the scarce empirical information in the literature that describes this group as mosquito species that seek vernal pools, which seem more common in wooded areas than in urban areas [39,40,67], and whose larval habitat seems to be located in depressions in wooded areas in the case of Oc. stimulans [67], and in ooded herbaceous areas in the case of Oc. hexodontus [68]. However, the geographic range of the latter species appears to be limited to an area much farther north than our study area [69], suggesting that the STM species group in our study may in fact consist exclusively of the Oc. Stimulans species.
It is also di cult to compare our results for the species CQP to the literature, because work on this species is scarce. According to Crans [67], CQP eggs are deposited directly on water. According to Gardner [25], who captured adult CQP with CDC light traps like ours, this species was trapped primarily in prairie, forest, and agricultural sites, and, to a lesser extent, in residential sites. This information is consistent with our results, which show that the only LCC variable that in uences CQP abundance is urban land, with a negative relationship. This makes our work a valuable tool for providing information on the environmental conditions of choice for CQP in Quebec.
The abundance maps predicted by our models show great heterogeneity between urban and peri-urban areas in terms of predicted abundances for all species, even VEX and those in the CPR group, for which the urban LCC variable was not included in the model. This can be explained by the fact that the traps used to produce our data were all in urban areas; therefore, the effect of this type of cover was modelled quite sensitively with our data. Moreover, the satellite data available to characterize our LCCs were available only from 2011 onward. However, the landscape changed between 2003 and 2011 (period covering the mosquito trapping data we worked with), especially in terms of urban land, which has expanded signi cantly in recent years in our study area. This factor must have played in favour of the underestimation of the impact of urban areas on mosquito abundances and, therefore, on the precision of the model in discriminating the spatial heterogeneity of abundances between urban and non-urban areas. For similar work, we recommend, whenever possible, the use of multi-temporal data that have a resolution of 30 m or less and can provide a detailed classi cation of the region, as we have demonstrated the importance of using LCC variables as much as meteorological data in mosquito abundance modelling work.
In addition to successfully modelling mean annual abundances for four different species, we have undertaken work on species that are scarcely studied despite being potential vectors of several viruses transmissible to humans. Many studies focus on the CPR group [11,17,26,42,70], or on both CPR and VEX [44] because of their potential to transmit West Nile virus, which is the most widespread arbovirus in Canada. Our selection of species studied includes a fortiori potential vectors for other diseases that are rarer but as present in Canada as WNV: CQP for EEE and STMg for California serogroup viruses [5]. This paper shows that it is possible to work on other, sometimes ignored, species of mosquitoes that pose a public health risk, using a robust and easily reproducible approach.
The modelling could have taken other effects into account, such as larviciding [44,70]. Although these effects are controversial in other studies, particularly for the CPR group [71], the results we obtained in the vicinity of the Île Saint-Bernard wildlife refuge-whose purpose of preserving wildlife and plant life could be why larviciding is restricted there-suggest that adding a variable describing the presence of larvicide treatment to our models would make them more precise for this area, where it appears that they systematically underestimate the observed abundance values.
Given the nature of our meteorological data, and while our results are very encouraging, we recommend re ning the precision of the meteorological variables using microclimatic data, in order to more accurately capture intra-seasonal and spatial variations in abundance. In addition, this would make it possible to test new explanatory variables, such as wind [11,72,73] or relative humidity [74], all of which are meteorological variables in uencing the biology and dispersion of mosquitoes.

Conclusions
The method presented in this paper to map mean annual abundances for various mosquito species in southern Quebec is a rst step toward developing a practical tool that can be used in decision making to identify areas of risk of exposure to infected mosquitoes. Indeed, no method had yet been developed in this study area to be applicable on a large scale and on four distinct species. The simplicity of the method used and the use of relatively generic LCC and climate variables are encouraging. This type of model can also help to study the impact of climate and environmental change on the distribution of these mosquito species in time and space from a public health perspective. Indeed, temperature, precipitation and the presence of urban areas are very important variables in our models and are undergoing profound changes: we note growing urban consumption of land, in tandem with an increase in mean temperatures and a change in precipitation patterns [75]. These three major effects can, therefore, be expected to signi cantly alter the landscape of mosquito abundances in the coming years and, consequently, the risk of exposure to mosquito-borne diseases.
List Of Abbreviations Distribution of Student residuals for the LMMwith10 model for the Ochlerotatus stimulans group for the region of our study area with the majority of the entomological data including 2014 data (our validation dataset).

Figure 3
Distribution of Student residuals for the LMMwith10 model for the Culex pipiens-restuans group for the region of our study area where we found the majority of the entomological data including 2014 data (our validation dataset).

Figure 4
Distribution of Student residuals for the LMMwithout10 model for Coquillettidia perturbans for the region of our study area with the majority of the entomological data including 2014 data (our validation dataset).

Figure 5
Distribution of Student residuals for the LMMwith10 model for Aedes vexans for the region of our study area with the majority of the entomological data including 2014 data (our validation dataset).

Figure 6
Mapping of spatial variations in mean annual abundances for Culex pipiens-restuans group (CPRg) for 2014 for (A) our entire study area, and (B) representation of values of observed data for a sub-area of our study area centred on the island of Montreal, with the highest density of entomological data.

Figure 7
Mapping of spatial variations in mean annual abundances for Coquillettidia perturbans (CQP) for 2014 for(A) our entire study area, and (B) representation of values of observed data for a sub-area of our study area centred on the island of Montreal, with the highest density of entomological data.

Figure 8
Mapping of spatial variations in mean annual abundances for Ochlerotatus stimulans group (STMg) for 2014 for (A) our entire study area, and (B) representation of values of observed data for a sub-area of our study area centred on the island of Montreal, with the highest density of entomological data.