Preparation of variables and univariate regression
Calculating the p_values for the simple regressions between the annual abundances and the predictive variables allowed us to use 24 variables for CPRg (3 LCC and 21 meteorological), 39 for CQP (3 LCC and 36 meteorological), 38 for STMg (3 LCC and 35 meteorological) and 45 for VEX (2 LCC and 43 meteorological). By following the univariate selection process, these numbers dropped to 15 for CPRg, 22 for CQP, 20 for STMg and 25 for VEX. It was noted that all the LCC variables from univariate regressions were independent of each other.
Following the multicollinearity test conducted with VIF thresholds set to 3, 5 and 10, there were no meteorological variables below the VIF threshold of 3 for any of the four species or species groups. To confirm whether adding meteorological variables improved the models, it was necessary to retain multiple models with thresholds equal to and greater than 3. Table 3 shows the selected variables by VIF threshold and by species.
Table 3
Explanatory variables selected for modelling, following the multicollinearity test, with the direction of the association
CPRg | CQP | STMg | VEX |
+ Agri * | +Agri * | − Urban * | − Water * |
− Grassland * | − Urban ** | + Grassland * | + Agri * |
−Woodland * | + Woodland ** | +Woodland * | −Pfeb ** |
−TNsep_1y *** | +Pjan ** | −Pjan ** | −Psep_1y ** |
−Pmar *** | +Psep ** | +Pmar *** | +Pjan *** |
+Paug *** | +Pjul *** | −Pjun *** | +Pjul *** |
| +Paug *** | −Paug *** | +Psep *** |
| | −Psep *** | |
CPRg: Culex pipiens-restuans group; CQP: Coquillettidia perturbans; STMg: Ochlerotatus stimulans group; VEX: Aedes vexans; +: positive correlation; -: negative correlation; *: Variance inflation factor thresholds (VIF) ≤ 3, 5 and 10; **: VIF ≤ 5 and 10; ***: VIF ≤ 10 |
For CPRg, the VIF 3 and VIF 5 models were identical. In addition, only CPRg had a temperature variable in the VIF 10 model, namely the average minimum temperature in September of the previous year that was negatively correlated, in univariate analysis, with the annual abundances of CPRg. All other meteorological variables selected in the other models and for the other species were exclusively precipitation variables.
The meteorological variables associated with abundance were very different from one species to another. CQP and VEX appeared to be sensitive to winter (January precipitation), summer (July and August precipitation) and fall (September precipitation) conditions. CPRg abundances seemed to be affected more by spring (March) and summer (August) precipitation. As for STMg, all seasons seemed to be significant in terms of their mean precipitation.
For the LCCs, agricultural land appeared to have a predominant influence, since it was positively associated with CQP, VEX and CPRg. The urban LCC was negatively correlated with CQP and STMg, and the grassland and woodland LCCs were negatively correlated with CPRg and positively correlated with STMg. Lastly, for VEX only, water was negatively correlated.
Construction of the multiple model
CPRg: AIC scores were quite similar for the LMMwithout3 (AIC=651), LMMwithout10 (AIC=652) and LMMwith10 (AIC=651) models. However, with the goal being to assess the long-term impact of climate change and since the LMMwithout3 model did not include meteorological variables, we set it aside. The LMMwith10 model was selected as the final LMM model (Table 4). The mean annual abundance of CPRg was, therefore, predicted by a linear model combining three land cover variables (agricultural land, grassland and woodland), a climate variable (mean temperature in September of the previous year) and an interaction term combining the climate variable and woodland. The agricultural land LCC had a positive effect on the mean annual abundance of CPRg, while the grassland and woodland LCCs and the mean temperature in September of the previous year had a negative effect. The interaction effect between the woodland LCC and temperature balance this negative effect by making it stronger with negative value of mean temperature value and almost null for positive mean temperature value (see Suppl. 1). By comparing prediction for 2014 to the observed data for 2014, the frequency distribution of Student residuals looked like a normal distribution. Ninety-five point one per cent (95.1%) of the values were between [-2;+2]. The mean was positive (+0.602), meaning that the mean annual abundance values predicted by the model were generally higher than those actually observed.
CQP: No interaction or quadratic terms were significant in the models for CQP. LMMwithout10 had the lowest AIC score (AIC=530) (Table 4). The mean annual abundance of CQP was, therefore, predicted by a combination of two meteorological variables (mean precipitation in January and August) that had a positive effect on abundance and the percentage of urban land surrounding the trap, which had a negative effect on abundance. For CQP, the histogram of Student residuals showed a normal distribution with a mean of zero. Residual values were between -3.5 and +3, with 94.6% of them falling in the [-2;+2] range, thus the "excellent" class. By comparing the data simulated by the final model for 2014 to the observed data for 2014, the frequency distribution of Student residuals looked like a normal distribution. Ninety-six point six per cent of the values were between [-2;+2]. The mean was negative (-0.67), meaning that the mean annual abundance values predicted by the model were generally lower than those actually observed.
STMg LMMwith10 had the lowest AIC score (AIC=1013) (Table 4). The mean annual abundance of STMg was, therefore, predicted by a linear model combining two land cover variables (urban and woodland, the latter including a quadratic term) and a single meteorological variable (mean precipitation in June). The presence of urban land had a negative effect, as well as the mean precipitation in June. The effect of woodland has to be interpreted with caution combining the simple and the quadratic effect. It shows that abundance peaks at intermediate values of woodland cover, but that at too high or too low woodland densities, the environment is not favorable to STMg anymore (see Suppl. 1). For STMg, the histogram of Student residuals showed a normal distribution with a mean of zero. Residual values were between -3.5 and +3, with 95.7% of them falling in the [-2;+2] range. By comparing the data simulated by the final model for 2014 to the observed data for 2014, the frequency distribution of Student residuals looked like a normal distribution. One hundred per cent of the values were between [-2;+2]. The mean was negative (-0.778), meaning that the mean annual abundance values predicted by the model were generally lower than those actually observed.
VEX LMMwith10 had the lowest AIC score (AIC=245) (Table 4). The mean annual abundance of VEX was, therefore, predicted by a linear model combining a land cover variable (agricultural land, in simple and quadratic form) and three climate variables (mean precipitation in January, February and September of the current year, and the interaction between January and September precipitation). The agricultural land LCC, because combining a single and a quadratic term, had a positive effect on the abundance of VEX only in an intermediate agricultural LCC density (see Suppl.1). Below and above this intermediate density, the agricultural land LCC becomes less favorable to VEX. January and September precipitation had a positive effect on VEX abundance, that is more important for the lowest September mean annual precipitation, because of their interaction term. February precipitation had a negative effect. For VEX, the histogram of Student residuals showed a normal distribution with a mean of zero. Residual values were between -3 and +2.5, with 94% of them falling in the [-2;+2] range. By comparing the data simulated by the final model for 2014 to the observed data for 2014, the frequency distribution of Student residuals looked like a normal distribution. Ninety-six point one per cent (96.1%) of the values were between [-2;+2]. The mean was negative (-0.57), meaning that the mean annual abundance values predicted by the model were generally lower than those actually observed.
For the four species, the Q-Q and P-P plots of the raw residuals confirmed that their distributions were near normal. The Student residuals were well distributed and did not appear to have a specific structure, thus supporting the normality assumption for raw residuals and the homoscedasticity assumption for Student residuals, as required to validate a linear model. Similarly, the Cook's distances confirmed the good results obtained with the Student residuals histograms. Distances were all less than 0.1 for all species or species groups except CPRg, which had a single value slightly greater than 0.15 (trap OKA 007_0).
Table 4: Final models for the variance inflation factor threshold of 10
Source
|
Value
|
SE
|
DF
|
t
|
Pr > |t|
|
LL (95%)
|
UL (95%)
|
Culex pipiens-restuans group (CPRg)
|
Intercept
|
6.369
|
0.117
|
57
|
54.415
|
< 0.0001
|
6.139
|
6.600
|
Agri
|
0.658
|
0.121
|
57
|
5.455
|
< 0.0001
|
0.420
|
0.896
|
Grassland
|
-0.422
|
0.121
|
57
|
-3.473
|
0.001
|
-0.662
|
-0.182
|
Woodland
|
-0.256
|
0.128
|
57
|
-2.007
|
0.050
|
-0.507
|
-0.004
|
TNsep_1y
|
-0.218
|
0.103
|
57
|
-2.103
|
0.040
|
-0.422
|
-0.013
|
Woodland×TNsep_1y
|
0.239
|
0.102
|
57
|
2.342
|
0.023
|
0.038
|
0.441
|
Coquillettidia perturbans (CQP)
|
Intercept
|
3.700
|
0.091
|
59
|
40.693
|
< 0.0001
|
3.521
|
3.879
|
Urban
|
-0.535
|
0.090
|
59
|
-5.947
|
< 0.0001
|
-0.713
|
-0.358
|
Pjan
|
0.244
|
0.062
|
59
|
3.904
|
0.000
|
0.121
|
0.367
|
Paug
|
0.265
|
0.070
|
59
|
3.769
|
0.000
|
0.126
|
0.403
|
Ochlerotatus stimulans group (STMg)
|
Intercept
|
6.533
|
0.331
|
58
|
19.715
|
< 0.0001
|
5.879
|
7.187
|
Urban
|
-1.448
|
0.467
|
58
|
-3.103
|
0.003
|
-2.369
|
-0.527
|
Woodland
|
1.907
|
0.604
|
58
|
3.155
|
0.003
|
0.714
|
3.099
|
Pjun
|
-1.127
|
0.240
|
58
|
-4.693
|
< 0.0001
|
-1.601
|
-0.653
|
Woodland×Woodland
|
-1.146
|
0.507
|
58
|
-2.260
|
0.028
|
-2.147
|
-0.146
|
Aedes vexans (VEX)
|
Intercept
|
4.615
|
0.035
|
56
|
132.824
|
< 0.0001
|
4.546
|
4.683
|
Agri
|
0.256
|
0.071
|
56
|
3.584
|
0.001
|
0.115
|
0.396
|
Pfeb
|
-0.126
|
0.037
|
56
|
-3.440
|
0.001
|
-0.198
|
-0.054
|
Pjan
|
0.227
|
0.036
|
56
|
6.344
|
< 0.0001
|
0.156
|
0.297
|
Psep
|
0.126
|
0.032
|
56
|
3.902
|
0.000
|
0.062
|
0.190
|
Agri×Agri
|
-0.158
|
0.069
|
56
|
-2.294
|
0.026
|
-0.294
|
-0.022
|
Pjan×Psep
|
0.159
|
0.039
|
56
|
4.091
|
0.000
|
0.082
|
0.236
|
SE: Standard error; DF: Degree of freedom; LL: Lower limit; UL: Upper limit
Mapping
Mapping of residuals
For all species, the calibration residual values were spread across the region without any particular clustering. The residuals from the external validation showed the same lack of spatial pattern, except for a cluster of negative values around the Marguerite-D'Youville wildlife refuge (45°385 N, 73°77 W): the observed values from traps were higher than those predicted (Fig. 2, 3, 4, 5), including for the CPRg model, which generally overestimated abundances.
Mapping of mean annual abundance for the entire study area
The maps allow us to observe that the hot spots on the maps overlap with the highest mosquito abundances observed in 2014, in general. Additionally, outside of the calibrated area (where colors are lighter), the prediction are more difficult to interpret and probably less reliable in terms of mosquito abundance prediction.
CPRg: The predictive map for 2014 shows some homogeneity on the island of Montreal, with some highly localized hot spots, shown in red. Outside the island of Montreal, there is more variability, with some hot spots located in densely wooded areas. However, the regression coefficient for this variable is negative, meaning that the "September temperature × woodland" interaction term attenuates the simple "woodland" variable. Intermediate abundances are mainly associated with the presence of agricultural plots. The few cold spots can be explained by the presence of grasslands, which have a negative influence in the regression model, or, further north in the study area, by drastically cooler temperatures in September, or by high densities of woodland.
CQP: High abundances of CQP are associated with less urbanized areas, which is consistent with the value of the coefficient associated with this variable in the final model (Fig. 7). The more subtle variations in abundance can be explained by the two precipitation variables. The observational data for 2014 blend generally very well with the simulated data, which highlights the accuracy and precision of the model.
STMg: There is a lower abundance of this group of species in urbanized areas compared to woodlands where there’s a higher abundance (Fig. 8). High risk areas are mostly associated with rapidly increasing abundance near woodlands.
VEX: The predicted abundances show a very rapid variation at the edge of agricultural areas (Fig. 9). For this variable, Temperature differences probably explain the more subtle additional variations.