3.1 Variables description and selection
In Table 1, The BI's mean in Siming District between 2015 and 2022 is 9.27, with a standard deviation of 12.31. The highest BI recorded was 92, on 22nd July 2016, while the lowest was 0. Autocorrelation analysis was performed on a set of 78 variables (Additional file 1: Fig S1.). The analysis reveals a significant positive correlation among the mean temperature, daily minimum temperature, and daily maximum temperature. Conversely, there is a marked negative correlation between the mean pressure and mean temperature, daily minimum temperature, and daily maximum temperature separately.
Using the Decision tree method, 20 viables were identified in which there is a high weight of unused containers, stagnant water, and cistern water jars, tanks, and basins. Using the Random Forest method, 18 viables were identified in which there is a high weight of unused containers, stagnant water, and cistern water jars, tanks, and basins in Fig 2. The findings from assessing autocorrelation in the random forest screening variables, decision tree screening variables, and the combined random forest and decision tree screening variables have been illustrated in (Additional file 1: Fig S2.). The results indicate that there is an insignificant correlation between the variables, except for the barometric pressure and air temperature variables.
Table 1. Basic Information of BI and Meteorological Data
Variable
|
Mean
|
Std. dev.
|
Min
|
Max
|
BI
|
9.27
|
12.31
|
0.00
|
92.00
|
tem low (℃)
|
21.05
|
5.08
|
6.20
|
28.20
|
tem mean (℃)
|
23.58
|
5.17
|
8.30
|
31.10
|
tem high (℃)
|
27.70
|
5.55
|
11.70
|
36.80
|
sunshine hour (h)
|
6.01
|
3.98
|
0.00
|
14.80
|
precipitation (mm)
|
2.34
|
8.16
|
0.00
|
97.50
|
humidity (%)
|
76.45
|
12.85
|
27.00
|
99.00
|
pressure (hPa)
|
996.47
|
6.05
|
980.80
|
1013.60
|
wind (m/s)
|
2.77
|
0.99
|
0.90
|
7.20
|
3.2 Analysis of meteorological and biotope factors influencing the BI
As some meteorological conditions were missing from the variables screened by the random forest, insignificant meteorological factors may have been produced. To consider a more comprehensive set of influencing factors, a regression was carried out using random forests and decision trees to screen variables in parallel sets. This was done to investigate the influence of selected variables on the BI, with the results illustrated in Table 2.
Table 2. Analysis of the impact of meteorological and biotope factors on the Breteau Index
Variable
|
OLS
|
Stepwise OLS
|
GAM
|
Variable
|
OLS
|
Stepwise OLS
|
GAM
|
containers
|
15.760***
|
16.250***
|
3.881***
|
sunshine_hour
|
-0.025
|
|
-0.008
|
-8.530
|
-9.070
|
-5.290
|
(-0.230)
|
|
(-0.270)
|
tanks
|
20.900***
|
21.170***
|
3.045**
|
precipitation3
|
0.053
|
|
-0.013
|
-8.740
|
-9.170
|
-3.040
|
-1.000
|
|
(-0.900)
|
bonsai
|
6.685**
|
6.703**
|
2.238**
|
sunshine_hour2
|
0.251
|
0.251*
|
0.043
|
-3.12
|
-3.2
|
-3.11
|
-1.760
|
-2.540
|
-1.120
|
pressure7 (hPa)
|
-0.062
|
|
-0.058
|
tem mean 3
|
0.374
|
|
0.051
|
(-0.430)
|
|
(-1.430)
|
-0.840
|
|
-0.450
|
precipitation7
|
0.103*
|
0.128***
|
0.002
|
pressure3
|
0.300
|
|
-0.129*
|
-2.480
|
-3.570
|
-0.180
|
-1.310
|
|
(-2.120)
|
other
|
3.653
|
|
-0.904
|
humidity1
|
-0.020
|
|
-0.021
|
-0.430
|
|
(-0.330)
|
(-0.380)
|
|
(-1.550)
|
wind3
|
-0.378
|
-0.495
|
0.301*
|
tem high3
|
-0.083
|
|
-0.034
|
(-0.810)
|
(-1.320)
|
-2.350
|
(-0.250)
|
|
(-0.390)
|
wind5
|
1.071*
|
0.901*
|
-0.005
|
precipitation1
|
0.020
|
|
-0.013
|
-2.200
|
-2.200
|
(-0.040)
|
-0.610
|
|
(-1.570)
|
tires
|
30.020**
|
31.530***
|
4.669
|
wind2
|
-0.165
|
|
0.357**
|
-3.110
|
-3.370
|
-1.370
|
(-0.320)
|
|
(-2.630)
|
wind
|
-0.004
|
|
-0.002
|
humidity5
|
0.023
|
|
0.006
|
(-0.010)
|
|
(-0.020)
|
-0.570
|
|
(-0.580)
|
sunshine_hour4
|
-0.145
|
-0.174
|
0.036
|
tem high2
|
0.005
|
|
-0.078
|
(-1.210)
|
(-1.750)
|
-1.180
|
-0.020
|
|
(-1.340)
|
wind1
|
0.290
|
|
0.209
|
pressure4
|
-0.334
|
-0.204*
|
0.117*
|
-0.570
|
|
-1.630
|
(-1.550)
|
(-2.240)
|
-2.030
|
wind6
|
0.265
|
|
-0.017
|
tem_low7
|
0.455
|
0.706***
|
0.183**
|
-0.550
|
|
(-0.130)
|
-1.700
|
-3.470
|
-2.700
|
wind7
|
0.506
|
0.600
|
0.249*
|
tem high7
|
-0.353
|
-0.393*
|
-0.077
|
-1.060
|
-1.640
|
-1.990
|
(-1.680)
|
(-2.120)
|
(-1.420)
|
humidity7
|
0.022
|
|
-0.006
|
humidity
|
0.0342
|
|
0.0250*
|
-0.530
|
|
(-0.520)
|
-0.680
|
|
-2.040
|
constant
|
82.870
|
199.000*
|
69.770
|
AIC
|
5278.4
|
5251.2
|
731.4
|
-0.560
|
-2.150
|
-1.640
|
BIC
|
5420.0
|
5315.1
|
872.9
|
N
|
710
|
710
|
710
|
R2
|
0.402
|
0.396
|
|
*P<0.05, **P<0.01, ***P<0.001
|
Several key biotope elements exert influence on the BI, including used tires with stagnant water; unused containers with stagnant water; cistern water jars, tanks, and basins with stagnant water; other water bodies with stagnant water; and bonsai aquatic plants with stagnant water. These habitat indicators are all habitats directly related to the breeding of Aedes aegypti larvae. Meanwhile, these meteorological factors influence the BI: the 20-20 hour precipitation lags before seven days; mean wind speed lags before three, five, and seven days; sunshine hour lags before two days; mean pressure lags before three and four days; mean humidity; maximum temperature lags before seven days; and mean temperature lags before seven days. In a nutshell, the most significant impacts stem from the following factors: used tires with stagnant water; unused containers with stagnant water; cistern water jars, tanks, and basins with stagnant water; mean temperature over seven days; and 20-20 hour precipitation lags before seven days.
3.3 Three Predictive models and evaluation
The study established a binary classification model to predict the BI rank. The Deep Neural Network model was found to be the most effective, with an area under the curve (AUC) of 0.96 (95% CI: 0.91-1.00). The random forest model had an AUC of 0.85 (95% CI: 0.77-0.93), while the support vector machine model had an AUC of 0.81 (95% CI: 0.72-0.90). Additionally, the logistics model had an AUC of 0.82 (95% CI: 0.73-0.91), but the decision tree model yielded poor results with an AUC of 0.58 (95% CI: 0.47-0.68) in Fig 3. Consequently, at a significance level of 0.05, no discernible difference was observed between the performance of the Deep Neural Network model and the decision tree model. Notably, distinctions in performance were detected between the support vector machine model, the logistic regression model, and the decision tree model. It is noteworthy that the Deep Neural Network model outperformed all other models in terms of performance.
The binary rank prediction models for the Breteau Index were all found to be effective. Out of these, the neural network model and random forest showed the best prediction performance, with a precious rate of 0.865 and 0.853, respectively. The support vector machine model, on the other hand, performed slightly worse, with a precious rate of 0.675. The BI multiclassification rank prediction model remains effective, showing little variance among the methods. Among them, the neural network model outperforms the support vector machine model with a precious rate of 0.532 and 0.302, respectively. As for numerical prediction models for the BI, the study employed a Deep Neural Network methodology to construct a BI forecasting model (R2=0.6797), which is highly proficient and suitable for quantitative predictions of the BI in Fig 4.
Table 3. Comprehensive quality evaluation of the Breteau Index binary prediction model
Method
|
Precision rate
|
Recall rate
|
F1score
|
Random Forest
|
0.853
|
0.850
|
0.807
|
Decision Tree
|
0.741
|
0.801
|
0.767
|
Support Vector Machine
|
0.721
|
0.849
|
0.780
|
Logistics model
|
0.772
|
0.817
|
0.778
|
Deep Neural Network
|
0.865
|
0.869
|
0.867
|
Table 4. Comprehensive quality evaluation of BI multiclassification prediction models
method
|
Precision rate
|
Recall rate
|
F1 score
|
Random Forest
|
0.500
|
0.577
|
0.481
|
Decision Tree
|
0.513
|
0.498
|
0.505
|
Support Vector Machine
|
0.302
|
0.549
|
0.390
|
Logistics model
|
0.491
|
0.549
|
0.469
|
Deep Neural Network
|
0.532
|
0.535
|
0.529
|
3.4 Extrapolation tests from other cities in Fujian Province
In addition to Xiamen, we gather pertinent data from various sources in other cities within Fujian Province to assess the generalizability of the aforementioned model. Upon examining the extrapolation outcomes for these cities, the Deep Neural Network model and the random forest model emerged as the most effective, exhibiting respective AUC values of 0.73 (95% CI: 0.7-1.00) and 0.78 (95% CI: 0.76-0.81). Statistically, these values did not indicate a significant difference. Conversely, the support vector machine model achieved an AUC of 0.73 (95% CI: 0.70-0.75), while the logistic model yielded an AUC of 0.70 (95% CI: 0.67-0.73). Regrettably, the decision tree model displayed suboptimal results with an AUC of 0.59 (95% CI: 0.56-0.62), as illustrated in Fig 5. Therefore, at a significance level of 0.05, the Deep Neural Network model and the random forest model significantly outperform the decision tree model and logistic regression model. Furthermore, the random forest model demonstrates a significant superiority over the support vector machine model, while any other model surpasses the decision tree model.
The binary extrapolation prediction models for the Breteau Index were all found to be effective. Out of these, the neural network model and the Logistic model showed the best prediction performance, with a precious rate of 0.866 and 0.855, respectively. The support vector machine model, on the other hand, performed slightly worse, with a precious rate of 0.787 in Table 5. Meanwhile, the BI multiclassification extrapolation prediction model continues to demonstrate effectiveness. In comparison, the precision rates for the random forest and logistic models are 0.532 and 0.521, respectively, indicating their relatively superior performance. Surprisingly, the Deep Neural Network model does not exhibit optimal performance in this context, potentially attributed to differences among datasets, as highlighted in Table 6. What’s more, the binary rank prediction model (R2=0.26) exhibits significant deviations from the performance on the original dataset. The numerical prediction model requires further refinement, as indicated in Fig 6.
Table 5. Comprehensive quality evaluation of the Breteau Index binary extrapolation prediction model
Method
|
Precious rate
|
Recall rate
|
F1score
|
Random Forest
|
0.837
|
0.886
|
0.840
|
Decision Tree
|
0.821
|
0.821
|
0.821
|
Support Vector Machine
|
0.787
|
0.887
|
0.834
|
Logistics model
|
0.855
|
0.857
|
0.856
|
Deep Neural Network
|
0.866
|
0.889
|
0.872
|
Table 6. Comprehensive quality evaluation of BI multiclassification extrapolation prediction models
method
|
Precious rate
|
Recall rate
|
F1 score
|
Random Forest
|
0.532
|
0.519
|
0.525
|
Decision Tree
|
0.528
|
0.613
|
0.518
|
Support Vector Machine
|
0.369
|
0.608
|
0.459
|
Logistics model
|
0.521
|
0.600
|
0.545
|
Deep Neural Network
|
0.445
|
0.536
|
0.454
|