## Exploratory Factor analysis (EFA)

Two latent factors were identified, factor 1 (related to Average mean relative humidity, average maximum relative humidity, and Average maximal temperature) and factor 2 (related to Average minimal temperature). The two identified factors accounted for 67% of the total variance, and at *α* = 5% level of significance, *χ*2 = 18.56, df = 8, P-value = 0.017. This result provides sufficient evidence to explain malaria incidence in the study area.

To determine the number of factors to be retrieved, we explore Guttman–Kaiser (Guttman, 1954)and Cattell scree plots(Kaiser, 1960). The exact number of factors is equal to the number of Eigenvalues of the population correlation matrix (Table 3) that are greater than unity. Using the correlation matrix, we computed the eigenvalues 2.80, 1.12 0.29, 0.19, -0.04, -0.09, and − 0.20; These values revealed that the number of factors that influenced malaria incidence is two. In the screen plot test, the number of eigenvalues that are especially large supposedly corresponds to the number of factors in the analysis(Cattell, 1966).;

Table 3

Correlation matrix of climatic variables and malaria incidence.

Item | Mean | Std.Dev | Average Precipitation | Average minimal temperature | Average maximal temperature | Average mean relative humidity | Average maximal relative humidity | Average Wind Speed | Average Malaria incidence |

Average Precipitation | 113.96 | 71.58 | 1.00 | | | | | | |

Average minimal temperature | 21.76 | 1.40 | -0.01 | 1.00 | | | | | |

Average maximal temperature | 33.53 | 2.25 | -0.14 | 0.16 | 1 | | | | |

Average mean relative humidity | 65.43 | 13.03 | 0.12 | 0.32 | -0.76 | 1 | | | |

Average maximal relative humidity | 83.38 | 11.72 | 0.06 | 0.39 | -0.67 | 0.96 | 1 | | |

Average Wind Speed | 2.03 | 0.37 | -0.18 | 0.36 | -0.01 | 0.04 | 0.05 | 1 | |

Average malaria incidence | 28.22 | 6.10 | -0.04 | -0.18 | -0.48 | 0.42 | 0.43 | -0.2 | 1 |

From the analysis of the matrix table (Table 3), we obtained the scree plot shown in Fig. 4 which represents the relative proportion of variance accounted for by the components. In the scree plot, the eigenvalues of the first two components greater than unity can be seen from the parallel indicator, while the subsequent components below unity also line up beneath the parallel indicator. the scree plot confirmed that the exact number of latent factors is two.

Table 4 displays Pearson’s cross-correlation between ecological variables and the occurrence of malaria incidence at various lag effects from 0 to 3 months. The lag0, Lag2 and lag3(eg: 0Month, 1Month, and 2Months) presented in Table 4 indicate the lagged correlation effects between climate variables and the incidence of malaria in Northern of Benin. At lag effects of 0 month, and 1 month; average precipitation, average mean relative humidity, and average maximum relative humidity have a positive association with the incidence of malaria respectively with (0.05,0.423, 0.431) and (0.015,0.554, 0.576). At lag effects of 2 months, average mean relative humidity, average maximum relative humidity, and wind speed are positively correlated with the incidence of malaria (0.526, 0.524,0.038) respectively.

Table 4

Cross-correlation between climatic variables and malaria incidence

Variables | Lag0 | Lag1 | Lag2 |

Average Precipitation | 0.05 | 0.015 | -0.073 |

Average minimal temperature | -0.183 | -0.007 | 0.156 |

Average maximal temperature | -0.483 | -0.456 | -0.073 |

Average mean relative humidity | 0.423 | 0.554 | 0.526 |

Average maximal relative humidity | 0.431 | 0.576 | 0.554 |

Average Wind Speed | -0.195 | -0.119 | 0.038 |

These results indicate that the meteorological variables at a lag of 1 month and 2 months would be good for the reproduction of mosquitoes and also the completion of their incubation periods (EIP) to be able to transmit malaria vectors to humans. Particularly, in this study area, at a lag effect of 2 months, the transmission of malaria will be very high and the non-infected neighbor areas(districts) can be infected because of the positive association of wind speed with malaria incidence. At lag effects of 1-month precipitation, and relative humidity are quite enough for the development of breeding sites for mosquitoes and their development to infect a human. On the other hand, average minimum temperature, average maximum relative humidity, average mean relative humidity, and wind speed are very good for the development cycle of mosquitoes and the transmission of the disease at a high scale.

## Confirmatory factor Analysis

To confirm the number of factors to extract from the meteorological variables (CFA) was employed. Seven observed indicators were analyzed. In the final model, four observed indicators were retained through two latent factors. After the correction of the initial model (Fig. 2), the fit indices have been significantly improved compared to these initial models, standardized residual distribution was also smaller than the previous models. To validate the CFA model, the recommended cutoff values for TLI and CFI are 0.90, and that for RMSEA is 0.06, CFI *>* 0.90 or RMSEA *<* 0.06 implies a good model (Browne, 1993). Fitting process indicators showed that fitting indices like Standardized Root Mean Square Residual (SRMR), Robust Comparative Fit Index (CFI), Robust Tucker-Lewis Index (TLI), and approximate root mean square error (RMSEA) were in the acceptable range and this indicates that the model fits very well the data(TLI = 1, CFI = 1, SRMR = 0.008 and RMSEA = 0). All the latent variables were significant and are above 50% of loading.

## Structural equation Model (SEM)

The Henze-Zirkler test revealed the non-normality of the data set (P value = 0). The means, standard deviations, and bi-variate correlations for all variables included in the analysis are shown in the correlation matrix table (3). The analysis of the correlation matrix table revealed the existence of multicollinearity among variables. The magnitude of the relationships demonstrates that many of the predictors are highly correlated, especially among the meteorological indicators. This is one of the necessities of the use of the SEM analytical technique given its superior handling of inter-correlated independent variables through the creation of latent constructs and direct and indirect pathways that circumvents the tendency to bias coefficient estimates(Bollen, 1989).

Figure 3 represents the graphical representation of the model analyzed. Before examining the relationships displayed in the model, it is crucial that recommended cutoff values are confident in an acceptable range. Inspection of the model presented in Fig. 3 reveals that the chi-square (348.113 df = 10, not significant), RMSEA (0.000), TLI (1), CFI (1), and SRMR(0.008). These values show that the model is fitted well the data and can be used to determine the effects of ecological variables on the incidence of malaria.

## Effects of climatic variables on the incidence of malaria

The analysis of the model Fig. 3 reveals that the direct effect of factor 1 is 0.84 and factor 2 is -0.58. The direct effect of average maximal temperature, average maximal relative humidity average mean relative humidity, and average minimum temperature on the incidence of malaria are respectively − 0.86, 0.78, and 0.71 and 1. The indirect effects of average mean relative humidity, average maximal temperature and factor 2 are indicated as 1.59, 1, and 0.39 respectively.

0.78, 2.30, 0.14, 1, 0.84, and − 0.19 are the total respective effects of average maximal relative humidity, average mean relative humidity, average maximal temperature, average minimal temperature, factor1(F1), and factor2(F2). Among the two factors identified by EFA, factor1 has the highest direct effect and the highest total effect on the incidence of malaria in the study area. We conclude that Factor 1 is the most influential hidden climatic factor in the incidence of malaria. So average Factor1 can be used to model and predict the incidence of malaria in the Northern part of Benin.

## Intelligent Malaria Outbreak Warning Model

The following step is to identify the algorithm which will have the best accuracy the predict the incidence of malaria in the study areas. From the previous section, factor 1 indicated by average maximal relative humidity, average mean relative humidity, and average maximal temperature was identified most influential hidden climatic factor in the incidence of malaria.

To develop the malaria outbreak warning model, we applied three different machine learning algorithms, including Support Vector Machine (SVM), Linear Regression (LiR), and Negative Binominal Regression (BiR) Model. After training and test of the different algorithms (Fig. 5), we assess the performance of each model in other to identify the best algorithm which has good accuracy to predict the incidence of malaria in the Northern part of Benin (Table 5)

Table 5

Assessment of the Model Performance

Item | MAE | MSE | RMSE | Predicted R2 |

LiR | 2.50 | 11.07 | 3.33 | 66% |

SVM | 1.66 | 5.89 | 2.43 | 82% |

NBiR | 22.11 | 504.84 | 47.22 | 66% |

The Table5 shows the MAE, MSE, and RMSE for SVM, LiRM, and BiRM Models. The analysis of these errors permits us to conclude that SVM Model provides the optimized solution for predicting the other two models. We conclude that Support Vector Machine (SVM) has the best performance to predict the incidence of malaria in Northern Benin.

## Prediction of Malaria incidence under scenarios RCP4.5 and RCP8.5

The RCA4 regional climate model used in this study was developed at the Swedish Meteorological and Hydrological Institute and has provided nearly 120 simulations in the CORDEX project (Coordinated Regional Climate Downscaling Experiment). The model considers the physical, chemical, and biological processes by which ecosystems affect climate through various spatial and temporal scales. Projected rainfall and temperature and relative humidity were retrieved from the coordinated regional downscaling experiment (CORDEX) simulations of the Rossby Centre Regional Atmospheric regional climate model (RCA4). The CORDEX-Africa data used in this work were obtained from the Earth System Grid Federation server (https://esgf-data.dkrz.de/search/cordex-dkrz/) driven by the RCA4 model. We have predicted the incidence of malaria with the Intelligent Malaria outbreak model we built (SVM model) by using the downloaded CORDEX data under two representative concentration pathway (RCP) scenarios (RCP4.5 and RCP8.5) in the Northern part of Benin with the RCA4-downscaled driving by the regional climate models RCA4/HadGEM, RCA4/CSIRO, and RCA4/MIROC over the period 2021–2030, 2031–2041 and 2041–2050.

Regional climate models RCA4/HadGEM2, RCA4/CSIRO, and RCA4/MIROC over the period 2021–2030, 2031–2040 and 2041–2050 are associated with an increase in malaria incidence under the scenario RCP4.5(Fig. 6). RCA4/HadGEM2 is associated with a decrease in the incidence of malaria over the period 2041–2050 under the same scenario.

The incidence of malaria will decrease under the scenario RCP8.5 driven by RCA4/CSIRO and RCA4/MIROC over the period 2021–2030(Figs. 6). An increase in malaria incidence is associated with RCA4/HadGEM2 over the same period under RCP8.5 (Fig. 6).

Overall, the findings suggest that malaria incidence will increase over 2021–2050 under scenarios RCP4.5 and RCP8.5 except for the period 2021–2030 where the incidence of malaria will decrease under RCP8.5 due to climate change. With regard to the findings of this study, the Northern part of Benin is at high risk of malaria.