3.1. Effective factors collinearity test
In general, the multi-collinearity test of parameters is crucial for a habitat suitability map74. Multi-collinearity denotes the existence of a linear connection between parameters. In this study, the “tolerance (TOL)” and “variance inflation factor (VIF)” indices are employed to check for multi-collinearity when the values of TOL and VIF are 0.1 and 5 or 10, suggesting multi-collinearity between independent variables, respectively75,76. Table 1 displays the findings of the multi-collinearity analysis performed on the 13 habitat suitability parameters employed in this study. This research used 13 variables, including elevation, slope, aspect, plan curvature, silt%, clay%, sand%, distance from rivers, distance from roads, EC, pH, mean annual rainfall, and mean annual temperature. As a consequence, three models could utilize these factors to create the final habitat suitability map.
Table 1
Collinearity Test of Effective Factors
Factors | Tolerance | VIF |
Elevation (m) | 0.23 | 4.35 |
Distance from rivers (m) | 0.78 | 1.39 |
Distance from roads (m) | 0.37 | 2.68 |
% Sand | 0.25 | 4.97 |
% Silt | 0.28 | 4.96 |
Slope degree | 0.80 | 1.25 |
Mean annual temperature (oC) | 0.20 | 4.98 |
% Clay | 0.19 | 4.99 |
Plan curvature (100/m) | 0.93 | 1.07 |
EC (dS/m ) | 0.72 | 1.38 |
pH | 0.80 | 1.24 |
Mean annual rainfall (mm) | 0.45 | 2.22 |
Aspect | 0.95 | 1.05 |
3.2. Algorithms For Determining Habitat Suitability
3.2.1. Aceria alhagi
Three models—RF, BRT, and SVM—were utilized to create the Aceria alhagi habitat's suitability map in the province of Fars (Fig. 5). Figure 5A shows the outcomes of RF algorithm modeling based on Aceria alhagi presence and non-presence locations and 13 influencing factors. The output of this model demonstrates that Aceria alhagi prefers the northern portion of Fars Province which is characterized by mean annual temperature of 3.82–7.70°C and, elevation of 1740–3491m. In other words, the region with the best habitat for Aceria alhagi also has the coldest climate. The RF model has been assigned the classifications of low (30.40%), moderate (30.09%), high (21.86%), and very high (17.06%) as the final output of the models is separated into four groups (Fig. 6A).
The BRT model is another algorithm that was applied to assess the habitat suitability of Aceria alhagi. This model's final outputs are essentially identical to RF (Fig. 5B). According to the BRT model, Aceria alhagi's habitat is not as suitable in the southern part. In fact, there are a few remote areas in the southwest with ideal habitats. According to the categorization used to assess the habitat suitability of Aceria alhagi, the low, moderate, high, and very high classes received values of 35.63%, 30.02%, 21.28%, and 13.07%, respectively (Fig. 6B).
The third model that was utilized in this study to assess the compatibility of the Aceria alhagi habitat was the SVM model. Figure 5C displays the SVM model's findings for four classes. The result of this model demonstrates that the low, medium, high, and low classes are essentially identical across the various areas of Fars Province. As a result, habitat suitability is present in the low 36.63%, moderate 24.65%, high 17.94%, and very high 20.78% classifications (Fig. 6C).
3.2.2. Alhagi maurorum
Three models of RF, BRT, and SVM were also employed to calculate the attractiveness maps of the Alhagi maurorum habitat. Figure 5 shows the outcomes of the models. The RF algorithm's findings revealed that the southeast, southwest, northeast, and northwest parts of Fars Province are the species' preferred habitats (Fig. 5D). Additionally, the RF model's final map was separated into four classes: low 25.19%, moderate 33.58%, high 28.70%, and very high 14.56%. (Fig. 6D).
In order to more accurately assess the findings, three comparable models for Alhagi maurorum were utilized in this work. The BRT model was utilized for this purpose as well, to determine if the habitat of Alhagi maurorum was suitable, and its findings were given in four groups (Fig. 5E). The outcomes of the BRT model used to identify the places in Fars Province that have a high habitat suitability for Alhagi maurorum are essentially identical to those of the RF model, with the exception of a portion of the northern areas. Additionally, the BRT algorithm's classification result revealed that the low 30.11%, moderate 27.46%, high 27.87%, and very high 14.56% classes have habitat suitability (Fig. 6E).
In Fig. 5F, the SVM model's findings are presented. It is evident that the Alhagi maurorum habitat suitability areas in the Fars Province based on the SVM model are comparable to the BRT model, with the exception being that the northeastern areas do not exhibit high habitat suitability. This model's output was similarly split into four groups, with low, moderate, high, and very high classes being given the values 24.05%, 27.90%, 26.43%, and 21.62%, respectively (Fig. 6F).
Researchers have recently turned to habitat suitability modeling as a reliable and practical method for managing the habitats of various pests, insects, and plant species77. Researchers compared the effectiveness of seven data analysis strategies for forecasting the spread of China berry (Melia azedarach L.) in a study using three standard metrics for evaluating model accuracy. The RF model offered the maximum degree of accuracy in creating a climate niche model because of its considerable durability and stability. According to the RF forecast findings, M. azedarach would profit from future changes in climate by expanding its range, which has a propensity to move north and west of where?78. In a different investigation, it was proven that RF and BRT performed better than decision trees, MaxLike, and Lasso overall79. Variable significance and complicated variations in reaction to the resolution play a key role in how well models work (REF). Wunderlich et al.79 encourage researchers to regularly investigate a variety of algorithms, parameters, and frequencies because RF and BRT are strongly advised but could necessitate bias correction techniques.
3.3. General Discussion
Except for Bioclim, all "machine learning and regression" models produced accurate predictions, according to the present results. "Random Forest (RF)" outperformed the other investigated models with 99% AUC and 93% TSS, followed in decreasing order by "Boosted Regression Trees (BRT)", "ensemble," "Generalized Additive Model (GAM)," "Support Vector Machine (SVM)," and "Generalized Linear Model (GLM)"76. Our findings also showed that RF and BRT models are better able to simulate the dispersal of the Aceria alhagi and Alhagi maurorum species. Additionally, for remote sensing-based intrusive SDM, the application of machine learning techniques like the RF and BRT algorithms is absolutely crucial. Similarly, it had been shown in other studies that BRT, Maxent, MLP, RF, and SVM showed excellent performance, with RF being the best at predicting the distribution of Bombus formosellus80.
Prosopis juliflora is anticipated to spread to more regions in Ethiopia, according to Sintayehu et al.81 who used a variety of algorithms including RF, BRT, SVM, and GLM. They stressed that P. juliflora is expected to spread rapidly to numerous drylands in Ethiopia, including major areas in "Afar", "Oromia", "Southern", "Dire Dawa", "Somalia", "Amhara", "Tigray", and "Gambella". This will reduce agricultural output and pose a danger to the region's biodiversity. The invasive species' ongoing range expansion has already had a negative impact on ecosystem services, the economy, and biodiversity. Many pastoralists across the world, in particular, rely on natural resources and other natural ecosystems for their livelihood to survive82. We need coordinated and extensive actions due to the existing situation and probable future increases in the range and abundance of invasive species worldwide. The study's findings will also aid in the early discovery and control of invasive species in prospective habitat-friendly niches. Based on our research, we recommend cooperation amongst various stakeholders, research institutes and authorities for early detection and eradication efforts at the national level to create and apply comprehensive biological management of Alhagi maurorum by Aceria alhagi that would minimize the adverse effects by reducing camelthorn’ size and seed production17. Although Alhagi maurorum is native to Iran, the research findings are extremely helpful for areas where this species is invasive. According to the findings of another study, the RF model outperformed other methods, and it is useful for mapping the proportionate covering of species distribution in agro climatic settings like those of the Afar Region (The Afar Region, previously designated as Region 2, is the home of the Afar people and a local state in northeastern Ethiopia). The GLM, the GBM-BRT, and the DNN performed poorly when considering specificity, precision, kappa, and the AUC, although the GBM and the SVM only slightly less accurately predicted outcomes83. However, if a substantially greater quantity of data (i.e., the response variable) is utilized, if there is a lack of training data, or if the research is carried out in a different agroecological environment, MLTs' performances may change (REF).
The results of research by Mudereri et al. (2020)84 show that RF, CART, SVM, BRT, GLM, and FDA have been used to predict the likelihood of Striga (Striga asiatica) incidence in Zimbabwe using multi-source bio - climatic and remotely sensed data. It has been determined that RF, CART, SVM, and the wide range of communication processes yield the most accurate Striga incidence prediction results in Zimbabwe. Additionally, several SVM kernels were utilized to generate GPMs with satisfactory performance. Their performance, however, lags below RF performance. In order to create the habitat suitability model, Pourghasemi and Rahmati (2018)85 used a variety of models, including the "generalized linear model (GLM)", "generalized additive model (GAM)", "classification and regression trees (CART)", "boosted regression trees (BRT)", "multivariate adaptive regression spline (MARS)", "random forests (RF)", "support vector machines (SVM)", "artificial neural networks (ANN)", "maximum entropy (Maxent)", "penalized maximum likelihood GLM (GLMNET)", "domain, and radial basis function network (RBF)". Their distribution model identified basins as having the highest likelihood of harboring invasive Fallopia species. The Southern Slovak Basin and the Koice Basin have the greatest potential for the propagation of this species.
3.4. Choosing The Optimal Algorithm
As was mentioned in the preceding section, three algorithms—RF, BRT, and SVM—were applied in this work to predict habitat suitability of Aceria alhagi. Based on ROC-AUC, machine learning algorithms were assessed. The results demonstrate that RF (89%), BRT (81%), and SVM (79%), respectively, were more accurate at predicting the events when the algorithms were applied to create the map of suitable Aceria alhagi habitats (Fig. 7 and Table 2). In other words, the SVM model had good accuracy, whereas the RF and BRT models had very good accuracy.
Table 2
Evaluating algorithms and selecting the best algorithm for Aceria alhagi based on the AUC
Test Result Variable(s) | Area | Std. Errora | Asymptotic Sig.b | Asymptotic 95% Confidence Interval |
Lower Bound | Upper Bound |
BRT | 0.816 | 0.053 | 0.000 | 0.712 | 0.921 |
RF | 0.890 | 0.044 | 0.000 | 0.803 | 0.977 |
SVM | 0.790 | 0.059 | 0.000 | 0.674 | 0.906 |
A habitat suitability map of Alhagi maurorum was also created using machine learning techniques, and the outcomes were quite similar. The ROC curve and area under the curve (AUC) findings show that the RF, BRT, and SVM algorithms have accuracy rates of 89%, 80%, and 73%, respectively (Fig. 7 and Table 3). As a result, the RF and BRT models had very good accuracy, while the SVM model had good accuracy. In general, a key tactic in the process model is the assessment of estimated outcomes86. As a standard procedure, the ROC curve is used to evaluate the accuracy of diagnostic tests87. Area under the curve (AUC) values for the ROC technique range from 0.5 to 1.088. If the constructed model is unable to forecast the existence of species more correctly than probability, the AUC is equal to 0.5. In comparison, the prediction has an AUC value of 1, which is ideal65. When training the habitat suitability models, the AUC value takes the species pixels into account89. Using existing species in the training phase, this approach was utilized to assess the accuracy of habitat suitability maps. However, in order to calculate accuracy in the validation stage, we employed species that weren't used in the training stage90. What is evident is that in recent years, ROC-AUC has been widely employed to assess habitat suitability maps91.
Table 3
Evaluating algorithms and selecting the best algorithm for Alhagi maurorum based on the AUC
Test Result Variable(s) | Area | Std. Errora | Asymptotic Sig.b | Asymptotic 95% Confidence Interval |
Lower Bound | Upper Bound |
BRT | 0.800 | 0.043 | 0.000 | 0.716 | 0.884 |
RF | 0.894 | 0.031 | 0.000 | 0.834 | 0.955 |
SVM | 0.733 | 0.048 | 0.000 | 0.640 | 0.826 |
3.5. Importance Of Factors By Pls
Alhagi maurorum and Aceria alhagi are threshold-dependent processes influenced by a wide array of useful parameters92. Therefore, in order to conduct a habitat suitability evaluation, it is required to determine the parameters that are effective for Aceria alhagi and Alhagi maurorum, as well as their significance among the conditioning factors69. A greater understanding of the impact that each influencing factor has on the overall evaluation of habitat suitability was achieved by developing the PLS approach after training data selection. For example, Fig. 8A and B show the 13 variables for Aceria alhagi and Alhagi maurorum habitat suitability models in the correct order of significance93,94. The findings show that, in that order, roads, slope, clay, and temperature are the most important variables for Aceria alhagi. However, plan, aspect, rain, and elevation, were of the least consequence (Fig. 8A).
The PLS algorithm also looked at the parameters that were important in the Alhagi maurorum habitat suitability modeling process. The findings revealed that the three most important variables were road, slope, and EC. On the other hand, the suitability of the Alhagi maurorum's habitat was not significantly impacted by rain, silt, or aspect, respectively (Fig. 8B).
In the current study, the abundance of Aceria alhagi and Alhagi maurorum was substantially greater close to the roads. The findings of Delgado et al. (2017)95, who discovered early indications of relatively high Aceria alhagi abundance near roads and in the area of road underpasses, are consistent with our study. This outcome was connected to the vegetation around the road and the presence of ticks. Another study established that closer to road borders than farther away, increased tick abundance was seen96. Along remote road edges with little traffic, adult ticks were seen acting aggressively. Ticks may have a better chance of finding hosts if they spend a lot of time on roadside vegetation. Our findings also suggest that roads may contribute to an increase in tick development and transmission. Since roads act as a barrier to stopping tick movement, Hornok et al. (2017)97 show that roads may influence disparities in tick species composition and tick-borne pathogen frequency along their two sides. The slope is a significant factor that determines whether a certain tick habitat is suitable. A study indicated that younger ticks are sparser on lower slopes, while older ticks are more numerous on higher slopes98.
One of the main factors contributing to the degradation of plant ecosystems is human disturbance. The amount of the Alhagi maurorum increased as the distance from the highways shrank in this investigation as well. According to Jahantigh and Pessarakl )2021(99 Alhagi maurorum distribution expanded as the distance from a road decreased. Furthermore, it is crucial to consider how the slope component affects the distribution of Alhagi maurorum. Water runoff and the spread of invasive plant seeds are both a result of the land's slope100. As a result, in this study, it is also determined that Alhagi maurorum is more abundant on low slopes.
3.6. The Perspective Of Hsms And Mlts
In general, it is evident that Aceria alhagi has been shown by Bijani et al. (2021)17 to act as a potential biological control by preventing the growth and development of Alhagi maurorum. The main goal of this research was to find a way to extend the control of Alhagi maurorum such that even the threat of its appearance could be used in areas where it is known to be an invasive species. When it comes to managing invasive plants, habitat suitability models (HSMs) and species distribution models (SDMs) are often utilized nowadays101,102. As a result, a novel approach in this area was to apply habitat suitability modeling.
By identifying the environmental factors limiting a species' distribution, HSMs seek to define the "envelope" that best captures the species' geographic range boundaries103. They are created by connecting the distributions of extant species to their current surroundings. By extrapolating these associations to certain environmental change scenarios, future species' natural geographical ranges are projected104. Measures of climate (such as temperature and rainfall), landscape structure (such as connectivity indices), vegetation heterogeneity (such as ecotone cover), resources (such as insect availability), soil characteristics (such as physical and chemical properties), the topography of an area (such as elevation, slope, aspect, and so on), and biotic information are frequently used as variables for habitat suitability modeling of plants105.
Environmental variables can exert direct or indirect effects on species and are optimally chosen to reflect the three main types of influences on the species: (1) limiting factors, defined as factors controlling species’ eco-physiology (e.g., minimum winter temperature or high summer temperatures) or appearance (e.g., competition and facilitation); (2) disturbances, defined as all types of perturbations affecting environmental systems (e.g., fire frequency); and (3) resources, defined as all materials that can be assimilated by organisms (e.g., availability of seeds or insects). The environmental data related to these three main types of influence depict the environmental niche of the species106. The environmental information pertaining to these three primary categories of effect shows the species' environmental niche103. The ecological niche is often multidimensional, and different aspects may be significant at various geographical scales. In the patterns of habitat utilization, these scale-dependent interactions between niche traits and plant species distributions frequently produce hierarchical structures107.
SDMs are very important, although the field of computer science has paid them very little attention. Although mapping habitat appropriateness using HSMs is our main objective, our other objective with this effort, we hope to do two things: first, provide computer scientists with the knowledge they need to understand the SDM literature and, second, create ML-based SDM algorithms that are beneficial to the environment. These characteristics could be extremely useful in ecology and agriculture, with potential future uses in plant management and conservation. The method may be used, for instance, to model distribution changes brought on by climate change. Additionally, it represents a novel strategy in relation to the many models mentioned in the literature.
Machine learning technology has recently been created, particularly for SDMs108. Numerous studies attest to the remarkable accuracy of algorithmically generated habitat suitability maps109–111. From our perspective, the main issue with the majority of these comparisons is that they only validate model performance (defined as the match up among both predicted and observed species' distributions) against the needs under current conditions, despite the fact that most models are approximately accurate in trying to project distributions under present environmental conditions. However, highly diverse model structures may be the origin of what appear to be minor variations in estimates of present distributions, leading to unsettlingly divergent projections for novel conditions.
The overall conclusion is that Alhagi maurorum can be biologically controlled in both its native and invasive ranges by introducing habitat suitability maps. In fact, the findings of this study add to those of Bijani et al. (2021)17. They found that the Alhagi maurorum was controlled by the Aceria alhagi. Now that we have created maps of habitat suitability, we can extend the reach of this biological control. By annihilating the inflorescences and branches of Alhagi maurorum, Aceria alhagi has the ability to stop its growth. We may now considerably more successfully accomplish our aim of controlling Alhagi maurorum by taking into account the habitat suitability maps of both Alhagi maurorum and Aceria alhagi. We can steer Aceria alhagi in that direction by using maps that show the favorable and vulnerable locations of the Alhagi maurorum habitat. Since Aceria alhagi can control Alhagi maurorum, it is predicted that Alhagi maurorum would be more controlled in regions with greater Aceria alhagi habitat. The Aceria alhagi habitat suitability map also conveys the idea that by taking crucial aspects into account, we may expand the Aceria alhagi range and in order to control Alhagi maurorum. For instance, in this study, roads, slope, clay, and temperature were the most significant elements; thus, the Aceria alhagi may be produced by taking these aspects into account. Furthermore, this approach could be extensively explored in regions where Alhagi maurorum is regarded as an invasive species. In other words, the regions that need to be managed are identified by creating a map of the habitats of Alhagi maurorum and Aceria alhagi.