Merow et al. (2013) identified some key decisions about input data and settings that critically influence models fitted by MaxEnt. The first key decision is the choice of background sample that can alter the features selected by MaxEnt and the consequent variability in predictions emerging from it (Elith et al. 2010, 2011). In the present study, model performance varied with the variable number of background points (Table 2). Five thousand background points yielded better performance in terms of AUC, low training and test omission.
Table 2
Table showing comparative model performance with a variable number of background points (at a regularization multiplier of 0.5).
Sl No. | Number of background points | Regularization multiplier = 0.5 |
AUC | Minimum training presence training omission | Minimum training presence test omission |
1 | 1000 | 0.915 | 0 | 0.15 |
2 | 2000 | 0.891 | 0 | 0.30 |
3 | 3000 | 0.906 | 0 | 0.20 |
4 | 4000 | 0.900 | 0 | 0.25 |
5 | 5000 | 0.918 | 0 | 0.15 |
6 | 6000 | 0.915 | 0 | 0.20 |
7 | 7000 | 0.911 | 0 | 0.25 |
8 | 8000 | 0.888 | 0 | 0.45 |
9 | 8540 (all points) | 0.901 | 0 | 0.25 |
The second key decision in MaxEnt is model regularization. MaxEnt selects individual features that contribute most to model fit using regularization (Philips et al. 2006). It reduces model overfitting by ensuring that the empirical constraints are not fit too precisely and by penalizing the model in proportion to the magnitude of the coefficients, thereby shrinking many coefficients toward zero and removing many features from the model (Tibshirani 1996). In the present study, the regularization multiplier was varied from 0.1 to 3.0, and the model performance in terms of AUC, training omission and test omission was observed (Table 3). A regularization multiplier value of 0.1 gave a close fit model with the highest AUC (0.951) but with high test omission (0.3). The model performance (AUC) decreased with an increase in the regularization multiplier from 0.1 to 0.3. A regularization parameter of 0.5 yielded a high AUC with low training and test omission.
Table 3
Table showing the effect of the variable regularization multiplier on model performance (number of background points − 5000, number of predictor variables − 5).
Sl No. | Regularization multiplier | AUC | Minimum training presence training omission | Minimum training presence test omission |
1 | 0.1 | 0.951 | 0 | 0.3 |
2 | 0.5 | 0.935 | 0 | 0.05 |
3 | 1.0 | 0.934 | 0 | 0.05 |
4 | 1.5 | 0.934 | 0 | 0.05 |
5 | 2.0 | 0.928 | 0 | 0.05 |
6 | 2.5 | 0.926 | 0 | 0.05 |
7 | 3.0 | 0.924 | 0 | 0.05 |
The third key decision in MaxEnt is the selection of predictors and features for model building. There are two schools of thought on the selection of predictors for model building. The first school of thought is based on machine learning that suggests including all reasonable predictors in the model and letting the algorithm decide which ones are important. The alternate school of thought recommends removing highly correlated predictors using correlation analysis, clustering algorithms, principal components analysis or some other dimension reduction method to reduce high correlation among features created by MaxEnt (Merow et al. 2013). In the present study, pairwise correlation analysis (Table 1) identified 13 highly correlated variables (with r > 0.75) that were eliminated from model building. The contributions of the 13 predictor variables included in the model and their permutation importance are presented in Table 4. The predictor variables were further narrowed down empirically to five based on model performance, permutation importance and jackknife test importance of each variable in the model building (Table 5).
Table 4
Contribution (%) and permutation importance (%) of predictor variables in model building (regularization multiplier 0.5; background points 5000).
Code | Environmental Variable | Unit | Percent Contribution | Permutation importance |
Bio1 | Annual Mean Temperature | ºC | 11.8 | 62.6 |
Bio3 | Isothermality | - | 2.0 | 0.2 |
Bio4 | Temperature seasonality(Standard Deviation) | ºC | 2.6 | 4.4 |
Bio7 | Temperature Annual Range | ºC | 0.4 | 0.5 |
Bio12 | Annual Precipitation | mm | 9.6 | 3.1 |
Bio14 | Precipitation of Driest Month | mm | 0 | 0.2 |
Bio15 | Precipitation Seasonality | cv | 0.4 | 0 |
Bio18 | Precipitation of Warmest Quarter | mm | 2.4 | 3.8 |
Bio19 | Precipitation of Coldest Quarter | mm | 1.0 | 1.1 |
Ele | Elevation | m | 57.4 | 7.9 |
BioR | Biological Richness | | 1.3 | 3.2 |
Dist | Disturbance Index | | 10.8 | 12.9 |
Veg | Vegetation | | 0.3 | 0.2 |
Table 5
Table showing comparative model performance with a variable number of predictor variables (regularization multiplier 0.5; number of background points 5000).
Sl No. | Number of variables | Regularization multiplier = 0.5 |
AUC | Minimum training presence training omission | Minimum training presence test omission |
1 | 13 | 0.918 | 0 | 0.15 |
2 | 7 | 0.931 | 0 | 0.10 |
3 | 6 | 0.935 | 0 | 0.05 |
4 | 5 | 0.935 | 0 | 0.05 |
5 | 4 | 0.930 | 0 | 0.05 |
The occurrence of S. alternifolium in the southern parts of Eastern Ghats can be attributed to a range of environmental factors, but five environmental factors contributed the most to the model building, viz., annual mean temperature, annual precipitation, elevation, disturbance index and temperature seasonality (Table 4). In the jackknife test (Fig. 1), the model achieved the highest gain when the variable Annual Mean Temperature was used in isolation, which therefore appears to have the most useful information by itself. The annual mean temperature at species presence locations ranged from 23.6 to 26.6°C, with a mean of 25.2°C (Table 6). Although the frequency of species occurrence was maximum in the 24–26°C range (Fig. 2), the predicted habitat suitability for S. alternifolium declined sharply beyond 24.0°C (Fig. 3). Likewise, the predictor variable Annual Precipitation decreased the gain the most when omitted from model building (Fig. 1), which appears to have the most information that is not present in other variables. The annual precipitation at species presence locations ranged from 619 to 829 mm, with a mean of 739.8 mm (Table 6). Although the frequency of species occurrence was highest in the 800–900 mm annual precipitation range (Fig. 4), the predicted habitat suitability declined sharply beyond 800 mm annual precipitation (Fig. 5). Furthermore, the elevation at species presence locations ranged from 452 to 926 m above msl, with a mean elevation of 668.9 m (Table 6). Although the frequency of species occurrence was highest in the 600–700 m elevation range (Fig. 6), indicating the species’ preference for mid elevations, the predicted habitat suitability was high in the 450–650 m elevation range (Fig. 7). The disturbance index is yet another variable that contributed highly to the model building. The disturbance index varied from low (11–18) to high (25–28) (Table 6) at the species presence locations with maximum locations under the high disturbance regime (Fig. 8). The predicted habitat suitability declined sharply beyond the disturbance index of 28 (Fig. 9). Interestingly, the predictor variable temperature seasonality calculated based on the standard deviation of monthly temperature averages showed less deviation (Table 6) at the sample locations, indicating low temperature variability over the year. The predicted habitat suitability was high for areas with a temperature standard deviation between 3.0-3.4°C (Fig. 10).
Table 6
Summary statistics at the species occurrence points for the top five predictor variables used in model building.
Sl. No. | Predictor variable | Minimum | Maximum | Mean | Mean ± SD |
1 | Annual mean temperature (ºC) | 23.6 | 26.6 | 25.2 | 25.2 ± 0.949 |
2 | Annual precipitation (mm) | 619 | 829 | 739.8 | 739.8 ± 75.167 |
3 | Elevation (m) | 452 | 926 | 668.9 | 668.9 ± 138.8 |
4 | Disturbance index | 12 | 28 | 22.5 | 22.5 ± 5.3 |
5 | Temperature seasonality (Standard deviation) (ºC) | 2.9 | 3.2 | 3.0 | 3.2 ± 0.079 |
The potential habitat suitability map generated by the MaxEnt model for Syzygium alternifolium (Fig. 11) overlapped the forested Seshachalam, Veligonda and Lankamalla hill ranges of Eastern Ghats. The model predicted 95% of the study area to be potentially least suitable for S. alternifolium, and approximately 5% of the study area (approximately 21.0 square km) is predicted to be moderate to highly potentially suitable for the species. The habitats identified in Seshachalam hills are potentially more suitable than those in the Veligonda and Lankamalla hill ranges. In Seshachalam hills, the species distribution is observed in Balapally and Chamala forest ranges that further extend north-westwards into Kodur, Rajampet, Sanipaya and Rayachoti forest ranges. Areas under the Balapally and Chamala forest ranges of the Tirupati forest division are potentially highly suitable for S. alternifolium and presumably hold the core population of the species. In contrast, the potential habitats identified in the Veligonda hills, comprising the Chitvel, Venkatagiri, Rapur, Badvel and Atmakur forest ranges, form a narrow strip along the hill tops of these ranges. Furthermore, the potentially suitable habitats identified in Lankamalla hills include parts of the Sidhout, Badvel and Produttur forest ranges. The MaxEnt model identified two novel habitats, viz., the first patch comprising parts of Rayachoti and Vempally forest ranges and the second patch comprising parts of Porumamilla, Rudravaram and Giddalur forest ranges, where the occurrence of species has not been reported and habitat conditions are predicted to be potentially suitable for S. alternifolium.