Species distribution modeling and assessment of environmental drivers responsible for distribution and preferred niche of critically endangered and endemic ornamental freshwater fish species of the genus Sahyadria

India has different bioclimatic zones and supports diverse aquatic habitats rich in biodiversity. For effective conservation of the endangered species in its habitat, it is essential to know the distribution of fish species in the environmental range, and for this, species distribution models are the efficient and innovative tools. The present study used the MaxEnt modeling technique for developing probability distribution models highlighting the distribution of fish species by analyzing the known occurrence records of Denison barb under genus Sahyadria (Sahyadria denisonii Day 1865 and Sahyadria chalakkudiensis (Menon et al., Rec Zool Surv India 97:61–63, 1999)) in relation to environmental variables typically incorporating seasonal and temporal variability. AUC of the models for Sahyadria species depicted good fitness. Both species were found sensitive to “solar radiation,” “temperature seasonality,” and “temperature annual range” and assessed as significant predictors. The sensitivity and distribution of both species to these environmental variables were found correlated with their breading and spawning seasons. “Precipitation” was determined as one of the significant climatic envelopes influencing the distribution of the species associated with river flow. The models showed the distribution of S. denisonii in the higher precipitation areas compared to S. chalakkudiensis. The probability distribution model with respect to the distribution of both species indicates a lineage barrier at Palghat Gap supporting the earlier studies. At the latitudinal scale, prediction of the suitable ecological habitat provides a detailed insight into the distribution of all genetic lineages of the genus Sahyadria. Evidently, the findings of this study can assist in determining ecological niches for endangered species of other areas and may aid in field surveys as well as developing conservation plans.


Introduction
The loss of biodiversity is a major vulnerability affecting the aquatic ecosystem in India due to anthropogenic and climatic changes (Dudgeon et al. 2006;Sarkar et al. 2021). The freshwater fish fauna of the Western Ghats in India is one of the richest and most unique in the tropical world. This region lists 288 freshwater fishes, of which 118 (41%) are endemic to the Western Ghats (Dahanukar et al. 2004;Dayal et al. 2014). Among the listed endemic species, the genus Sahyadria is represented by two species, Sahyadria denisonii (Day 1865) and Sahyadria chalakkudiensis (Menon et al. 1999), and both these species are commercially important ornamental freshwater Responsible Editor: Philippe Garrigues species endemic to the Western Ghats. These two species look alike, but there are six evolutionary distinct lineages between these species (John et al. 2013b). The population of both the species of the genus Sahyadria is declining because of exploitation for export (Raghavan et al., 2013), restricted range, and deterioration in its habitats (Ali et al. 2015;Raghavan et al. 2018). Now, both the species are listed under the endangered category according to the IUCN Red List Criteria (Version 2020-21.). Inviting the focused research on conservation biology, immediate attention is needed for effective conservation and restoration plans. Both species are benthopelagic and herbivorous of which S. denisonii inhabits fast-flowing hill streams and S. chalakkudiensis inhabits upper reaches of river with overgrowing plants (Raghvan et al. 2013). The general distribution of S. denisonii covers Mundakayam, Travancore hill ranges, Aralam Wildlife Sanctuary, and Kannur district of Kerala. S. chalakkudiensis is only distributed in the river Chalakudi, Kerala (Ponniah and Gopalakrishnan 2000).
Global anthropogenic climate change is known to impact inland aquatic ecosystems and life history events such as distribution, habitat utilization, and breeding phenology, ultimately deteriorating the existing natural fish population (Sarkar et al. 2019). It has implications for both biodiversity conservation as well as fisheries exploitation. In order to develop proper conservation of the target species, spatial and temporal data of the species distribution and habitat are needed for assessing and predicting the status. Though limited research has been done at the global level for predicting the inland fish distribution, still, in India, the focused research in this direction has not been addressed so far. Worldwide, there are methods and models, which have been used for predicting the fish distribution from ecological/evolutionary perspectives.
Fish distribution prediction and change in the land use patterns are essential for developing plans for the conservation of biodiversity. Information on the geographic distribution of species has been documented using predictive models, which is important for a variety of applications in ecology and conservation (Graham et al. 2004). Species distribution models (SDMs) are the efficient tools to predict the geographic and environmental range of a species, typically incorporating seasonal and temporal variability in both. SDMs estimate the relationship between species recorded at sites and the environment and/or spatial characteristics of those sites (Franklin 2010). SDMs are widely used for stream bioassessment, estimating changes in habitat suitability and identifying conservation priorities. In the last two decades, many developments have been made in the field of SDM, and multiple methods are available (Austin 2007;Elith et al. 2006), of which many have been used to predict the distribution of fish by predicting suitable habitat (Buisson et al. 2008;Mcnyset 2005;Oakes et al. 2005). Moreover, a recent comparison of presence-only modeling techniques (Elith et al. 2006) highlights that some new methods have better predictive accuracy than the established methods, and among the new methods, Maximum Entropy Species Distribution Modeling (MaxEnt) (Elith et al. 2006;Phillips et al. 2004) is better to optimize predictive accuracy. A major distinction between these methods is the kind of species data they use and the accuracy, which is critical for guiding effective stream management decisions (Rose et al. 2016). A recent survey of literature suggests that species distribution models could be better if a more ecological theory is included by taking species-environmental relationships into consideration (Austin 2007;Guisan et al. 2013). Thus, modeling the distribution of a species greatly depends on the data, the model parameters, and the uncertainties derived from these. Therefore, it is essential to adopt the best practice in modeling the species distribution by identifying the data limitations and understanding the elements and methods involved in the SDM.
In a fish assemblage, species differ in life history strategies, habitat requirements, and sensitivity to the stressors (Maloney et al. 2006;Schleiger 2000;Schlosser 1982); thus, generalized conservation plans will not always be suitable for the species of interest. In this case, the species distribution model for the species of interest will serve the conservation purpose better. The spatial and temporal distribution of species diversity is a major aspect in ecological studies, and its relevance is pertinent to the climate change scenario, habitat degradation/alteration, and aquatic diversity loss (McGill et al. 2015). In India, the conceptualization of methods and developing models to study inland fish biology, habitat, and distribution has not been attempted except for food web models (Panikkar and Khan 2008;Khan et al. 2015) as well as model-based reproductive vulnerability assessment in changing environmental scenario (Sarkar et al. 2019;2021). In view of the aforesaid facts, this study aims to develop models for distribution prediction of the two fish species having high conservation significance, Sahyadria denisonii and Sahyadria chalakkudiensis of genus Sahyadria, using MaxEnt modeling technique. The ultimate goal of the present study is to understand the spatial distribution pattern and ecological niche of each species and assess environmental factors affecting the distribution. Generating knowledge on the present state of distribution of the two endangered and endemic species, their essential fish habitat responsible for environmental concerns will be an important step toward conserving these species in its region.

Species occurrence records
The locality occurrence records of the species of the genus Sahyadria with longitude and latitude for each occurrence site were collected from publications of the natural history museums, published literature, and reliable observational datasets (Tables 1 and 2). We have few records, where maps are available for a locality, and for such records, maps were georeferenced to assign geographical coordinates for the locality. Thereafter, a complete dataset holding the occurrence records of the species of the genus Sahyadria was screened to find errors in georeferencing and taxonomic status. To avoid the sampling bias, we applied spatial filtering and used only one sample from a locality in such a way that no two samples should lie within a 5-km radius (Kramer et al. 2013). Finally, the error-free presence-only datasets as the locality occurrence datasets of these two species were used in the study to develop the distribution models of the endemic and endangered species of the region (Hernandez et al. 2008).

Environmental dataset
The environmental dataset holds records comprising climate, hydrology, and topography datasets. All these datasets were downloaded and used in the study to develop the probability spatial distribution models of the species of the genus Sahyadria. For climate, WorldClim Version 2.0 (Fick and Hijmans 2017) is a database of high spatial resolution global weather and climate data. This is a new release, which is a set of global climate layers (gridded  climate data). This data is used for mapping and spatial modeling. In this database, there are monthly climate data for minimum, mean, maximum temperature, precipitation, solar radiation, wind speed, water vapor pressure, and total precipitation. Moreover, there are 19 bioclimatic variables. Each climate data download is a "zip" file containing 12 GeoTiff (.tif) files, one for each month of the year (January is 1; December is 12). The 19 bioclimate variables are the average for the years 1970-2000. Each download is a "zip" file containing 19 GeoTiff (.tif) files, one for each month of the variables. In the present study, for environmental data, all the 7 climate (7 × 12 month = 84) and 19 bioclimatic variables (Table 3) with a spatial resolution of 30 s (~ 1 km 2 ) were used. For hydrology, the Global Multi-resolution Terrain Elevation data GMTED 2010 database, which is a replacement of GTOPO30, was used to derive data for entire India with a spatial resolution of 30 m for different hydrological and topographical variables. This new product suite provides global coverage of all land areas. In the present study, only the mean dataset available in GMTED for India was used for deriving the values of different hydrological parameters. The values of the hydrological parameters listed in Table 3 were derived by applying the A T Search Algorithm developed by Ehlschlaeger (1989) using GRASS GIS (Version 7.8.1) software. A T Search Algorithm also known as least cost search is a method that identifies drainage flow directly from the original elevation data. From the prepared dataset of the hydrological variables, the two topographical variables "landscape slope" and "aspect" were derived using the abovementioned GIS software. The basin and drainage maps were overlaid on the Google Maps for the identification of river reaches. The river reaches were isolated by digitizing and annotated too.

Species distribution modeling
Elevation in the two adjoining pixels can fluctuate significantly but not the climate; therefore, resampling of the elevation data to a finer resolution can severely reduce the Wind speed (m s-1) Water vapor pressure (kPa) BIO1 = annual mean temperature BIO2 = mean diurnal range (mean of monthly (max temp-min temp)) BIO3 = isothermality (BIO2/BIO7) (× 100) BIO4 = temperature seasonality (standard deviation × 100) BIO5 = max temperature of the warmest month BIO6 = min temperature of the coldest month BIO7 = temperature annual range (BIO5-BIO6) BIO8 = mean temperature of the wettest quarter BIO9 = mean temperature of the driest quarter BIO10 = mean temperature of the warmest quarter BIO11 = mean temperature of the coldest quarter BIO12 = annual precipitation BIO13 = precipitation of the wettest month BIO14 = precipitation of the driest month BIO15 = precipitation seasonality (coefficient of variation) BIO16 = precipitation of the wettest quarter BIO17 = precipitation of the driest quarter BIO18 = precipitation of the warmest quarter BIO19 = precipitation of the coldest quarter max_slope_length = maximum length of surface flow Accumulation = number of cells that drain through each cell log_accumulation = absolute logarithmic value of accumulation tci = topographic index ln(a/tan(b)) (Quinn et al. 1991) spi = stream power index a × tan(b) (Moore et al. 1991) drainage = drainage direction (numbered from 1 to 8) Basin stream length_slope = slope length (Weltz et al. 1987) slope_steepness = slope steepness (McCool et al. 1987) Landscape slope Landscape aspect quality of hydrological data, but resampling of the climatic data to a finer resolution cannot affect much hence. We resampled the climate data of 30 s using Lanczos interpolation (Lanczos 1938) to transform the data to a finer grid of spatial resolution 30 m to match with the elevation data. All the GIS analysis was done using GRASS GIS (version 7.8.1) software. A method called maximum entropy distribution (MaxEnt) was used for modeling the species distribution because this model performs the best among many modeling methods (Elith et al., 2006) and because of its effectiveness over a small sample size (Benito et al. 2009;Hernandez et al. 2006;Papeş and Gaubert 2007;Pearson 2007;Wisz et al. 2008). MaxEnt is a maximum entropy-based machine learning programme that estimates the probability distribution from the species' occurrence data in relation to environmental constraints (Phillips et al. 2004). It requires only species presence data and environmental variable (continuous or categorical) layers of the study area. The freely available MaxEnt software version 3.1 was downloaded and used with a fourfold cross-validation method on the occurrence datasets of S. denisonii and S. chalakkudiensis (Tables 1 and 2) in relation to environmental variables comprising climate, hydrology, and topography (Table 3) for distribution modeling of these species as well as to estimate the presence probability of the species varying from 0 to 1, where 0 is the lowest and 1 the highest probability (100%). A sample radius of 5 pixels (150 m) was used with 25% random test percentage for each replicate run.

Results
In this study, MaxEnt models predicted the potential niches and probability distribution maps of both S. denisonii and S. chalakkudiensis for the different river basins based on the known/reported occurrence records. The AUC (area under curve) for the two species are presented in Fig. 1a and b). These figures indicate that the mean AUCs of the models of both the species are close to 1 that prove that models are good classifiers. The analysis reveals that "solar radiation" and "temperature" are significant predictors for the distribution of the species of the genus Sahyadria. The predictor permutation importance (%) of the major predictors for distribution of both the species is presented in Fig. 2. This figure clearly shows that "temperature annual range" is the significant predictor variable followed by "solar radiation" in January and "temperature seasonality (standard deviation)" for the distribution of S. denisonii. Further, "solar radiation" in April followed by "temperature annual range" and "temperature seasonality (standard deviation)" are the most significant predictors for distribution of S. chalakkudiensis. Though "slope steepness" and "logarithmic of accumulation" have low significance in the final model, AUCs get lowered if these predictors are dropped. It indicates that these predictors have some significance in finalising the models of the species of the genus Sahyadria. The jackknife test of these environmental variables showed that "temperature annual range,"

(b) Sahyadria chalakkudiensis
Test AUC (4 replicate runs)= 0.9997, SD= 0.0002 Fig. 1 a and b. Area under curve (AUC) between average sensitivity and specificity for the species of the genus Sahyadria "solar radiation," and "temperature seasonality (standard deviation)" are the three important predictors for probable distribution of S. denisonii and S. chalakkudiensis. These variables presented the higher gain in comparison to other variables. In order to understand how long the species is likely to present in the rivers of different basins, the extent occurrence range of the species of the genus Sahyadria for each class was calculated using the defined probability classes at the threshold of > = 0.5 (Tables 4 and 5). The vector maps were published by converting the average value raster maps predicted for these two species (Figs. 3a and b) and analyzing the vector maps of these species at threshold of > = 0.5 because at this threshold and above, the species is more likely to be present in the habitat (Figs. 4a and b). Further, to understand the phenomenon of distribution of the two species in response to the water flow, "precipitation" was determined as another significant climate envelope (Fig. 5).

Discussion
The spatial prediction distribution model was developed for the two important fish species S. denisonii and S. chalakkudiensis in the Western Ghat ecosystem, and prioritized ecological niches were predicted in relation to a relative contribution of the bioclimatic features. Though in inland waters, few models were reported in fisheries (Panikkar and Khan 2008;Khan et al. 2015;Sarkar et al. 2019 and2021), it is the first attempt, which predicted the geographical distribution of important species in relation to the environmental dataset. The pattern of distribution range under seasonality was noticed and depicted. The distribution range difference between the two species was noticed, especially during the monsoon season from May to September. Further as indicated in Fig. 5, priority habitat distribution of S. denisonii was modeled in the higher precipitation areas compared to S. chalakkudiensis. The results derived from the models show that the distribution of S. denisonii majorly depends on the "temperature annual range," "temperature seasonality," "solar radiation" in January, and "slope" of the river. These are somewhat important predictors in defining the ecological niche of this species. Similarly, the "solar radiation" in April, "temperature annual range," "temperature seasonality," and "logarithmic of the accumulation" are the important environmental factors in defining the ecological niche of S. chalakkudiensis. The jackknife test further provides information about these predictors suggesting that "temperature" and "solar radiation" in January and December are the limiting factors for distribution of S. denisonii because of breeding periods of this species in these months with peak Gonado Somatic Index (GSI) (Solomon et al. 2011). Similarly, "solar radiation" and "temperature" in January as well as February are critical for the distribution of S. chalakkudiensis as these are associated with the breeding of this species during these months. Thus, in totality, "temperature" and "solar radiation" were predicted as important variables responsible for the distribution of species of the genus Sahyadria. Among the environmental variables, these are also significant variables providing the ecological niche of the species of this genus. In the case of S. denisonii, though "logarithmic of accumulation (log accumulation)" itself has not much permutation importance (0.6723%), the removal of it from the list of predictors causes AUC to drop significantly, suggesting that it has some relevance. The genus Sahyadria is distributed in 14 small rivers in the Western Ghats (John et al. 2013b) of which S. denisonii is found in its entire latitudinal distribution. Our model at the latitudinal scale suggests that there are some other rivers, which have a suitable ecological niche for S. denisonii in small fragments, but the rivers Netravati, Kallada, and Ithikkara are absolutely suitable. Hence, it is interesting to know whether S. denisonii is occurring there or its ecological niche is occupied by other species. The model also suggests that the river Periyar has more occurrence probabilities for S. chalakkudiensis in comparison to the river Chalakkudi. The probability distribution model of the species of the genus Sahyadria indicates that Palghat Gap is a barrier in the distribution as reported earlier (John et al. 2013a). It is due to distinct lineages in this genus, and 8 distinct genetic lineages have been reported in this genus so far (John et al. 2013b), and if grouped in two clades, then one clade comprises populations in the north of Palghat and the other south of it. The population in both the clades differs in size, body shape, and color. It indicates that Palghat Gap is a barrier in the distribution (John et al. 2013a). Concomitantly, the model suggests Fig. 2 Important predictor variables and their response curve for Sahyadria denisonii and Sahyadria chalakkudiensis with their permutation importance in % (average of 4 replicate runs). In each graph X-axis represents prediction probabilities between 0 (Absent) and 1 (100% present). Sahyadria denisonii: bio_07 (63.0941%), srad_01 (14.7454%), bio_04 (13.7885%), slope_steepness (7.0731%); Sahyadria chalakkudiensis: srad_04 (39.644%), bio_07 (25.28%), bio_04 (20.2391%), log_accumulation (5.7099%) ◂ temperature as a predictor has been used because of its adequate availability of information, which is an important predictor according to the models. The relationship of air with water temperature is not always linear; therefore, it is important to have data on water temperature from its known range of distribution for more detailed mapping.   . 4 a Predicted distribution of Sahyadria denisonii in the Western Ghats derived from predicted raster map at threshold of > = 0.5. b Predicted distribution of Sahyadria chalakkudiensis in the Western Ghats derived from predicted raster map at threshold of > = 0.5

Conclusion
The spatial distribution model developed for the species of the genus Sahyadria showed good performance in a wide range of environmental situations. The present study depicts suitable ecological niches offering distribution of the species and factors affecting the distribution. However, the output of the model for understanding the distribution of all the lineages can be enhanced by including more climato-environmental data supported by microhabitat and fish assemblage pattern for all lineages, water temperature, and threats from its known range of distribution. The information generated is highly relevant to both space and time.
The probability distribution map presenting details on the ecological niche can help in land use management around its existing populations, discovering new populations, identifying top priority survey sites, as well as setting priorities to restore its natural habitat for more effective conservation. The methodology conceptualized and demonstrated here can be used to determine ecological niches for other threatened and endangered plant and animal species in other potential areas and may aid in the field surveys and setting up conservation and restoration efforts by the managers and stakeholders of natural resources.
Author contribution Ajey Kumar Pathak (AKP) conceptualized the idea and checked the data, maps, and models for authenticity and accuracy. AKP drafted and edited the manuscript. Pushpendra Verma downloaded the data and MaxEnt software. He configured the software under Linux operating system for working and analyzed the data (occurrence and environmental) using MaxEnt. All the GIS analysis was done by him, and the resultant maps were too produced by him using the QGIS software. Rajesh Dayal (RD) collected the occurrence data of fish species used in the study and assisted in verifying the predicted distribution of fish species using literature. Uttam Kumar Sarkar (UKS) edited the manuscript and suggested this journal for publication of this article.
Data availability All data generated or analyzed during this study are included in this published article.

Declarations
Ethics approval and consent to participate Not applicable as the study does not use any animal.
Consent for publication Not applicable as the manuscript does not contain the data from any individual person.

Competing interest
The authors declare no competing interests.