Physicochemical Habitat Traits Preferred by Small Indigenous Fish (Chanda Nama ) in Indian River Discerning through Machine Learning

21 Physicochemical traits of river influence the habitat of fish species in aquatic ecosystems. Fish showed a complex 22 relationship with aquatic factors in river. Machine learning (ML) modeling is a useful tool to established relationship 23 between complex systems. This study identified the preferred habitat indicators of Chanda nama (a small 24 indigenous fish), in the Krishna River, of peninsular India, using machine learning modeling. Data were observed on 25 Chanda nama fish distribution (presence/absence) and associated ten physical and chemical parameters of water at 26 22 sampling sites on the river during year 2001-02. Machine learning models such as random forest (RF), artificial 27 neural network (ANN), support vector machine (SVM), k-nearest neighbors (KNN) used for the classification of 28 Chanda nama distribution in the river. The ML model efficiency was evaluated using classification accuracy (CCI), 29 Cohen’s kappa coeffic ient ( k ), sensitivity, specificity and receiver-operating-characteristics (ROC). Results showed 30 that random forest is the best model with 82% accuracy, CCI (0.82), k (0.55), sensitivity (0.57), specificity (0.76) 31 and ROC (0.72) for Chanda nama distribution (presence/absence) in the Krishna River. Random Forest model 32 identified three preferred physicochemical habitat traits like altitude, temperature and depth for Chanda nama 33 distribution in the Krishna River, India. This study will be helpful for researcher and policy maker to understand the 34 important habitat physicochemical traits for sustainable management of small indigenous fish ( Chanda nama ) in the 35 river system. 36


76
Very rare studies were conducted for Chanda nama (SIF) habitat identification for presence/absence (prediction) in 77 Indian river system using machine learning approaches. The purpose of this study is to develop a framework for 78 prediction of habitat indicators for SIF Chanda nama distribution in the Krishna River, India using ML modeling.

79
This manuscript is structured in two steps (a) at first the ML classification models such as RF, SVM, ANN, and 80 KNN were performed and compared for prediction of Chanda nama and selected the best model (b) Secondly, using   is divided into three strata (upper, middle and lower). There are total twenty two sampling stations which are 89 distributed into these three strata in the river. Fourteen sampling stations were taken in the upper part of the river 90 having altitude between 740 m to 515 m (slope 42 cm/km), five sampling stations in the middle part of the river 91 having altitude between 494 to 170 m (slope 113 cm/km) and the three stations in the lower part of the river of 92 altitude between 19 m to 5 meter (slope 11 cm/km). These sampling stations were selected to cover the variations in 93 three strata of the selected river as well as the best representative sample and best point of gaining access to the 94 rivers that can be suitable for easy sampling for fish and water quality. The data were taken on Chanda nama fish presence/ absence and associated ten physical and chemical parameters 97 of water (temperature (0C), transparency (m), depth(m), pH, specific conductivity (μS/m), dissolved oxygen (ppm), 98 total alkalinity (ppm), flow (cm/sec), chloride (ppm) and altitude (m) ( Table 1) where ∝ is positive real constants and b is a real constant. In this present study, most frequently used SVM with 129 radial basis function kernel was used which can be calculated as using equation (2), where σ is the width of the radial function determined by a grid search method using repeated cross validation

151
The sigmoid neuron has weight for each input and over all bias. The neurons in the input layer receive input from 152 the input cell perform some kind of transformation by assigning weights to the input and transmit outcome to the assign to the k th output node by the jth hidden unit, ℎ bias term of the jth hidden unit, is the bias term of the kth 158 output unit. These are the adjustable parameters that were estimated during the training process by minimizing the 159 loss function. Let us assume that the training samples NT are available to train a neural network with the K output  172 the conditional probability for class j as the fraction of points in N0 whose response value equal j:

244
Here, RF identified the three important habitat parameters i.g., altitude, temperature and, depth for Chanda  (table 4). In this river Chanda nama habitat preference was low temperature and less depth.

257
Some previous studies also reported that fish distribution is governed by stream gradient, zones, altitude and

260
Besides the preferable habitat features of individuals the persistence of populations also depends on landscape-scale 261 features in relation to immigration and emigration rates, broader regional abiotic constraints and habitat fitness