Study area
Panna Tiger Reserve (24°16ʹN to 24°42ʹN and 79°29ʹE to 80°16ʹE), covering an area of 1598.10 km² is situated in the state of Madhya Pradesh in central India. The Critical Tiger Habitat (CTH), or core of the reserve comprises Panna National Park and Gangau Wildlife Sanctuary, covering an area of 576.13 km2 (Madhya Pradesh Forest Department 2007). The buffer covers an area of about 1,021.97 km2 (Madhya Pradesh Forest Department 2012). The reserve lies in Vindhyan hills, its altitude ranging between 330 and 540 m a.s.l. (Chawdhry 1996; Rodgers et al. 2002). It has an average annual humidity of 86%, and temperature ranges from 5 to 45℃. Both monsoon (July-September) and winter (November-February) are short, thus, the climate is mostly hot and dry (Chawdhry 1996; Gopal et al. 2010). Ken a major tributary of the river Yamuna, cuts through the reserve, flowing from South to North. The major forest type is dry deciduous forest with teak (Tectona grandis) as the most dominant flora (Meher-Homji 1990). Apart from tigers, the major faunal community comprises of carnivores, viz. leopard (Panthera pardus), striped hyena (Hyaena hyaena), wild dog (Cuon alpinus), golden jackal (Canis aureus), Bengal fox (Vulpes bengalensis), jungle cat (Felis chaus), and sloth bear (Melursus ursinus); herbivores, viz. sambar (Cervus duvauceli), chital (Axis axis), nilgai (Boselaphus tragocamelus), chinkara (Gazella bennetti), chousingha or four-horned antelope (Tetraceros quadricornis) and wild pig (Sus scrofa); and primates, viz. hanuman or common langur (Semnopithecus entellus) and rhesus macaque (Macaca mullata) (Gopal et al. 2010). There are four villages within the national park area, however, there are seven villages in sanctuary area and 49 villages in the buffer of the reserve. Many of the communities living in these villages are dependent on the reserve for fuelwood, fodder and NTFPs (Malviya et al. 2022). The current study focussed on the CTH and two kilometres of natural buffer around it (Fig. 1).
The original tiger population in Panna was lost due to poaching and as a population recovery measure, tigers were reintroduced from neighbouring reserves in 2009 (Sarkar et al. 2016). Since then, the tiger population in Panna has increased exponentially from seven founders to more than 60 individuals in 2020 (Sharma and Jarande 2020).
Livestock kill quantification and kill site location
In India, forest department of the state compensates livestock depredation by wild carnivores. We collected livestock compensation records from Panna Tiger Reserve management to understand the intensity of HTC within the reserve. But the compensation data was not geo tagged, hence we also obtained the livestock kill data (that has GPS locations), which is collected by tiger monitoring teams in the reserve, for the period 2009–2016. Since in case of Panna, there are several feral cattle, and no distinction is made between feral and domestic cattle in the kill records, we matched kill and compensation data to obtain reliable locations for domestic livestock kill within the reserve. We thus matched 156 locations for the period 2011–2016. For predation risk probability modelling, we treated livestock kill data as presence and generated equal number of random absence points using ‘create random points’ tool in ArcGIS 10.4 (ESRI 2016a). We used livestock kill data for the period 2011–2015 for training the model and 2016 data for testing the model.
Measuring ecological variables
We considered a total of 27 variables for ecological driver modelling, which can be grouped under the following heads
I. Prey: We used distance sampling technique for tiger prey species population estimation (Buckland et al. 2001). We surveyed 41 line transects each up to 2 km in length, during winter 2012-13 and 2013-14. All the line transects were walked in a replicate of three. The prey species recorded were sambar, chital, wild pig, nilgai, chinkara, chousingha, hanuman langur, hare, peacock and livestock (cattle and buffalo). We combined the data for both the years and calculated all prey (wild and livestock) encounter rate, wild prey encounter rate, and livestock encounter rate, for each transect.
II. Vegetation cover: We quantified vegetation indices, viz. canopy cover and shrub abundance in 15m circular plots laid at every 400 m on the line transects during winter 2012-13 (Jhala et al. 2009). We laid a total of 234 circular plots. Within the plots, we made ocular estimations of canopy cover and scored shrub cover according to its abundance (0–4). The same team of two people carried out all the vegetation related estimations to avoid interobserver bias.
For further understanding the vegetation cover of the tiger reserve, we downloaded LANDSAT 8 (OLI/TIRS) scenes for the reserve from USGS website for April 2013 (LANDSAT SCENE ID = LC81440432013119LGN01; Download date = 20 April 2015) and November 2013 (LANDSAT SCENE ID = LC81440432013327LGN01; Download date = 15 December 2020). We calculated Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1974) using these scenes, employing Raster Calculator in ArcGIS 10.1 (ESRI, 2012) by the formula:
NDVI = (Near Infrared - Red)/ (Near Infrared + Red).
III. Water: Using the same scenes as used for calculating NDVI, we calculated Normalized Difference Water Index (NDWI) (McFeeters 1996) with Raster Calculator in ArcGIS 10.1 (ESRI 2012), by the formula:
NDWI = (Green-Near Infrared)/ (Green + Near Infrared).
Additionally, we obtained drainage and water source data from the forest department and created Euclidean distance raster using ‘Euclidean Distance’ tool in the Spatial Analyst toolbox in ArcGIS 10.4 (ESRI 2016a). We also created Euclidean distance rasters for Ken River and its tributaries, and water sources tagged perennial.
IV. Topography: We downloaded ASTER Global Digital Elevation Model (DEM) data from USGS Global Visualization Viewer website. We used ‘Slope’ tool in the Spatial Analyst toolbox to calculate slope from DEM layers in ArcGIS 10.1 (ESRI 2012). Additionally, we calculated topographic ruggedness index or terrain ruggedness index (TRI) that measures elevation difference between a cell and mean of its eight neighbouring cells (Riley et al. 1999) using raster calculator in ArcGIS 10.1 (ESRI 2012), by the formula (Cooley 2016):
TRI = SquareRoot (Abs((Square(“3x3max”)-Square (“3x3min”)))).
V. Land Use Land Cover and forest contiguity: We procured Land Use Land Cover (LULC) prepared by Forest Survey of India (FSI) for the entire country at 98m resolution for the year 2009 (FSI 2009). We studied landscape characteristics and patterns, particularly habitat connectivity, using multiple indices in program FRAGSTATS (ver. 4.2). We ran FRAGSTAT analysis using the FSI LULC and calculated three class-level metrics: Patch Density (PD), Large Patch Index (LPI) (percentage of total landscape area comprised by the largest patch), and Clumpiness Index (CLUMPY) (a measure of fragmentation). LPI is a simple measure of dominance. And CLUMPY that ranges from − 1 (patch type is maximally disaggregated) to 1 (patch type is maximally clumped) provides an index of fragmentation of the focal class that is not affected by changes in class area (McGarigal 2015).
VI. Disturbance: We also quantified anthropogenic disturbance indices in the circular plots (as discussed earlier under section II). In each plot, we counted all the lopped (only branches were cut) and cut (cut to stump) trees. We deployed camera traps (Cuddeback Attack pairs) in 109 locations in 2x2 km grids within the national park area, accounting for 7459 trap nights, in the winter of 2013-14. We then manually counted the number of tigers, livestock, humans, and vehicles captured in each camera trap and calculated encounter rates (total no. of captures/total trap nights).
We also obtained village and road location data from the forest department and calculated Euclidean distance rasters. Additionally, we downloaded human footprint data from Socio Economic Data and Application Centre (SEDAC) website for the year 2009 (Sanderson et al. 2002; Venter et al. 2018). We also downloaded population census data for the year 2011 from SEDAC (Balk et al. 2020).
Geostatistical modelling to create rasters: We interpolated canopy cover, prey, human, livestock, and vehicle encounter rates, and cutting and lopping intensity rates, to create rasters using the Geostatistical wizard in ArcGIS 10.4 (Cressie 2015; ESRI 2016a). For this purpose, we considered four interpolation tools: Inverse Distance Weighing, Simple Kriging (SK), Ordinary Kriging (OK), and Empirical Bayesian Kriging (EBK). We used statistical measures of correctness (mean prediction error, root-mean-square error, standardized root-mean-square error, average standard error) to compare the kriging algorithms. We selected the model that had the smallest root-mean-squared prediction error (RMSE), standardized mean nearest to zero, the average standard error nearest the root-mean-squared prediction error, and the standardized root-mean-squared prediction error nearest to 1 (ESRI 2016b) (Supplementary Table 1).
Scale and variable selection: To construct a multiscale model, we resampled each of the variables (except for human footprint (which was available at ~ 1 km resolution)) at five scales: 30m (the highest resolution available), 50m (mean drag distance for tiger kill (Karanth and Sunquist 2000)), 100m (midpoint between fine and coarse resolution), 350m (maximum kill drag distance (Karanth and Sunquist 2000)), 1200m (coarsest resolution used by us and at which most global environmental data is available). LULC (available at 98m) could only be resampled at coarse scale (100-1200m). Thus, in total, we had 118 variables. We then extracted all the variables for each of the presence/absence points using the Spatial Analyst toolbox of ArcGIS 10.4 (ESRI 2016a).
For scale and feature selection, firstly, we ran univariate logistic regressions, after performing Box-Tidwell procedure to test for logistic regression’s linearity assumption, i.e., logit transformation of the dependent variable and continuous independent variables have a linear relationship (Shin and Ying 1994). Secondly, we ran univariate GAMs to understand how much r square/deviance was explained by each of the predictors at each of the scales. We selected the scales at which the variables were best explaining the response (livestock kill presence/absence). Additionally, we employed Information Value and Weight of evidence for feature selection (Good and Osteyee 1974). We studied the results of univariate logistic regression, univariate GAM, and Information value to select the explanatory variables at appropriate scales. We then checked the data for multicollinearity and spatial autocorrelation. To check for multicollinearity among the selected variables, we ran appropriate tests of association (for continuous vs continuous and continuous vs ordinal variables, Kendall’s tau b; for categorical vs continuous variables, logistic regression; and for categorical vs categorical variables, Cramer’s V). Among the correlated variables, we included those variables in the model that better explained the response. For example, all prey (wild prey plus livestock) was highly correlated to wild prey encounter rate (r = 0.812). Between the two, we selected all prey encounter rate since it was explaining higher deviance of the dependent variable. Similarly, NDVI and NDWI were moderately correlated (r = 0.636); we selected NDVI since it was explaining higher deviance.
So, after variable selection, the global model consisted of the following 12 variables at these scales: All prey encounter rate (50m), slope (50m), elevation (1200m), slope deviation (100m), NDVI (100m), shrub abundance (50m), canopy cover (30m), distance to water (1200m), livestock encounter rate (ct) (50m), human encounter rate (1200m), distance to road (100m), distance to village (100m) (Fig. 2).
We checked spatial autocorrelation among these variables using Moran’s I statistic, and it was found that many variables were spatially autocorrelated (Getis 2007; Moran 1950).
Spatial GAM
The Box-Tidwell test indicated that many of the variables did not meet the linearity assumption of logistic regression. Therefore, we could not use linear models for our predictive modelling. And since our aim was not to merely get accurate predictions but to identify the drivers of HTC, we did not use machine learning algorithms. Therefore, as discussed within the introduction, we selected GAM to model livestock kill locations as a function of various tiger relevant ecological factors (Wood 2017). To account for the spatial autocorrelation in the data, we constructed spatial GAM model using geoGAM package in R ver. 3.6.3. (Nussbaum and Papritz 2017). It’s a procedure to build a parsimonious model based on gradient boosting, smoothing splines and a smooth spatial surface to account for the spatial structure. The GAM for spatial data or geoadditive model in its full generality is represented by
$$\text{g}\left({\mu }\left(\text{x}\left(\text{s}\right)\right)\right)= {\nu } + \text{f}\left(\text{x}\left(\text{s}\right)\right)=$$
$${\nu } +{\sum }_{u}{f}_{{j}_{u}}\left({x}_{{j}_{u}}\right(\text{s}\left)\right)+ {\sum }_{v}{f}_{{j}_{v}}\left({x}_{{j}_{v}}\right(\text{s}\left)\right) . {f}_{{k}_{v}}\left({x}_{{k}_{v}}\right(\text{s}\left)\right)$$
$$+{\sum }_{w}{f}_{{s}_{w}}\left(\text{s}\right). {f}_{{j}_{w}}\left({x}_{{j}_{w}}\right(\text{s}\left)\right)+ {f}_{s}\left(\text{s}\right)$$
Where, \({f}_{s}\left(\text{s}\right)\) is a smooth function of spatial coordinates, which accounts for residual autocorrelation (Nussbaum and Papritz 2017).
Since the response variable, in this case, is binary, Bernoulli distribution is assumed, and logit link used
$$\text{g}\left({\mu }\left(\text{x}\left(\text{s}\right)\right)\right)= \text{log}\left(\frac{{\mu }\left(\text{x}\left(\text{s}\right)\right)}{1-{\mu }\left(\text{x}\left(\text{s}\right)\right)}\right)$$
where,
$${\mu }\left(\text{x}\left(\text{s}\right)\right)= \text{P}\text{r}\text{o}\text{b}\left[\text{Y}\left(\text{s}\right)= 1\right| \text{x}\left(\text{s}\right)]= \frac{\text{e}\text{x}\text{p}({\nu } + \text{f}(\text{x}\left(\text{s}\right)\left)\right)}{1+\text{e}\text{x}\text{p}({\nu } + \text{f}(\text{x}\left(\text{s}\right)\left)\right)}$$
For building parsimonious model geoGAM automatically selects factors, covariates and spatial effects using componentwise gradient boosting, following which model is further reduced using cross validation (Nussbaum and Papritz 2017).
We ran the final model so selected on test data, to get model performance measures, area under the curve (AUC) and true skill statistic. We performed all the analyses using R Statistical Software (v3.6.3; R Core Team, 2020).
Risk map
We created a raster with 42m cell size (average calculated from mean kill drag distance for tiger as reported by literature (Karanth and Sunquist 2000; Miller et al. 2015)) and masked it to the reserve boundary. We then converted it into points and, for each of these points, extracted the values for all the explanatory variables. We then ran the final selected model on this data to predict predation risk probability for each point. We then converted the points back to raster. Finally, we assigned risk predictions as the value of the raster to create HTC risk map.