Multiscale Spatially Explicit Generalized Additive Modeling of Human-Tiger Conict Risk

Spatial modeling of human-carnivore conict has recently gained traction and predictive maps have become a great tool to understand the distribution of present and future conict risk. However, very few such studies consider scale and use appropriate spatial modeling tools. This study aimed at understanding the ecological predictors of human-tiger conict and predicting livestock predation risk by reintroduced tigers in Panna Tiger Reserve, Central India, by modelling livestock kill as a function of various tiger relevant ecological variables, at multiple scales employing spatially explicit statistical tools. Methods


Introduction
Spatial modelling of con ict or predictive risk modelling has become one of the major tools to understand human-carnivore con ict (HCC) ( However, most studies employ aspatial models to predict predation risk using spatial correlates (Soh et al. 2013;Miller et al. 2015). Since most ecological variables exhibit a certain degree of spatial autocorrelation, therefore, it is important to account for the spatial nature of the data (Gri th 1992; Legendre 1993), when modelling predation risk by carnivores. In absence of which, 'independence of data points', a common assumption across most statistical models, is violated, leading to unreliable model outcomes (Legendre 1993;Dale and Fortin 2002). As there is a dearth of studies examining HCC at multiple scales employing appropriate spatial statistical models; in this study we have evaluated the ecological determinants of human-tiger con ict (HTC) and predicted livestock predation risk by tigers in and around Panna Tiger Reserve, Central India. Livestock kill was modelled as a function of various tiger relevant ecological variables, viz. prey, cover, water, and anthropogenic disturbance, at multiple scales employing spatially explicit Generalized Additive Model (GAM) .

Study Area
Panna Tiger Reserve (24°16 N to 24°42 N and 79°29 E to 80°16 E) covering an area of 1598.10 km² is The original tiger population in Panna was lost due to poaching and as a population recovery measure tigers were reintroduced from neighbouring reserves in 2009 ). Since then, the tiger population in Panna has increased exponentially from seven founders to around 60 individuals in 2020.

Methodology
Multivariate multiscale predictive modelling GAM while retaining the interpretability of generalized linear models have the exibility of machine learning algorithms, because it does not assume a linear relationship between dependent and independent variables (Hastie and Tibshirani 1990;Larsen 2015). GAM is, as the name suggests, a generalization of the linear model, in which the linear function of the covariate is replaced with a smooth function (Hastie and Tibshirani 1990). Because of their semiparametric nature, GAMs are much more sensitive to unique data distribution than GLM, allowing for modelling of nonlinear relationships by deriving predictor functions during model building (Härdle and Turlach 1992;Larsen 2015). At the same time to avoid over tting, one can control the smoothness or 'wiggliness' of the predictor function (Larsen 2015). Despite their versatility, GAM has not been explored as much as linear or machine learning models to understand the relationship between livestock depredation by carnivores and the environmental factors 2018). Given that the data for this study does not t the assumptions of linear models, and the primary aim is to not merely get accurate prediction but identify the drivers of HTC, GAM was selected, to model livestock kill locations as a function of various tiger relevant habitat factors (after determining appropriate scale) (Wood 2017).

Livestock kill quanti cation and kill site location
In India, domestic livestock depredation by wild carnivores is compensated by the forest department of the state. These livestock compensation records were collected from tiger reserve management to understand the intensity of HTC. But the compensation data was not geo tagged, hence the livestock kill data (that has GPS locations) which is collected by tiger monitoring teams in the reserve was also obtained, for the period 2009-2016. Since in case of Panna, there are several feral livestock, and no distinction is made between feral and domestic livestock in the kill records; kill and compensation data were matched, to obtain reliable locations for domestic livestock kill. Thus, 156 locations were matched for the period 2011-2016. For the purpose of modelling, livestock kill data was treated as presence, while random absence data was generated in GIS. Livestock kill data for the period 2011-2015 was used for training the model and 2016 data was used for testing the model.

Measuring ecological variables
A total of 27 variables were considered for ecological driver modelling, which can be grouped under the following heads I. Prey

II. Vegetation cover
Vegetation indices, viz. canopy cover and shrub abundance were quanti ed in 15m circular plots laid at every 400 m on the line transects (Jhala et al. 2009). A total of 234 circular plots were laid. Within the plots, ocular estimation of canopy cover was made and shrub cover was scored according to its abundance (0-4).

III. Water
Normalized Difference Water Index (NDWI) (McFeeters 1996) was calculated with raster calculator in ArcGIS 10.1 (ESRI 2012) using the same scenes as used for calculating NDVI, by the formula: NDWI = (Green-Near Infrared)/ (Green+Near Infrared). Additionally, drainage and water source data were obtained from the forest department and Euclidean distance raster created using Euclidean Distance tool in spatial analyst toolbox in ArcMap 10.4 (ESRI 2016a). Euclidean distance rasters were also created for Ken River and its tributaries, and water sources tagged perennial.

III. Topography ASTER Global Digital Elevation Model (DEM) data was downloaded from USGS Global Visualization
Viewer website. Slope tool in spatial analyst was used to calculate slope from DEM layers in ArcGIS 10.1 (ESRI 2012). Additionally, topographic ruggedness index or terrain ruggedness index (TRI) that measures elevation difference between a cell and mean of its eight neighbouring cells (Riley et al. 1999) was calculated using raster calculator in ArcGIS 10.1 (ESRI 2012), by the formula (Cooley 2016): TRI = SquareRoot (Abs((Square("3x3max")-Square ("3x3min")))).
IV. LULC and forest contiguity LULC prepared by Forest Survey of India for the entire country at 98m resolution for the year 2009 was procured (FSI 2009). Landscape characteristics and patterns, particularly habitat connectivity, were studied using multiple indices in program FRAGSTATS (ver. 4.2). FRAGSTAT analyses was run using the FSI LULC, three class-level metrics were calculated: PD, LPI (percentage of total landscape area comprised by the largest patch), and CLUMPY (a measure of fragmentation). LPI is a simple measure of dominance. And CLUMPY that ranges from -1 (patch type is maximally disaggregated) to 1 (patch type is maximally clumped) provides an index of fragmentation of the focal class that is not affected by changes in class area (McGarigal 2015).

V. Disturbance
Anthropogenic disturbance indices were quanti ed in the circular plots (as discussed earlier under section II). In each plot, all the lopped (only branches were cut) and cut (cut to stump) trees were counted.
Camera traps were deployed in 109 locations in 2x2 km grids within national park, accounting for 7459 trap nights. The number of tigers, livestock, humans, and vehicles captured in each camera trap was then manually counted and encounter rates (total no. of captures/total trap nights) calculated.
Village and road location data were also obtained from the forest department and Euclidean distance raster, calculated. Additionally, human footprint data was downloaded from Socio Economic Data and Application Centre (SEDAC) website for the year 2009 (Sanderson et al. 2002;Venter et al. 2018). Population census data for 2011 was also downloaded from SEDAC (Balk et al., 2020).
Geostatistical modelling to create rasters: Canopy cover, prey, human, livestock, and vehicle encounter rate, cutting and lopping intensity rates, were interpolated to create rasters, using the Geostatistical wizard in ArcMap 10.4 (Cressie 2015; ESRI 2016a). Four interpolation tools were considered for the purpose: Inverse Distance Weighing, Simple Kriging (SK), Ordinary Kriging (OK), and Empirical Bayesian Kriging (EBK). Statistical measures of correctness (mean prediction error, root-mean-square error, standardized root-mean-square error, average standard error) from the cross-validation were used to compare the kriging algorithms. The best model was considered the one that has the smallest root-meansquared prediction error (RMSE), standardized mean nearest to zero, the average standard error nearest the root-mean-squared prediction error, and the standardized root-mean-squared prediction error nearest to 1 (ESRI 2016b) (Online Resource 1).
Scale and variable selection: To construct a multiscale model, each of these variables (except human footprint, LULC and FRAGSTATS variable) were resampled at ve scales: 30m (the highest resolution available), 50m (average kill drag distance for tiger, as indicated by literature (Karanth and Sunquist, 2000;Miller et al., 2015)), 100m (midpoint between ne and coarse resolution), 350m (maximum kill drag distance from literature), 1200m (coarsest resolution used by us and at which most global environmental data is available). Thus, in total there were 118 variables. All the variables were then extracted for each of the presence/absence points using the Spatial Analyst toolbox of ArcMap 10.4.
For scale and feature selection, rstly, univariate logistic regressions were run, after performing Box-Tidwell procedure to test for logistic regression linearity assumption (Shin and Ying 1994). Secondly, univariate GAMs were run to understand how much r square/deviance was explained by each of the predictors at each of the scales. The scale at which the variable was best explaining the response was selected. Additionally, Information Value and Weight of evidence were employed for feature selection (Good and Osteyee 1974). Results of univariate logistic regression, univariate GAM, and Information value were studied to select the explanatory variables at appropriate scales for constructing the predictive model. The data was then checked for multicollinearity and spatial autocorrelation. To check for multicollinearity among the selected variables, we ran appropriate tests of association (for continuous vs continuous and continuous vs ordinal variables, Kendall's tau b; for categorical vs continuous variables, logistic regression; and for categorical vs categorical variables, Cramer's V). Among the correlated variables, selection for inclusion in the model was made based on which variable was better explaining the response. All prey (wild prey plus livestock) was highly correlated to wild prey encounter rate (r=0.812). Since all prey encounter rate was explaining higher deviance of dependent variable, it was selected. Similarly, NDVI and NDWI were moderately correlated (r=0.636), since NDVI was explaining higher deviance, it was selected. So, after variable selection, the global model consisted of the following 12 variables at these scales: All prey encounter rate (50m), slope (50m), elevation (1200m), slope deviation (100m), NDVI (100m), shrub abundance (50m), canopy cover (30m), distance to water (1200m), livestock encounter rate (ct) (50m), human encounter rate (1200m), distance to road (100m), distance to village (100m) (Figure 2). Spatial autocorrelation was checked among these variables using Moran's I statistic and it was found that many variables were spatially autocorrelated (Getis 2007;Moran 1950).

Spatial GAM
To account for the spatial autocorrelation in the data, spatial GAM model was constructed using geoGAM package in R ver. 3.6.3. (Nussbaum and Papritz 2017). It's a procedure to build parsimonious model based on gradient boosting, smoothing splines and a smooth spatial surface to account for the spatial structure. The GAM for spatial data or geoaddtive model in its full generality is represented by Where, f s (s) is a smooth function of spatial coordinates, which accounts for residual autocorrelation (Nussbaum and Papritz 2017).
Since the response variables in this case is binary, Bernoulli distribution is assumed and logit link used where, For building parsimonious model geoGAM automatically selects factors, covariates and spatial effects using componentwise gradient boosting, following which model is further reduced using cross validation (Nussbaum and Papritz 2017).
The nal model so selected was run on test data, to get model performance measures, area under the curve (AUC) and true skill statistic.

Risk map
A raster with 42m cell size was created and masked to the reserve boundary. It was then converted into points and for each of these points, values for all the explanatory variables were extracted. The nal selected model was then run on this data to predict predation risk probability for each of the points. The points were then converted back to raster. Finally, risk predictions were assigned as the value of the raster, to create HTC risk map.

Summary of ecological variables
Prey: All prey encounter rate (including livestock) was 2.98 ± 1.90 n/km (Figure 2). Wild prey encounter rate was 2.60 ± 1.71 n/km, and livestock encounter rate was 0.38 ± 0.61 n/km. Prey density estimate (including livestock) was 52.80 ± 7.10 n/km 2 for both winters combined. Wild ungulate prey density was 27.78 ± 2.61 n/km 2 while livestock density was 39.80 ± 10.90 n/km 2 within the CTH.
Cover: Mean canopy cover was 0.33 and shrub abundance 1.81 ( Figure 2). The mean value of NDVI was 0.15 ± 0.03 during summer and 0.22 ± 0.06 during winter (Figure 2).
Water: NDWI during summer was -0.09±0.04, and during winter 0.01±0.07. Euclidean distance to water source including ponds and water falls but excluding river at 1200m resolution was 1378.21m±906.25 ( Figure 2). Mean distance to river (at 30m resolution) was 4339.78m±3655.48.
Topography and contiguity: Mean elevation and slope were 350.69±83.84 and 9.58±9.93, respectively ( Figure 2). The TRI value was 63.68±32.82. As for FRAGSTATS class-level metrics, PD was highest for open forest. LPI was highest for moderately dense forest. Surprisingly clumpiness was highest for nonforest class and lowest for scrub.

Ecological predictors of HTC
The variables selected to be included in the nal geoGAM model were, all prey encounter rate (50m), elevation (1200m), NDVI (100m), shrub abundance (50m), and human encounter rate (1200m). Among these smooth terms, all prey encounter rate and shrub abundance were signi cant at α=0.05 level, and elevation was signi cant at α=0.1 level ( Table 1). The effective degree of freedom (edf) is higher than 3 for most of the smooth terms, indicating that the wiggliness is high, and relationships, nonlinear (Table   1). Even more is revealed by examining the partial effect plots of smooth terms also called rug plots. A partial effect plot shows the effect of an explanatory variable on the response variable after accounting for the effects of all the other variables included in the model. Upon examination of the partial effect plot for all prey encounter rate, it is revealed that it has an inverse relationship with log odd of livestock kill i.e., the odds of livestock kill by tiger are higher when prey is low (Figure 3 a). In case of shrub abundance, a unique trend was observed, log odds of livestock kill increases with shrub abundance but only till it reaches a certain mark, after which increase in shrub abundance seems to reduce odds of livestock kill (Figure 3 b). In case of NDVI, human encounter rate and elevation, as also indicated by their chi square p values, do not seem to have a signi cant relationship with odds of livestock kill (Figure 3 c, d, e).
The deviance explained by the model was 44.4% and AUC of the model was 0.91. When run on test dataset the model accuracy was calculated to be 0.65 (Table 2) and AUC was found to be 0.70, indicating that the model had fair amount of prediction capability. Habitat factors that structure the carnivore use of an area are likely to dictate livestock kill by the carnivore and, thereby, HCC. Preferred habitat parameters for tiger have been identi ed mainly as high prey density, forest contiguity, thick understory, proximity to water, and low human disturbance (Miquelle et al. 1999;Karanth and Sunquist 2000;Sunarto et al. 2012). Among these, past studies have linked HTC with tree cover, elevation/altitude, slope, aspect, proximity to reserve forest, proximity to water, distance to village, distance to road, and density of livestock, settlements, and roads (Li et  Our spatial modelling revealed that the potential ecological drivers of livestock depredation by tigers in Panna Tiger Reserve, were prey, and shrub, at ne scale. The model suggests that when prey encounter is low at ne scale (50m), i.e., tiger encounters less prey, it is more likely to predate upon domestic livestock.

Discussion
Low availability of prey has been linked with livestock depredation by carnivores including tiger (Fritts et al. 2003 . Thus, in predator-occupied habitats where there is low availability of wild prey, if the optimal foraging theory (large prey, high in abundance, easy to catch) is applied (Emlen 1966;MacArthur and Pianka 1966;Werner and Hall 1974), the tiger kills what it can with least effort i.e., livestock.
The second explanatory variable is shrub abundance, which is also selected at the same scale (50m) as prey. Shrub abundance seems to have a unique relationship with livestock kill, resulting in increase of livestock kill up to a certain point after which increase in shrub abundance is actually decreasing the odds of livestock kill. Although it is di cult to explain such a complex relationship, but it can be examined in the light of predation technique of tigers. Tiger is an ambush predator, therefore in areas where there is very low cover, it might be very di cult to make a kill, but in areas where cover is high the chances of success may improve (Greene 1986;Murray et al. 1995;Karanth and Sunquist 2000;Sunquist 2010). Studies done on tiger and other carnivores have also found that livestock predation risk was higher in habitats with high shrub density, because it provides cover for these predators (Davie et al. 2014;Miller et al. 2015). However, if the cover is too dense it is also less likely for grazers like livestock to venture into such patches because they would be devoid of grasses. Thus, making the relationship curve between livestock kill and shrub cover, bell-shaped. Livestock kill by tiger is thus a culmination of predator choice and foraging tactics, and, prey vulnerability and defense mechanism.
Studying ecological drivers of HTC in Panna Tiger Reserve, therefore revealed that in predator-occupied habitat if prey availability is low at ne scale, domestic livestock availability is high and ambush cover is available, the odds of predator depredating livestock become high.

Predation risk map
The risk map produced using spatial modelling shows that the risk is higher in the south eastern part of the reserve encompassing Panna Range and parts of Gehrighat Range (Figure 4).

Summary and recommendation
Ecological drivers of human-carnivore con ict are complex and scale dependent, with the likelihood of con ict being high in large carnivore habitat that has low prey encounter and in ux of domestic cattle. In case of Panna, it is suggested that mitigation efforts should be focussed on the administrative units agged as high risk by our study. Furthermore, detailed study should be conducted to understand lower availability of wild prey and higher availability of livestock in certain parts of the reserve, based on which, where required and deemed feasible, prey augmentation should be considered.

Declarations
Funding: This study was funded by National Tiger Conservation Authority (NTCA), India and Madhya Pradesh Forest Department, India Competing interests: It is stated that there is no con ict of interest among authors, funding agency, or with any other party. The authors have no relevant nancial or non-nancial interests to disclose.   Figure 1 Map depicting the study site i.e., Core area of Panna Tiger Reserve and 2 km natural buffer around it Maps depicting interpolated layers of variables included in the global model for livestock depredation, at selected scales: a) All prey encounter rate (50m), b) shrub abundance (50m), c) canopy cover (30m), d) slope (50m), e) elevation (1200m), f) slope deviation (100m), g) NDVI (100m), h) distance to water hole (1200m), i) distance to road (100m), j) distance to village (100m), k) human encounter rate (1200m), l) livestock encounter rate (ct) (50m) shrub abundance (50m), c) elevation (1200m), d) NDVI (100m), e) human encounter rate (1200m). The dashed line represents 95% con dence interval and the lines on x axis represent the frequency of data