MODIS-FIRMS and ground-truthing-based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms

Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 15 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model, Support Vector Machine, Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.


Introduction
The forests of Sikkim are part of the Eastern Himalayan biodiversity hotspot. It is home to a variety of rare and endemic species of flora and fauna (Arrawatia and Tambe 2011;Paul et al. 2005). The pressure of climate change, deforestation, development and overgrazing is a growing challenge for the conservation of this fragile ecosystem. These pressures have a direct impact on the wildfire regime of Sikkim Himalaya. For instance, from 2004 to 2009 in Sikkim, the drought-like conditions and index of hotness showed an increasing trend, 1 3 while the number of rainy days was below average. These trends indicate the impact of climate change in Sikkim (Arrawatia and Tambe 2012;Sharma and Thapa 2021;Banerjee et al. 2020;Dong et al. 2017;Kumar 2012;Pradhan and Badola 2015). Moreover, the drier winter season converts the deadwood and forest litter into potential fuel for wildfires. Studies indicate that wildfires are most common in the low elevation Sal forests of Sikkim, followed by temperate sub-alpine and coniferous forests (Arrawatia and Tambe 2011;Sharma et al. 2014).
Lightning is the main cause of wildfires in the sparsely populated North Sikkim (Sharma et al. 2014). In contrast, the high incidents of wildfire in the Oak and Sal forests of East, South and West districts of Sikkim are mainly due to human activities. Slash and burn farming, forest fire to deter wild animals from entering settlements and bonfires in the forest areas are the intentional causes of wildfires in Sikkim. On the other hand, car sparks, power transformers in forest areas, use of the traditional torch called Ranku, throwing live cigarettes/bidis are the accidental causes of wildfires (Sharma et al. 2014). To date, very limited studies have been done to map the likelihood of wildfires in Sikkim Himalaya. Identifying the wildfire hotspots and factors contributing to it is the first step towards disaster mitigation. A Multi-Criteria Decision-Making Technique showed that all the districts of Sikkim except for North Sikkim district are at higher risk of wildfire. Furthermore, dense forests of Sikkim are prone to wildfires due to human activities and the aspect of the area. The model accuracy was 82.36% (Laha et al. 2020). MaxEnt machine learning-based prediction of wildfires in Sikkim Himalaya indicates that proximity to roads, the fraction of tree cover and meteorological conditions were the major determinants of wildfire events. The model accuracy was 95% (Banerjee 2021).
Wildfires can be considered a double-edged sword. Uncontrolled, large-scale and frequent wildfires can inflict colossal damage to a forest ecosystem. For instance, wildfire destroys the wildlife habitats (Pastro et al. 2011;Haque et al. 2021), causes atmospheric phosphorus deposition in the local water bodies (Vicars et al. 2010), mobilizes heavy metals (Campos et al. 2015), promotes leaching of soil nutrients (Murphy et al. 2004), interferes in the mobility of wildlife (Banks et al. 2011), promotes the loss of soil biota, increase the volatilization of soil nutrients, soil erosion, and causes decline in biodiversity and forest biomass (Chandra and Bhardwaj 2015;Parashar and Biswas 2003). Long-term impacts of ecosystem disturbances including wildfire events can substantially change the nutrient composition of the soil. This can have profound ecological and functional impacts (Bowd et al. 2019). Wildfires have a differential impact on the mortality of plants depending on the species and size of the vegetation (Trouvé et al. 2021). Perhaps, such a wildfire-induced selection process can have a long-term impact on the species composition and overall health of the forest ecosystem. The increasing trend of wildfires in the Himalayan forests induced by climate change is acting as a positive feedback loop as they are accountable for the large-scale emission of greenhouse gases (Sannigrahi et al. 2020). Smoke generated from the wildfires can be a major health hazard. This health impact can have a wide geographic cover depending on the spread of the wildfire, population distribution and quality of health services (Cascio 2018). Health hazards associated with exposure to wildfires have high economic impacts in the form of public health liability in terms of premature deaths and respiratory diseases (Fann et al. 2018). Furthermore, a likely association between wildfire and the psychosocial health of children, adolescents and family have been observed (Kulig et al. 2019).
In contrast, periodic and relatively smaller scale wildfires do benefit the forest ecosystem. The release of nutrients from the burnt biomass into the soil improves the fertility of the vegetation. This is reflected in the increase in the abundance of grazers, rodents and birds in the forest. Also, wildfires increase standing biomass over the years in an undisturbed forest ecosystem (Lowe et al. 1978). Fire and Landscape Ecology Assessment Tool (FLEAT), a modelling tool to assess whether wildfires benefit or harm an ecosystem, suggests that wildfire has great ecological benefits (Keane and Karau 2010). Some of these benefits are echoed as ecological services to mankind. For instance, wildfires help in pest control, enhances flowering, pollination and germination (Pausas and Keeley 2019). In some cases, mixed-severity wildfires can be beneficial to some species. For instance, the wildfire-induced opening of habitat patches has promoted foraging by the spotted owls (Strix occidentalis) that has led to a significant increase in their abundance in the USA (Lee 2018).
Wildfire is governed by a wide range of environmental features. Studies indicate that meteorological features like atmospheric temperature, wind, precipitation, humidity and lightning events are good predictors of wildfire. Topographic features like elevation, slope, aspect, topographic wetness index, topographic roughness index and plan curvature have been widely used in preparing Wildfire Likelihood Map (WLM). In-situ features like soil type, soil moisture, land surface temperature, land use, potential evapotranspiration and soil carbon content have also been used in wildfire prediction. Ecological features like vegetation type, normalized difference vegetation index (NDVI), tree cover fraction, standing biomass, fuel biomass and proximity to water bodies are some of the accepted ones in preparing a WLM. Wildlife prediction studies have also used proximity to roads, proximity to settlements and population density as anthropogenic factors for wildfire occurrences (Arpaci et al. 2014;Estes et al. 2017;Flannigan and Harrington 1988;Guo et al. 2016;Jaafari et al. 2018;Kim et al. 2015a, b;Ljubomir et al. 2019;Sachdeva et al. 2018;Sharma et al. 2014;Tien Bui et al. 2019;Yathish et al. 2019).
Effective allocation of resources is a crucial step towards appropriate wildfire mitigation (Gheshlaghi et al. 2020). This issue becomes even more pressing with the increasing trend of wildfires due to climate change (Flannigan et al. 2000;Gillett et al. 2004;Williams et al. 2019). In most cases, authorities are effective in controlling wildfire. However, a small fraction of wildfires does get accidentally overlooked by the authorities. These wildfires can inflict substantial damage to the forest ecosystem as well as the local economy unless prior knowledge of the likelihood of wildfires is available to the stakeholders (Taylor et al. 2013). WLM provides the spatial probability of the occurrence of wildfires over a study area. Such a likelihood map can be prepared by the criteria-based overlay of environmental feature maps that influence wildfires.
Expert opinion-based multicriteria decision analyses like Analytic Hierarchy Process (AHP) and Analytical Network Process (ANP) have been used to prepare WLP. These methods rely on constructing a hierarchal structure of the model criteria and alternatives. Pairwise comparison of the criteria at each hierarchal level and that of the alternatives yield the criteria weight and the importance of alternatives in the context of the model (Banerjee et al. 2020;Yathish et al. 2019;Ljubomir et al. 2019;Regodic et al. 2018;Gheshlaghi et al. 2020;Goleiji et al. 2017). Other decision-based methods like fuzzy logic and fuzzy AHP have also been widely applied in preparing WLPs (Garcia-Jimenez et al. 2017;Gheshlaghi et al. 2020). Hybrid methods involving analytical network process and fuzzy logic have been applied in WLMs with fair success (Gheshlaghi et al. 2020). However, expert opinion and fuzzy logic-based methods have an innate limitation of subjectivity in the decision process. Moreover, such methods cannot handle a relatively large number of criteria as well as logical conditions. This is primarily because the comparison of criteria and logical conditions inflate rapidly with the increase in the criteria set. Also, unlike machine learning, expert opinion-based decision methods do not learn from the dataset (Behrooz et al. 2018).
Machine learning is a subdiscipline of artificial intelligence, that can learn from the dataset available, provided the data are sufficient and representative of the population under consideration (Géron 2017;Mitchell 1997). A range of popular machine learning methods has been widely applied in wildfire likelihood mapping (Banerjee 2021). For instance, logistic regression, a relatively simple machine learning method, has been widely successful in predicting wildfires in several studies (Guo et al. 2016;. Decision tree-based methods like Random Forest (RF) and Gradient Boosting Method (GBM) have been equally successful in wildfire predictions. The success of both methods is due to their simple approach of iterative dichotomization of the feature space with a tuning criterion that minimizes the cost of false prediction of the target variable (Arpaci et al. 2014;Chirici et al. 2013;Guo et al. 2016;Kim et al. 2019;Leuenberger et al. 2018;Massada et al. 2013;Sachdeva et al. 2018;Tehrany et al. 2019;Xie and Peng 2019). Support Vector Machine (SVM) is another machine learning method that has fared well in wildfire prediction in several GIS-based studies. SVM performs prediction by attempting to maximize the margin between the clusters of elements of the target variable in the features space. Here, feature space represents the hypervolume of environmental features used for predicting wildfire while the target variable is a binary set of presence and absence of wildfire events (Al_Janabi et al. 2018;Tehrany et al. 2019;). Brain mimicking machine learning methods like multilayer perceptron, deep learning and convolutionary neural network have been applied in wildfire prediction with high success. Collectively these methods are called Artificial Neural Network (ANN). ANN uses various neural architectures that iteratively adjusts the weight of the nodes that represent neurons of the simple brain. The nodes explain a certain aspect of the model features. ANN performs predictions by adjusting the weights of the nodes in a way that minimizes the cost of false prediction (Al_Janabi et al. 2018;Satir et al. 2016;Tien Bui et al. 2018Xie and Peng 2019;Zhang et al. 2019). The popularity of machine learning is based on its ability to automate the process of prediction. Also, it progressively improves upon its learning through successive exposure to new datasets. Furthermore, it objectively identifies trends and patterns. These merits come at a cost of high computational time and the data-greedy nature of machine learning. Moreover, methods like ANN are difficult to interpret due to the inner complexity of their architecture.
In this study, an attempt has been made to prepare the WLM of Sikkim Himalaya using various machine learning methods. The overarching objectives of this study include the preparation of WLM of the study area based on a comparative analysis of the machine learning methods. Secondly, the identification of the environmental features influencing wildfires in the study area. This study is most likely the first attempt to prepare the WLM of Sikkim Himalaya using a comparative analysis of machine learning methods. It presents a high-resolution WLM of Sikkim Himalaya that can significantly facilitate in the identification of wildfire hotspots. Accordingly, a robust wildfire management system can be developed by the state as well as the civic authority towards efficient resource allocation, early warning systems and awareness programmes. The study shows that roadways are the most important determinant of wildfires in Sikkim followed by meteorological factors like wind speed and ambient temperature.

Study area
Sikkim is the second smallest state of India situated in the north-eastern hills of Himalaya. It is neighboured by Tibet in the north, Bhutan in the east, West Bengal in the south and Nepal in the west. Sikkim is richly endowed by nature in terms of rugged mountainous terrain and a wide variety of endemic flora and fauna. Also, Sikkim is home to a vibrant collection of indigenous cultures and tribal communities.
Sikkim extends from 27° 00′ 46′′ N to 28° 07′ 48′′ N in latitude and 88° 00′ 58′′ E to 88° 55′ 25′′ E in longitude. Altitude-wise Sikkim varies from 280 m in the south to 8586 m in the north. The north of Sikkim is covered by the Great Himalayan range soaring to the world's third-highest peak, Mt. Kangchenjunga. The two main rivers of Sikkim include the Teesta River and its tributary, the Rangeet (Shukla et al. 2018).
The climate of Sikkim in addition to having winter, summer, spring and fall seasons, has a monsoon season that lasts from June to September. The dry period of Sikkim lasts from December to March. It is characterized by cold, dry and windy conditions. Much of the wildfires in Sikkim occur during this dry period. Overall, Sikkim has a subtropical climate in the south and a tundra climate in the north.
Different types of vegetation ecotype populate Sikkim, such as the Himalayan subtropical broadleaf forests dominate the lower elevations, Eastern Himalayan broadleaf forests populate the temperate zone above the altitude of 1500 m, Eastern Himalayan subalpine conifer forests grow from 3500 to 5000 m; and Eastern Himalayan alpine shrub and meadows in the higher elevations (O'Neill 2019). In terms of human presence, the bulk population of Sikkim reside in the southern and eastern parts with an average population density of 86 km −1 (COI 2011). This becomes evident by realizing that much of the road network and agrarian land dominate these areas of Sikkim (Fig. 1).

Data sources and data processing
Supervised machine learning algorithms require a dataset, often in the form of a table for learning. The table consists of columns and rows. The columns represent attributes or features based on which the predictions are made by the algorithm. Apart from the feature columns, a target or response variable column is also included. The target variable is the output that the algorithm learns to predict. The rows of the table, known as the instances, represent a set of features and their corresponding target variable. In this study, several environmental attributes were considered encompassing meteorological, topographical, ecological, in situ and anthropogenic features (Devisscher et al. 2016;Ghorbanzadeh et al. 2019;Joseph et al. 2009) (Table 1).
Meteorological factors like above-ground air temperature, precipitation and wind speed are important determinants of wildfire. Higher temperature, low precipitation and high wind speed facilitate wildfire occurrences (Flannigan and Harrington 1988;Mhawej et al. 2015). Topographical factors like elevation influence meteorological factors. At higher elevations, the wind speed increases while precipitation and temperature tend to decrease. Also, a steeper slope facilitates the spread of wildfire. Aspect influences the amount of insolation and the humidity of the biomass. Studies have shown that wildfires are more common in the south to the southwest direction (Graham et al. 2004;Jo et al. 2000;Mhawej et al. 2015). Topographic Wetness Index (TWI) is a measure of potential soil wetness (Krawchuk et al. 2016). A low TWI indicates drier soil, a potential facilitator of wildfires. Also, the curvature of the terrain influences convection, radiation and the transport of burning material (Hilton et al. 2017). All the topological features in this study were prepared from the DEM using ArcGIS. Ecological features like NDVI indicate the health of vegetation and potential fuel biomass. Areas with moderate tree cover are more vulnerable to wildfires (Zhang et al. 2019). Proximity to water 1 3 bodies indicates the level of soil moisture and human disturbances. In situ conditions like low soil moisture promotes wildfire (Krueger et al. 2015). Also, soil surface carbon content can be considered as a representative of the litter content of the forest floor. Dry litter acts as fuel to wildfire. Anthropogenic factors like proximity to the road network, settlements and population density were considered in this study. These features substitute for human-induced activities such as sparks of vehicle engines. Other wildfires inducing activities are, slash and burn farming and burning of forest vegetation to deter wildlife from entering settlements. Population density has a strong relation with recreational activities in the forested areas such as bonfires, deforestation for firewood and timber. Also, population density partly explains illegal forest exploitation due to unemployment (Sharma et al. 2014).
The target variable in this study was the presence or absence of wildfires. The historical wildfires dataset of Sikkim Himalaya was prepared using two data sources. The ground-truthing-based dataset was procured from the Forest Environment and Wildlife Management Department (FEWMD), Govt. of Sikkim. The timeframe of FEWMD ranged from 2016 to 2018. The remote sensing-based fire events dataset was accessed from the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) data archive at the Fire Information for Resource Management System (FIRMS) site (FIRMS 2020). The coarse resolution of 1 km of MODIS is complemented by the relatively finer resolution of 375 m of VIIRS. The fire dataset of Sikkim Himalaya accessed from FIRMS ranged in the timeframe from 2000 to 2019. However, the dataset of FIRMS does not distinguish wildfire from   Fig. 1). The FIRMS fire dataset was intersected with the forest cover raster to include the fire events restricted within the forest cover area. In this way, only the wildfires were identified from the FIRMS dataset. Furthermore, to prevent double-counting from the data sources, wildfires of FIRMS within 200 m proximity from FEWMD were dropped from the dataset. The groundtruthing-based dataset and modified FIRMS dataset were combined to prepare the final wildfire dataset of 754 events. All the feature rasters and the wildfire dataset were projected from the geographic projection system of GCS-WGS-1984 to the plane projection system of WGS-1984-UTM-Zone-45N in the GIS framework. The latter provides an appropriate measurement in the metric system for India. The cell size and extent of all the feature rasters were standardized to be the same. Next, the rasters were exported as GeoTiff rasters from the GIS framework and imported into the R-programming framework for feature extraction and machine learning. The feature rasters were transformed into unitless rasters by normalization method (Chang 2017): where the pixel value z i is the normalized value of a raster ranging from 0 to1 at the ith location of the study area extent. x i is the pixel value of the ith location before the normalization. x min is the lowest value, while x max − x min is the range of the raster.
Apart from the wildfire-presence dataset, an equal number of the wildfire-absence dataset was prepared within the study area extent using random sampling. The two datasets were combined to prepare the presence-absence dataset. Next, the feature rasters were stacked and the presence-absence dataset was used to extract the feature values from the feature stack to populate the presence-absence dataset with the features (Supplement 1). The presence-absence dataset was used to train various machine learning methods. For this, the target variable was treated as a categorical variable while the features were treated as continuous variables. The dataset was split into training and testing subsets with 75% devoted to training and the rest for testing the prediction model (Boutaba et al. 2018;Géron 2017;Luo et al. 2017). There is no hard and fast rule of splitting the dataset for training and testing. Usually, the percentage splitting of the dataset can be arrived at by a trial-and-error method based on the total size of the dataset or by following the prevailing practices in machine learning research. In this study, the latter option was adopted. Furthermore, the training dataset was split into ten subsets for cross-validation. During cross-validation, repeatedly, one of the subsets of the ten training subsets was used to test, while the others were used to train the machine learning algorithm. The average of the errors generated by the repeated tests made it possible to adjust the parameters of the model for better ML performance.
Four machine learning methods were considered for this study, namely, generalized linear model (GLM), SVM, RF and GBM. The selection was based on their prevalence in the disaster prediction mapping literature, relatively higher efficiency and efficacy in predictions and better model interpretability. (1)

Multicollinearity analysis
Studies suggest that a wide range of environmental features can influence wildfire depending on the area of interest. However, identifying the environmental features specific to the present study area was the key step for robust prediction modelling. For this, correlation matrix and multicollinearity analysis were performed (Argyrous 2011). In multicollinearity analysis, the Variance Inflation Factor (VIF) exceeding 10 are used to identify multicollinear variables. This indicated data redundancy of the dataset. The quality of the dataset was improved by dropping one or more of such feature variables and rechecking multicollinearity criteria. Once the multicollinearity criterion was met the dataset was fit for further analysis.

Generalized linear model
GLM is an umbrella term used for linear regression models that are characterized by certain properties. Firstly, the dependent or the target variable of the regression model belongs to the exponential family of the probability distribution. In other words, the predictor behaves nonlinearly with the change in the model parameter. Secondly, the predictor itself is linear. Thirdly, there exists a link function that yields the predictor value when provided with appropriate covariate features of the model. GLM is expressed as (McCullagh and Nelder 1989): where E(⋅) is the expected value of the matrix of dependent variables Y. g is the link function that takes X as the matrix of covariate features and vector of parameters . The predictors, as well as the covariates in GLM, can be continuous or categorical. GLM includes regression models like linear regression, ANOVA, logistic regression, log-linear regression, Poisson regression and multinomial response regression. In the case of a binomialdependent variable, like Y = {0, 1} , GLM takes the form of the logistic regression model. The sigmoidal function used for logistic regression generates an S-shaped probability distribution between 0 and 1: where = 0 , 1 , … , n is the vector of model parameters and x = 1, x 1 , … , x n is the vector of predictor features of an instance from the dataset. The superscript T represents the transpose of a vector. For training of the model, a cost function was used as (Géron 2017): The model was fit to the training dataset using the maximum likelihood method by minimizing the convex log loss function: where M is the size of the training dataset. P (i) = P y (i) |x (i) as given in Eq. (3) and superscript (i) is the index of the instance of training. The partial derivative of the log loss function over all the parameters gives the gradient descent for the model to reach the solution.

Support vector machine
In SVM a vector of features x = x 1 … x M and their corresponding target y, pair up to make the instances (x,y). SVM is a supervised machine learning method that partitions the M-dimensional feature space by a decision boundary. The decision boundary is generated by the inner products of a set of feature vectors called support vectors that defines the partition. The decision boundary can be expressed as: where h(x) T is a vector of basis functions defined as features and N instances. A basis function explains the nonlinear relationship of the feature vector x i with the regression model f (x) . is a vector of coefficients of the corresponding basis function and 0 is the intercept. The decision boundary that maximizes the margin of the partition was considered for regression. Equation (6) can be amended to include a kernel trick K(·) that solves for the hypothesis f (x) by using the Lagrange dual function: where ̂i is a positive constraint. The kernel K x, x i is the generalized form of the inner product of a feature point and support vectors expressed as ⟨ h(x), h x i ⟩ . Depending on the nature of the SVM algorithm, the kernel type is selected. In this study, the radial basis function (RBF) kernel has been considered: Considering a two-class regression, like the wildfires, if the feature point x in the feature space occurs close to a support vector, then the RBF will be approaching one. This will lead Eq. (7) to the prediction of the likely value of f (x) =ŷ . The parameter of RBF controls the Gaussian distribution of RBF. The hypothesis was directed towards the solution by minimizing a cost function: Considering y i = 1 and correct prediction made by the hypothesis, i.e. f x i = 1 during the training of SVM, Eq. (9) reduces to minimizing the term 2 ‖ ‖ 2 . This second term of Eq. (9) was mainly to maximize the margin of the decision boundary. The hyperparameter controls the smoothness of the decision boundary.
Thus, through iterations, the SVM constructed a decision boundary in the feature space using a kernel function and by minimizing the cost function of misclassification. A test instant was regressed, based on its location with respect to the decision boundary and proximity to the support vectors. A detailed account of this discussion can be found in Hastie et al. (2017) (

Gradient boosting model
GBM uses ensemble learning that progressively does weighted stage-wise addition of weak learners in the form of regression trees to compensate for the limitation of the existing weak learners. This process eventually generates a strong learner. A regression tree partitions the feature space into regions R m , m (1, … , M) based on a split criterion that minimizes the sum of squares of the mean of the region from its corresponding target value. The splitting criterion can be the minimization of entropy of information or the Gini index. The outcome of the partitioning is expressed as: where y m is the mean of the target of the R m . Consider T(x;Θ) to be a tree learner, defined by an instance x and Θ as the parameterized form of the expected loss function. Furthermore, a loss function L y i , f x i is constructed to estimate the deviation of f x i from y i . GBM approaches the solution by minimizing the squared sum of the tree learner T x;Θ t from negative of the gradient of the loss function g im for the ith instance in the mth region at each iterative stage: where Equation (12) expresses the slope of the gradient descent of the loss function over the hypothesis space. The GBM solves by taking the steepest gradient descent g im using a learning rate i . This is followed by updating f m (x) giving more weight to trees that give higher Θ , as given in Eq. (13). As a result, the boosting process focuses more on residual errors: The final boosted tree is the weighted sum of all the trees: Thus, GBM starts the iterative ensemble process by initially constructing a weak learner regression tree and compares it with the target. This is followed by the stepwise construction of another such tree. But this time the focus of the algorithm is more towards the residual error. This is achieved by giving more weightage to the residual error. In this way, the progressive learner becomes relatively better than its predecessor. A loss function is used to track the learning success of the trees. GBM fits the residual with the steepest gradient of the loss function to boost the learning process. Eventually, as the learning process approaches a certain tolerance level, a weighted sum of all the weak learners yields a strong learner for testing and regression. In this study, a stochastic GBM was applied that used the Gaussian loss function for training. The stochastic GBM takes a subsample of the training instances. This prevents overfitting, speeds up the computation and helps in the regularization of the model. A more detailed discussion on this topic can be found in Hastie et al. (2017) and Natekin and Knoll (2013).

Random forest model
Random forest is another form of ensemble-type machine learning method that uses regression trees for the solution set. In this method, several regression trees are constructed by selecting a subsample from the dataset by bagging method as discussed in GBM. Furthermore, instead of taking the entire set of features, a subset of the same is randomly selected by the algorithm, usually √ p , where p is the number of features of the model. A small subset of the feature set helps in reducing the variance of the average of the predictions of the trees. This increases the accuracy of the prediction. After that, the best regressor feature and splitting value are selected that minimizes the impurity or cost function. The feature space is dichotomized using the splitting value. This creates two daughter nodes of the decision tree. This process is iterated till the minimum number of terminal nodes of the tree is reached and the error reduction reaches a plateau. The value of the target is estimated as: where B is the total number of regression trees constructed. T x;Θ b is the tree defined by the instance x and loss parameter Θ . Like GBM, RF also enjoys high accuracy in predictions (Géron 2017;Hastie et al. 2017).

Model performance
The performance of a model was assessed by estimating how far was the prediction of the model as compared to the actual observation. In this study, several indices were considered to cover all the aspects of the model performance (Ali et al. 2021;Pham et al. 2020Pham et al. , 2021Tuyen et al. 2021;Zhang et al. 2021). They include Root Mean Square Error (RMSE), Mean Absolute Error (MAE): where f (⋅) is the hypothesis of actual observation y of ith instance from a total of m-sized test set of the dataset matrix X. Ideal value for both the performance criteria is zero. However, RMSE is more sensitive to outliers than the MAE (Géron 2017).
Furthermore, two widely accepted model performance criteria have been used in this study, namely, confusion matrix and Receiver Operating Characteristic (ROC) Curve. The confusion matrix is a tabular representation of correctly and incorrectly classified instances, based on the comparison of the predicted and observed values of the target variable of the test set. Typically, in a binary classification problem, the confusion matrix has four elements. The first element of the primary diagonal of this matrix holds the True Positives (TP), the presence-instances that have been correctly classified as positives. The second element of the primary diagonal holds the True Negatives (TN), the absence-instances that have been correctly classified. In contrast, the first element of the secondary diagonal is the False Negatives (FN), the presence-instances, that have been incorrectly classified as an absence. The second element of the secondary diagonal is the False Positives (FP), the absence-instances, that have been incorrectly classified as presence. The overall accuracy (OAC) of the model is estimated as: An accuracy value not less than 0.7 is usually considered a good prediction. Other relevant performance indices include Sensitivity, Specificity, Precision, Cohen's Kappa and F1 score. Sensitivity is the measure of the proportion of presence-instances that have been classified as positives (Eq. 17). Specificity is the measure of the proportion of absence instances that have been classified as negatives (Eq. 18). Precision is the measure of the accuracy of positive predictions of the model (Eq. 19). Cohen's Kappa is the measure of the deviation of the relative predicted agreement by the model from the hypothetical probability of chance agreement (Eq. 20). F1 score is the harmonic mean of precision and sensitivity values (Eq. 21). It holds a higher value only if both precision and sensitivity are high: where P obs is the proportion of pixels correctly classified as wildfires or as non-wildfires, and P exp is the proportion of pixels for which the amount of agreement is expected by chance only.
The ROC curve is used to visualize the change in sensitivity over the specificity of the prediction model. A perfect prediction by a model should yield an ideal tuple of (1,1) implying perfect sensitivity and specificity. Usually, a good model generates a concave ROC curve with respect to the diagonal connecting (1,0) to (0,1) for the tuple (sensitivity, specificity), with a high value of area under the curve (AUC).
A comparative analysis of the performance of the machine learning methods was done using the box and whisker plot and scatter plot matrices. To do this, 30 samples from the precision × sensitivity precision + sensitivity dataset were selected using the cross-validation method. These plots provided visualization of the similarities between the performances of the machine learning methods.

Wildfire likelihood model
In this study, several R-packages were used for data pre-processing and machine learning (Hunt 2020;Kuhn 2020;Liaw and Wiener 2002;Robin et al. 2011). These packages were used for training and testing the machine learning algorithms. Furthermore, machine learning algorithms also provided the importance of the feature variables used in the model. Features with no importance to the model were dropped from the dataset and algorithms were rerun. This was followed by stacking of the feature rasters. The stack was used by the machine learning algorithms for predicting the wildfire probability over the entire study area. The matrix of predicted values of the entire study area was exported as a GeoTiff raster, that stores the predicted values along with their respective latitude and longitude values. The rasters of predicted values of wildfire were imported into the GIS framework for further analysis that included the categorization of the raster into areas of very low, low, medium, high and very high probability of wildfire (Fig. 2).

Results
The wildfire inventory dataset was used to analyse whether wildfires were on the rise in Sikkim Himalaya. Time series data of wildfires over the years 2000 to 2019 indicated that there was a growing trend. The picture became clearer by forecasting the wildfires using Holt's forecast model. The model predicted that the wildfires were likely to increase from 82 in 2019 to 96 in 2022 with an uncertainty of ± 62.343 events (Fig. 3). To identify the

Multicollinearity analysis
Based on the prevailing literature, initially, 16 environmental features were considered for the prediction. However, multicollinearity analysis brought the number of explanatory environmental features down to 15 (Fig. 4, Table 2).

Impact of environmental features on wildfires
Wildfires showed more propensity over certain intervals of the ranges of environmental features. Except for TWI, they were mostly normally distributed over the topographical feature. For instance, in aspect, wildfires were more common over the interval 140°-220°, covering southeast to southwest direction. Also, wildfires were more common over steep slopes (29°-32°). Wildfire events over plan and profile curvatures showed normal distributions over certain ranges. In the case of plan curvature, all the wildfires occurred in convex curvatures, while the same occurred in concave curvatures for profile curvature (Pourghasemi 2014). In contrast, wildfires showed skewness over lower TWI interval (5-6.5). Wildfires also showed more skewness over the meteorological features. Temperature-wise wildfires were very common in warmer areas of Sikkim, mainly the lower altitude areas with high average temperature (19°-24°). Moreover, wildfires are more concentrated over low average wind speed (about 1.6 ms −1 ). Regarding ecological features, wildfires were more common over moderate NDVI values (about 0.6) and moderate tree cover (43-49%). Also, wildfires were more clustered near the water bodies (400-630 m away from water bodies). Regarding the in situ features, wildfire events were mostly confined to low carbon content

Generalized linear model
The GLM model was run with 15 environmental features as explanatory variables. However, features like Aspect, plan and profile curvatures, and TWI were found to be not significant. Thereby, they were dropped from the model and the model was rerun (Table 2). Hence, the GLM model-based prediction included ten explanatory variables and 1090 instances. Tenfold cross-validation was performed for model tuning. From Table 2, it is evident that proximity to roadways and low wind speed were the strongest determinants of wildfire in Sikkim Himalaya. Also, features such as proximity to water bodies, slope and average ambient temperature were partly accountable for wildfires. Interestingly, distance from human habitations had an inverse effect on wildfire occurrences. Low soil carbon and drier soil promoted wildfires. Also, low tree cover encouraged the chances of wildfires. The model was able to explain 62% of the predictions (Table 3). The model performance was satisfactory, with low RMSE and MAE, while high AUC, Accuracy, Kappa, Sensitivity, Specificity, Precision, F1 Score, and Goodness of fit (R 2 ) ( Table 4).

Support vector machine
The nonlinear kernel, Radial Basis function, was used in SVM for the prediction of wildfires. SVM used 727 support vectors to distinguish between the presence and absence of wildfire instances from the training dataset. Tenfold cross-validation was performed to tune the model. SVM uses several parameters known as hyper-parameters to tune the algorithm and converge to the solution. Model hyper-parameters, namely sigma (σ), epsilon (ε) and cost C settled at 0.106, 0.1 and 1, respectively. The ε tunes SVM by determining the number of support vectors to be considered for regression. C, which is similar to λ in Eq. (9), is accountable for regularization that provides a trade-off between over-and under-fitting of SVM. The objective function value of SVM settled at −292.33 and the training error, the convergent error of the model achieved from the training set, was 0.286. RMSE of the final iteration of SVM was the same as GLM (Supplement Fig. 3a). However, other performance indices were worse than GLM except for sensitivity, F1 score and RMSE (Table 4).

Gradient boosting model
Under GBM, Stochastic Gradient Boosting was used using the Gaussian loss function. While converging to the solution, the GBM takes smaller learning steps to reduce the effect of each additional fitted weak learner tree. This penalization reduces the chances of giving undue importance to erroneous iterations. This method is called 'shrinkage'. The 'n.minobsinnode', another tuning parameter of GBM, is the minimum number of observations in trees at the terminal nodes. The GBM in this study used the default values of shrinkage and n.minobsinnode at 0.1 and 10, respectively. GBM converged to the solution with 150 decision trees (n.trees) with an interaction depth of 3 (Supplement Fig. 3b). Performance-wise GBM outperformed GLM and SVM, except for MAE (Table 4).

Random Forest model
RF was used to predict wildfires in Sikkim Himalaya using 500 decision trees. RF converged to the solution when the algorithm selected eight environmental features at random at each split (mtry) (Supplement Fig. 3c). The mean squared residual of RF was 0.082, while RF explained 67.27% variance of out-of-bag predictions of the target variable of the training set (% Var explained). RF outperformed all the other prediction models (Table 4).

Comparative analysis of models
The box and whisker plots of accuracy and kappa showed a very similar pattern of distribution over the 30 samples selected to form the wildfire dataset using the cross-validation method. In both cases, RF distribution was rightwards than all other models indicating a higher model performance, followed by GBM. The greater width of the box in the case of GBM indicated a greater interquartile range. This showed a generalization approach of classification by GBM which is reflected in its better classification ability than GLM and SVM. The long range of observations as whiskers in the case of GLM indicated its inability to classify the incidents efficiently. In contrast, in the case of MAE and RMSE, RF showed a distribution shifted towards the left with a compact interquartile range and the range of observations skewed towards the left. This showed that RF is a much better contender in classification than other models which provide very similar MAE values, however with varied prediction distributions. RMSE being sensitive to outliers showed a clearer picture with the lowest value for RF followed by GBM. In the case of R 2 , RF showed a compact interquartile distribution with a high value. In contrast, GBM showed the next highest value of R 2 ; however, its mean value has been pulled leftward by the skewed distribution of its predictions (Fig. 6). Scatter plot matrix (SPLOM) showed that RF and GBM have a higher correlation in predictions. Similarly, GLM and SVM showed a strong correlation. In contrast, such a correlation was not observed in the case of MAE, except for RF and GBM (Fig. 7). The ROC curves indicated that all the models performed satisfactorily, while RF outperformed all the other models (Fig. 8).

Variable importance
Proximity to roadways got the highest importance in GLM, GBM and RF. This was followed by average wind speed that got the highest importance in SVM and GLM. Also, features such as average temperature, NDVI and tree cover were found important in SVM.
Methods like RF and GBM that uses regression trees for prediction gave disproportionately high importance to proximity to roadways and average wind speed. In contrast, methods like SVM and GLM that do not rely on regression trees gave more distributed importance to all the features. Topographic features like plan curvature, profile curvature and TWI received no to low importance in all the models. In situ features received low to moderate importance. Anthropogenic features, meteorological features and ecological features were found to be the most important determinants of predictions (Fig. 9).

Wildfire prediction maps
All the feature maps were projected to the plane coordinate system of WGS-1984-UTM-Zone-45 N, as it is appropriate for India. Furthermore, all the feature maps were resampled to 30.7 m resolution. Accordingly, all the WLMs had the same projection system and resolution. In all WLMs, the southern part and valley areas of Sikkim Himalaya were found to be at a higher risk of wildfires. In the case of GLM, wildfire probability in most of the study areas was found to be very low, except for warmer valley areas of southern parts of Sikkim Himalaya. However, GLM put more emphasis on areas with high soil carbon content. This led to the consideration of such areas as high wildfire likelihood values (Fig. 10a). In contrast, SVM gave a slightly higher probability than GLM to most of the study area, along with giving a higher probability of wildfire to a much larger fraction of the study area (Fig. 10b). GBM devoted more areas to wildfire than GLM and SVM (Fig. 10c). RF gave more importance to valley areas than GBM, although the spatial distribution of wildfire probability showed high similarity with GBM (Fig. 10d). Based on the accuracy and performance of the prediction models, the WLM of RF was considered the best for Sikkim Himalaya. The WLM of RF was classified into five categories namely very low, low, medium, high and very high likelihood of wildfire based on natural breaks in the GIS framework (Fig. 11). Compared to high likelihood categories, very high likelihood of wildfire category had a relatively larger area (Fig. 12).

Discussion
The overarching objective of this study was to prepare the WLM of Sikkim Himalaya based on a comparative study of machine learning methods with appropriate explanatory variables. The study yielded prediction maps with good model performance indices.

Comparison between machine learning methods and their implications
In this study instead of just one algorithm, four algorithms were considered. This was mainly to identify the algorithm that performs best in the wildfire prediction out of the popular algorithms considered. Contrary to previous studies, RF outperformed other machine learning methods in wildfire predictions (Ogutu et al. 2011;Tehrany et al. 2019;Xie and Peng 2019). This observation was in harmony with studies performed by other authors (Guo et al. 2016;Kim et al. 2015a, b;Massada et al. 2013). The better performance of RF in comparison with GLM and SVM can be because RF uses the ensemble method Fig. 5 Environmental features. All the maps have been reclassified using Jenks natural breaks method. The natural breaks method minimizes variance within categories while maximizing the variance between categories. This leads to an increase in the quality of the classification (Jenks 1967   In the ensemble method, the average output of several decision trees is considered. This process increases the chances of correct prediction. Also, contrary to SVM, RF is good at handling datasets with many outliers (Andreas 2013). As observed from the dataset, the histograms of several environmental features in this study were skewed. Perhaps, this was another reason for the better performance of RF. The comparative analysis of the models was based on samples extracted from the wildfire dataset through the cross-validation method. It showed that GLM had a much wider range of accuracy and kappa values. This can be explained by the limited number of feature variables considered in the GLM model in comparison with other models. Furthermore, the smallest range of MAE of GLM indicated that the possible reason for the wide ranges of accuracy and kappa values can be due to a large set of outliers in the wildfire dataset (Géron 2017). The higher correlation between GLM and SVM as well as that of GBM and RF in terms of accuracy and MAE showed that out of these pairs of models only one should be considered while making an ensemble of models to improve the prediction capacity of wildfire events (Brownlee 2016).

Importance of feature variables
Consistent with previous studies, this study suggested that meteorological features like wind speed and to some extent ambient temperature were important determinants of wildfires. The low wind speed and warm temperature of the valley areas are features of sub-tropical Sal and Oak deciduous forests prone to wildfires in Sikkim Himalaya. The  1 3 anthropogenic feature like distance from the roadways on average was the strongest predictor of wildfires. To lesser extent proximity to human habitations also contributed to the predictions of wildfires. These observations second the previous studies on wildfires of  . Contrary to the MCDA-based study, namely using AHP, on forest fire risk zones of Sikkim (Laha et al. 2020), the present study gave limited importance to the aspect, except for the SVM model. However, indirect measures of human population density, namely, proximity to human settlements and roadways supported the observations of Laha et al. (2020). Like the observations by Banerjee (2021), this study showed that proximity to roadways was the most important determinant of wildfire in Sikkim. However, contrary to Banerjee (2021) average wind speed has been given more weight in this study than average ambient temperature. Looking at the correlation matrix these two meteorological variables had a significant negative correlation. However, their collinearity in terms of VIF was within acceptable limits. Thereby, they were considered as independent variables in this study. Furthermore, tree cover fraction has been considered as an important factor in wildfire prediction in both the studies.

Future risks of wildfire
The study showed that wildfires were predominantly distributed in the lower altitudes and valley areas of Sikkim Himalaya. Few observations can be made about these areas. The meteorological conditions of these areas were identified as having relatively warmer ambient temperatures and low wind speed. Also, the road network of these areas closely follows the river network. Steep slope facing southeast to southwest aspect with low TWI explained most of the wildfires of these areas (Graham et al. 2004;Jo et al. 2000;Mhawej et al. 2015). Low soil carbon and water content areas had more incidents of wildfire. The role of human activities in the occurrence of wildfires was evident from the study. These observations were similar to previous studies (Arpaci et al. 2014;Kim et al. 2019). However, contrary to previous studies, proximity to settlements as a feature had a contradictory role in this study as the bulk of the wildfires were on average 2.5 km away from the human habitations (Kim et al. 2019;Massada et al. 2013;Nami et al. 2018;Vilar et al. 2016). This may be since the land use around the settlements was mainly non-forest lands like agrarian The WLMs did not effectively predict the wildfires of upper North Sikkim. This may be since in this study meteorological factors, like the occurrence of lightning was not considered. In contrast, a study done earlier does mention the role of lightning in wildfires in North Sikkim (Sharma et al. 2014).
This study is probably the first attempt to systematically prepare the WLM of Sikkim Himalaya using multiple machine learning models. In line with studies done in other locations, this study indicated that anthropogenic and meteorological factors were the most prominent descriptors of wildfires. Also, this study highlighted that machine learning methods were reliable means of preparing hazard maps. However, the reliability of the predictions heavily depends on the wildfire inventory. This can be achieved by pruning instances with incorrect target variable or incomplete instances. Usually, a large and representative inventory leads to better predictions. Also, the engineering of features like normalization and removal of multicollinear features are essential steps for dataset preparation. Regarding the choice of algorithms, consideration of the nature of the dataset, in terms of whether the target variable is binomial, multinomial, categorical, or continuous is important. Moreover, skewness of the features has an important role in the choice of machine learning methods. Cross-validation and choice of hyperparameters for the regularization are essential steps towards reliable algorithm-based predictions.
The outcomes of this study can be useful to the stakeholders for the preparedness and effective allocation of fire-retarding resources and manpower to wildfire-prone areas. Furthermore, vulnerability assessment of wildfire in Sikkim can be performed based on this study by overlaying socioeconomic and environmental cost map on the wildfire likelihood map of Sikkim. Such studies can be very helpful in wildfire mitigation and land-use policies.

Conclusion
Applications of machine learning in geospatial analysis are progressively expanding. One of the prominent niches of this new branch of science is the predictive modelling of natural hazards. This study presents the wildfire prediction map of Sikkim Himalaya using four machine learning methods. These methods were run over the wildfire dataset involving several environmental features encompassing, meteorological, topographical, ecological, in situ and anthropogenic factors. The methods, namely Generalized Linear Model in the form of Logistic Regression, Radial Basis Function Kernel-based Support Vector Machine, Gradient Booster Method and Random Forest, are compared using model performance criteria. Amongst these, Random Forest computes the most accurate prediction followed by Gradient Booster Method. These methods produce high values of AUC, Accuracy, Kappa, Sensitivity, Specificity, Precision, F1 Score and Goodness of fit and low values of RMSE and MAE. These decision tree-based methods marginally outcompeted SVM and GLM.
Furthermore, it is concluded that meteorological factors like ambient temperature and wind speed over the dry season, as well as anthropogenic factors like proximity to roadways, are the most important descriptors of wildfires in Sikkim Himalaya. Most of the wildfires in Sikkim are prevalent in the low altitude valley areas of the south. These observations can be internalized into the wildfire mitigation policies towards the consequences of slash and burn farming, use of fire to discourage entry of wildlife in settlements and traffic-induced wildfires. Also, long-term policy intervention can be prepared from this study regarding the impact of climate change-induced changes in the meteorological conditions of Sikkim Himalaya.
This study shows that machine learning can be combined with GIS to produce robust geospatial models of wildfire predictions. Machine learning can be a reliable wildfire management tool. Such a tool can be further improved by integrating online learning where the prediction model can have an incremental learning from a near real-time database like MODIS FIRMS. The methodology of this study can be further extended to include more in situ and meteorological factors into the feature space. Also, other artificial intelligence methods like ANN, evolutionary algorithms and agent-based learning can be applied to the wildfire dataset to generate better and reliable prediction maps. However, such studies need to trade-off between accuracy and interpretability.