Maximum entropy method-based forest fire prediction mapping of Sikkim Himalaya.

The recent episodes of forest fire in Brazil and Australia of 2019 are tragic reminders of the hazards of the forest fire. Globally incidents of forest fire events are in the rise due to human encroachment into wilderness and climate change. Sikkim with a forest cover of more than 47%, suffers seasonal instances of frequent forest fire during the dry winter months. To address this issue, a GIS-aided and MaxEnt machine learning-based forest fire prediction map has been prepared using forest fire inventory database and maps of environmental features. The study indicates that amongst the environmental features, population density and proximity to roads are the major determinants of the forest fire. This indicates the role of human activities on the incidences of a forest fire. Model validation criteria like ROC curve, correlation coefficient and Cohen’s Kappa show a good predictive capability (AUC = 0.95, COR = 0.77, κ = 0.77). The outcomes of this study in the form of a forest fire prediction map can aid the stakeholders of the forest in taking informed mitigation measures.


Introduction
The incidents of forest fire in Sikkim Himalaya take a peak during the dry period of the year from November to March due to the accumulation of dry biomass over the forest floor. These incidents may occur by natural causes like lightning as Sikkim falls under the northeast region of India, which is considered a high lightening zone. Anthropogenic causes of forest fire in Sikkim include intentional and accidental factors. Bonfires by the cattle herders, burning of the forest floor to deter wild animals entering the agrarian land, logging induced decrease in forest canopy cover are the intentional causes of forest fire in Sikkim. While, sparks from the uphill moving vehicles, electric transformers located in forested areas, use of traditional torch called Rankoo, throwing away of live bidi and cigarettes butts are the accidental causes of forest fire (S. Sharma, Joshi, and Chhetri 2014).
A forest fire can be considered as a mixed blessing. Low-intensity forest fire opens up the canopy cover and removes dead wood, providing a niche for new plants to grow. Also, burning of the forest releases the nutrients bound with the biomass to the soil, rejuvenating forest growth. Forest fire also offers new ecological niches for wildlife to proliferate. In contrast, high-intensity forest fire leads to loss of soil biota, volatilization of soil nutrients, increase in soil erosion, a decline in biodiversity and forest biomass (Chandra and Bhardwaj 2015;Parashar and Biswas 2003).
A wide range of features have been considered for prediction of forest fire. According to the review of forest fire in the Indian context done by Joseph et al. (2009), topographical features like altitude, aspect, slope, Topographic Wetness Index (TWI) have been used in forest fire prediction.
Meteorological features like average precipitation, temperature, humidity and wind speed have been used to understand forest fire characteristics. In other studies lightening has been focused to predict forest fire (Chen et al. 2015). Vegetational features like vegetation type, Normalized Vegetation Difference Index (NDVI); human induced features like proximity to road network, human habitation or Wildland-Urban Interface (WUI); and in-situ factors like soil moisture, soil texture, fuel density and tree cover fraction have also been used for forest fire prediction (Mhawej, Faour, and Adjizian-Gerard 2015;Satir, Berberoglu, and Donmez 2016;Jaafari, Zenner, and Pham 2018;Gheshlaghi, Feizizadeh, and Blaschke 2020) A forest fire or wildfire prediction map has become a valuable tool for disaster management and ecological restoration. Multicriteria decision analysis such as Analytic Hierarchy Process (AHP), Analytical Network Process (ANP) and other forms of expert opinion based methods have been applied in forest fire prediction mapping (Yathish et al. 2019;Ljubomir et al. 2019;Regodic et al. 2018;Gheshlaghi, Feizizadeh, and Blaschke 2020;Goleiji et al. 2017). In these methods, the model criteria and alternatives are considered as a hierarchical structure. This is followed by the ranking of the model criteria and alternatives based on a certain scale. Based on the ranking the importance or weights of the model criteria and alternatives are estimated and then used in the GIS-aided prediction mapping (Banerjee, Ghose, and Pradhan 2018). However, expert opinion-based prediction mapping may suffer subjective bias. Moreover, these methods are deterministic. As a result, they may not be suitable for a phenomenon that involves uncertainty, such as a forest fire (Ishizaka and Labib 2009;Mendoza and Martins 2006). Machine learning methods such as kernel logistic regression, support vector machine, random forest, fuzzy logic, MaxEnt, multilayer perceptron, deep learning and convolutionary neural networks have been extensively used in forest fire prediction mapping.
Contrary to expert opinion-based methods, machine learning methods do not suffer from subjective 4 bias. Moreover, these methods encompass the uncertainty associated with the modelling of a phenomenon. However, machine learning may suffer issues like model overfitting. These methods heavily rely on the training dataset and take time to learn. Furthermore, these methods require a large dataset of events of interest for proper training of the model. Another important limitation of machine learning method, to be specific methods involving artificial neural networking, is that, they achieve efficiency and accuracy at the cost of interpretability of the model (Nami et al. 2018;Tien Bui, Le, and Hoang 2018;Ghorbanzadeh, Kamran, and Blaschke 2019;Tehrany et al. 2019;Tien Bui, Hoang, and Samui 2019;Zhang, Wang, and Liu 2019).
Maximum entropy or MaxEnt is a popular machine learning method widely being used in species distribution and earth hazard modelling (Harte 2011;Feng and Hong 2009;Pourghasemi and Rossi 2018). Unlike most machine learning methods such as logistic regression, support vector machine, random forest, k-nearest neighbour and artificial neural network, that uses presence-absence instances dataset for training, the MaxEnt uses presence-background instances dataset for training.
MaxEnt is based on the principle, that the probability distribution that maximizes entropy for the current state of knowledge subject to the constraints of the features is the best fit model for the phenomenon under consideration (De Martino and De Martino 2018). It is popular primarily because it considers 'minimum assumption' while selecting a probability distribution (Warton 2013). Moreover, this method considers more realistic presence-background dataset, in the sense that in nature hardly any absence data is available. On the other hand, MaxEnt needs a large presence-dataset to perform reliable prediction. Also, a study has suggested that MaxEnt is equivalent to the Generalized Linear Model (GLM) when it comes to Point Process Models (PPMs) such as forest fire events (Fithian and Hastie 2013). MaxEnt has been used in several forest fire prediction mappings. Studies indicate that MaxEnt has performed equally well in comparison to other machine learning methods in predicting a forest fire. (Arpaci et al. 2014;Massada et al. 2013;Peters et al. 2013;Fonseca et al. 2016;Kim et al. 2015;Fernández-Manso and Quintano 2020;Lim et al. 2017) In this study, MaxEnt has been applied to prepare a forest fire prediction map of Sikkim Himalaya using MODIS and Ground data-based forest fire inventory. As features, meteorological, topological, 5 ecological and human-induced data have been used to train the MaxEnt model. Model validation criteria have been used to evaluate the model. The study indicates that MaxEnt is a reliable machine learning method in predicting areas prone to forest fire events in Sikkim Himalaya.

Study area
Sikkim is a small eastern Himalayan state of India neighboured by Tibet in the North, Nepal in the West, Bhutan in the East and the state of West Bengal in the south. It extends from 27 • 00′ 46′′ N to 28 • 07′ 48′′ N latitude and 88 • 00′ 58′′ E to 88 • 55′ 25′′ E longitude. The elevation of Sikkim varies from 280 m in the South to 8586 m in the North, crowned by the world's third-highest mountain peak, Mt. Khangchendzonga (Shukla, Garg, and Srivastava 2018). Sikkim, apart from having four seasons of winter, summer, spring, autumn, has a monsoon season lasting from June to September. It has a subtropical climate in the south and tundra climate in the north. The two main rivers of Sikkim include the Teesta River and its tributary, the Rangeet (ENVIS Sikkim 2019) (Figure 1a

Data sources
The active fire data from the year 2000-2019 of Moderate Resolution Imaging Spectroradiometer (MODIS) was accessed from the data archive at the Fire Information for Resource Management System (FIRMS) site. The MODIS dataset was combined with the forest fire inventory prepared from 6 the GPS-tagged dataset of the Forest and Environment Department, Government of Sikkim. The fire incidents dataset thus prepared was intersected with the forest fraction raster data (Shimada et al. 2014) to exclude fire incidents beyond the forest cover of the study area. This generated a fire dataset of 754 events.
The environmental feature raster maps or simply features used in this study included precipitation, ambient air temperature and wind speed averaged over the dry period of Sikkim prepared from the monthly data accessed from Worldclim 2 (Fick and Hijmans 2017). For this study, amongst several data resolutions, the 30 seconds resolution average monthly climate data for 1970-2000 was taken from Worldclim 2. Also, features like aspect, elevation, slope and TWI were derived from the Digital

7
The MaxEnt algorithm does prediction by minimizing the relative entropy between the probability density of the presence-only instances of the target variable from that of the instances of background landscape data (Elith et al. 2011). For instances, for a landscape, L the algorithm uses forest fire occurrence, (y = 1) over a vector of environmental features, z. The MaxEnt algorithm attempts to minimize the distance of the probability densities of features in case of forest fire occurrences, from the probability densities of features of the background or the null model, over the landscape L. The minimization of the distance function is achieved by maximizing a penalized likelihood model subject to the model constraints given by the probability densities of features of the landscape (Steven J. Phillips and Dudík 2008). Unlike conventional machine learning methods like logistic regression or random forest which uses the presence-absence data, MaxEnt uses presence-background data for prediction of the forest fire. This makes MaxEnt prone to sample selection bias, a condition where some areas of the landscape may be over-sampled than other areas. Also, MaxEnt is prone to overfitting the predictive model to the presence-only data (Devisscher et al. 2016). However, recent MaxEnt software control overfitting by a method called regularization. In contrast, events like forest fire rarely have data on their absence. This makes MaxEnt more appropriate for processes that have presence-only data (Arnold, Brewer, and Dennison 2014; Steven J. Phillips and Elith 2013).

Data processing and preparation of forest fire prediction map
Initially, all feature maps were projected from geographic projection system of GCS-WGS-1984 to plane projection system of WGS-1984-UTM-Zone-45N, which is suitable for the study area. Next, Euclidean distance raster maps were prepared from the polyline vector maps of the road network and waterbodies network and point vector map of human habitations. These raster maps were prepared to measure the proximity of fire events from roads, waterbodies and human habitations. Topographic feature maps of aspect, slope, elevation and TWI were prepared from DEM. Thereafter, all the feature maps were changed to have the same cell size and same extent. Next, the feature maps were normalised, such that the pixel values of the maps were in the range from zero to one (Chang 2017).
The normalized maps were exported in GeoTiff format as they are readily readable by R-programming language. Furthermore, the presence-only dataset of forest fire events was stored as a CSV file.

8
The forest fire prediction map was prepared in RStudio environment using R packages named 'raster' (Hijmans 2020), 'rgdal' (Bivand, Keitt, and Rowlingson 2019) and 'dismo' ). The first two packages are mainly used for raster images related spatial operations while dismo was used for bridging between R and MaxEnt software. The MaxEnt software used in this study is a java program-based package (S.J. Phillips, Dudík, and Schapire n.d.). During the preparation of the prediction map, all the feature maps were stacked with matching extents and feature attributes were extracted from the features stack using the fire event coordinates. The fire dataset was divided into five-folds for crossvalidation. This was followed by the preparation of background dataset by selecting 1000 random points from the extent of the study area. Similar to the preparation of fire event dataset, the background dataset was populated with the feature attributes and divided into five-fold datasets for crossvalidation. In this process, repetitively any one sub-dataset out of the five sub- fire. The kappa is the ratio of the deviation of the predicted value from the observed to one less predicted value (Sim and Wright 2005). In all the three cases, a value close to one is satisfactory for model validation.
Moreover, importance of the features of the model and sensitivity analysis were performed. The 9 methodology of the study is illustrated below (Figure 3).

Results
Starting with the meteorological features, the fire events were more common in moderate to warmer areas (11 -24 o C) (Figure 4a). Similar to temperature, the fire events were more common in areas with moderate to higher average rainfall (35 -55mm) (Figure 4b). In contrast, fire events were common in areas with low average wind speed (1.4 -1.7 ms -1 ) (Figure 4c). Moving onto topographic features, bulk of the fire events were observed in the low elevation areas (230 -1200m), flatter slopes (5 -7 degree), lower TWI (4 -6 value) and moderate aspect (81 -217 degree) (Figure 4d-g). Looking at the ecological features, fire events were common to areas with moderate to high NDVI value (0.5 -0.7) having moderate tree cover (31-55% of the area) (Figure 4h-i). Furthermore, fire events were skewed towards areas close to the waterbodies (0 -800m), human habitations (0 -3000m), roadways (0 -800m) and high population density (1200 -1300 people km -1 ) (Figure 4j-m).
Correlation analysis indicated that feature like population density was strongly correlated with average rainfall and average temperature, primarily because most of the human population was located in the south of Sikkim. Also, average temperature had a strong correlation with average rainfall, as the subtropical Sikkim gets the bulk of rainfall. In contrast, elevation had a strong negative correlation with slope ( Figure 5, Supplement Figure S1).
In terms of contribution of the features towards prediction of forest fire, population density explained 50% of events followed by proximity to road that explained almost 30% of fire events. While the remaining 20% of fire events could be explained by the rest of the features (Figure 6). Coming to the effect of feature values on the prediction, higher NDVI and moderate tree cover contributed to the forest fire prediction. Lower elevation, moderate slope, moderate aspect and lower TWI contributed to forest fire events. Meteorological features like low wind speed, moderate to higher average temperature and rainfall contributed to fire events. Higher human density and proximity to human habitations, roadways and water bodies explained most forest fire events (Figure 7).
The MaxEnt method-based forest fire prediction map of 30.7m resolution showed a probability range from 0 to 1, indicating no chances of forest fire occurrences to very high chances of forest fire occurrences. The prediction map was further categorized into very low, low, medium, high and very high chances of forest fire incidents for the sake of convenience (Figure 8). Model validation criteria like ROC curve showed an Area Under the Curve (AUC) of 0.952 (Figure 9a). The correlation coefficient was estimated to be 0.771 and Cohen's Kappa was estimated to be 0.77 (Figure 9b).

Discussion
In this study an attempt has been made to prepare the forest fire prediction map of Sikkim Himalaya using MaxEnt machine learning method. The study indicated that estimation of probability of forest fire by MaxEnt was satisfactory as per the model validation criteria.
It was observed that almost 80% of forest fire events were explained by population density and proximity to roadways. This observation was similar to a previous study done in the Amazonian forest of Bolivia (Devisscher et al. 2016). Also, studies conducted in the Huron-Manistee National Forest, Michigan, USA suggested that population density and development were the major determinants of forest fire occurrences (Massada et al. 2013). Similarly, a study in the Tyrolean forests of Austria has shown that population density was a major explanatory variable of forest fire (Arpaci et al. 2014). This clearly indicated that human activities had a substantial role in the forest fire incidents in Sikkim.
In contrast to the mainland India where forest fire is common during the hot dry summer (Joseph, Anitha, and Murthy 2009), majority of forest fire in Sikkim Himalaya occurs during the cold dry period from November to March. As observed in this study, bulk of the forest fire events were in the southern part of Sikkim. This is mainly due to the logging activities there. Also, higher vehicular traffic explained by greater road network in southern Sikkim makes the dry vegetation vulnerable to fire due to engine spark and cigarette or bidi butts. The limited number of forest fire events in the high altitude of Sikkim is primarily due to lightening. Moreover, the high contrast of warmer climate in southern Sikkim as compared to very cold climatic conditions of northern Sikkim makes the former more vulnerable to forest fire (R. Sharma et al. 2012). This study also indicated that forested areas close to human habitations and waterbodies are at a higher risk of forest fire. An aspect from East to South-West direction had more contribution towards forest fire. Aspect influences soil moisture, solar radiation, vegetation composition and density (Estes et al. 2017). Also, forest patches of valley areas that receive moderate rainfall, have moderate temperature and low wind speed were prone to forest fire. The higher values of model validation criteria suggested that the model prediction was satisfactory.
Being fundamentally distinct from other machine learning methods, MaxEnt uses presence-only dataset to train itself (Elith et al. 2011). However, like many studies have shown earlier, the present study also indicated that this distinction of MaxEnt does not limit its capability in generating reliable hazard prediction maps (Arpaci et al. 2014;Massada et al. 2013;Peters et al. 2013;Fonseca et al. 2016;Kim et al. 2015;Fernández-Manso and Quintano 2020;Lim et al. 2017). In this study a limited set of features have been considered for forest fire prediction. This was primarily due to availability of reliable data. However, other features like in-situ factors like soil type, soil moisture and fuel density can be considered to improve the model.
The forest fire prediction map of Sikkim Himalaya can be considered as a decision support tool for stakeholders of forest resources. The forest managers such as forest rangers and forest dependent communities can mitigate forest fire by allocating their fire control resources to areas more prone to forest fire. Population of forest fire prone areas can be educated about the impacts of their activities on the occurrence of forest fire. Targeted law enforcement against irresponsible activities like illegal logging, negligent smoking and bonfire, slash and burn farming and traffic management can be achieved from the forest fire prediction map.

Conclusion
Applications of remote sensing imageries, machine learning and geospatial analysis can mitigate forest fire by identifying areas that were relatively more prone to forest fire. MaxEnt-based forest fire prediction map of Sikkim Himalaya indicated that road network and population density were mainly accountable for forest fire incidents. Also, aspect and elevation contribute in explaining forest fire events. Although a forest fire can be considered as an opportunity for the forest to rejuvenate, increase in frequency and extent of forest fire can only lead to damage to the forest health. The prediction map can be used as a decision support tool by the stakeholders to mitigate the occurrences forest fire. The applications of MaxEnt can be extended to other forms of earth hazards like landslide, flood and drought predictions. The MaxEnt model can be further improved by expanding the feature set, followed by factor analysis to identify the most relevant explanatory features of forest fire. A comparative analysis of the MaxEnt along with other machine learning methods can be performed to assess the efficiency and efficacy of MaxEnt. The outcomes of this study can be internalized into forest management policies by applying geographically targeted resource allocation and law enforcement towards forest fire mitigation.

Declarations
Ethics approval and consent to participate: Due permission has been taken from the competition authority.
Consent for publication: Due permission has been taken from the university.    Importance of feature variables. Abbreviation: pop is Population density, roads is Proximity to roadways, aspect is Aspect, elevation is Elevation, NDVI, avgTemp is Average ambient temperature, avgWind is Average wind speed, slope is Slope, TreeCover is Tree cover, places is Proximity to human habitations, avgPrep is Average rainfall, water is Proximity to waterbodies, TWI.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.