Multivariate Fire Risk Models Using Sample-Size Reduction and Copula Regression in Kalimantan, Indonesia

The copula-based joint distribution can construct a fire risk model to improve forest fires' early warning system, especially in Kalimantan. In this study, we model and analyze the copula-based joint distribution between climate conditions and hotspots. We used several climate conditions, such as total precipitation, dry spells, and El Nino-Southern Oscillation (ENSO). We used copula functions with sample size reduction to construct the joint distributions and the copula regression model to estimate the fire size. The results show that the probability of extreme hotspots number during normal ENSO conditions is very rare and almost near zero during La Nina. Other than that, extreme hotspot event (more severe than in 2019) during El Nino is more sensitive to total precipitation than dry spells based on the conditional survival function. However, the copula regression model found that the model used dry spells as a climate condition better than total precipitation. In this model, the 95% confidence interval of the expected hotspots can cover all actual hotspots data.


Introduction
Indonesia has a vast forest area. The Ministry of Environment and Forestry of Indonesia (KLHK) said that the forest area reached 94.1 million hectares or about 50.1% of the total land area in 2019 (KLHK 2020). However, forest areas in Indonesia have experienced severe degradation in both quantity and quality in the last few decades (Miettinen et al. 2011-2009, Indonesia lost 1.4 million ha of natural forest/year. In the following period (2009)(2010)(2011)(2012)(2013), the area lost natural forest decreased to 1.1 million ha/year but increased again in 2013-2017 to 1.4 million ha/year (FWI 2020). One of the causes of deforestation is forest fires activity (Austin et al. 2019).
Forest fires have become a national issue every year and get serious attention from the government and researchers, especially in Kalimantan (Aflahah et al. 2019). Generally, forest fires are caused by human and natural factors. Natural factors such as El Nino can reduce rainfall intensity and prolong the dry season so that plants become more dehydrated and more prone to fire (Field et al. 2016;Nikonovas et al. 2020). Severe forest fires in 1997 and 2015 coincided with an extreme El Nino episode Nikonovas et al. 2020).
2 A fire risk model is essential to improve the early warning system of forest and land fires in Indonesia. A fire risk model is a conditional probability of a fire size given specific climate indicators (Madadgar et al. 2020). The concept of joint distribution is needed to develop a conditional probability model. One approach that is often used to calculate the joint distribution is copula functions. The copula is a function that links the joint distribution and its marginal distributions (Schölzel and Friederichs 2008). The copula is often applied in finance, but there is very little application in climatology, especially forest fires analysis. Usually, fire size uses hotspots to quickly monitor forest fires in large areas (LAPAN 2016). Meanwhile, we know that forest fires in Kalimantan are very sensitive to El Nino conditions ). Moreover, we can measure the drought severity using drought indicators, such as total precipitation and dry spells (Marinović et al. 2021;Thoithi et al. 2021).
This study aims to model and analyze the copula-based joint distribution between climate conditions and hotspots in Kalimantan, Indonesia. We constructed the bivariate joint distributions between climate conditions, either total precipitation or dry spells, and hotspots with sample size reduced by ENSO conditions, i.e., La Nina, normal, and El Nino. From the joint distribution, fire risk models are calculated using conditional probability and copula regression. The results are expected to increase understanding of the effect of ENSO on the joint distribution between drought indicators and hotspots in Kalimantan. Moreover, the fire risk model can be developed to predict future hotspots and improve the early warning system for forest fires in Indonesia.

Study area and data
The study area was taken in the Kalimantan fire-prone area, Indonesia ). This fire-prone area is obtained using k-mean clustering by eliminating the cluster with minimum hotspots. Hotspot data has a spatial resolution of 0.25° × 0.25° derived from MODIS sensors of the Terra and Aqua satellites in 2001-2020 (Ardiansyah et al. 2017). Meanwhile, total precipitation and dry spells are derived from CMORPH CRT datasets (Sun et al. 2016). We defined dry spells as the number of days with total precipitation less than 1 mm/day. Based on previous research, we use an average of 2 months of total precipitation (TP) and a total of 3 months of dry spells (DS) because it has the strongest association with hotspots in the Kalimantan fire-prone area . We only observe months with a significant number of hotspots, i.e., July-November. The time series is obtained by taking the average values of hotspots and drought indicators in the Kalimantan fire-prone area (Fig. 1). Moreover, Oceanic Nino Index (ONI) is used for classifying ENSO conditions. There are three classifications for ENSO condition, i.e., El Nino if ONI is more than 0.5, La Nina if ONI is less than -0.5, and normal condition if otherwise.

Copula function
The copula is a function that links the joint distribution and its marginal distributions (Schölzel and Friederichs 2008). For the bivariate case, suppose that random variables 1 and 2 has marginal distributions 1 dan 2 , respectively. Therefore, the joint distribution between 1 and 2 is defined as a function of its marginal distributions, given by where : [0,1] 2 → [0,1] is a copula function (Sklar 1959;Laux et al. 2011). Sklar then said that "every joint probability density is also writable by the product of its marginal probability densities and the copula density ", given by [ ] We use different copula functions because each will further describe the dependency structure, such as one-parameter copulas and two-parameter copulas (Table 1). One-parameter copulas include Clayton, Frank, Gumbel, Joe, and Galambos copula. Meanwhile, two-parameter copulas include BB1, BB6, BB7, and BB8 copula.

Parameter estimation
The inference of functions for margins (IFM) method is the most common method used to estimate the copula parameters (Bouyé et al. 2000;Wei et al. 2020). This method is a two-step method. The first step is to assess the parameters of the marginal distribution of each variable. The marginal distributions are estimated by maximizing the log of the likelihood function, given by 1 arg max ln ( ; ) where � is the estimated parameter of and is the probability density function of for = 1,2. For drought indicators, we use generalized extreme value (GEV) and lognormal (LN) distributions, while negative binomial (NBIN) distribution is employed to fit the distribution of hotspot data. For simplicity, we use the negative of total precipitation to estimate the marginal distribution and copula parameter. Therefore, the dryness of drought indicators and the worse condition of hotspots are on the distribution joint's upper tail (right-top). However, we return the original data for visualization and analysis (Najib et al. 2021b). The probability density functions of each distribution are given by (Hilbe 2011;Pobočíková et al. 2017;Farooq et al. 2018;Sylvi et al. 2018;Baran et al. 2021) The second step of the IFM method is to estimate the copula parameter by maximizing the log of the likelihood function of copula density, given by 11 1 2 2 2 1ˆâ rg max ln ( ; ), ( ; ); where � is the estimated parameter of and is the copula density function. The fittest copula function is selected based on several statistical tests, such as where is the number of parameters, is the likelihood function, and are the theoretical dan empirical frequency, respectively. The theoretical frequency is obtained from the model of copula distribution. Meanwhile, the empirical frequency is calculated using Gringorten's formula (1963), given by where is the sample size of data.

Joint probability and return period
We can estimate the rarity of extreme or outlier events using the joint probability and return period. The joint probability of two conditions that are simultaneously exceeding a specific threshold ( > and > ) is defined by (Afshar et al. 2020) Moreover, the joint return period (in months) is defined by 1/ (Zscheischler and Fischer 2020). However, this joint probability and return period are uncertain depending on the sample size (Link et al. 2020;Najib et al. 2021b). We can calculate the 95% confidence interval of the joint probability (Link et al. 2020), which is Madadgar et al. (2020) said, "the fire risk model is defined as the conditional probability of fire size given a specific climate condition (e.g., temperature or precipitation amount)." The conditional probability density function of fire size ( ) given a specific climate condition ( ) is defined as (Madadgar and Moradkhani 2014) [ ]

Multivariate fire risk model
where ( where can be a specific or interval climate condition, and we called it a conditional survival function. We use drought indicators (either total precipitation or dry spells) and El Nino-Southern Oscillation (ENSO) as the climate conditions. Meanwhile, we use hotspots data as fire size. The ENSO conditions are not used to calculate the joint distribution, but we use them to reduce the sample size of data (Najib et al. 2021b). Therefore, we obtained three joint distributions between climate conditions and fire size for one model. Because we use two options for drought indicators, then we will discuss two models, i.e., total precipitation and dry spells as the climate conditions.

Copula regression
From the conditional probability (Eq. 9), we also can obtain the expected value of given a specific climate condition = as (Noh et al. 2013 This kind of model is also known as copula-based or copula regression models (Kolev and Paiva 2009;Masarotto and Varin 2017;Cooke et al. 2020). We can estimate the expected value (Eq. 10) using Riemann's sum approach (Jha and  Using these marginal distributions, we estimate the copula parameters of TP -H (total precipitation -hotspots) and DS -H (dry spells -hotspots) with different ENSO conditions (Eq. 8). Interestingly, we got the same fittest copula for TP -H and DS -H, i.e., Joe, Galambos, and BB1 copulas during La Nina, normal, and El Nino conditions, respectively. They showed that both TP-H and DS-H had the same tail dependence structure characteristic. The Joe and Galambos copulas have an upper tail dependence but no lower tail dependence, while the BB1 copula has both lower and upper tail dependences (Joe 1997). The upper tail dependence coefficients for Joe and Galambos copulas (Joe 1997)

The joint probability density function
Using the marginal distributions and copula parameters (Eq. 2), we construct the joint probability density functions between total precipitation and hotspots (TP -H) also between dry spells and hotspots (DS -H) during different ENSO conditions (Fig. 2). Both figures show that ENSO conditions significantly affect the joint probability density function between drought indicators and hotspots in Kalimantan. El Nino will increase the probability of extreme hotspots in Kalimantan, while La Nina will reduce the probability. However, there is an outliner during normal ENSO conditions, i.e., in September 2019, where there are 8497 hotspots in a month. This 2019 hotspot event is infrequent to happen. We can calculate the rarity from the joint probability (Eq. 12 and 13) between total precipitation and hotspot in September 2019, i.e., 0.0034 with a 95% confidence interval [0, 0.0196]. In other words, the joint return period is 296 months with a 95% confidence interval [51, ∞] if all months are assumed in normal ENSO conditions. The same approach can be applied using the joint distribution between dry spells and hotspots, and we found that the joint return period is 399 months with a 95% confidence interval [61, ∞] if all months are in normal ENSO conditions.

The conditional probability
To get more detail about the effect of ENSO conditions on extreme hotspots events, we calculated the conditional survival functions (Eq. 15) for TP -H and DS -H (Fig. 3). We use the dryness of climate conditions as a given condition, i.e., lower tercile for total precipitation ( < 5.2201 mm/day) and upper tercile for dry spells ( > 54 days). Given the dry condition of total precipitation (straight lines, Fig. 3), extreme hotspots' probability depends on ENSO conditions. We divide the analysis into three conditions, i.e., more than 90% quantile, more than 95% quantile, and more than 2019 hotspots event, the latest extreme wildfire in Kalimantan. We obtained the conditional probability of more than 90% of hotspots quantile: 2.26%, 12.80%, and 45.92% during La Nina, normal, and El Nino, respectively. El Nino increased the conditional probability more than triple normal conditions, while La Nina significantly suppressed the probability of extreme hotspots in Kalimantan. In more extreme conditions (95% quantile), the probability decreases to 0.22% during La Nina, while during normal conditions, the probability is still a bit high, i.e., 4.91%. However, for very extreme hotspots events (more than in 2019), the condition probability during La Nina and normal conditions is very small, i.e., 0.006% and 1.1%. This probability indicates that extreme hotspot events in September 2019 or more severe are infrequent during normal ENSO conditions even can be impossible during La Nina. Otherwise, the conditional probability is still relatively high during El Nino, i.e., 14.4% or about 1/7 months. The same pattern also appears in the conditional survival function given the dry condition of dry spells (dashed lines, Fig. 3). The conditional probability for extreme hotspots is very small during La Nina. Meanwhile, there is still a bit high probability for 90% and 95% quantiles during normal ENSO conditions, but the probability of extremely high hotspots events (more than in 2019) is very small, i.e., 1.51%. However, the conditional probability is still high during El Nino, i.e., 11.25%. Fig. 3 The conditional survival function of hotspots given the dry condition of total precipitation (straight lines) and dry spells (dashed lines) during La Nina, normal, and El Nino.

The copula regression models
Using the joint probability density function (Fig. 2), we calculate the expected value for hotspots (Eq. 16) with a 95% confidence interval (Eq. 18) given a specific climate condition (precipitation and dry spells) for each ENSO condition. Given specific total precipitation, the result of copula regression (first row, Fig. 4) shows that almost all data points are inside the 95% confidence interval, except one point in normal ENSO conditions. Furthermore, RMSE values of the expected value from the original data are 271, 1062, and 2073 hotspots for La Nina, normal, and El Nino conditions, respectively. Meanwhile, the coefficient of determinations ( 2 ) are 79.65, 48.93, and 49.90% for La Nina, normal, and El Nino conditions. From these results, the fittest copula regression given total precipitation is in La Nina conditions. Moreover, although the RMSE value during El Nino was much higher than normal conditions, the copula regression model's 2 during El Nino better than normal conditions. Other than that, at least all data points during El Nino are inside the 95% confidence interval.
Meanwhile, all data points are inside the 95% confidence interval on the result of copula regression given specific dry spells conditions (second row, Fig. 4). The statistical results show that the RMSE values are 262, 855, and 1.907 hotspots; also the 2 values are 80.98, 66.92, and 57.59% for La Nina, normal, and El Nino conditions. The lowest error of the copula regression model is during La Nina conditions, while the highest is during El Nino, but the 2 value is still high, i.e., 57.59%. The RMSE and 2 values of copula regression given specific dry spells much better than given specific total precipitation during all ENSO conditions.  Fig. 4 The copula regression results: the expected value for hotspots with a 95% confidence interval given specific precipitation (first row) and dry spells (second row) for each ENSO condition.

Fig. 5
The time series of the copula regression model for hotspots given specific total precipitation (top) and dry spells (bottom) for each ENSO condition.
By evaluating all data points into the copula regression model based on specific climate dan ENSO conditions, we obtained the time series of the copula regression model with the 95% confidence interval (Fig. 5). If given specific precipitation, the time series result shows that the number of hotspots in August 2014 is outside the 95% confidence interval of the copula regression model as previously discussed. Several years have also not shown satisfactory results, especially in 2004, 2006, 2014, 2015, and 2019. The expected value in 2014 does not follow the actual hotspot pattern (up, down, and up). In 2006(up, down, and up). In , 2015(up, down, and up). In , and 2019, the expected values were underestimated from the actual hotspots. Even in 2014, the actual hotspots were not reached by the confidence interval. Contrary, the expected values were wildly overestimated in 2015, almost twice the actual hotspots. However, the expected value for hotspots in 2002 is satisfactory. Overall, the copula regression for hotspots given specific total precipitation and ENSO conditions have the RMSE value of 1339 hotspots and the 2 value of 60.70%. Meanwhile, the time series of the copula regression model for hotspots given specific dry spells and ENSO conditions shows the expected value and 95% confidence interval are satisfactory. The expected values given specific dry spells fit better than given total precipitation in 2004, 2006, 2014, 2015, and 2019. In 2004, the expected values followed the pattern of the actual hotspots number. The copula regression model was not too underestimated in 2006 and 2014; also not too overestimated in 2015. Even better, all actual hotspots are within the 95% confidence interval, including in 2014. The expected value was even very fit in 2019. However, the expected value in 2002 is not as good as the previous model. Overall, the copula regression for hotspots given specific dry spells and ENSO conditions have the RMSE value of 1185 hotspots and the 2 value of 69.21%. These statistical results are better than the previous model.

Summary and Conclusion
This study focused on modelling and analyzing the copula-based joint distribution between climate conditions and hotspots with different ENSO conditions, i.e., La Nina, normal, and El Nino. We calculated the expected value of hotspots given specific climate and ENSO conditions with a 95% confidence interval using these joint distributions. This study partially uses two indicators as climate conditions, i.e., precipitation and dry spells, so we obtained two models based on the indicator we used.
The copula parameters fitting results show that the copula functions of total precipitation -hotspots (TP -H) and dry spells -hotspots (DS -H) are the same, i.e., Joe, Galambos, and BB1 copula for La Nina, normal, and El Nino condition, respectively. We know from these copulas that the relationship between climate conditions and hotspots in La Nina and normal ENSO conditions have an upper tail dependence but no lower tail dependence. If climate conditions are in extremely dry weather, hotspots will be produced more than normal weather. Meanwhile, during El Nino, the relationship has both upper and lower tail dependences (Joe 1997;Joe et al. 2010).
The probability of extreme hotspots number during normal ENSO conditions is very rare from the joint probability and conditional survival functions. Although it happened in September 2019, the probability is less than 2%. Meanwhile, the probability of extreme hotspots number during La Nina is almost near zero. In addition, extreme hotspot event (more severe than in 2019) during El Nino is more sensitive to total precipitation than dry spells. Moreover, the copula regression model found that the model used dry spells as a climate condition better than total precipitation. In this model, the 95% confidence interval can cover all actual hotspots data.
This study is limited to bivariate joint distribution. Therefore, the obtained model contains only one climate condition, either total precipitation or dry spells, with ENSO condition as a sample size reducer. The suggestion for further research from this study is to use a trivariate copula or more to build a joint distribution so that the climate condition used is not only one between total precipitation and dry spells.

Declarations Funding
There is no funding for this research

Conflicts of interest
There is no conflict of interest for this research.

Availability of data and material
The raw data of climate conditions provided by NOAA CPC Morphing Technique ("CMORPH") can be downloaded freely on https://ftp.cpc.ncep.noaa.gov/precip/PORT/SEMDP/CMORPH_CRT/DATA. Meanwhile, the raw data of hotspots provided by the Agency for Meteorology, Climatology, and Geophysics of Indonesia were processed from MODIS sensors