Green House Gases emissions are given as direct, indirect and cumulative by area, regions and particular categories, with the global coverage compared to the period 1961-present (with annual updates) and forecasts for 2030 and 2050, for the gases N2O and CO2eq, by crop and N residue content (FAOSTAT, FAO,2020).
The emission of CO2 (direct, indirect, and total) data was collected from the open-source Food and Agriculture Organization (FAO,2020) for 1961 to 2018.
Thr trend of direct, indirect, and total emission of CO2 was found similar (figure 1), therefore we have analysed only total emission of CO2 in gigagrams (Gg.). In this study, CO2 emission data was divided into training data and test data where training data covers from 1961 to 2015 and, test data covers for 2016 to 2018.
On the basis of training and test data, this study conducted long-term forecasts for seven years (2019 to 2025). Python software was used to run SARIMAX and random forest model for prediction. To calculate error in the model testing and actual data was compared for the period 2016 to 2018. In interpreting the predictions, we have used the forecasting accuracy measure MAPE (Mean Absolute Percentage Error) (Kim & Kim, 2016).
The Mean Square Error is
Where, At = Actual values at data time t, and Ft=forecast value at data time t
SARIMAX: The SARIMAX method is an extension of the SARIMA model, improved by adding exogenous variables to improve its forecasting efficiency.
Hence, the model is called Seasonal ARIMA with an Exogenous Factor (i.e. SARIMAX), is commonly expressed mathematically as follows:(Vagropoulos et al., 2016).
Where k, t, x ′ is the vector, including the kth explanatory input variables at time t and βk is the kth exogenous input variable's coefficient value. The stationarity and invertibility conditions are as of ARMA methods.
Random Forest Regressor (RFR): Random Forest Regressor is a bagging based supervised learning algorithm used in this paper for time series prediction. The algorithm is based on random sampling with replacement, i.e. bagging the data and a set of decision trees are created.
These decision trees have the problem of getting overfitted easily therefore, the random forest has a regularizing effect by the inclusion of randomness.
Decision Trees have been made by recursively splitting the dataset so that there is maximum information gain where Information content is represented by:
where is the probability of a particular category of values
Decision Tree reaches to a particular prediction by using a set of decisions learnt from the data and parsing the data through the tree until a leaf node is reached. Random Forest considers several decision tree weak learners and accumulate their independent predictions by averaging them as depicted in the figure 2.
Holt-Winters: Holt-Winters prediction method has been categorized into Multiplicative Holt-Winters, and Additive Holt-Winters.
The equation of Multiplicative Holt-Winters is described as (Ferbar Tratar & Strmčnik, 2016).
The component form for the additive method is:
Where: yt = value of y at t, s= the seasonal length, and m=the amount of data to be predicated.
Support Vector Regression Method
Due to small samples with non-linear features, the support vector machine approach is used to evaluate the selected results. SVM is one of the mathematical theory-based machine learning approaches suggested by Cortes and Vapnik in 1995 (Cortes & Vapnik, 1995).
It offers excellent results for small samples, non-linear and high-dimensional pattern recognition (Cui et al., 2008).
The support vector machine, also called SVR for regression, supports vector thinking and the Lagrange multiplier process, and data can be evaluated. The basic function of the Support vector regression is given by (Cortes & Vapnik, 1995), (Cui et al., 2008),(Chang & Lin, 2011), (Cherkassky & Ma, 2004), (Baydaroğlu & Koçak, 2014):