Prediction of coal mine gas emission based on hybrid machine learning model

Coal mine gas accident is one of the most serious threats in the process of safe coal mine mining, making it important to accurately predict coal mine gas emission. To improve the accuracy of coal mine gas emission prediction, a hybrid machine learning prediction model combining random forest (RF) algorithm, improved gray wolf optimizer (IGWO) algorithm and support vector regression (SVR) algorithm is proposed, the model prediction effect is validated by using actual measured gas emission data from a coal mine. Firstly, the RF algorithm is used to screen 13 influencing factors of coal mine gas emission, and finally 6 influencing factors are selected as the input variables of the prediction model; Secondly, the GWO algorithm is improved using the nonlinear convergence factor and DLH search strategy to obtain the IGWO algorithm; Finally, the IGWO algorithm is used to optimize the parameters of the SVR algorithm, and the RF-IGWO-SVR model is established. The results show that the mean absolute percentage error, mean absolute error and root mean square error of the RF-GWO-SVR model are 1.55%, 0.0759, and 0.1103, respectively, and this result is better than the other comparative models, which indicates that the model can effectively improve the prediction accuracy of coal mine gas emission and provide a new model for coal mine gas emission prediction.


Introduction
Coal is the main consumed energy source in China and has played an important role in the past economic development Xue et al. 2021). In 2021, China's coal usage increased at a 4.6% annual pace, accounting for 56.0% of the country's total energy consumption The process of coal mining is often accompanied by gas emission, a phenomenon that affects the safe production of coal mines and the personal health of mine workers and is a major safety hazard in coal production (Liang et al. 2016;Liu et al. 2020;Liu et al. 2018 causes a progressive rise in gas emissions and a corresponding rise in the frequency of coal mine gas accidents (Wei et al. 2011;Gao et al. 2021). The occurrence of coal mine gas accidents is mainly due to abnormal gas emissions, therefore, the accurate prediction of gas emissions is of great importance for coal mine safety production.
In the twentieth century, Airey (1968) theoretically derived the partial differential equation for predicting coal mine gas emission, taking the mining time and mining geological conditions as factors affecting gas emission. Leszek (1998) integrated geomechanics into the coal mine gas emission prediction model and obtained relatively good prediction results. In recent years, with the technological development in the field of computers, some researchers have applied machine learning models to predict coal mine gas emission; these models are SVR model (Li and Liu 2013), ELM model (Bing et al. 2016), and BP model (Guo et al. 2019). Although the use of the above prediction models in coal mine gas outflow prediction is beneficial to improve the prediction accuracy, ELM algorithm and BP algorithm are not suitable for solving small sample prediction problems such as coal mine gas emission prediction. SVR algorithm is suitable for small sample data prediction, such as coal mine gas emission prediction, but the penalty factor and kernel function parameters in SVR algorithm need to be selected reasonably to avoid affecting the algorithm's ability to forecast (Yu et al. 2006;Roozbeh et al. 2018). The grey wolf optimizer (GWO) algorithm has the advantages of a simple structure, few set-up parameters, and strong global search capability (Al-Betar et al. 2018;Zhao et al. 2021). The GWO algorithm can be used to find the optimum of the SVR algorithm parameters, but it may fall into the local optimum while finding the optimum, which needs to be improved. Numerous influencing factors affect the amount of gas emission from coal mines, and the high latitude nature of the influencing factors affects the accuracy of the prediction of gas emission to some extent (Ji and Zhang 2021). Principal component analysis (PCA) is commonly used to reduce the latitude of the influencing factors and thus improve the model prediction accuracy, but the data processed by the PCA algorithm is ambiguous, which is not conducive to the later analysis of individual influencing factors (Na et al. 2021;Noori and Sabahi 2010). If the RF algorithm is used to select the influence factors to achieve dimensionality reduction, this method would not only have better effect than the PCA algorithm processing, but also retain the original data of the influence factors, which is convenient for the later analysis of individual influence factors.
In light of this, this study proposes a hybrid machine learning model (RF-IGWO-SVR model) combining random forest (RF), improved gray wolf optimizer (IGWO), and support vector regression (SVR). Firstly, the RF algorithm screens out the main factors influencing the amount of gas emission from coal mines, which reduces the computational work of the prediction model and improves the prediction accuracy. Secondly, the nonlinear convergence factor and DLH search strategy are used to optimize the GWO algorithm to avoid the algorithm from falling into the local optimum. The nonlinear convergence factor can improve the later searchability of the population, and the DLH search strategy can enrich the diversity of the population. Finally, the parameters of the SVR algorithm are optimized using the IGWO algorithm, which improves the stability and prediction accuracy of the prediction model. The main contributions and innovations are as follows: i. This study uses a hybrid machine learning model for predicting coal mine gas emission with a higher prediction performance, which provides a new method for predicting coal mine gas emission. ii. There are many factors influencing the amount of gas emission, and the dimensionality of the sample data is too high, so it is necessary to reduce the dimensionality of the data before prediction. In this study, the RF algorithm is used to select the influencing factors, which not only preserves the original values of the influencing factors, but also identifies the main influencing factors affecting coal mine gas emissions. iii. In this study, the GWO algorithm is improved, and the IGWO algorithm is used to better optimize the parameters of the SVR, which provides a reference for the optimization of SVR parameters.
The structure of this study is as follows: Sect. 2 introduces the basic theory of the algorithm and the process of building the RF-IGWO-SVR model. Section 3 investigates the accuracy of the model and designs experiments to compare with other models. Section 4 presents the conclusion.

Methodology
Section 2.1 describes the data preprocessing methods. Sections 2.2-2.4 introduce the basic principles of the algorithms used, among which are the RF algorithm, the GWO algorithm and the SVR algorithm. Section 2.5 describes the process of improving the GWO algorithm. Section 2.6 shows the detailed process of model construction. Section 2.7 shows the method for assessing the accuracy of the model.

Data pre-processing
Normalization of data is an important method of data preprocessing. Normalization is a dimensionless treatment that turns the absolute values of physical system values into relative value relationships. The normalization interval in this study is [0,1], and the normalization process is formulated as follows. (1) In Eq. (1), X is the raw data of the influencing factor.X min is the minimum value in the data, X max is the maximum value in the data, and Y is the normalized output value.

Random forests (RF)
A combinatorial classifier technique called random forest consists of many decision tree models (Makungwe et al. 2021;Abu El-Magd et al. 2021). The main goal of its influencing factors selection is to first determine the value that each influence factor contributed to each decision tree. This value is quantified by the out-of-bag data error, where out-of-bag data refers to the data that is not always selected when decision trees are formed. These are the precise steps: Step 1: Suppose the initialization sets the number of decision trees to N. For each decision tree, the corresponding out-of-bag data is selected, and its data error is calculated, with the error for each decision tree noted as Err OOB1 , Err OOB2 , ……, and Err OOBN .
Step 2: Add noise interference to each influence M for outof-bag data, recalculate the out-of-bag data error, and note as Err OOBM1 , Err OOBM2 ,……, and Err OOBMN .
Step 3: Eq. (2) is used to find the importance I M of each influencing factor and the influencing factors are ranked by their importance magnitude.

Support vector regression (SVR)
Support vector regression (SVR) is an important application branch of support vector machines (Cortes and Vapnik 1995). The fundamental idea is to execute a non-linear mapping of the predicted data before solving the regression issue in the high dimensional space. This transfers the data from low latitude space to high dimensional space (Bakhtiar et al. 2020;Dendi and Channappayya 2020). The SVR algorithm is widely used in various fields of forecasting and is suitable for predicting small sample data. The SVR algorithm steps are as follows: Suppose the training sample set is S = {(x i , y i )}, i = 1, 2, ⋯ , m . Among them, x i is the input vector and y i is the output vector, the decision function expression of the SVR model is as follows.
In Eq. (3), T is the weighting factor, (x) is a nonlinear mapping function and b is the bias amount.
The process of solving the decision function can be seen as solving the minimization process, the expressions are as follows. (2) Among them, c is the penalty factor, i and * i is the slack variable and is an insensitive parameter.
Applying Lagrange's equation and pairwise theory, expression (4) can be converted into a pairwise problem for the SVR algorithm, the expressions are as follows.
In Eq. (7)-(9), i , * i , j , and * j is the Lagrangian equation multiplier and K(x i , x j ) is the kernel function. The Radical Basis Function (RBF) kernel function is used as the kernel function of the SVR algorithm. The RBF kernel function is a typical local kernel function with excellent local interpolation capability, which is conducive to improving the computational power of the algorithm. The RBF kernel function is expressed as follows.
In Eq. (10), g is the kernel function parameter. The decision function of the SVR algorithm is transformed as follows.
The values of the penalty factor c and the kernel function parameter g of the SVR algorithm are chosen to directly affect the overall performance of the algorithm, so the two parameters of the penalty factor c and the kernel function parameter g need to be optimized for optimization.

Grey wolf optimizer (GWO)
The grey wolf optimizer (GWO) algorithm is an algorithm proposed by Mirjalili in 2014 (Mirjalili et al. 2014). The grey wolf algorithm classifies wolves in a pack into a total of four classes, from highest to lowest a , , , and . Its hunting process is mainly divided into tracks, encircling, and hunting (Ramadan et al. 2021; Adhikary and Acharyya 2022). The formula for the tracking process is as follows: In Eq. (12)-(15), t represents the tth iteration, X p (t) and X(t) represent the tth iteration prey position and gray wolf position, A and C are the convergence factor and the coefficient constants, Maxiter is the maximum number of iterations, r 1 and r 2 is a random number belonging from 0 to 1.
The encircling process formula is expressed as follows.
In Eq. (16), D , D and D denote the distances between wolves a , , , and wolf , respectively. X (t) , X (t) , and X (t) represent the location of wolves a, , and , X(t) is the location of wolves.
The hunting process formula is expressed as follows.

Nonlinear convergence
In the original GWO algorithm, the convergence factor a decreases linearly in the interval [0, 2], this will affect the local search capability of the model later. After researching, transforming the convergence factor a from linear convergence to nonlinear convergence helps to improve the overall merit-seeking ability of the GWO algorithm. A cosine function is used to convert the convergence factor a to a nonlinear convergence factor, and the specific mathematical expression is as follows.
By using a maximum iteration value of 100, the nonlinear convergence factor is compared to the original convergence factor, and the detailed findings are displayed in Fig. 1.
According to Fig. 1, the value of the improved nonlinear convergence factor has been higher than the value of the original convergence factor throughout the iterative process, making it more advantageous to broaden the population's search space and prevent it from settling on local optimal solutions.

DLH search strategy
The DLH search strategy is an improved idea proposed by Nadimi-Shahraki in 2020 (Nadimi-Shahraki et al. 2020). The original GWO algorithm leads to a reduction in population diversity at a later stage and is prone to fall into local optimal solutions. The introduction of the DLH search strategy in the population increases the diversity of the population and improves the optimal solution-finding ability.
The expression for the gray wolf population location update using the DLH search strategy is as follows.
In Eq. (20), X DLH (t + 1) is the new result generated after the wolf adopted the DLH search strategy, X r (t) is a random selection from a , , and wolves, X n (t) is a randomly selected proximity wolf from N t .The mathematical equation of N t is expressed as.
In Eq. (21), X j (t) ∈ , , , D is the distance between X(t) and X j (t) , X GWO (t + 1) is the location update of the original GWO. Finally, comparing X GWO (t + 1) and X DLH (t + 1) , and the best one is selected to perform the location update. (20)

RF-IGWO-SVR model building
Step 1: Collect the raw data of coal mine gas emission and normalize the raw data.
Step 2: The RF algorithm is used to measure the importance of each influencing factor of gas emission and to select the main influencing factors.
Step 3: Set the initial parameters of the GWO algorithm, the number of grey wolves, and the maximum number of iterations.
Step 4: Find the fitness value of each wolf in the pack and classify the pack into four types: , , , and .
Step 5: Update the convergence factor a according to Eq. (19) and update the value of A and C according to Eq. (14).
Step 6: Prey encirclement and hunting processes in grey wolf populations according to Eqs. (16) and (17). Step 7: Grey wolf population location updates were carried out according to Eq. (18).
Step 8: The update of grey wolf population locations using the DLH search strategy was carried out according to Eq. (20) and compared with the grey wolf population locations obtained by process step 6 to retain the optimal locations.
Step 9: Processes step 5 to step 8 repeated until the maximum number of iterations is reached, and the run is   stopped to find the optimal position of the grey wolf population. The coordinate components of the optimal position correspond to the values of the penalty factor c and the kernel function parameter g . The values of c and g are brought into the SVR algorithm to construct the RF-IGWO-SVR model.
The flow chart of the RF-IGWO-SVR model building process is shown in Fig. 2.

Evaluation of model accuracy
The evaluation of the model is a key step in model prediction. In this study, mean absolute percentage error(MAPE), mean absolute error(MAE) and root mean square error (RMSE) are used to evaluate the effectiveness of the predictive model for coal mine gas emission. Lower values of MAPE, MAE, and RMSE indicate higher accuracy of the prediction model, and these metrics are calculated as follows.
In Eq. (22)-(24), N is the total number of test sets, y n is the actual value of gas emission, y * n is the predicted value of gas emission.

Experiments and results analysis
In this section, we present the data sources and design relevant experiments to verify the superiority of the RF-IGWO-SVR model performance.

Data source
Thirty groups of historical gas emission detection data from the Lin Sheng coal mine located in Shenyang, China, are selected for prediction and analysis (Dong et al. 2016). The first 25 groups of data are used as the training set, and the last 5 groups of data are used to test the accuracy of the prediction model, and the relevant parameters are shown in Table 1 (Dong et al. 2016). Among them, the influencing factors of gas emission include coal seam depth X 1 (m), coal seam thickness X 2 (m), coal seam dip angle X 3 (°), the original gas content of mining seam X 4 (m 3 ·t −1 ), coal seam spacing X 5 (m), mining height X 6 (m), adjacent layer gas content X 7 (m 3 ·t −1 ), adjacent layer thickness X 8 (m), interlayer lithology X 9 , working face length X 10 (m), advancing speed X 11 (m·d −1 ), extraction rate X 12 (%), daily production X 13 (t·d −1 ), and gas emission expressed as Y (m 3 ·min −1 ). The vectors of influencing factors in the sample space have different physical meanings and magnitudes in Table 1, so the data need to be normalized according to Eq. (1) before the experiment. The normalized data are shown in Table 2.

RF algorithm influencing factors screening
The influencing factors of gas emission have high-dimensional complexity, which will affect the prediction accuracy. To improve the prediction accuracy of the model, it is necessary to screen the influencing factors of gas emission in advance. The RF algorithm is used to select the influencing factors and obtain the importance of the influencing factors. The detailed results of the importance ranking of the influencing factors are shown in Table 3.
From Table 3, the ranking of factor importance in descending order is X 11 , X 12 , X 6 , X 4 , X 2 , X 1 , X 13 , X 3 , X 8 , X 9 , X 10 , X 7 , and X 5 , with the sum of factor importance of the 13 influencing factors being 1. To select the best combination of factors, 13 different sets of influencing factors are generated by increasing one by one from X 11 to X 5 in order of the importance of the factor values, and the sets are substituted into the IGWO-SVR model in turn. MAPE is used as a criterion to assess the accuracy of the model, and the detailed results are shown in Fig. 3. From Fig. 3, the value of MAPE drops to the lowest when the number of influencing factors is 6. Therefore, the first 6 influence factors are selected as the combination of input influence factors for the prediction model, which is advancing speed X 11 , extraction rate X 12 , mining height X 6 , original gas content of mining seam X 4 , coal seam thickness X 2 , and coal To further explore the correlation between the six influencing factors screened for factors and gas emissions, the curve fitting toolbox cftool in MATLAB was used to analyze the relationship, and Fig. 4 displays the complete outcomes.
From Fig. 4, the linear regression coefficients of determination (R 2 ) for advancing speed, extraction rate, mining height, original gas content of ming seam, coal seam depth, and coal seam thickness are 0.8025, 0.8888, 0.9289, 0.9051, 0.9164, and 0.8817 respectively. There is a significant association between the 6 influencing factors and gas emission, as shown by the R 2 of the factors being larger than 0.8, among which are the mining height, the original gas content of the mining seam, coal seam depth, and coal seam thickness show a positive correlation with gas emission, and advancing speed and extraction rate show a negative correlation with gas emission.

IGWO-SVR model parameter selection
The initial setup parameters required for the IGWO-SVR model are the number of populations and the number of iterations. Data processed by the RF model are divided into a training set and a test set, with the first 25 data sets being the training set and the last 5 data sets being the test set. The number of iterations is set to 50.
The number of wolves is increased from 2 to 50 in succession, and MAPE is used as an index to assess the accuracy of prediction, and the detailed results of the relationship between the number of wolves and MAPE are shown in Fig. 5. When the number of wolves is from 2 to 35, the value of MAPE is still in an unstable state; when the number of wolves is from 36 to 50, the value of MAPE starts to maintain a stable state at the lowest point, so the number of wolves is set to 36 during initialization.

Experiment 1
The influencing factor dataset screened by the RF algorithm was selected as the input variable, and the first 25 groups and  Test set sample number Actual value RF PCA Unprocessed data the last 5 groups of the input variable are split into training and test sets, respectively. In order to evaluate the effectiveness of RF algorithm in data dimensionality reduction, PCA-IGWO-SVR model and IGWO-SVR model are established to compare with RF-IGWO-SVR model respectively, and PCA is performed according to the process of the paper (Ren et al. 2021). Figure 6 shows the comparison between the predicted and actual values of coal mine gas emission, and Fig. 7 shows the results of the absolute error of the predicted values. Figure 6 and 7 show that the RF-IGWO-SVR model has a smaller error than the PCA algorithm processed data for test set samples 1, 2, 3, and 5 test points, except for the 4th test sample where the error is higher than the PCA-IGWO-SVR model, and the overall stability of the RF-IGWO-SVR model is better than the PCA-IGWO-SVR model.
The RF-IGWO-SVR model outperformed the IGWO-SVR model at all five test points when compared to the unprocessed raw data. In summary, the data set processed with the RF algorithm is more conducive to improving the accuracy and stability of the model, and the RF-IGWO-SVR model has high prediction accuracy when applied to coal mine gas emission prediction.
To further test the superiority of the predictive capability of the RF-IGWO-SVR model, the RF-GWO-SVR, RF-SVR, PCA-IGWO-SVR, PCA-GWO-SVR, PCA-SVR, GWO-SVR and SVR models are selected for comparison and the results of different model predictions of coal mine gas emission are shown in Table 4 and Fig. 8. Table 4 shows that the MAPE, MAE, and RMSE of the RF-IGWO-SVR model are 1.55%, 0.0759, and 0.1103, respectively, and this model had better predictive  performance than the other eight comparison models. Compared with the model using PCA processed data as input variable, the MAPE of IGWO-SVR, GWO-SVR and SVR models with RF algorithm for data processing are reduced by 0.75%, 0.47%, and 0. 47%, respectively; compared with the model using unprocessed data as input variable, the MAPE of IGWO-SVR, GWO-SVR and SVR models with data processing using the RF algorithm are reduced by 1.2%, 0.94%, and 7.14%, respectively. The above results again indicate that pre-processing the sample data using the RF algorithm is beneficial to improve the prediction accuracy of the models. Under the same data processing conditions, the IGWO-SVR model has better prediction performance than the GWO-SVR model and the SVR model. When the data is processed with the RF algorithm, the MAPE of the IGWO-SVR model is reduced by 0.31% and 1.88% compared to the GWO-SVR model and SVR model, respectively. When the data is processed with the PCA algorithm, the MAPE of the IGWO-SVR model is reduced by 0.03% and 1.6% compared to the GWO-SVR model and SVR model, respectively. When using unprocessed data, the MAPE of the IGWO-SVR model is reduced by 0.05% and 7.82% compared to the GWO-SVR model and SVR model. The above results indicate that the IGWO algorithm has a better search capability and can better find the optimal parameters of the SVR model.

Experiment 2
The comparison in Experiment 1 is only within the SVR system to compare with each other. To further test the superiority of this hybrid model, a comparison with previous studies is needed, so Experiment 2 compares the IGWO-SVR model with the SVR model, ELM model, and BP model, and the input data used are the RF algorithm processed data. Table 5 and Fig. 9 show the comparison results of the different models in Experiment 2. Table 5 demonstrates that the values of MAPE, MAE and RMSE of the IGWO-SVR model are smaller than those of the other comparative models, which again validates that the IGWO-SVR model has the advantage of high prediction accuracy. In addition, the values of MAPE, MAE and RMSE of SVR model are smaller than those of BP and ELM models, which indicates that SVR model is suitable for prediction of small samples such as coal mine gas emission.

Experiment 3
The data used in this study are small samples of coal mine gas emission, and Experiment 3 is constructed to further verify the reliability and stability of the model. The K-fold cross-validation method is used to group the sample data into the IGWO-SVR model for training and comparison with other models, and the detailed results are shown in Table 3. K-fold cross-validation divides the original data into K groups and extracts one subset as a test set without repetition, and combines the remaining K-1 subsets as the training set (Bengio et al. 2004)]. K-fold cross-validation avoids the occurrence of over-and under-learning, and the final results are more convincing. In this study, the sixfold cross-validation method is selected, and the procedure is shown in Fig. 10. Table 6 shows that the prediction results obtained by the RF-IGWO-SVR model after cross-validation are better than other comparative models, which validates the stability of the RF-IGWO-SVR model for accurate prediction.

Conclusions
In this study, a novel hybrid machine learning prediction model (RF-IGWO-SVR model) is proposed for the problem of gas emission prediction with small samples. In the prediction process, the RF algorithm is used to screen the influencing factors, which not only improves the prediction accuracy and stability of the model, but also retains the original information of the influencing factors. The nonlinear convergence factor and DLH search strategy are used to improve the GWO algorithm, which improves the overall search capability and avoids the model falling into local optimal solutions. The IGWO algorithm is used to optimize the penalty factor and kernel function parameters in SVR, which improves the prediction accuracy of the model. The experimental results show that the mean absolute percentage error, mean absolute error and  root mean square error of the RF-GWO-SVR model are 1.55%, 0.0759 and 0.1103, respectively, and this result is better than other comparative models, indicating that the RF-IGWO-SVR model can predict the actual situation of coal mine gas emissions more accurately.
In future work, we will further explore the prediction model with better performance, and will collect more data to validate the model, and finally make it applied to coal mine gas emission prediction.

Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declared that they have no conflicts of interest regarding this work.