An evaluation of the mine water inrush based on the deep learning of ISMOTE

In order to establish an effective coal mine floor water inrush prediction model, a neural network prediction method of water inrush based on an improved SMOTE algorithm expanding a small sample dataset and optimizing a deep confidence network was proposed. ISMOTE is used to enlarge the coal mine’s measured data collection, while PCA is used to minimize the data’s dimension. DBN is used to extract water inrush data features and estimate water inrush danger in coal mines. As the water inrush samples are small, they cannot provide enough information about the occurrence of water inrush accidents, which affects the DBN analysis of water inrush accidents, reduces the prediction accuracy, and causes safety hazards when mining in the coal mines. An improved SMOTE algorithm is used to expand the dataset. The DBN network is used to extract the secondary features of the nonlinear data after processing. Finally, a prediction model is established to predict coal mine water inrush. The superiority of this method is verified by the comparison between the actual condition and the prediction of the measured working face in a typical mining area in North China. The prediction accuracy of coal mine water inrush obtained by the model proposed in this paper is 94%, while the prediction accuracy of traditional BP algorithm is 70%, and the prediction accuracy of SAE algorithm is 91%, better than the rates of other methods. The findings of this study can be used to better predict and analyze coal mine water inrush accidents, improve the accuracy of water inrush accident prediction, and encourage the use of deep learning in coal mine floor water inrush prediction, all of which have theoretical and practical implications.


Introduction
North China-type coal mines mostly conduct deep coal mining in the Carboniferous Taiyuan Formation. Because it is close to Ordovician limestone, water inrush accidents occur from time to time. Therefore, predicting coal mine water inrush is a necessary part of coal mine safety production. The research and development of coal mine water inrush prediction is based on the study of the coal seam water inrush mechanism. Through research on the water inrush mechanism, many scholars at home and abroad have determined the water inrush index system and clarified the related factors of water inrush. A regression analysis (Shi and Han 2004;Du et al. 2014;Weitao et al. 2015;Liu et al. 2009), classification technology (Liu et al. 2011), a geographic information system (Li and Zheng 2010;Ma et al. 2018), a support vector machine (Shi et al. 2017;Cao and hao 2011;Zhigang et al. 2008;Qiao 2010), a neural network (Zhao and Hu 2014;Zhao et al. 2013;Shi 2012), random forest (Zhao et al. 2018), and other data analysis algorithms have been applied to coal mine water inrush prediction. They analyze and evaluate the probability of water inrush accidents and provide data support for the safe production of coal mines.
It is challenging to get data since there are so many variables that affect water inrush. Little sample water inrush data cannot provide enough information on water inrush characteristics, which leaves the prediction model with insufficient accuracy. Simultaneously, the existing water inrush prediction algorithms based on data analysis are primarily based on BP neural network, SVM, ELM, etc., which makes it challenging to excavate the complex mapping relationship between water inrush characteristics and accident occurrence, leading to low prediction accuracy of the constructed model and inadequate generalization ability.
In order to reduce the effect of created noise samples on the prediction model, this study proposes a modified SMOTE approach, builds the feature space for data sampling, enhances the diversity of synthetic samples, and adds a few class sample screening mechanism in the training process. A deep confidence network-based water inrush prediction model is suggested in an effort to address the shortcomings of current water inrush prediction algorithms that find it challenging to uncover the intricate mapping link between water inrush characteristics and accident occurrence. To ensure that the characteristics of water inrush incidents could be accurately retrieved and displayed, the characteristics of water inrush were transferred and optimized layer-by-layer using the multilayer perceptron structure.

Contributions
In this paper, a neural network prediction method of mine floor water inrush based on improved SMOTE algorithm extended small sample dataset and optimized depth confidence network is proposed. ISMOTE method was used to expand the amount of measured data collected in coal mine, and DBN was used to extract data characteristics of coal mine water inrush and estimate the risk of mine water inrush. As far as we know, there are few literatures focusing on deep learning-based water inrush prediction methods. At present, deep learning has not been introduced into coal mine water inrush prediction. The main contributions of this paper are as follows: (1) Based on the mechanism of water inrush from coal floor and the main hydrogeological inducing conditions, the causes of water inrush from coal floor and water inrush channel are analyzed and studied. The driving effect of surrounding geological conditions such as aquifer, waterproof layer, coal seam, and geological structure on the occurrence of floor water inrush accident is clarified, and the main geological and hydrological factors affecting the occurrence of floor water inrush accident are determined.
(2) Use SMOTE algorithm to expand the dataset, and use spatial random interpolation to improve the oversampling technology, improve the diversity of synthetic sample characteristics, and add sample screening step to ensure that the generated data are the required category of prediction model training.
(3) In order to solve the difficulty of feature extraction of water inrush samples from coal mine floor, a depth feature extraction model is introduced. Deep confidence network (DBN) is used to model and analyze water inrush accident from coal floor. The water inrush data of typical mining areas in North China are used as the training set to train the prediction model, and the superiority of the proposed method is proved by comparing with the commonly used underground water inrush prediction algorithm. (4) As floor water inrush samples are high-dimensional and small sample data, PCA algorithm is adopted to transform high-dimensional water inrush samples into lowdimensional and easily processed data, so as to achieve accurate prediction of target variables of floor water inrush accidents.

Water accident assessment model
The occurrence of water inrush accidents in coal mines is the result of a variety of influencing factors. There is an interaction between the influencing factors of the accident, which makes the water inrush accident, and its influencing factors form a nonlinear system, which cannot be controlled by classical mathematical models. It makes precise mathematical expressions. Research on the law of water inrush in our country began in the 1960s. The Ministry of Coal proposed the water inrush coefficient method and established the empirical formula of the water inrush coefficient (Hao and Jingming 2015). In the 1980s, Professor Jing Zigang of Shandong University of Science and Technology proposed the "lower three belts" theory (Zhang and Liu 1990). In the 1990s, Dr. Tianquan Liu and Jincai Zhang of the General Coal Mine Institute proposed a "two-zone" model, which made them believe that the floor rock mass is composed of a mining water-conducting fissure zone and a floor water-repellent zone (Shi and Han 2005;Duan 2012). In the twenty-first century, Professor Longqing Shi of the Shandong University of Science and Technology presented the theory of the "lower four belts" on the basis of the theory of the "lower three belts" (Minggao et al. 1995). In the 1990s, the Institute of Geology of the Chinese Academy of Sciences presented the "Strong Permeability Channel" theory, which made them believe that the existence of a water inrush channel was the key for water inrush to occur (Li 2014). Academician Qian Minggao of the China University of Mining and Technology proposed the KS (Key Strata) theory about the key layer of the stope floor rock mass based on the layered structural characteristics of the floor rock layer (Shi et al. 2015). It can be deduced from the above theory that the problem with water inrush from the coal seam floor is mainly based on the hydrogeological environment, which is caused by a combination of factors, such as water barrier conditions, structural conditions, aquifer conditions, and coal mining methods. Based on the summary of the previous studies, the analysis shows that the main factors affecting the water inrush from the coal floor are the five first-level indicators of aquifer conditions, aquifer conditions, coal seam conditions, structural conditions, and mining conditions, as well as the corresponding 12 secondary indicators. Table 1 shows the influencing factors of coal seam water inrush.

SMOTE
The SMOTE algorithm is a classic oversampling algorithm that was proposed by Chawla et al. in 2002. Its main idea is to achieve the balance of the dataset samples by manually increasing the number of small samples, thereby improving the performance of subsequent algorithms.
In the unbalanced dataset S, for the minority sample X in the dataset, the m nearest neighbor samples in the minority sample are searched based on the Euclidean distance. We set the oversampling magnification of the dataset to n and randomly select n samples (m > n) from the searched m nearest neighbor samples. We mark the selected n nearest neighbor minority samples as y i1 , y i2 , … , y in , and perform random linear interpolation on the minority sample X and the selected y i (i = 1, 2, … , n) to obtain the interpolation data. The interpolation formula for SMOTE is

Improved SMOTE algorithm based on feature space
Compared with the simple oversampling method and random upsampling method of copying data, the SMOTE algorithm uses linear interpolation to process minority samples based on set rules, while also synthesizing new samples to expand the dataset.
(1) However, the SMOTE algorithm also has many problems. There are noisy data in the actual sample set, which we cannot distinguish from the noisy data, and the unified oversampling operation causes the amount of noisy data to expand. In addition, the linear random difference also occurs when the area in the minority sample dataset is denser, and the area with sparse sample data is still relatively sparse, especially when the minority edge samples are in the sparse area, which will lead to insufficient classification information and fuzzy positive and negative data boundaries.
In response to these problems in SMOTE, this chapter proposes a new and improved SMOTE algorithm based on sampling feature space. We randomly select a minority sample S 1 in the minority sample space S, find the Euclidean distance d between each minority sample and the selected sample S 1 in the sample space S, find the sample point S new with the largest d, and form a sample space, as shown in Fig. 1. When we suppose S 1 = (x 1 , y 1 ) , the farthest Euclidean distance sample point in the minority sample set is S d = (x 2 , y 2 ) , and a data point is randomly sampled in the feature space as the composite sample point S new = x new , y new , where x new = random x 2 , x 1 y new = random y 2 , y 1 . It can be seen that in contrast with the traditional SMOTE algorithm, the new synthetic sample point is located in the sampling space formed by the selected sample. Additionally, its farthest distance sample point in the minority class is set, rather than on the line between the two points ( Fig. 1 red cross).
To prevent a composite sample from occurring within most class areas, A = mean (D (Snew, MinorS)) , B = mean (D (Snew, MajorS)) . A defined as generated sample, and A few sample average Euclidean distance between the sample B is defined as generated, and the majority of the average Euclidean distance between the samples, if A greater than B the generated sample is closer to the minority class area, save the generated data, if it is A less than B may be in the majority class area, will generate the data deleted.
The steps to improve the SMOTE algorithm are as follows: (1) The first step is to define K as the selected parameter (usually the difference between the number of majority data and the number of minority data), assuming that the minority samples in the sample dataset are {S 1 , S 2 , … , S n }.
(2) The second step is based on the Euclidean distance, where we find the randomly selected minority sample S 1 x 1 , y 1 and the longest distance point S2 x 2 , y 2 in the minority dataset.
(3) The third step is to form sampling space P based on the selected minority sample and its farthest distance point. In addition, random interpolation is performed in the sampling space to generate a new synthetic sample S new [random(x 2 , x 1 ), random y 2 , y 1 ]. (4) The fourth step is to calculate the average distance A, B between the synthetic sample and the majority dataset and the minority dataset. (5) The fifth step is that we save the generated sample and add it to the minority dataset if A > B; if A < B, we delete the generated data. (6) The sixth step occurs when the number of iterations reaches K. The program then ends and stops; otherwise, we skip to step (2). Figure 2 shows the situation where the boundary between the minority data and the majority data is not clear, and when the minority samples are distributed in a scattered area. According to Fig. 2, because the traditional SMOTE algorithm uses linear random interpolation when generating samples, the data points are in the sparse sample area.  Figure 3 shows that when noise points exist and the minority data are surrounded by the majority data area, the improved SMOTE algorithm can still perform better oversampling operations and does not generate additional noise points to further damage the quality of the dataset.
The optimization of improving SMOTE algorithm put forward in this paper mainly focuses on two aspects, (1) The sampling feature space is established by selecting the farthest distance sample points within the dataset for a few class samples, reducing the disadvantages of increasing sample area density caused by linear random sampling, improving the distribution diversity of new synthesized minority class data, and clarifying the fuzzy boundary of positive and negative class data after oversampling.
(2) Propose a synthetic sample screening method to ensure that the synthesized sample after oversampling is the required small sample data, reduce the influence of noise points and isolated points on the algorithm, and improve the quality of a few class datasets.

The PCA data dimensionality reduction
PCA is a dimensionality reduction algorithm, which reduces the dimensionality of the original data and extracts the main information in the original data by integrating and classifying the comprehensive indicators. The PCA algorithm is used to perform nonlinear dimensionality reduction on the main control factors of coal mine water inrush and to standardize the data proof of the coal mine's actual sampling. SPSS software is used to perform principal component analysis on the corresponding measured data. The selection criterion of principal components is that the cumulative variance contribution rate must exceed 80%.
Since the cvcp value of the first to the sixth principal component is approximately 83%, these six components contain most of the information required for water inrush prediction, and thus, the first six components are used for floor water inrush evaluation. Table 2 shows the contribution rate and cumulative contribution rate of principal components.

Deep belief network
The deep belief network is a probabilistic generation model, which is composed of a stack of multiple restricted Boltzmann machines and a classification or regression layer at the top. Through forward learning and combined with the reverse fine-tuning mechanism of gradient descent, it can achieve a comparatively accurate model training accuracy.

Restricted Boltzmann machine
The restricted Boltzmann machine (RBM) is a probabilistic sudden model that can be explained by a random neural network. In the classic RBM structure, the neurons in the same layer are not related to each other. This structure is developed on the basis of the Boltzmann machine (RM), which solves the shortcomings of the slow training speed of the traditional RM and the training speed of the network (Gao and Ma 2016). The structure of RBM is shown in Fig. 4. RBM is composed of two layers of neurons. There are undirected full connections between different neurons, and there is no connection between neurons in the same layer. The data are input by the visual layer and output by the hidden layer after training through the neurons and the weight matrix.
When the element node (v, h) is given, the energy function of RBM is as follows: (2) Based on the energy function, the probability distribution under the condition Θ = w n×m , a, b can be obtained as follows: After the activation function sigmoid, the activation probability of h and v is obtained as follows: The core formula of the RBM algorithm is the activation formula of h and v. The data are input by the visual layer, and the feature indicators are mapped from the visual layer to the hidden layer neurons through Formula (6). Then, the output value obtained is passed through Formula (7). We reconstruct to the visual layer v, calculate the error between the reconstructed data in the original data domain, and adjust the weight parameter between the visual layer and the hidden layer through the error minimization rule so that the reconstructed data can be maximized. The original input data are represented to achieve the goal of feature extraction. The training process of the RBM algorithm actually involves solving the Markov maximum likelihood estimation problem to maximize the value of the P Θ(v) by adjusting the internal parameters of the RBM under the condition of the fixed data input.

DBN network structure
A DBN is formed by stacking multiple RBMs to construct a typical DBN network model. Compared with the shallow neural network, this stacked DBN structure has a deeper network level and a better model generalizability. Traditional neural networks rely on the selection of data features, while DBNs can extract hidden features in the input data by setting up multiple hidden layers (Wu et al. 2020).
The DBN network is composed of stacked RBMs, and the backpropagation algorithm used at the top. The algorithm training process is divided into two parts: pretraining and parameter fine-tuning. In pretraining, the input data are trained layer-by-layer by the underlying RBM, and the output of the previous layer is used as the input data of the RBM's higher layer. This structure can effectively filter out the feature information (Fig. 5). The parameter fine-tuning process is the overall tuning, which is supervised training. The error between the expected data in the output data domain is compared and propagated back layer-by-layer to fine-tune the entire network parameters.
For DBN model, observation data X and L layer hide layer nodes H 1 , H 2 , … H L . The joint probability distribution formula of is The training goal of deep confidence network is to solve the joint probability distribution P(x, h 1 , … , h l ) , and the training process is mainly to calculate the posterior probability model P h k| | h k−1 . The training process of deep confidence network is divided into two steps. The first step is pretraining of DBN model, and the initial parameters of DBN model are obtained by greedy training method. The second step is fine-tuning, using labeled data to conduct global training with supervised learning algorithm, and adjusting the weight of the network.
The difficulties of current water inrush prediction algorithms in mining the complicated mapping relationship between water inrush properties and accident occurrence have led to the proposal of a water inrush prediction model based on deep confidence networks. The multilayer perceptron structure transmits and optimizes the water inrush characteristics layer-by-layer to guarantee that the information features can be accurately extracted and expressed.

DBN model training
The improved SMOTE algorithm proposed in this paper is used to expand 100 sets of data into 300 sets of water inrush datasets, and the PCA is used to reduce dimensionality. The reduced-dimensional data are input into the DBN for pretraining. The pretraining first initializes the weight matrix between each layer, traverses the input vector and the hidden layer neuron nodes, and then outputs the neuron parameters after the first RBM is trained. As the input vector of the second RBM, it is finally passed layer-bylayer to the highest layer. According to the results of the pretraining output layer, the error between it and the expected output backpropagation from each output layer to the hidden layer updates the parameters of each layer (Chen et al. 2017).
The advantage of the model lies in the use of DBN abstraction to extract output data features and a neural network as the top-level unit of DBN to predict water inrush after extracting new features. The model prediction process is shown in Fig. 6.
A four-layer RBM network is established, the number of input layer nodes is determined by the data dimension, and the number of hidden layer nodes is obtained by the "trial and error method."

Algorithm verification
The proposed method is used to test the water inrush data of the measured working face in a typical mining area in North China, and the water inrush situation is predicted and compared by DBN, SVM, BP, and other classic algorithms. The three types of modeling use the same number of samples, which are all data after oversampling. After the original data are processed by the PCA, the dimensionality of the feature values is reduced to 6.
The data in Table 3 are entered into the DBN model, and the results are shown in Table 4 and Fig. 7. There are three incorrect predictions, which mean that the correct rate is 94%. The reason for the incorrect prediction sample may be the result of an insufficient sample size and missing features in dimensionality reduction. In the training process, better dimensionality reduction methods can improve the accuracy of the algorithm. The correct rate of the BP neural network trained with original features is 70%, the correct rate of the BP neural network using oversampling data is 80%, the correct rate of the water burst coefficient method is 60%, and the SVM algorithm using the improved SMOTE oversampling data is correct. The rate is 88%, and the accuracy rate of the DBN algorithm trained with the unexpanded data training set is 85%. The accuracy rates of the water inrush risk prediction models proposed in this paper are better than the rates of these methods.
ISMOTE-DBN algorithm has also achieved ideal results, mainly because ISMOTE algorithm can effectively expand the dataset of water inrush, making up for the shortcomings of the measured mine data samples being few and unbalanced, while DBN can better extract the feature vector of water inrush data, providing effective data support for the prediction algorithm.
The current coal mine water inrush prediction methods mostly employ shallow neural network models such support vector machine, BP, Fisher, and random forest. However, the majority of these techniques only define the accident mechanism in terms of one or a few particular water inrush-related elements, making it impossible to fully evaluate coal mine water inrush accidents. The shallow neural network falls short of the deep learning model when it comes to extracting the intricate nonlinear aspects of the water inrush accident and figuring out how each feature relates to the accident. As can be observed from the findings in Fig. 7, the deep learning feature extraction technique DBN is superior in prediction accuracy, F1 value, recall rate, and other areas when compared to the current mainstream prediction methods SVM, SVR, BP, and their upgraded algorithms. The method has the highest chance of properly predicting both positive and negative samples in the original dataset when compared to the other algorithms. This is mostly due to DBN's ability to comprehensively extract the associated characteristics of water inrush accidents, handle nonlinear data in the prediction modeling of water inrush, and make clear the connection between geological and hydrological characteristics and water inrush accidents. The ISMOTE method presented in this research is used to increase the dataset at the same time in order to address the shortcomings of the small number of water inrush data samples and data imbalance. The dataset preprocessed by the ISMOTE algorithm effectively addresses the shortcomings of the original data imbalance and small number of samples, and improves the prediction accuracy of the classification prediction algorithm, as can be seen from the comparison of the original data and the expanded dataset prediction results of the DBN and BP algorithms in Fig. 7. Being can make the conclusion that the proposed expansion of the depth of the datasets using the algorithm of ISMOTE learn DBN coal mine water inrush prediction model has higher prediction accuracy, is more suitable for the analysis of coal mine water inrush accident forecast, by comparing the existing shallow neural network model of water inrush and the deep learning model, as well as the expansion of dataset and the original dataset of the prediction model.

Conclusion
There are too many factors affecting the risk of water inrush from the coal seam floor and the redundant data. The use of principal component analysis reduces the data dimension without compromising the integrity of the data and saves the cost of training algorithms. Compared with the BP neural network that uses the original features for training, the DBN prediction model more effectively extracts the water inrush impact features in the original data, improves the training accuracy and the generalization performance of the model, and eliminates the defects of traditional algorithm feature selection. The hidden features in complex hydrogeological information are extracted, missing data and noise data are effectively filtered, and a more reliable water inrush accident evaluation model is established. The case analysis shows that the predicted value of the model is consistent with the actual coal mine water inrush situation, and the following conclusions are obtained. (1) Multidimensional redundant input data complicate the structure of the deep confidence network. Principal component analysis is used to reduce the dimensionality of the data, and the nonlinear features of high-dimensional data are extracted and input into the deep confidence network to simplify the network structure. In addition, the accuracy of the model is improved.
(2) Compared with the traditional network model, the DBN model based on the improved SMOTE algorithm proposed in this paper has the highest prediction accuracy. In follow-up research, we can also start from the structure of the DBN network itself, optimize the network model, integrate other algorithms, and further improve the model accuracy.
The use of improved SMOTE method in this paper alleviated the defect of small sample size, and with the addition of screening method, effectively reduced the majority of class samples, but because some abnormal training samples are in a few class sample set, this will cause the defects of excessive noise of synthetic sample and fuzzy boundary of positive and negative data. In addition, due to the characteristics of high dimensionality and small sample size of water inrush accident, overfitting phenomenon is more likely to occur in deep confidence network training, resulting in the reduction of accuracy of prediction results.
(1) The application of oversampling technology to the expansion of water inrush data will still produce noise or the shortcomings of blurred boundaries of positive and negative data. Although the improved algorithm proposed in this paper solves this problem to a certain extent, it cannot completely eliminate it. In the follow-up work, new data generation algorithms such as GAN network can be introduced into the field of floor water inrush, and the data can be preprocessed before prediction.
(2) When the number of auxiliary variables and target variables is large, the model structure based on deep learning will be very complex, resulting in high training and prediction costs. Therefore, neural network compression for such problems is also the follow-up research of this topic. The deep learning network is compressed to form an effective compression method, thereby simplifying the model structure and improving the modeling and output efficiency.
Authors' contributions All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by TS, SK, TX, and ZY. The first draft of the manuscript was written by ZY, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding The funding was provided by Key Technologies Research and Development Program (Grant No. 2017YFF0205500).

Declarations
Ethical approval I testify on behalf of all co-authors that our article submitted to Natural Hazards. All authors: (1) This material has not been published in whole elsewhere; (2) the manuscript is not currently being considered for publication in another journal; and (3) all authors have been personally and actively involved in substantive work leading to the manuscript, and will hold themselves jointly and individually responsible for its content.