Risk Assessment of Coal Mine Water Inrush Based On PCA-DBN


 To provide an effective risk assessment of water inrush for coal mine safety production, a BP neural network prediction method for water inrush based on principal component analysis and deep confidence network optimization was proposed. Because deep belief network (DBN) is disadvantaged by a long training time when establishing a high-dimensional data classification model, the principal component analysis (PCA) method is used to reduce the dimensionality of many factors affecting the water inrush of the coal seam floor, thus reducing the number of variables of the research object, redundancy and the difficulty of feature extraction and shortening the training time of the model. Then, a DBN network was used to extract secondary features from the processed nonlinear data, and a more abstract high-level representation was formed by combining low-level features to find the expression of the nonlinear relationship between the characteristics of water inbursts. Finally, a prediction model was established to predict the water inrush in coal mines. The superiority of this method was verified by comparing the prediction of the actual working face with the actual situation in typical mining areas of North China.


Introduction
Deep mining of the Carboniferous Taiyuan Formation system is carried out in most North China-type coal mines.
Because of its proximity to Ordovician limestone, water inrush accidents often occur. Therefore, the prediction of coal mine water inrush is a necessary part of coal mine safety production. The research and development of coal mine water inrush prediction are based on research on the mechanism of coal seam water inrush. Through this research on the mechanism of water inrush, many scholars in China and abroad have determined an index system for water inrush and defined the related influencing factors of water inrush. Regression analysis (Shi and Han.2004;DU Chunlei, et al.2014;Liu Weitao et al. 2015;Liu Zaibin et al.2009 L et al.2017;Cao Qingkui and Zhao Fei.2011;Yan Zhigang et al.2008), neural networks (Qiao Yufeng. 2011), extreme learning machine (Zhao Zuopeng and Hu Mengke.2014;Zhao Z et al.2013) and other data analysis algorithms are applied to the prediction of coal mine water inrush, and the probability of water inrush accidents is analysed and evaluated, which provides data support for coal mine safety production.

Analysis of the influencing factors of water inrush
The occurrence of water inrush accidents in coal mines is the result of the joint action of many influencing factors.
The interactions between the influencing factors that contribute to the water inrush accidents form a nonlinear system, which cannot be accurately expressed by the classic mathematical model. In China, the study of water inrush law began in the 1960s. The water inrush coefficient method was proposed by the Ministry of Coal, and the empirical formula of the water inrush coefficient was established (Shi Longqing 2012). Professor Jing Zigang of Shandong University of Science and Technology proposed the theory of "the next three belts" (Huang Hao 2015). Dr. Liu Tianquan and Zhang Jincai of the General Institute of Coal Mine proposed the "two-zone" model, which considered that floor rock masses were composed of mining-induced water-conducting fissure zones and floor water-isolating zones (Zhang Jincai and Liu Tianquan 1990). In the 21st century, Professor Shi Longqing of Shandong University of Science and Technology put forward the theory of the "Lower Four Belts" based on the theory of "Lower Three.
Belts" (Shi Longqing and Han Jin 2005). The Institute of Geology, Chinese Academy of Sciences, put forward the theory of a "strong seepage channel" in the 1990s, which believes that the presence of a water inrush channel is the key to the occurrence of water inrush (Duan Hongfei 2012). Qian Minggao, an academician at the China University of Mining and Technology, proposed the KS theory of key strata of stope floor rock according to the layered structure characteristics of floor rock (Qian Minggao et al 1995). It can be inferred from the above theory that the water inburst problem of the coal seam floor is mainly caused by the hydrogeological environment as the basic background and is caused by the joint actions of many factors, such as the condition of the waterproof layer, the structural condition, the condition of the aquifer and the mining method used in the coal mine. Based on the summary of previous studies, it is concluded that the main factors affecting the water inrush from the coal floor are five first-level indexes, namely, aquifer condition, water-barrier condition, coal seam condition, structure condition and mining condition, and twelve second-level indexes. TheInfluencing factors of coal seam water inrush are shown in table 1.

Mining height
The theory of methods

PCA
PCA is a dimension reduction algorithm. The principle is that the use of multiple indicators through linear transformation converts the comprehensive indicators of several unrelated indicators to each other, and according to certain rules to classify the integration of the comprehensive index, never reduces the dimension of the original data, extracts the main information in the original data, and minimizes the information loss in the process of the dimension reduction algorithm.
There is information overlap among the variables influencing the occurrence of water inrush, which will increase the cost and time of the classification prediction algorithm and reduce the success rate of its prediction. PCA is used to carry out dimensionality reduction processing on the original feature data, eliminate redundant information within the acceptable loss range, save the key evaluation index factors, and realize the dimensionality reduction of the evaluation index (Li Pei. 2014).

DBN is a probabilistic generation model that is composed of a stack of several Restricted Boltzmann machine (RBM)
and a classification or regression layer at the top. Through forward learning combined with the reverse fine-tuning mechanism of gradient descent, more accurate model training accuracy can be achieved. . . .

Hidden layer
Visual layer Based on the energy function, the following probability distribution under the condition Θ= (wn×m, a, b) can be obtained: Z is the normalized coefficient.
The activation probabilities of h and v are obtained after the activation function sigmoid: The core formula of the RBM algorithm is the activation formula of h and v. Data are input from the visual layer, and the characteristic index is mapped from the visual layer to the neurons of the hidden layer through Equation (5).
Then, the output value obtained is reconstructed to the visual layer v through Equation (6), and the error between the reconstructed data in the original data domain is calculated. The weight parameters between the visible and hidden layers are adjusted by the error minimization rule so that the reconstructed data can represent the original input data to the maximum and achieve the goal of feature extraction. In fact, the goal of the training process of the RBM algorithm is to solve the Markov maximum likelihood estimation problem; that is, under the condition of fixed data input, the PΘ(v) value is maximized by adjusting the internal parameters of the RBM.

DBN network structure
A DBN is composed of multiple stacked RBMs, which construct a typical DBN network model. Compared with the shallow neural network, this kind of stacked DBN structure has a deeper network level and better model generalization ability. Traditional neural networks rely on the selection of data features, while DBN can extract hidden features from input data by setting multiple hidden layers (Lecun Y et al 2015).

PCA-DBN prediction model
The data of water inrush accidents in coal mines are nonlinear and high-dimensional, and there are complex interrelations among the related factors of water inrush accidents. Most of the current prediction and evaluation methods cannot effectively extract a large number of hidden features from the data, resulting in the establishment of a water inrush accident model that is more one-sided, which affects the prediction accuracy and is unable to provide effective support for coal mine safety. Therefore, the design idea of the model in this paper mainly is: converting the high-dimensional influencing factors into low-dimensional data that is easy to train and more complete extraction of the characteristic quantity in the data (Wu Kai et al 2020).

PCA data dimension reduction
The PCA algorithm is used to reduce the dimension of the main control factors of water inrush in coal mines and to standardize the data of the actual coal mine measurements. Linear transformation of data is carried out by adopting a deviation standardized formula, and the data are mapped to the interval of [0,1] to smooth the data optimization process. According to the normalized data matrix, a covariance matrix is obtained, and the eigenvalue, principal component contribution rate (VCP) and cumulative variance contribution rate (CVCP) of the original data are calculated. Using SPSS software and on the basis of the testing data on the corresponding PCA, the main component of the selection criteria is that the cumulative variance contribution rate must be more than 80% because the value of the first to sixth principal component CVCP is approximately 83%. These six components already contain most of the information needed for water inrush prediction, so the first six components are used for floor water inrush evaluation.TheCP and CVCP of principal components are shown in table2.

PCA-DBN model training
The data after PCA dimension reduction are input into the DBN for pretraining. The pretraining initializes the weight matrix between each layer and traverses the input vector and the neuron nodes of the hidden layer. After the training of the first RBM, the output neuron parameters are taken as the input vector of the second RBM and are then transmitted layer by layer to the highest level. According to the results of the pretrained output layer and the error between the output layer and the expected output, the parameters of each layer are updated by backward propagation from each output layer to the hidden layer (Chen Kai et al. 2017).
The advantage of the model is that the output data features are abstracted by the DBN, and the neural network is used as the top unit of the DBN to predict water inrush after new features are extracted. The prediction process of the model is as follows: A. Establish a 4-layer RBM network. The number of nodes in the input layer is determined by the data dimension, and the number of nodes in the hidden layer is obtained by the "trial and error" method.
B. Input the training data without labels after dimension reduction by PCA into the DBN network, pretrain the RBM parameters layer by layer, and locally optimize the RBM parameters.
C. Input the training data with labels, back-propagate the errors layer by layer, and update the DBN network weights using the gradient descent method until convergence.
D. Input all the data into the network for feature learning and extract the output reconstructed feature data.
E. The data are divided into test and training data. The training data with labels are input into BP for training, and the trained network is used to predict the test set, compare the predicted water inburst situation with the actual situation, and evaluate the predicted results.
The algorithm flow chart is shown in Fig 3.

Algorithm verification
The proposed method is used to test the water inrush data of typical mining areas in North China, and the PCA-DBN and BP algorithms are used to predict and compare the water inrush situations. The number of samples used in these three modelling methods is the same, with 100 groups, among which 80 groups are used as training sets and 20 groups are used as test sets. After processing the original data by PCA, the dimension of the eigenvalues is reduced to 6. The data from Table 3 are input into the PCA-DBN model, and the results are shown in Figure 4 and  can provide a more accurate water inrush risk assessment to better ensure coal mine safety.

Conclusion
There are many risk factors affecting coal floor water inbursts, and some data are redundant. Principal component analysis reduces the data dimension without damaging the integrity of the data and saves the cost of the training algorithm. By training relative to the original features of PCA and BP, the PCA-DBN model is more effective for extracting the characteristics of water inrush that influence the original data, improving the training accuracy and generalizing the performance of the model. As a result, the PCA-DBN model can eliminate the defects of traditional algorithms for feature selection, extract implicit characteristics in complex hydrogeological information, and effectively filter the missing and noise data to establish a more reliable evaluation model for water inrush accidents.
The case analysis shows that the predicted value of the model is consistent with the actual situation of water inrush in coal mines, and the following conclusions are drawn: (1) The multidimensional redundant input data will complicate the structure of the DBN. PCA is used to reduce the dimensionality of the data, extract the nonlinear features of the high-dimensional data, and input them into the deep confidence network, which can simplify the network structure and improve the accuracy of the model.
(2) Compared with the traditional BP network, the PCA-BP network model and the water inburst coefficient method, the PCA-DBN model proposed in this paper has the highest prediction accuracy. In subsequent research, the network model can be optimized from the structure of the DBN network itself, and other algorithms can be integrated to further improve the model's accuracy.