Experimental data and model description
Extensive experiments conducted on the coal–gangue recognition in top coal caving prove the superior performance of MBCNN. There are seven different sound classes in the coal–gangue dataset, including the noise induced by the device operation of the front conveyor, the rear conveyor, the right cutting of the shearer, the left cutting of the shearer and the transfer machine, as well as the state sound of falling coal and gangue that needs to be recognized. Table 2 gives a detailed description of state labels for noiseless dataset.
Table 2
Description of noiseless dataset conducted in coal–gangue recognition
States label | Audio event class | | States label | Audio event class |
0 | Falling coal | 4 | Transfer machine |
1 | Falling gangue | 5 | Right cutting of Shearer |
2 | Rear conveyor | 6 | Left cutting of Shearer |
3 | Front conveyor | | |
In this study, feature smoothing method of MFCC is adopted in MBCNN model. In order to illustrate the impact of this method on coal–gangue recognition performance, single–branch and double–branch CNN models are introduced, and their main structural parameters are described as follows:
Model 1: Single–branch CNN model is used with Conv = 256, Kernel size = 6, Dense (10) + BN.
Model 2: There is a double–branch CNN model with Conv = 256, Kernel size = 5 + 6, and Dense (10) + BN.
Model 3: There are three branches of CNN proposed in this paper, Conv = 256, Kernel size = 4 + 5 + 6, and Dense (64) + Dense (32) + BN.
The performance of coal–gangue recognition of the above three models in the process of top coal caving under different conditions will be illustrated in the following section. In addition, the initial learning rate of Adam optimizer was 0.0001, the number of epochs was 200, and the batch size was 64.
Experiment with noiseless dataset
In this section, we discuss the effectiveness of the MBCNN model for noiseless dataset on the coal–gangue recognition. Firstly, Figs. 4 and 5 respectively show the accuracy and loss curves of the training set and the testing set for different CNN models. For Model 1, the training accuracy and the testing accuracy reach the stable values after 69 epochs, and the testing accuracy is 98.50%. For Model 2, the training accuracy and the testing accuracy reach the stable values after 90 epochs, and the testing accuracy is 98.74%. For Model 3, the training accuracy and the testing accuracy reach the stable values after 68 epochs, and the testing accuracy is 99.28%. So, Model 3 converges to the stable value faster than the other two models, which means that training the Model 3 requires less time in practice. It is obvious from Fig. 5 that the loss function of Model 3 is smaller than that of the other two models, and the curve oscillation is the smallest before convergence, indicating that the robustness of Model 3 is the best on the noiseless dataset.
Then, the mean value of F1–score for each class obtained through five–fold cross validation is shown in Fig. 6, where the error bar represents the standard deviation of the stability of recognition performance. By comparing the recognition effects of the three models on 7 kinds of sound signals, it is found that Model 3 can fully recognize the labels 2–6, while the other two models cannot completely identify a certain class state, and the recognition result of Model 3 is superior to the other two models in the recognition state of labels 0–1. In addition, in the case of labels 2–6, the standard deviations of Model 3 are all zero, indicating that the MBCNN model proposed in this paper has a more reliable and stable performance. The results show that the proposed MBCNN can learn the linkage features between different classes from the original sound signals of different branches, so as to obtain higher quality recognition performance.
In a word, the proposed MBCNN model of three branches makes full use of the MFCC feature distribution of falling coal or gangue sound pressure on different time–frequency scales, and has higher sound recognition accuracy than the traditional single branch CNN and double-branch CNN on most class labels.
Site simulation experiment
In order to further verify the generalization ability of MBCNN in coal–gangue recognition during the process of top coal caving, according to Fig. 3, the coal and gangue caving as well as various noises are simultaneously mixed to simulate the site process of top coal caving, forming a simulated site dataset, which is called dataset 1. In order to obtain more abundant features that can represent different sound classes, the datasets with different branches of MBCNN are smoothed, that is, the two adjacent frames in dataset 1 are averaged to form a new dataset 2, and the three adjacent frames in dataset 1 are averaged to form a new dataset 3. For Model 1, the original dataset 1 is input into the convolutional neural network; For Model 2, the original dataset 1 is input into the branch where Kernel size = 6, and the smoothed dataset 2 is input into the branch where kernel size = 5; For Model 3, the first two branches are consistent with Model 2, and the smoothed dataset 3 is input on the kernel size = 4 branch. The simulated site dataset formed in this section has two classes of labels, label 0 represents the sound of falling gangue, and label 1 represents the sound of falling coal. The difference from the previous research5,9−11 is that these two classes signals are collected when the rear conveyor, the front conveyor, the transfer machine, the shearer right cutting and the shearer left cutting are running at the same time, namely noise induced by the operation of site device is considered. The curves of accuracy and loss function are shown in Figs. 7–8, and the corresponding confusion matrixes are shown in Fig. 9.
As can be seen from Fig. 7, the testing accuracy of Model 1 reaches stable convergence after 96 epochs, and its convergence accuracy is 87.72%; the testing accuracy of Model 2 reaches stable convergence after 110 epochs, and its convergence accuracy is 90.35%; however, the testing accuracy of Model 3 reaches stable convergence only after 63 epochs, and its convergence accuracy is 92.98%. The loss function in Fig. 8 shows that Model 3 has the fastest convergence speed and the least oscillation amplitude change. Figure 9 depicts that the correct recognition rate of Model 1 and Model 2 in recognizing the falling coal is 89.9%, while the correct recognition rate of Model 3 in this class is 90.4%, and the correct recognition rate of Model 3 is the highest in the falling gangue.
On the simulation site dataset, the convergence accuracy of Model 3 is improved by 6.0% compared with Model 1, and 2.9% compared with Model 2, and Model 3 converges to the stable value faster than Model 1 and Model 2. At the same time, under the state of gangue caving, the recognition accuracy rate of Model 3 is 1.4% higher than that of Model 1, and 5.1% higher than that of Model 2. The results show that Model 3 has a better recognition accuracy ratio and requires less time in the practice of coal–gangue recognition. This is mainly because the feature smoothing processing method of three branches can effectively capture useful state feature information, so that Model 3 has better recognition performance in noisy environment.
Comparison with traditional classification algorithm
In order to prove the advantages of the proposed MFCC–MBCNN, we compared several traditional feature extraction and classification algorithms, including Hilbert–Huang Transform (HHT) combined with bimodal deep neural networks (DNN)10, wavelet packet transform(WPT) combined with fuzzy neural network(FNN)11, MFCC and wavelet transform(WT) combined with K–nearest neighbors (KNN) classifier15, MFCC combined with self–attention Convolution neural networks(SACNN) and Logistic Regression (LG) classification algorithm18.
This section focuses on two kinds of experiments, one is target recognition on noiseless dataset, and the other is target recognition on simulated site dataset. Table 3 and Table 4 respectively list the F1– score comparison between other sound recognition methods and our method in these two kinds of experiments, and the comparison results adopt the five–fold cross validation method.
Table 3
F1–score (%) comparison with different methods on noiseless dataset
State Label | Methods |
HHT–DNN10 | WPT–FNN11 | (MFCC + WT)–KNN15 | MFCC– SACNN–LG18 | Proposed MFCC–MBCNN |
0 | 96.41 ± 3.29 | 82.07 ± 2.37 | 95.72 ± 2.94 | 93.62 ± 3.14 | 94.48 ± 2.48 |
1 | 95.05 ± 2.13 | 81.94 ± 3.07 | 94.73 ± 2.86 | 93.9 ± 2.93 | 94.61 ± 2.37 |
2 | 93.88 ± 1.23 | 80.92 ± 2.73 | 94.07 ± 1.56 | 97.92 ± 1.62 | 100 ± 0.00 |
3 | 93.27 ± 1.41 | 79.63 ± 2.38 | 95.84 ± 2.04 | 98.17 ± 1.27 | 100 ± 0.00 |
4 | 94.03 ± 2.62 | 81.53 ± 3.57 | 96.17 ± 2.99 | 100.0 ± 0.00 | 100 ± 0.00 |
5 | 93.47 ± 2.08 | 80.26 ± 1.86 | 95.94 ± 2.13 | 98.11 ± 1.07 | 100 ± 0.00 |
6 | 94.83 ± 1.95 | 81.25 ± 2.14 | 94.19 ± 1.82 | 100 ± 0.00 | 100 ± 0.00 |
Average | 94.42 ± 2.10 | 81.09 ± 2.59 | 95.24 ± 2.33 | 97.39 ± 1.43 | 98.44 ± 0.69 |
Table 4
F1–score (%) comparison with different methods on simulated site dataset
States | Methods |
HHT– DNN10 | WPT–FNN11 | (MFCC + WT)–KNN15 | MFCC– SACNN–LG18 | Proposed MFCC–MBCNN |
Coal | 82.69 ± 3.71 | 75.19 ± 4.38 | 81.26 ± 2.84 | 89.05 ± 2.47 | 91.37 ± 1.04 |
Gangue | 80.83 ± 4.25 | 77.24 ± 4.71 | 81.53 ± 3.17 | 88.11 ± 2.73 | 91.49 ± 1.26 |
Average | 81.76 ± 3.98 | 76.22 ± 4.55 | 81.40 ± 3.01 | 88.58 ± 2.60 | 91.43 ± 1.15 |
Reference 10 transferred knowledge from relevant data by transfer learning, which improves the weakness of limited labeled samples of deep neural networks. In reference 11, fuzzy logic reasoning was applied to the neural network to form a fuzzy neural network, which improves the self–learning ability of neural network. In reference 15, genetic algorithm (GA) was used to select sound signal features, and traditional K–nearest Neighbors (KNN) classifier was used to realize sound recognition. Reference 18 introduced convolutional neural network as a self–attentional filtering model, which consists of two blocks. The first block is a reduction module, and the second block is the stack of remaining modules. According to Tables 3 and 4, it is not difficult to find that the traditional target recognition methods10, 11, 15 has poor F1–score performance on both noiseless dataset and simulated field dataset. The main reason is that the traditional methods can only extract or select a small number of features at different time scales or frequency bands, especially for a single feature extraction method, different classes may be similar, which reduces the classification accuracy.
The method proposed in Reference 18 improves the shortcomings of traditional target recognition methods and can obtain better recognition accuracy on noiseless dataset. The proposed method in this paper can obtain 98.44% average F1–score on the noiseless dataset, and can obtain 100% F1–score for the following five cases, which is obviously better than other methods on simulated field dataset and has more stable performance. This is mainly because MBCNN combined with MFCC smoothing can learn useful state classification features from different frequency scales, which not only retains the information from the original features, but also smooths the noise. It can be seen that the MBCNN based on MFCC smoothing proposed in this paper provides a method for target recognition in noisy environment.