CNN model principle
A typical CNN model comprises a convolution layer, pooling layer, full connection layer, and softmax classification function. A CNN has a vital feature of adaptive extraction, and its mechanism of parameter sharing and inter-layer connection sparsity introduced within the hidden layer can significantly reduce the number of model parameters. Three classical CNNs are adopted in this study, and their principles are as follows.
AlexNet uses an eight-layer neural network with five convolutional layers, three fully connected layers, and a maximum pooling layer. The entire deep learning network contains 630 million links, 60 million parameters, and 650 000 neuron nodes (Zhao et al. 2021). The AlexNet structural model is shown in Figure 1.
The main difference between AlexNet architecture and traditional CNNs is the increase in network depth, which leads to an increase in the number of tunable parameters of the model and regularization techniques, such as random inactivation and data enhancement. Random deactivation techniques are applied after the first two fully connected layers in the AlexNet architecture, resulting in less overfitting and better generalization to unknown examples. Another remarkable feature of AlexNet is the use of ReLU nonlinear activation after each convolutional and fully connected layer, which significantly improves the training efficiency compared to the traditionally used hyperbolic tangent function.
VGGNet explores the relationship between the depth of a CNN and its performance, and by iteratively stacking small convolutional kernels of 3 × 3 and maximum pooling layers of 2 × 2; VGGNet successfully constructs a CNN with 16 to 19 layers of depth (Blok et al. 2021). The network structure is shown in Figure 2.
VGGNet has five convolutional segments with two to three convolutional layers in each segment, and a maximum pooling layer is connected at the end of each segment to reduce the image size. The same number of convolutional kernels are contained within each segment, with more convolutional kernels in the later segments: 64-128-256-512-512. Multiple identical 3 × 3 convolutional layers are often stacked together, which is a functional design.
The ResNet50 network structure comprises several residual modules. Assuming that is the input data and with \(F(x)\) denoting the residual mapping, the characteristic output \(H(x)\) of the network residual module is
When \(F(x)=0\), it means that the convolutional layer performs constant mapping; when \(F(x)>0\), it means that the convolutional layer learns new feature information, ensuring gradient transfer during backpropagation, which effectively solves the problem of gradient disappearance and network degradation during network training (Nijat et al. 2019). The ResNet50 network structure is illustrated in Figure 3.
The ResNet50 network structure comprises 49 convolutional layers and one fully connected layer, and the network operation process comprises six phases. The first stage contains convolution, batch regularization, activation function, and maximum pooling operations; the CONV (CONVOLUTION) BLOCK in the second to fifth stages represents the convolution residual block, the ID (IDENTITY) BLOCK represents the constant residual block, and the sixth stage contains the global average pooling, fully connected layer, and softmax classifier.
AFSA algorithm principle
Artificial fish swarm algorithm was first proposed by Li Xiaolei et al in the study of optimizing mode of animal autonomous body(Li et al. 2004), AFSA's rationale is that artificial fish as a whole can be described as \(Z=\{ {X_1},{X_2},{X_3} \cdots ,{X_i}, \cdots ,{X_M}\}\), M is the total number of AF, \({X_i}=({x_1},{x_2}, \cdots ,{x_n})\) represents the individual state of AF, \({x_i}\) is the variable to be optimized, The food concentration of the current position of the artificial fish is \(Y=f(X)\), Where is the target function value, The distance between artificial fish individuals is denoted as \(d=\left| {{X_i} - {X_j}} \right|\),\(Visual\)represents the perceived distance of the artificial fish, \(Step\) is the maximum stride length of the artificial fish, \(\delta\) is the crowding factor, 0 < δ < 1.
(1) Foraging behavior. The current state of the artificial fish is\({X_i}\). In its perception range, randomly select a state as\({X_j}\),\({X_j}={X_i}+rand( \cdot ) \times Visual\), and \(rand( \cdot )\) represents any random number between 0 and 1. Compare the food concentration function twice. When Yi < Yj, move one step in this direction; On the contrary, select a state \({X_j}\)for comparison. After repeatedly trying the maximum number of times, if the advance conditions of the artificial fish are still not met, move forward one step at random. The formula is

(2)
(2) Clustering behavior. The number \({n_0}\) of artificial fish in the current field of vision, the position \({X_c}\)of artificial fish in the cluster center, and the food concentration \({Y_c}\) of artificial fish in the center. When \(\frac{{{Y_c}}}{{{n_0}}}>\delta {Y_i}\), move one step in this direction, otherwise, conduct foraging behavior,

.(3)
(3) Rear end behavior.\({X_j}\) is the artificial fish with the smallest \({Y_j}\) in the current field of vision. When \(\frac{{{Y_j}}}{{{n_0}}}>\delta {Y_i}\), move one step in this direction, otherwise, foraging behavior will be carried out,

(4)
Firstly, the artificial fish is randomly generated in the parameter interval, the food concentration function (objective function) is calculated, and the optimal value is recorded. Secondly, the state of each artificial fish after the above three behaviors is compared with the optimal value. If it is better than the optimal value, it will be replaced. After \(gen\) (total number of iterations) iterations, the state of artificial fish is the optimal state.
The quality of SVM algorithm depends on the value of penalty factor c and kernel function parameter g(Qin et al. 2021). in this paper, the artificial fish swarm algorithm with strong optimization ability and good global convergence is used to find the optimal penalty factor c and kernel function parameter g, The Resnet50_AFSA_SVM mining subsidence basin detection model is constructed(Mrozek and Perlicki 2019; Khan et al. 2021).
Method construction
To effectively monitor illegal mining in large areas and prevent geological disasters, the CNN model, which has achieved excellent results in image detection, is applied to the detection of mining subsidence basins in InSAR interferograms. By introducing the CNN model and using the SVM model with a strong classification ability to replace the original Softmax classifier of the CNN model, in this study, we constructed a CNN_SVM automatic detection method for mining subsidence basins. The application process of this method is as follows:
(1) Constructing sample datasets: Interferograms were obtained by processing Sentinel-1A radar data using differential radar interferometry (D-InSAR), manually cropping the mined subsidence basin as a positive sample dataset, and selecting other targets as a negative sample dataset.
(2) CNN extracts feature vectors: The CNN model is used to extract the features of the mining subsidence basin and other targets, and the extracted feature vectors are input into the SVM classifier.
(3) SVM classifier: After the feature vector was introduced into the SVM classifier, the artificial fish swarm algorithm searched for the optimal penalty factor c and the kernel function parameter g, and used the SVM classifier for training and classification test to test the model accuracy.
(4) Detection of mining subsidence basin: After model is trained and tested, it starts to find mining subsidence basins found in the large-width InSAR interferogram, which uses non-maximal value suppression to remove the duplicate search box, and finally outputs the mining subsidence basin detection results. The flow of the method is shown in Figure 4.
Evaluation criteria
In this study, the precision rate P, recall rate R, and F1 value are chosen to evaluate the accuracy of the method detection, which is formulated as follows:
$$\left\{ \begin{gathered} P=\frac{{TP}}{{TP+FP}} \hfill \\ R=\frac{{TP}}{{TP+FN}} \hfill \\ F1=\frac{{2PR}}{{P+R}} \hfill \\ \end{gathered} \right.$$
5
Among them, the meaning of each indicator is shown in Table 1.
Table 1
|
Positive forecast
|
Negative forecast
|
Actual positive
|
TP
|
FN
|
Actual negative
|
FP
|
TN
|
The precision rate represents the proportion of samples classified as positive cases that are actually positive, and the recall rate represents the proportion of detected positive samples to the total number of actual positive samples; the value of F1 reflects the comprehensive identification ability of positive and negative samples, and the higher the value of F1, the more robust the method (El-Saadawy et al. 2021).