First, learning and extracting key features from various attack types is necessary. This paper uses multiple autoencoders to extract features to improve extraction efficiency and classification accuracy. Moreover, multiple self-encoders can mine the potential distribution of different attack types based on the characteristics of each attack type to obtain more representative features. The structure of our feature extraction process is shown in Fig. 1.

Additionally, the BiGRU module has been included in the FM model. Learning the time series relationship between the previous moment and the next moment and the current state, mining the potential representation rules between network threat traffic, and successfully boosting the network's ability to learn are all things that BiGRU is capable of doing. On the other hand, inefficiency and sluggishness would emerge whenever BiGRU learns sequence data that is excessively lengthy. The solution to this problem can be found in attention models. To begin, it can concentrate on data at various positions in the sequence, reducing the total length of the input data. Second, because the feature information in the threat data at various times contributes differently to the classification and detection of the current attack type, the attention mechanism can be used to assign weights to the features that affect the detection results. This enables the model to learn the potential features more effectively, improving the model's ability to detect attacks. The following are the components that make up the FM model:

First, the extracted feature data \({x_{i,j}}\) is input into the network to get the output \({y_i}\)

$${y_{i,j}}=o\left( {{x_{i,j}}} \right)$$

1

Further, assign weights to features via the attention layer

$${h_{i,j}}=\tanh \left( {{\mathbf{W}}{y_{i,j}}+b} \right)$$

2

$${w_{i,j}}={\text{softmax}}\left( {{h_{i,j}},{\mathbf{w^{\prime}}}} \right)$$

3

where \({h_{i,j}}\) is the state of the hidden layer, \({\mathbf{W}}\) is the weight matrix, is the bias term, \({\mathbf{w^{\prime}}}\) is the initial weight matrix of the attention layer. Then we have

$${w_i}=\sum\limits_{j} {{w_{i,j}}} {h_{i,j}}$$

4

Input the calculated local weights into BiGRU to get the global distribution weights

$${y_i}=o\left( {{w_i}} \right)$$

5

$${h_i}=\tanh \left( {{\mathbf{W^{\prime}}}{y_i}+b} \right)$$

6

$${w_i}={\text{softmax}}\left( {{h_i},{\mathbf{w^{\prime\prime}}}} \right)$$

7

where \({\mathbf{W^{\prime}}}\) and \({\mathbf{w^{\prime\prime}}}\) are the weight matrix and initial weight matrix in the attention layer, respectively, we then can get

$$w=\sum\limits_{i} {{w_i}} {h_i}$$

8

Input the global feature weights into the softmax layer. Then we can get the final prediction results. The overall framework of the FM method is shown in Fig. 2.

Further, this paper quantifies various indicators in NSSA. First, quantify the severity of the attack. For attack detection, thousands of data sets are chosen at random from the data collection, and then those random selections are entered into the trained threat detection model. Suppose the number of detected attacks is \({N_i}\)the actual number of occurrences of each attack type \({N^{\prime}_i}\)then, the error probability is

$${p_{i,j}}=\frac{{{{N^{\prime}}_j}}}{{{{N^{\prime}}_i}}}$$

9

Further, different types of attacks are classified into three categories according to the attack severity level, and the calculation methods of the attack severity operator \(a{o_i}\) are as follows

when attack level \(1 \leqslant {l_i} \leqslant 0.5n\)

$$a{o_i}=\frac{{3+\sqrt { - 2\ln 2{l_i}+2\ln n} }}{6}$$

10

when attack level \({l_i}=0.5n\)

when attack level \(0.5n \leqslant {l_i} \leqslant n\)

$$a{o_i}=\frac{{3 - \sqrt { - 2\ln 2{l_i}+2\ln n} }}{6}$$

12

The higher the attack severity level, the larger the attack severity operator, and the more serious the threat caused by the attack. Further, according to works of literature 7 and 11, this paper formulates NSSA levels, as shown in Table 1.

Table 1

NSSA Level | Value Range |

Safety | [0, 0.3] |

Low risk | [0.3, 0.6] |

Medium risk | [0.6, 0.9] |

High risk | [0.9, 1.2] |

Danger | [1.2, +∞] |