DESNN Algorithm for Communication Network Intrusion Detection

Intrusion detection is a crucial technology in the communication network security field. In this paper, a dynamic evolutionary sparse neural network (DESNN) is proposed for intrusion detection, named as DESNN algorithm. Firstly, an ensemble neural network model is constructed, which is processed by a dynamic pruning rule and further divided into advantage subnetworks and disadvantage subnetworks. The dynamic pruning rule can effectively reduce the subnetworks weight parameters, thereby increasing the speed of the subnetworks intrusion detection. Then considering the subnetworks performance loss caused by the dynamic pruning rule, a novel evolutionary mechanism is proposed to optimize the training process of the disadvantage subnetworks. The weight of the disadvantage subnetworks approach the weight of the advantage subnetworks by the evolutionary mechanism, such that the performance of the ensemble neural network can be improved. Finally, an optimal subnetwork is selected from the ensemble neural network, which is used to detect multiple types of intrusion. Experiments show that the proposed DESNN algorithm improves intrusion detection speed without causing significant performance loss compare with other fully-connected neural network models.


Introduction
Intrusion detection system (IDS) plays a crucial role in defending communication network [1,2]. Several state-of-art intrusion detection algorithms are proposed in recent years [3][4][5][6][7]. However, the rapid development and popularization of the network bring many challenges [8], such as: (1) the intrusion detection performance is unsatisfied due to the diversification of attack types; (2) The detection speed of the IDS is reduced due to the volume of data both stored and passing through networks persistently increase. Fortunately, the deep learning (DL) has the ability to learn and model complex nonlinear relationships that allow it to solve high-dimensional classification or prediction problems, the DL-based IDS is deployed as a potential solution to effectively detect the network intrusions.
It has been proved that the DL-based IDS can effectively detect intrusions recent years. A novel intrusion detection model is introduced, which joints nonsymmetric deep autoencoder (NDAE) and random forests classification algorithm (RF) [3]. This model offers high levels of accuracy, precision and recall together. An intrusion detection model is proposed [4], which is based on fully-connected network, variational autoencoder (AE) and Sequence-to-Sequence (Seq2Seq). Experimental results show that this model has better intrusion detection performance compared with other models. Invoking the convolutional neural network (CNN) based on machine learning (ML), an intrusion detection algorithm is presented [5]. This algorithm improves the accuracy and adaptive ability of the intrusion detection. Combining the Conditional Random Field (CRF) and spider monkey optimization (SMO), a new model is proposed for the intrusion detection problem [6]. In this model, the CRF is applied for selecting the contributed features initially. Then, the SMO is applied for finalizing the useful features from the reduced features dataset. Moreover, the CNN is used for classifying the dataset as normal and the attacks. A unique ensemble framework is proposed [7], which establishes an ensemble by ranking the detection ability of different base classifiers to identify various types of attacks. It is clear that the DL-based IDS performs well in classification performance and can accurately identify attack types.
Though the aforementioned methods are promising, the neural network has considerable redundant parameters, which may take more time to detect the types of intrusion. Neural network pruning can effectively improve the real-time performance of intrusion detection while maintaining detection accuracy. The existence of redundant connections in neural networks is proved [8]. Therefore, with a proper strategy, it is possible to compress neural networks without significantly losing their prediction accuracy. A data-free model compression method is introduced [9]. The square difference of neural network output is used to determine whether the neuron connection is deleted or not, such that the impact of data on pruning is avoided. A simple regularization method based on soft weight-sharing is presented [10], which includes both quantization and pruning in network retraining procedure. A network compression method is proposed [11], named as dynamic network surgery (DNS) method, which can significantly reduce network complexity by dynamic pruning. However, it is difficult to determine the appropriate weight evaluation criteria, therefore the pruning process may result in the decline of neural network performance.
In order to solve the problem that pruning may damage the detection performance of neural networks, a dynamic evolutionary sparse neural network (DESNN) is proposed. Firstly, a dynamic pruning (DP)-based ensemble neural network model is constructed to detect intrusion. Then a novel evolutionary mechanism is proposed to optimize the training process of the ensemble neural network. Finally, an optimal subnetwork is selected from the ensemble neural network, which can detect intrusion faster and better than the other models.
The outline of the paper is organized as follows. The problem model is described in Sect. 2. Section 3 introduces the proposed DESNN algorithm. Section 4 shows some experiment results, the conclusion is given in Sect. 5.

System Model
DL can be utilized in intrusion detection for both dimensionality reduction and classification tasks, which automatically learns complex features from multi-dimensional and largescale data [12][13]. Thus, deep learning models can be trained with large amounts of historical data to build an intrusion detection model. The model classifies the new traffic into either the normal or anomaly class. If a multi-class classification is used, the model can further classify the infected traffic to different classes and subclasses of attacks. Figure 1 illustrates the overall architecture of a DL-based IDS.
KDD-Cup 99 data set is used as the input data to verify the performance of the proposed algorithm. This data set is collected from a simulated United States air force over nine weeks of communication network connection data, which can be divided into labeled training data and unlabeled test data. The test data and training data have different probability distributions, and the test data contains some intrusion types that do not appear in the training data, which makes intrusion detection more realistic.
The date pre-processing including the numericalization of text features and the normalization of numerical features. The text features are converted into the numerical features, furthermore all the numerical features are quantified in the same range through normalization.
The neural network is trained to find the optimal weight of the neural network, the objective function can be expressed as follows where N denotes the number of intrusion samples. J represents the number of intrusion types. y i,j is the expected probability with the ith sample belongs to the intrusion type j. f j (x i ; ) stands for the predicted probability. indicates the neural network weight set. * means the optimal neural network weight set. Through the neural network training process, the neural network learns knowledge from the data sample, such that the neural network weight gradually converge to the optimal weight * . After the neural network training process, the performance of the neural network is tested under the unlabeled test data. If the intrusion detection performance of the neural network meets the IDS requirements, it can be used as a detection model to detect intrusion types. If not, the neural network needs to be further trained.

Algorithm Formula
Redundant connections between neurons may increase intrusion detection time. Combining a dynamic pruning rule and an evolution mechanism, a DESNN algorithm is proposed to eliminate the redundant connections issue without causing significant performance loss. Ensemble neural network based on dynamic pruning rule and evolution mechanism is shown in Fig 2. The process of eliminating redundant connections in the neural network is expressed as an optimization problem. In this section, the optimization problem, the dynamic pruning rule and the evolution mechanism are clarified.

Optimization Problem
The DESNN algorithm reduces redundant neuron connections through an unstructured pruning. Figure 3 shows neural network with the unstructured pruning. The corresponding optimization problem can be expressed as where L(⋅) stands for a loss function. W k denotes the matrix of neural network connection weights in the kth layer. ⊙ represents the Hadamard product operator.
Ensemble neural network based on dynamic pruning rule and evolution mechanism indicates the filter matrix. h k (⋅) is a pruning rule. The following describes that the problem (2) is optimized by the dynamic pruning rule.

Dynamic Pruning Rule
In this paper, a dynamic pruning rule based on weight contribution is used to find redundant connections of neurons. The neural network is pruned by the pruning rule h k (⋅) , which can be expressed as where W (a,b) k denotes the neuron connection with indicator (a, b) in the kth layer. T (a,b) k stands for the filter matrix element with indicator (a, b) in the kth layer. Set represents all the entry indices in matrix W k . a represents the input neuron indicator, b is the output neuron indicator. p k denotes the pruning threshold of the kth layer.
In the proposed pruning rule h k (⋅) , the pruning threshold p k is determined by pruning rate and the total number of W k elements n. Sort the absolute value of all elements in the weight matrix W k by the ascending order as a weight contribution vector c k . Then the pruning threshold p k is set as the ( n × )th value in the contribution vector c k . Then the weight matrix W k is processed by expression (3), the filter matrix element k has a higher contribution to the neural network; the value of T (a,b) k is 0 otherwise.
Updating the weight matrix W k and the filter matrix T k through the stochastic gradient descent (SGD) method, then the back propagation process of the proposed neural network can be expressed as where is a positive learning rate. Set represents all the entry indices in matrix W k .
Due to the filter matrix T k is introduced into the back propagation process of the neural network, the weight optimization of sparse neural network {W k ⊙ T k ∶ k ∈ } is different from the weight optimization of fully connected neural network {W k ∶ k ∈ } , set denotes all the entry indices of the neural network layer. The expression (4) updates all Neural network with unstructured pruning weight elements in the weight matrix W k , not only the important parameters, but also the unimportant parameters of W k . The neural network is processed by the dynamic pruning rule to realize the dynamic connection and inhibition of neurons, which enhances the flexibility of pruning.

Evolutionary Mechanism
Although the dynamic pruning rule can effectively reduce redundant connections, the pruning rate affects the pruning effect. Excessive pruning rate may cause important connections of the neural network to be inhibited, such that affects the intrusion detection performance of the neural network. In order to address this problem, an ensemble neural network model is constructed. Furthermore, the subnetworks in the ensemble neural network are divided into advantage and disadvantage subnetworks according to their intrusion detection accuracy. Specifically, the subnetwork with the best intrusion detection accuracy is called the advantage subnetwork, and the others are defined as the disadvantage subnetworks. In order to improve the detection performance of disadvantage subnetworks, an evolution mechanism is proposed. The proposed evolutionary mechanism is used to optimize the back propagation of the ensemble neural network, which can be described as follows where W op k denotes the weight matrices of the advantage subnetworks in the kth layer. W d k represents the weight matrices of the disadvantage subnetworks in the kth layer. The expression (5) can realize that the disadvantage subnetworks weight matrices W d k approximate the advantage subnetworks weight matrices W op k in each epoch of training process. Since the the disadvantage subnetworks weight matrices W d k is closer to the local optimal value, the performance of the disadvantage subnetworks is improved.

Experiments
In this section, several experimental results are provided to evaluate the performance of the proposed DESNN algorithm. In the following simulation experiments, the software is set as follows. Python: 3.7.6; keras=2.3.1. The hardware is set as follows. CPU IntelⓇ XeonⓇ E5-2560, GPU NVIDIA Tesla K80, RAM 64GB.

Dataset
The intrusion detection dataset KDD-Cup 99 is widely used to verify the performance of intrusion detection algorithms. Each network connection in the KDD-Cup 99 dataset is marked as normal or attack. Specifically, attacks are divided into 4 categories and 39 attack types in detail. The four categories of attack are Dos, Probing, R2L, and U2R, respectively. In addition, among the 39 attack types, 22 attack types appear in the training dataset and 17 unknown attack types appear in the test dataset. In this paper, the performance of all intrusion detection algorithms is obtained based on the same dataset KDD-Cup 99. The details of the dataset are shown in Table 1.

Model Parameters
To ensure the reproducibility of the experiment, the parameters of the DNN model set in this paper are shown in Table 2. The model is a shallow model, which not only has better intrusion detection classification performance, but also has faster intrusion detection speed. In order to ensure the performance of the model, the parameters of the model are selected based on performance in multiple experiments. Although the model processed by the dynamic pruning rule can effectively reduce neuron connections, its intrusion detection performance may decrease. In order to solve this problem, this paper proposes the DESNN algorithm. Some hyperparameters are required in the implementation of the proposed algorithm, which have a certain influence on the neural network model. The hyperparameter of the DESNN algorithm are shown in Table 3.
where iteration is 50 × 10, 50 represents the training epoch of ensemble neural network, 10 denotes the retraining epoch of the subnetwork. The retraining epoch can be understood as the pruning interval, which affects the degree of the subnetwork convergence. The weight of the subnetwork can converge a better local optimal value with the retraining

The Evaluation Indicator
This paper evaluates the network performance from two perspectives, including the classification performance and the running time. Invoking [14][15][16][17][18], the evaluation indicators of classification performance adopt accuracy, precision, recall, F-Score, which can be expressed as where A acc represents the classification accuracy of the neural network. T p and T n are the number of normal and intrusion samples correctly classified, respectively. F p and F n are the number of normal and intrusion samples mistakenly classified.
where P precision denotes the ratio of the T p to the total number of intrusion samples.
where R recall represents the ratio of T p to the predicting the correct samples.
where F score is the harmonic average of A acc and R recall . In order to evaluate the running time of intrusion detection model, the running time of the model code as an evaluation indicator. The running time in the experimental results is obtained by using a python time module. Figure 4 shows the accuracy as functions of iterations for different pruning rates. The pruning rate used by DNN1 is equal to 0.5, the pruning rate used by DNN2 is equal to 0.3. It can be observed that the accuracy of DNN may be reduced by the dynamic pruning rule, named as degradation phenomenon. Due to the selection of the unsuitable pruning threshold p k , some important connections in the neural network are deleted, which leads to the occurrence of degradation.

Experiment 2: Accuracy Comparison of the DNN, the DESNN and the DPNN
The accuracy of the DNN may be degraded due to the dynamic pruning rule, the DESNN algorithm is proposed to solve this problem. Figure 5 shows the accuracy versus pruning rate for different neural network model, the xlable represents pruning rate, the range is [0.1, 0.9] and the interval value is equal to 0.1, the ylable denotes accuracy of neural network. The DESNN stands for the DNN processed by the DESNN algorithm, the DPNN represents the DNN processed by the dynamic pruning rule.
In the Figure 5, the accuracy of the DPNN and the DNN are generated by averaging 5 independent runs. The accuracy of the DESNN is the optimal network accuracy in the ensemble neural network, which is also generated by averaging 5 independent runs. The accuracy of the DNN is 0.97633. When the pruning rate are equal to 0.2 and 0.3, the corresponding accuracy of the DESNN are 0.97426 and 0.97535, close to the DNN and higher than the DPNN that accuracy are 0.97126 and 0.96820. When the pruning rate is greater than 0.3, the accuracy rate gradually decreased with the increase of pruning rate. Experiment 2 shows that the DESNN algorithm can reduce the damage caused by dynamic pruning rule to the network.

Experiment 3: Weight Distribution Diagram of the DESNN and Fully-Connected DNN
In the experiment 3, the weight elements of the DESNN and the fully-connected DNN are extracted, which frequency distribution histogram (FDH) are drawn. Figure 6 represents the FDH of the fully-connected DNN weight elements and Fig 7  denotes the FDH of the DESNN weight elements.
It can be seen from Figs. 6 and 7 that the distribution of weight elements is normal distribution. The reason is that the weights of the neural network are initialized according to the Gaussian distribution. The training process adjust the weights to make the neural network converge gradually, but the training process does not change the weight distribution type of the neural network. Figure 8 shows the weight distribution of the two neural networks, with the xlable denotes the weight value and the ylabel represents the unit frequency. It can be seen that

Experiment 4: Performance Comparison Between the Proposed Algorithm and Other Intrusion Detection Models
This experiment tests the classification performance and the running time of various deep learning models. The model paremeters of the DESNN is shown in Table 2 and the hyperparameter of the DESNN algorithm is shown in Table 3, the pruning rate is equal to 0.3. It is worth noting that in order to allow the data to be entered into the 2D-CNN model, we use the data slicing to convert 1D data into 2D data. It can be seen from Table 4 that the accuracy of all models exceeds 90% and the model with the highest accuracy is the LSTM. The accuracy of the DESNN is higher than the DNN, indicating that the DESNN algorithm can improve the accuracy of the DNN. Because the DNN may be overfitting, and pruning is a means to solve overfitting. From the perspective of running time, the DESNN is lower than all models. Since the number of parameters of the model is directly related to the running time. In this experiment, the weight parameter of the DNN is only 10 thousand, while the LSTM is 69 thousand and the 2D-CNN is 30 thousand, the operating speed of the DNN is lower than that of the LSTM and the 2D-CNN. If the pruning rate is equal to 0.3, then the weight parameters of the DESNN is 7 thousand, so its running time is lower than the DNN.

Conclusion
In this paper, the DESNN algorithm is proposed by combining the evolutionary mechanism with the dynamic pruning rule to solve the problem that network performance degradation caused by inappropriate pruning threshold. The proposed DESNN algorithm constructs a DP-based ensemble neural network which can be divided into the advantage subnetworks and the disadvantage subnetworks according to the intrusion detection performance. Furthermore, a new evolution mechanism is proposed and used to optimize the training process of the disadvantaged network, such that the performance of the ensemble neural network can be improved. The proposed algorithm not only effectively improves the intrusion detection performance compared with the fully connected neural network, but also improves the intrusion detection speed compared with all other neural network models.

Future Scope
Our future work will focus on the following aspects: 1. To solve the problem of network performance degradation caused by inappropriate pruning threshold, the proposed DESNN algorithm constructs a DP-based ensemble neural network which can be divided into the advantage subnetworks and the disadvantage subnetworks according to the intrusion detection performance. However, this process involves the training of multiple neural network models and has a certain computational complexity. Therefore, how to select the appropriate scale of ensemble neural network is the future research work. 2. In the proposed evolutionary mechanism, the evolutionary rules, W d k = (W op k + W d k )∕2 , is used to optimize the back propagation of the ensemble neural network. In this paper, the evolution parameter is 1/2. In fact, the intrusion detection performance of the proposed algorithm will be affected by this parameter. Therefore, how to select the optimal evolutionary parameter is one of the future research work.