The Python framework was preferred to apply the proposed method to detect UDP flood anomalies in the N-BaIoT dataset. The experiments were performed on a 64 bit Windows Operating System with 8 GB RAM memory and Intel I7 CPU 2730 processor.
Evaluation metrics are calculated to demonstrate the success of the approach that was proposed in this study. The proposed method includes MRMR and Fine tune SSAE methodologies trained with N-BaIoT dataset data to detect UDP-flood activity in network traffic. True positive (TP), True negative (TN), False negative (FN), and False positive (FP) are used for the verification parameters [31]. The Accuracy, Precision, sensitivity, specificity, f1 score, and Cohen kappa metrics are calculated by using TP, TN, FN and FP. The accurate classification rate of feature vectors in the test set is determined by Accuracy metric. Eq. 12 is used for calculating the accuracy.
$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
12
The precision metric shows how many of the values decided by the model proposed as the UDP flood are actually true-positive. It is calculated by Eq. 13.
$$Precision =\frac{TP}{TP+FP}$$
13
The sensitivity parameter indicates the rate at which the UDP flood attack can be detected. It is calculated by Eq. 14.
$$Sensitivity =\frac{TP}{TP+FN}$$
14
The specificity metric specifies the rate of benign detection to non-attack data when detecting an attack. It is calculated by Eq. 15.
$$Specificity =\frac{TN}{TN+FP}$$
15
The f1 score metric is used for classification success in unbalanced datasets. The closer the value obtained by a calculation based on the harmonic mean of precision and sensitivity metrics to one, the more successful the model allows us to obtain the result. It is calculated by Eq. 16.
$${f}_{1} score=\frac{2.Prc.Sn}{Prc+Sn}$$
16
Cohen's Kappa (К) coefficient is a coefficient used in unbalanced classification problems and expresses the efficiency of the classification. The success of the classifier depends on the unbalanced dataset. The unbalanced dataset causes robustness and high stability problems. The value of the K coefficient measures the unbalanced dataset classification success. This coefficient is calculated by Eq. 17.
$$Kappa=\frac{Acc-Exp}{1-Exp}$$
17
It expresses the relationship between accuracy (Acc) and expected classification accuracy (\(Exp=\frac{A+B}{TP+TN+FP+FN}\)) in Eq. 4. While A in the Exp expression is obtained by (\(\frac{\left(TP+FN\right)\left(TP+FP\right)}{TP+TN+FP+FN}\)), The B parameter is calculated by using the expression (\(\frac{\left(FP+TN\right)\left(FN+TN\right)}{TP+TN+FP+FN}\)). К values are obtained between 0 and 1. The approach of this value to zero indicates that the classification is unsuccessful, while its approach to one proves that the classification is successful [32].
In the proposed method, an experiment was conducted using samples from the N-BaIoT dataset to detect UDP-flood attacks. 70% of the 198465 data in the dataset was allocated for training and 30% for verification. In the training, the two AEs and softmax were trained separately and combined. Classification success was achieved when the data allocated for testing the resulting Stacked AE was applied. However, when the created stacked AE was retrained as a single piece, it was fine-tuned. This increased the success positively. The Confusion Matrix obtained from the binary classification results are shown in Table 2.
Table 2
Confusion Matrix of experiments
| MRMR- FTSSAE with normalization (Optimum Accuracy) | MRMR- FTSSAE with 15 features (Optimum Accuracy) | MRMR- SSAE with 15 features | MRMR-AE with 15 features | FTSSAE with 15 features (Optimum Accuracy) | SVM | Active Model with NSL-KDD Test set |
Benign | UDP Flood | Benign | UDP Flood | Benign | UDP Flood | Benign | UDP Flood | Benign | UDP Flood | Benign | UDP Flood | Benign | UDP Flood |
Benign | 11115 | 0 | 11115 | 0 | 11113 | 2 | 11100 | 15 | 11110 | 5 | 11091 | 24 | 10781 | 273 |
UDP Flood | 3 | 40655 | 976 | 39682 | 1120 | 39538 | 1575 | 39083 | 2259 | 38399 | 12202 | 28456 | 12 | 1971 |
The data of 11115 benign and 40658 UDP-flood traffic are included in the 51773 feature vector in the dataset and reserved for testing. These data were used to verify the method proposed in the study with the hold-out validation method. The remaining 11115 data are benign traffic measurements. The proposed method was able to detect 97.6% of problematic traffic by detecting 39682 of 40658 data of UDP flood traffic measurements. False-positive was calculated as 976. Benign traffic was detected at a rate of 100%. Total success was achieved as 98.11%. In the model without fine tuning, the success was 97.83%. The success of the system increased a little when the model, which was obtained as FP amount of 1120, was tuned to fine. Other verification parameters are presented in Table 3.
UDP measurements in the N-BaIoT dataset used in the study were performed with only 60 features instead of 115. When these 60 features were applied to the MRMR algorithm, 57 features were selected. An FPR as high as 8% was achieved in the deep SSAE that was trained and fine-tuned with the selected features. In experiments with Batch normalization added to the SSAE input before fine-tuning to reduce FPR, FPR was obtained as a value very close to zero. The z-score was calculated by using the mean and standard deviation of the selected features in the batch normalization process. These calculated values are considered as distances. Standard deviation was used in this evaluation. This method has the feature of preserving the shape properties of the original dataset. For this reason, the success of flooding detection was almost 100% in the experiments performed by evaluating the training data in one batch and the test data in the other batch with batch normalization.
Table 3
Performance of the experiments of UDP Flood
Performance Metrics | MRMR- FTSSAE with normalization (Optimum Accuracy) | MRMR- FTSSAE with 15 features (Optimum Accuracy) | MRMR- SSAE with 15 features | MRMR-AE with 15 features | FTSSAE with 15 features (Optimum Accuracy) | SVM | Active model with NSL-KDD Test set |
Sensitivity | 100 | 100 | 99.99 | 99.96 | 99.99 | 99.92 | 99.89 |
Precision | 99.99 | 97.60 | 97.25 | 96.19 | 94.62 | 69.99 | 97.53 |
f1 Score | 100 | 98.79 | 98.60 | 98.04 | 97.23 | 82.32 | 98.70 |
Specificity | 99.97 | 91.93 | 90.79 | 86.89 | 81.21 | 47.62 | 87.83 |
Kappa | 100 | 95.31 | 93.8 | 91 | 86.9 | 76.41 | 92 |
A significant increase in kappa value was obtained in experiments with the unbalanced dataset. The achievement of the experiment in which only AE was used by selecting effective properties with the MRMR algorithm is calculated as 96.9%. Although the SSAE algorithm was fine-tuned without the MRMR algorithm, the accuracy remained at 95.62%. Therefore, the MRMR algorithm has become a very effective solution to the accuracy of the system. Although the sensitivity, precision and F1-score were close to each other in 4 different experiments, Specificity and kappa values were obtained more successfully in the proposed model compared to other experimental models due to the uneven data distribution. As shown in Table 3, performance metrics of the proposed method were obtained as sensitivity 100%, precision 99.99%, specificity 99.97%, f1 score 100% and kappa 100%, respectively. Receiver operating characteristic of the proposed method is gathered as shown in Fig. 4. The area under the curve of proposed method is calculated as 99.99%.
Experiments were also performed with ACK flood, scan flood, sys flood and UPD-plain flood. In the experiments conducted under the same conditions as the UDP flood, each flood traffic was achieved with very high success. Experiments with each flooding situation within the Mirai attack set yielded almost 100% accuracy with the proposed method. In the ACK flood attacks, only 1 false positive was seen in the 23121 test set. In the experiment with scan flood attack data, 12141 were identified in 12149 attack cases. Only 4 false positives were detected. All 32987 attacks were detected in the sys flood attack data. Although no FP was seen, only 3 false negatives were seen. In udpplain flood attacks, 20447 attacks out of 20449 were determined, and the number of FPs was obtained as only 2. After that, the model trained with the N-BaIoT dataset was validated with the NSL test dataset, which was recorded in real time and widely used by different researchers. Of the network traffic records of the 1983 UDP attack in the dataset, 1971 were detected with the proposed method. Only 12 UDP flood data were placed in the benign class. In addition, 10781 of the 11054 benign traffic data were placed in the benign class. Only 273 of the benign-labeled data were falsely detected as flooding. Especially in the dataset with an unbalanced data distribution, the detection of flooding attacks was performed with high accuracy (97.81%). As shown in Table 3, performance metrics of the proposed method with NSL-KDD test set were obtained as sensitivity 99.89%, precision 97.53%, specificity 87.83%, f1 score 98.70% and kappa 92%, respectively.
When evaluated as the working time, the classification performed with SVM took 0.33 seconds with the data reduced by MRMR. In the proposed method, all 51773 data were evaluated in 0.2 seconds. While 57 features applied to SVM as a single data were classified in 0.0521 seconds, this time was measured as 0.0133 in the proposed method. This result has shown that the proposed method works better than the SVM method.
In similar studies in the literature, problematic traffic detection studies related to UDP in the N-BaIoT dataset were performed and presented in Table 4. Aminanto et al. in their study, verified the traffic-based attack recognition in WiFi networks with the AWID dataset. The authors stated that the deeper AE architecture was effective in determining attack. They compared the proposed deep AE model with SVM, Decision Tree (DT) and ANN. The most successful model was presented as the use of AE in feature selection and SVM classification with these selected features. However, training of SVM with too much data is the disadvantage of the study [33]. In their study, Aldweesh et al. conducted research on models that detect anomalies in network traffic. The authors examined recurrent neural networks, convolutional neural networks, Boltzmann machine deep learning and autoencoder approaches. With these approaches, they evaluated their effectiveness in solving the problem of classifying the abnormal traffic in the network. In this evaluation, they stated that the security in the scada and IOT platform was determined by shallow neural network and machine learning algorithms. The authors stated that deep learning algorithms can be adapted to this field and that this algorithm should be tested on a dataset only in the field of IOT. [34]. Ferrag et al. examined the effectiveness of different deep learning models in detecting anomalies in network traffic. Training and test data generated from Bot-IoT and CSE-CIC-IDS2018 datasets were divided into 80% and 20%, respectively. DBN, RNN and CNN models were verified with these data. UDP flood attacks were successfully detected with DBN, RNN and CNN models on average 96.66%, 96.85%, 97.34%, respectively [35]. In the N-BaIoT dataset, 96.118% success was achieved with DBN, 96.666% with RNN and 97.006% with CNN. In addition, the authors obtained the following results with a 20% test result in the experiments they performed with RBM, DBN, DBM and deep autoencoder (DA) models to detect UDP flood attacks in the N-BaIoT dataset. They achieved an accuracy of 96.522% with RBM, 96.623% with DBN, 96.111% with DBM and 97.991% with DA algorithm. Alharbi et al. analyzed the Mirai attacks in the N-BaIoT dataset in their study. The authors stated that the classification success increased by optimizing the features with PSO and Local-Global best Bat Algorithm (LGBA). They achieved a significant increase in success as a result of optimization with the Neural network architecture used as a classifier. UDP flood attacks in the PSO optimized N-BaIoT dataset were successful, with 0.997233 Precision, 0.923866 Recall and 0.959148 F1-Score, respectively. In the neural network architecture trained by optimization with LGBA, success was achieved as 0.9982 Precision, 0.9987 Recall and 0.9985 F1-Score, respectively. Palla and Tayeb detected abnormal traffic on different IOT devices in the N-BaIoT dataset. When the authors ran the ANN and RF algorithms in their study, they were able to detect anomalous traffic on the security camera with an accuracy of only 83.9% with ANN and 75.6% with RF [18].
Table 4
Comparison of the metrics between proposed method and state of art studies
Study | Description | Metric (%) |
Aminanto et al. [33] | Autoencoder with SVM | 99.91 Acc 0.012 FPR |
Ferrag et al. [35] | DBN RNN CNN DA | 96.118 Acc 96.666 Acc 97.006 Acc 97.991 Acc |
Shafiq et al. [16] | Wrapper-based feature selection algorithm | 95 Acc |
Alharbi et al. [17] | PSO-NN | 99.72 Pr 92.38 Sn 95.91 F1 |
Alharbi et al. [17] | Local-Global best Bat Algorithm NN | 99.82 Pr 0.9987 Sn 99.85 F1 |
Palla and Tayeb [18] | ANN (Security Camera) | 89.3 Acc 84 Pr 99 Sn 92 F1 |
Palla and Tayeb [18] | RF (Security Camera) | 75.6 Acc 68 Pr 92 Sn 78 F1 |
Kushwah and Ranga [36] | ANN and Imperialistic Competitive Algorithm with NSL-KDD test set | 83.5 |
Al-Qatf et al. [37] | SAE-SVM with NSL-KDD test set | 84.96 |
Kushwah and Ranga [38] | extreme learning machine with NSL-KDD test set | 86.80 |
Yusof et al. [39] | MLP with NSL-KDD test set | 91.7 |
Ma et al. [40] | Deep learning | 92.99 |
Proposed Method | Deeper hybrid model | 99.99 Acc 99.99 Pr 100 F1 99.99 Sp 100 Kappa |
Proposed Method | Deeper hybrid model with NSL-KDD test set | 97.81 Acc |
When the studies in the literature was examined, which were recently confirmed by the NSL-KDD test set, flooding was determined by different methods. Kuswah and Ranga achieved 83.5% success rate from the NSL-KDD test set with ANN and Imperialistic Competitive Algorithm [36]. Al-Qatf et al achieved 84.96% success in validating the NSL test set using the SAE and SVM methods [37]. In another study, Kuswah and Ranga achieved a success rate of 86.80 in the NSL-KDD test set in the detection of flooding they proposed with an extreme learning machine [38]. Yusof et al achieved 91.7% success with the MLP [39]. Ma et al, on the other hand, achieved 92.99% success when they verified the flooding detection method they designed with deep learning with the NSL-KDD dataset [40]. The method proposed in this study and trained with the N-BaIoT dataset is in the NSL-KDD dataset.
In addition, unsupervised learning methods have a high rate of FP in recognizing abnormal traffic [41]. An Auto Encoder is an unsupervised neural network that is used to efficiently learn the encoding of the input data. Typically, an auto-encoder is used for the size reduction performed by encoding the inputs. Stacked automatic encoders were used in this study to discover the nonlinear representations of the data. With this architecture, UDP flood attacks were effectively detected. Only 3 of abnormal traffic was detected as false positives. The total accuracy was 99.99%. This result shows that UDP-flood attacks are detected more effectively than other studies.