4.1. Dataset Description
The authors [45] argued that the dearth of relevant public data for evaluating the detection of anomaly systems that rely on ML, particularly the widespread usage of replicated data such as the DARPA, and KDD’Cup datasets in the 1990s, makes it difficult to evaluate and compare solutions.
With the exponential growth of digital data over the past few years, the barrier of a dearth of standard sets or publicly accessible data in this field has evaporated. The availability of these data allows us to assess the model from a real-world standpoint. NSL-KDD is a prominent dataset that is still used by many academics, however, in our analysis, it does not replicate the traffic conditions and security problems that occur in today's WSN. As a result, we conducted a second survey and chose the most recent UNSW-NB15 and CICIDS2017 datasets. It is one of the few datasets that include current assaults, and it was chosen as the most complete IDS baseline to verify and test the proposed methods. The NSLKDD dataset, UNSW-NB15 data, and CICIDS2017 data are all utilized to test our suggested technique in this research. The last two datasets are the most recent, and they feature benign and up-to-date common assaults that closely reflect the authentic practical network context.
The UNSW-NB15 data was constructed via the Australian Centre for Cyber Security's (ACCS) Cyber Range Lab, which generated a combination of accurate contemporary normal activities and modern attacks from network circulation [24]. Each record contains 48 characteristics, plus one for classification. The 47 characteristics supply the facts of network traffic packets in the real world, whilst the labeling feature determines if this network access is authorized or unauthorized. These 47 features are further subdivided into five categories: fundamental features, flow features, time features, content features, and other created features. The UNSW-NB15 data has roughly 2,540,044 records. In addition, an appropriate split is proposed for training and testing.
The authors [46] from the Canadian Institute for Cybersecurity (CICIDS) suggested the CICIDS2017 dataset to overcome the limits of existing data and give accurate and credible data for intrusion detection. The CICIDS2017 data includes benign and contemporary common assaults gathered over 5 days from Monday, July 3, 2017, to Friday, July 7, 2017. The CICFlowMeter
utility extracts 80 net flow characteristics from the produced network data to describe each record. The CICIDS2017 dataset consists of 8 files with a total of 2,830,743 entries.
4.2. NSLKDD Dataset
The authors [22] presented significantly more real data, the NSL-KDD set of data, which is a revised form of the KDD'99 set of data that removes all needless data and recreates the format, making it more realistic in terms of both data quantity and formats. The NSL-KDD data includes TCP link records with 41 features and one labeling attribute. The 41 informative attributes are utilized to explain the specifics of each TCP protocol in the data; the labeling attribute aids in categorizing each connection as normal or anomalous.
4.3 UNSW-NB15 Dataset
Table 4 displays the 42 features (inputs) extracted from the UNSW-NB15 dataset [47] for this study. Three of the inputs are notional qualities, while the rest 39 are binary, integer, and float numeric attributes. A training set and a testing set are included in the UNSW-NB15[48]. In this study, we divided the training dataset into two sub-sets: UNSW-NB15-25, which signifies 25% of the total testing set, and UNSW-NB15-75 which represents 75% of the total training part. Following training on the UNSW-NB15-75, the UNSW-NB15-25 will be used as a validation group. This method prevents a model from learning from validation or test sets. During the model's initialization phase, the technique also ensures that the findings acquired on the test data and validation data are free of interference and bias.
Table 4
UNSW-NB15 Attributes list
Number | Name | Category | Number | Name | Category |
A1 | Dur | Float | A22 | Dtcp | Integer |
A2 | Protoc | Categorical | A23 | Dwin | Integer |
A3 | Servce | Categorical | A24 | Tcprt | Float |
A4 | Stat | Integer | A25 | Synac | Float |
A5 | Spkt | Integer | A26 | Ackda | Float |
A6 | Dpkt | Integer | A27 | Smea | Integer |
A7 | Sbyte | Integer | A28 | Dmea | Integer |
A8 | Dbyte | Integer | A29 | Trans_dept | Integer |
A9 | Rat | Float | A30 | Response_body_le | Integer |
A10 | Sttl | Integer | A31 | Ct_srv_sr | Integer |
A11 | Dttl | Integer | A32 | Ct_state_tt | Integer |
A12 | Sloa | Float | A33 | Ct_dst_sport_lt | Integer |
A13 | Dloa | Integer | A34 | Ct_src_dpor_ltm | Integer |
A14 | Slos | Integer | A35 | Ct_dst_spor_ltm | Integer |
A15 | Dlos | Integer | A36 | Ct_dst_src_lt | Integer |
A16 | Sinpk | Float | A37 | Is_ftp_login | Binary |
A17 | Dinpk | Float | A38 | Ft_st_cmd | Integer |
A18 | Sji | Float | A39 | Ct_flw_http_mth | Integer |
A19 | Dji | Float | A40 | Ct_src_lt | Integer |
A20 | Swin | Integer | A41 | Ct_srv_ds | Integer |
A21 | Stcp | Integer | A42 | Is_sm_ip_port | binary |
4.4. CICIDS Dataset
The CICIDS 2017 dataset [46] was created in 2017 by the University of New Brunswick's the Faculty of Computer Science. CICIDS 2017 is a streamlining of the ISCX 2012 data [49], founded on previous studies by Shiravi Ali [50]. The 2017 CICIDS set of data is derived from actual traffic generalization. The study [46] refers to the features of the IDS dataset and the methodology utilized to develop it. CICIDS 2017 collected data over five days, utilizing 225,745 packages with more than 80 attributes and capturing above seven (7) days of network connections (that is normal and attack). The attack simulations in the CIC 2017 dataset are classified into seven categories: heart bleed attack, brute force, DDoS attack, botnet attack, DoS, infiltration attack, and web-driven attack.
4.5. Experimental Analysis
We pre-normalized the dataset within a band of [0,1] to remove the unfavorable effect of the unit of features dimension and to avoid the values of features in vast ranges from dominating those in small ranges. Our suggested intrusion detection model was evaluated using the 10-fold cross-validation (CV) approach, which is a typical method for completing training and detection as recommended by the authors [45]. The unique data is then randomly tested into ten equal-sized mutually exclusive subsets. Nine (9) subsets are chosen to train the intrusion detection model in each run of the model, while the outstanding one is utilized to test the model. As a result, each subset has an equal probability of being chosen to train and test the model if the process is repeated 10 times. Finally, the proposed model's performance is calculated by averaging the outcomes of testing subgroups. Both the CICIDS2017 and UNSW-NB15 datasets have a large capacity and a severe imbalance class, ensuing in higher loading and processing overhead and a towards the class majority [51]. To circumvent these limitations, we model a piece of data from the initial two data for the attack classes as Chiba et al. [52] did. Table 5 provides more information.
Table 5
The sample data size of the attack class
Datasets | Original size | Extracted size |
CICIDS2017 | 557,646 | 133,045 |
UNSW-NB15 | 119,341 | 62,000 |
To assess the suggested method's performance in terms of intrusion detection. The 10-fold CV has been used to solve the problem. The classifier was repeated ten times, with the final findings being averaged. Because minimizing detection mistakes, particularly false positives, is a high priority, we employ accuracy, DR, and FAR to assess the performances of the proposed model and compare it to other detection approaches for intrusion detection systems. These indicators do not require a sample size, which is extremely useful when evaluating the effectiveness of an intrusion detection system[53].
These indicators can be determined using the Eq. 1 to 3:
4.6. Detection Performance Analysis on the NSLKDD dataset
We began by comparing the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the NSLKDD dataset. The PCA + FA-RF gave a DR of 99.19%, accuracy of 99.23%, and FAR of 1.02. The PCA + FA-DT revealed a DR of 98.63%, an accuracy of 98.62%, and a FAR of 2.41. The PCA + FA-NB showed a DR of 89.95%, an accuracy of 85.85%, and a FAR of 2.95. The PCA + FA-DBN gave a DR of 99.52%, an accuracy of 99.46%, and a FAR of 0.40. The single individual model RF without feature dimensionality reduction (PCA + FA) gave a DR of 98.98%, an accuracy of 99.04%, and a FAR of 1.40. The individual model DT gave a DR of 98.37%, an accuracy of 98.34%, and a FAR of 2.60. The individual model NB revealed a DR of 89.59%, accuracy of 85.29%, and FAR of 3.10. The individual model DBN gave a DR of 99.41%, an accuracy of 99.10%, and a FAR of 0.60. From the performance evaluation of various models in Table 6, it is concluded that the proposed technique yielded the best results for the NSLKDD dataset.
Table 6
Results of hybrid (PCA + FA) and without hybrid feature dimensionality on NSLKDD dataset
Metric | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
DR | 99.19 | 98.63 | 89.95 | 99.52 | 98.98 | 98.37 | 89.59 | 99.41 |
Accuracy | 99.23 | 98.62 | 85.85 | 99.46 | 99.04 | 98.34 | 85.29 | 99.10 |
FAR | 1.02 | 2.41 | 2.95 | 0.40 | 1.40 | 2.60 | 3.10 | 0.60 |
On the NSLKDD dataset, Fig. 5 illustrates the 10-fold cross-validation performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, PCA + FA-DBN models, and individual RF, DT, NB, and DBN (without hybrid features dimensionality PCA + FA) in terms of DR, FAR, and accuracy. As can be seen in Fig. 5, the proposed models outperformed the individual models in DR, accuracy, and FAR.
The comparison results in Fig. 5 demonstrate that our proposed method outperforms individual-RF, individual-DT, individual-NB, and individual-DBN in DR, accuracy, and FAR, suggesting that the presented hybrid PCA + FA feature dimensionality can significantly improve detection ability.
As revealed in Fig. 6, the random forest ROC class of the normal class 0 is 1, the ROC of attack class 1 AUC is 1, ROC of attack class 2 AUC is 1, the ROC of attack class 3 AUC is 0.75, the ROC of attack class 4 AUC is 1. These describe the AUC of some of the attack classes in the NSLKDD dataset.
As revealed in Fig. 7, the NB ROC curve of the normal class 0 is 0.95, the AUC of attack class 1 is 0.96, the attack class 2 is 0.97, and the attack class 3 AUC is 0.79, the attack class 4 AUC is 0.95. These show the NB algorithm AUC of all the attack classes in the NSLKDD data.
As seen in Fig. 8, the DT model ROC class of the normal class 0 is 0.99, the ROC of class 1 attack class AUC is 1, the AUC of attack class 2 is 0.98, and the attack class 3 AUC is
0.50, the attack class 4 AUC is 0.84. These describe the DT model AUC of all the attack classes in the NSLKDD data.
4.7. Detection Performance Analysis on the UNSW-NB15 dataset
We compare the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the UNSW-NB15 dataset in this section. As seen in Table 7, the PCA + FA-DBN gave outstanding performance than all other proposed models in terms of DR, accuracy, and FAR. While, the individual NB gave the least performance with a DR of 70.11%, an accuracy of 70.80%, and a FAR of 4.71.
Table 7
Results of hybrid (PCA + FA) and without hybrid feature dimensionality on UNSW-NB15 dataset
Metric | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
DR | 99.98 | 98.79 | 71.21 | 100 | 98.19 | 97.80 | 70.11 | 98.90 |
Accuracy | 99.99 | 99.00 | 71.37 | 100 | 98.98 | 98.00 | 70.80 | 99.40 |
FAR | 1.51 | 2.51 | 2.64 | 0.30 | 3.80 | 4.71 | 4.89 | 4.90 |
The comparison results in Fig. 9 demonstrate that our proposed method outperforms individual-RF, individual-DT, individual-NB, and individual-DBN in DR, accuracy, and FAR, demonstrating that the proposed hybrid PCA + FA feature dimensionality can significantly improve detection capability on the UNSW-NB15 data. Additionally, the suggested PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN all have a FAR of less than 3%, but the single individual model's RF, DT, NB, and DBN all have a FAR of more than 3%. Interestingly, the individual model DBN achieved a high accuracy as well as a high FAR, showing that it is skewed and unable of detecting intrusions.
According to Fig. 10, the RF ROC curve score generated for class 0 normal class is 1, and attack class 1 AUC is 1 which indicates that there is no overlapping of the distribution.
As seen in Fig. 11, the NB ROC curve score produced for the normal class 0 is 0.82, and attack class 1 is 0.82.
According to Fig. 12, the DT ROC curve score generated for the normal class 0 is 1, and attack class 1 is 1, which indicates there is no overlap in the distribution.
The SHAP value of the DT model is given in Fig. 13, where the class 0 normal class is 0.5, and the attack class 1 is 0.5. Red pixels reflect positive SHAP values that enhance the class's probability, whereas blue pixels indicate negative SHAP values that decrease the class's probability. Each of the attributes as seen in Fig. 13 belonging to the attack class (starting from worms) contributes to the DT model output.
4.8. Detection Performance Analysis on the CICIDS2017 dataset
We compare the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the CICIDS data in this section. The PCA + FA-RF gave a DR of 99.58%, accuracy of 98.95%, and FAR of 2.90. The PCA + FA-DT gave a DR of 99.42%, accuracy of 98.89%, and FAR of 3.08. The PCA + FA-NB showed a DR of 99.36%, an accuracy of 98.81%, and a FAR of 3.10. The PCA + FA-DBN gave a DR of 99.99%, accuracy of 99.98%, and FAR of 3.10. The single individual model RF without feature dimensionality reduction (PCA + FA) gave a DR of 98.10%, an accuracy of 97.20%, and a FAR of 2.80. The individual model DT gave a DR of 98.90%, an accuracy of 98%, and a FAR of 2.99. The individual model NB revealed a DR of 98.74%, an accuracy of 97.89%, and a FAR of 2.98. The individual model DBN gave a DR of 99.10%, an accuracy of 99.50%, and a FAR of 1.51. According to the performance evaluations in Table 8, the proposed models (with hybrid feature dimensionality reduction PCA + FA) produced the best results for the CICIDS dataset than the individual models (without hybrid feature dimensionality reduction PCA + FA).
Table 8
Results of hybrid (PCA + FA) and non-hybrid feature dimensionality on the CICIDS dataset
Metric | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
DR | 99.58 | 99.42 | 99.36 | 99.99 | 98.10 | 98.90 | 98.74 | 99.10 |
Accuracy | 98.95 | 98.89 | 98.81 | 99.98 | 97.20 | 98.00 | 97.89 | 99.50 |
FAR | 2.80 | 2.99 | 2.98 | 1.51 | 6.90 | 7.08 | 5.10 | 6.98 |
More specifically as shown in Fig. 14, about DR, accuracy and FAR, our proposed models (with feature dimensionality reduction PCA + FA) gave significantly better performances than the single individual model RF, DT, NB, and DBN model. Besides, in terms of FAR, the proposed PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN are all below 3%, while the single individual model RF, DT, NB, and DBN are all over 5%. Most notably, the individual model DBN produced both a high FAR and a high DR, indicating that it is biased and incapable of detecting intrusions.
4.9. The required Training time of the proposed models on NSLKDD data
The training time (TT) needed by our suggested models on the NSLKDD
the dataset is given in Table 9 to further highlight the benefits of our proposed techniques.
Table 9
Required training time to build the model on NSLKDD data
Algorithms | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
Training Time | 1.1056 | 1.2453 | 1.1190 | 1.1200 | 3.2930 | 2.1520 | 2.0200 | 3.1100 |
Figure 15 shows that the TT of our suggested models is faster than that of single individual models (RF, DT, NB, DBN). On the NSLKDD dataset, single-RF without hybrid feature dimensionality reduction (PCA + FA) requires approximately 2.18 times higher, and single-DBN without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.99 times higher the training time as PCA + FA-RF and PCA + FA-DBN.
4.10. The required Training time for the proposed models on UNSW-NB15 data
To validate the value of our proposed techniques, the TT required by our suggested models on the UNSW-NB15 dataset is presented in Table 10.
Table 10
Training time required to build the model on UNSW-NB15 data
Algorithms | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
Training Time | 1.567 | 1.188 | 1.427 | 1.715 | 2.676 | 2.297 | 2.516 | 2.804 |
Figure 16 indicates that our suggested models train faster than single individual models (RF, DT, NB, DBN). Individual single-DT without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.10 times higher training time than PCA + FA-DT and PCA + FA-NB on the UNSW-NB15 dataset. Individual single-NB without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.08 times higher training time than PCA + FA-DT and PCA + FA-NB.
4.11 The required Training time of the proposed models on CICIDS 2017 data
Table 11 shows the training time required by our model predictions on the CICIDS-2017 dataset to highlight the effectiveness of our proposed strategies.
Table 11
Training time required to build the model on CICIDS-2017 data
Algorithms | PCA + FA-RF | PCA + FA-DT | PCA + FA-NB | PCA + FA-DBN | RF | DT | NB | DBN |
Training Time | 1.4780 | 1.1761 | 1.2380 | 1.2134 | 3.4560 | 2.3780 | 3.5160 | 3.7045 |
According to Fig. 17, our proposed models train faster than single individual models (RF, DT, NB, DBN). Individual single-RF without hybrid feature dimensionality reduction (PCA + RF) on the CICIDS dataset requires roughly 1.97 times higher training time than PCA + FA-RF and PCA + FA-DT. Individual single-DT without hybrid feature dimensionality reduction (PCA + FA) takes around 1.20 times higher as long to train as PCA + FA-RF and PCA + FA-DT.
4.12 Comparison with the Previous studies
We compare the proposed hybrid models with other current models utilizing the NSLKDD, UNSW-NB15, and CICIDS datasets to conduct an additional evaluation of our suggested intrusion detection architecture. Tables 12, 13, and 14 summarize the comparison results, respectively. As seen in these tables, the Null represents no feature dimensionality reduction techniques that were considered.
Table 12
On the NSLKDD dataset, the performance of various intrusion detection algorithms is compared.
Authors/Year | Algorithms | Feature Dimensionality Reduction | Dataset utilized | Accuracy | DR | FAR |
[15] | FFDNN | Filter technique | NSL-KDD | 81.19% | x | x |
[25] | DFFN | Wrapper | NSL-KDD | 98.60% | x | x |
[18] | DCNN | Null | NSL-KDD | 85.00% | x | x |
[23] | APCA + I + ELM | Wrapper | NSL-KDD | 81.22% | x | x |
Proposed hybrid RF method | PCA + FA-RF | Filter + Wrapper | NSLKDD | 99.23 | 99.19 | 1.02 |
(Null: No feature dimensionality reduction strategy considered) |
Table 13
On the UNSW-NB15 dataset, the performance of various intrusion detection algorithms is compared
Authors/Year | Algorithms | Feature Selection Technique | Dataset utilized | Accuracy | DR | FAR |
[16] | DNN | Null | UNSW-NB15 | 78.50% | x | x |
[25] | DEA | Wrapper | UNSW-NB15 | 92.40% | x | x |
[24] | DT, ANN, NB | Filter | UNSW-NB15 | 81.34% | x | x |
[21] | DT + GA + LR | Wrapper | UNSW-NB15 | 81.42% | x | x |
[23] | APCA + I + ELM | Wrapper | UNSW-NB15 | 70.51% | x | x |
Proposed hybrid DT | PCA + FA-DT | Filter + Wrapper | UNSW-NB15 | 99.99 | 98.79 | 2.51 |
(Null: No feature dimensionality reduction method considered) |
Table 14
The performance of various intrusion detection techniques is compared using the CICIDS 2017 dataset
Authors/Year | Algorithms | Feature Selection Technique | Dataset utilized | Accuracy | DR | FAR |
[20] | LSTM, RNN | Null | CICIDS2017 | 84.83% | x | x |
[54] | DBN | Wrapper | NSLKDD, CICIDS2017 | 98.24 | 99.0 | 2.10 |
[55] | AE-QDA | Filter | CICIDS2017 | 94.20 | 96.40 | 6.30 |
[56] | DNN | wrapper | CICIDS2017 | 92.92 | 92.38 | 3.24 |
[57] | DT-EnSVM | filter | CICIDS2017 | 98.46 | 99.15 | 4.00 |
[58] | BRS | Filter | CICIDS2017 | 97.96 | 96.38 | 1.42 |
Proposed Hybrid DBN | PCA + FA-DBN | Filter + Wrapper | CICIDS2017 | 99.98 | 99.99 | 1.51 |
(Null: No feature dimensionality reduction method considered) |
Some recent studies have been chosen to help understand the benefits of our proposed intrusion detection methodology. The results in Tables 12–14 revealed that, as compared to FFDNN in [15], DT + GA + LR in [21], LSTM, RNN in [20], our suggested technique obtains improved overall performance, particularly in DR, accuracy, and FAR. Also, it is demonstrated that our suggested models outperform other detection systems, including DL detection methods, in terms of the three-measuring performance. Intrusion detection is frequently confronted with massive amounts of data. Therefore, feature dimensionality reduction is of high significance as shown in our proposed models. Consequently, small changes in evaluation criteria might have a substantial impact on practice if a significant number of attacks/intrusions are recognized and allowed. However, it should be noted that Tables 12–14 provide a comparison between our proposed intrusion models and other current IDS methods. Nonetheless, based on the comparison results provided above, our suggested solution remains competitive and may inspire future studies in intrusion detection in the WSNs arena.
4.13. Threats to validity
This section discusses possible issues with the validity of the verification results obtained during this investigation.
4.14. Internal Validity
Internal validity is the extent to which published findings reflect actual reality in the population under study and are not due to methodological flaws. There are two crucial factors to consider in this case.
4.14.1 Instrumentation: This term refers to inconsistencies resulting from changes in the instrument's calibration, as well as variances in the scorers, observers, or most likely the device itself. Accuracy, detection rate, and false alarm rate are all well-known validation metrics. There have been no changes that could have influenced the outcome of the evaluation.
4.14.2 Selection: A selection threat is any element other than the system that contributes to posttest discrepancies. As a result, the absence of feature scaling and data that are not on the same scale could be a role in this work.
4.15. Construct Validity
The amount to which the measuring instrument 'interacts' with conceptual assumptions and the scores appropriately represent the framework's complexity. This risk arises from the question of whether the experiment accurately replicates the investigated real-world occurrences. The proposed model is consistently based on the high accuracy and DR of the evaluation criteria.
4.16. External Validity
This pertains to our ability to apply study findings to practical issues. This risk raises the question, "Can this effect be extended across a range of contexts, populations, treatments, and measurement characteristics?"
The suggested hybrid feature dimensionality models for WSN threat identification were implemented and validated on the NSLKDD, UNSW-NB15, and CICIDS datasets. The findings corroborate what has been discovered in the literature. Validation will be conducted in the future in an industry context or on a recent WSN dataset.