An Efficient Machine Learning and Deep Belief Network Models for Wireless Intrusion Detection System

doi:10.21203/rs.3.rs-2110380/v1

Download PDF

Research Article

An Efficient Machine Learning and Deep Belief Network Models for Wireless Intrusion Detection System

https://doi.org/10.21203/rs.3.rs-2110380/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Wireless Sensor Networks (WSNs) have emerged as a critical component of the Internet of Things (IoT), yet despite their obvious benefits, security challenges persist. As more devices connect to the internet, new cyber assaults join established ones, posing serious concerns to the confidentiality, integrity, and accessibility of data in WSNs. Security in WSNs is a critical and difficult task. Anomaly detection is essential for ensuring the security of WSNs. The detection of abnormal data using a machine learning (ML) algorithm has gained popularity in recent years. Numerous ML classifiers have been employed in WSNs intrusion detection. However, existing research rarely considered feature dimensionality reduction, which is critical for developing a well-performing intrusion detection system (IDS). The purpose of this study is to develop a hybrid solution for intrusion detection in WSNs. For better results, the hybrid technique employed both principal component analysis and the firefly algorithm (PCA + FA) for feature dimensionality reduction. We investigated both ML algorithms (random forest, decision tree, Naïve Bayes) and deep belief networks for intrusion detection in WSNs. The experiment was run on the renowned NSL-KDD dataset in addition to the most recent CICIDS2017 and UNSW-NB15 datasets, to create a stable dataset with a proportionate number of regular traffic and malicious samples. The results demonstrated that the proposed hybrid feature dimensionality reduction techniques PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN outperform the individual base models RF, DT, NB, and DBN (without feature dimensionality reduction) in terms of accuracy (ACC), detection rate (DR), and false alarm rate (warning) making the proposed hybrid feature dimensionality reduction a viable option for intrusion detection in WSNs. Additionally, the experimental findings in terms of the training time showed that our proposed models train faster than the single individual models on the UNSW-NB15, NSLKDD, and CICIDS2017. On the NSLKDD single-RF requires approximately 2.18 times higher and single-DBN requires approximately 1.99 times higher training time than PCA + FA-RF and PCA + FA-DBN. For the UNSW-NB15 dataset, the individual single-DT requires approximately 1.10 times higher training time than PCA + FA-DT and PCA + FA-NB on the UNSW-NB15 dataset. Individual single-NB requires approximately 1.08 times higher training time than PCA + FA-DT and PCA + FA-NB. On the CICIDS2017 dataset, our proposed models train faster than single individual models (RF, DT, NB, DBN). Individual single-RF on the CICIDS2017 dataset requires roughly 1.97 times higher training time than PCA + FA-RF and PCA + FA-DT. Individual single-DT takes around 1.20 times higher to train than PCA + FA-RF and PCA + FA-DT. The execution time results revealed our model's capability of detecting intrusions with a great DR, ACC, and FAR (warning) rate. Consequently, indicating its advantage over state-of-the-art methods.

Wireless Sensor Networks

Intrusion Detection System

machine learning

Deep Learning

UNSW-NB15

CICIDS 2017

Internet of Things

Dimensionality Reduction

A wireless sensor network (WSNs) is a collection of resource-constrained sensor nodes that perform numerous functions such as sensing, processing, and communication to convene the needs of various applications [1]. In another term, a WSN is a type of network made up of several sensor nodes that can be used as part of the Internet of Things (IoT). An institution, such as a government entity (i.e., a military unit), deploys such networks in a predetermined region called the target field or deployment area, and then the sensor nodes automatically create a network through wireless communications [2]. Furthermore, WSN can be employed in a variety of applications, including ocean monitoring, industrial machine performance monitoring, earthquake monitoring, and numerous military applications [3]. In addition, future applications including pollution monitoring, highway traffic, building security, wildfires, and water quality monitoring are projected to include WSN principles in their structures. WSN has several benefits, including the ability to turn raw data into meaningful aggregated and categorized information [4].

Wireless handheld devices, wireless computer networks (WCN), and wireless communication technologies have made it possible to process massive data in real time. As a consequence, these systems are vulnerable to a diversity of malware assaults and security flaws. As a result, developing proactive, efficient, and accurate wireless intrusion detection systems (WIDS) to mitigate these attacks is critical [5].

Sensor nodes are typically static; however, we can also install mobility nodes depending on the application requirements. These nodes are dispersed over the globe to gather obligatory data and transmit it to a central node known as a base station (BS) node, which node is more capable and powerful [6]. They're employed in a variety of programs that run in real-time, including surveillance of security and healthcare, environmental and changes in climate monitoring, and systems of military surveillance. Nowadays, sensors are also utilized to regulate electrical devices (such as lighting, geysers, and air conditioners). As a result, we can construct smart homes by deploying sensor nodes, which can provide better lighting, heating, and cooling systems [7]. Figure 1 depicts a typical smart home scenario in which a smart home user can use his or her smartphone to access and control various smart devices such as temperature, light, humidity, and other smart devices over the Internet.

The network also includes a resource-rich node known as the base station (sometimes known as the gateway node or BS). The BS is usually an entry to a network with powerful processing of data and huge storage capabilities, as well as a human interface access point. The BS receives sensor information, administers the network, and conducts costly actions in favor of sensor nodes. The BS is regarded as a trusted entity in most applications, and it will never be compromised by an attacker; otherwise, the entire WSN will be compromised. Sensor nodes are placed in the vicinity of the BS in one or more hops. The BS might be placed at the network's core or a corner, depending on the applications [8]. All sensor nodes continue to monitor the network area after deployment. If an event of interest occurs, one of the sensor nodes in the vicinity detects it and creates a report based on perceived and analyzed data, which it then transmits through wifi communication to the base station. If multiple nodes in the vicinity noticed the same occurrence, a collaborative report can be generated. The BS receives data from the sensor nodes and analyses it before sending it to the outside world via high-quality wireless or cable channels [9].

When designing these applications, security is a crucial element that must be considered. Despite dangers to data confidentiality and integrity, it must be protected against all threats to the WSN's availability, such as DoS assaults and other present-day assaults. In security terms, a single attacker always possesses a physical mechanism for identifying or accessing a node and can obtain vital data, allowing them to join the network as a legitimate member. As a result, a WSN should be capable of detecting internal attacks that are impossible to detect with the use of an active IDS. Several researchers have proposed various approaches to overcoming potential security vulnerabilities posed by WSN. Key exchange, secure routing, authentication, and any other security mechanisms for specific types of intrusions are among them. IDS are one of the most widely used versatile and useful technologies for defending WSN against various attacks and threats. The structure of IDS is given in Fig. 2.

Enormous amounts of research have been done in the last few decades to construct IDS for WSN networks using trust-based approaches, data mining-based approaches, regression model-based approaches, and artificial intelligence-based approaches [1]. A short outline of the established IDS for WSNs is included in the literature part. An IDS is an effective instrument for identifying intrusion risks in both wireless and wired connections. When an intrusion is detected, the system notifies the controller, who can then take appropriate action [10]. Today, WSNs are primarily concerned with security issues related to packet transfer between the network's various sensor nodes. Because the security of WSNs is becoming increasingly vital, intrusion detection is critical [11]. Another major challenge is that the IDS utilized in WSN produces a large number of false-positive warnings. The security analyst's ability to detect successful attacks and take corrective action is hampered by a large number of false-positive warnings. Such warnings have not been categorized according to the level of threat they pose. They must also be processed to determine the most serious warnings and the response time. Each IDS produces a large number of warnings, the majority of which are true, while the rest are either false (i.e., false alerts) or redundant (i.e., redundant alerts) [12]. IDS in WSNs face a significant problem as a result of false warnings and it is extremely difficult and time-consuming to differentiate between intrusion and routine network traffic activities. In recent years, numerous IDS for diverse WSN architectures have been presented. However, based on current breakthroughs in this field, there is still a considerable demand for IDS performance enhancement and optimization [6],[13]. Therefore, to address these issues, an optimized, efficient, and high-performance IDS in WSNs is sacrosanct and necessary. This study aims to propose an enhance and efficient dimensionality reduction for IDS in WSNs. As a result, the following were the study's motivations:

Early detection of wireless intrusion attempts allows network managers to identify attacks at an earlier stage and with more accuracy.

Concentrate on the most important aspects of the attack while ignoring the less important ones to ensure better high accuracy.

Proposed an enhanced model that reduces the false positive warning.

The following are some of the proposed work's unique contributions:

A hybrid feature dimensionality reduction was proposed to improve the dataset's quality and contain only significant and useful features for training the proposed models.
The PCA + FA solution was proposed and utilized to drastically shorten the time to train the ML (RF, DT, NB)-based models and DBN model.
The proposed models outperformed the state-of-the-art techniques in terms of DR, FAR, and accuracy.
Utilization of contemporary and up-to-date datasets against an obsolete dataset that does not reflect recent assaults in the WSNs environment.

The remaining sections of this paper are organized as follows. Section 2 provides a summary of the literature on WSNs intrusion detection systems. This study's methodology, including deep belief networks, PCA, the Firefly algorithm, NB, RF, and DT, are discussed in Section 3. In addition, Section 3 included a description of the employed methodology and datasets. In Section 4, the results and comments are presented. The fourth section presents experimental results derived from the aforementioned datasets and analyses. The fifth section concludes the paper and discusses future research.

There are numerous works for intrusion detection studies in WSN, as shown in the following [14] reviewed studies. The authors [15] suggested a system for detecting wireless network assaults using a filter-based method and DL for reducing feature space. The information gain (IG) theory is the cornerstone of the feature selection method employed in this study. The FFDNN-IDS were compared against the following classifiers NB, kNN, SVM, RF, and DT in the study. Employing the NSL-KDD IDS dataset, experiments were performed for the multiclass and binary forms of assaults. The results showed that the IG-FFDNN-IDS performed better than the other models that were used. The FFDNN + IDS achieved a binary classification accuracy of about 99.37 percent on the train data, and 87.74 percent on the test set using the decreased NSL-KDD set of attributes. The FFDNN + IDS had a training set accuracy of 94.54 percent and a testing set accuracy of 86.19 percent for the multi label setup.

Implementation of DNN for IDS was proposed in [16]. The authors did not attempt to extract any features in this case. The DNN model was likened to traditional ML approaches in a series of trials. The studies used NSL-KDD, UNSW-NB15, KDDCup-99, and WSN-DS datasets. The DNN outpaced other methods in multiclass and binary classification situations, according to the findings. A binary categorization accuracy of 92.7 percent and a multiclass setup accuracy of 92.5 percent were achieved. A DNN with 5 layers on the NSL-KDD had a detection accuracy of about 78.9 percent for binary classification and 78.5 percent for multiclass categorization on the NSL-KDD. A five-layer DNN obtained 98.2% accuracy for binary classification and 96.44% accuracy for multiclass classification on the WSN-DS. On the UNSW-NB15, a DNN with 5 layers has an accuracy of 76.1 percent for binary classification scheme identification and 65.1 percent for multi-class setting detection.

The authors [17] demonstrated a network IDS based on RF and SVM. The features selection procedure used RFs based on a feature significance score algorithm. The KDD Cup 99 dataset was utilized to test the above method. The performance results showed that using SVM as the predictor, a reduced set of fourteen features attained an accuracy of 93 percent on the training set against 90 percent on the complete set of 41 attributes.

The paper [18] looked into the use of DNNs for the detection of an anomaly. DCNN, different autoencoders, and an RNN with LSTM were used. Traditional ML models such as the ELM, DT, kNN, NB, RF, and SVM were compared to these DNN models. The NSL-KDD data was utilized to test and train the above systems. According to the findings, IDSs based on DNN outperformed traditional IDSs based on ML. DCNN, for instance, achieved an accuracy of 85 percent on the data test, which, according to the experiments, was superior to conventional ML techniques. Finally, the authors determined that DL is a viable technology for applications in the security of information.

The authors [19] suggested an IDS built on the LSTM algorithm. To evaluate the performance of their methods, the scholars employed the CIDDS dataset. The key criterion utilized the detection accuracy of the LSTM IDS to estimate its efficiency. The system was also associated with NB, SVM, and the MLP, among other techniques. The results showed that the LSTM-IDS beat its counterparts on the validation data, with an accuracy of 84.83 percent. A bidirectional LSTM technique was utilized to categorize assaults in the UNSW-NB15 data in [20]. An RNN with an LSTM is a specific sort of RNN. The LSTM-IDS demonstrated that it was able to detect malevolent activity successfully; though, its performance was hampered by the UNSW-NB15 dataset's class imbalance issue. The authors concluded that the next phase of their research would be to solve the problem of class imbalance.

The paper [21] suggested a wrapper method for feature extraction based on an evolutionary process called GA. To pick the finest subset of attributes, the LR methodology was also applied. To accomplish network intrusion detection, their tests were shown using the KDD- Cup99 data and the UNSW-NB15 data employing three types of DT classifies. The best feature vector for the KDD Cup99 [22] included 18 features and had 99.90% of accuracy, a DR of 99.81%, and a FAR of 10.5. The top feature vector for the UNSW-NB15 dataset had 20 attributes and had an accuracy of 81.42% and a FAR of 6.39%. In [23], the authors used an IDS that is on the I-ELM and an APCA. An ELM is a type of DNN with numerous concealed layers made up of a large number of neurons. In most cases, ELMs are developed in a feed-forward fashion. The APCA is employed in this study as a wrapper selection strategy, in which the best subsets of attributes are gradually extracted and trained before being evaluated via the I-ELM. The authors used the UNSW-NB15 and NSLKDD datasets in their investigations. The model gave an accuracy of 81.22 percent across the NSL-KDD, according to the findings. In the case of the UNSW-NB15 data, their system had a 70.51 percent overall accuracy.

The authors [24] investigated the performance of five different models on the KDDCup 99 and UNSW-NB15 datasets. NB, ANN, DT, EM, and LR clustering are some of these approaches. The accuracy and FAR was the performance metrics used. On the KDD Cup 99 data, the overall detection accuracy and the FAR were 97.04 percent and 1.48 percent, respectively, while using ANNs. ANNs produced 81.34 percent accuracy and a FAR of 21.13 percent for the UNSW-NB15. The researchers determined that the UNSW-NB15 data is substantially more complicated than the KDD Cup 99 data based on the findings of the ANNs and the four remaining classifiers. This level of complexity mirrors the complexity of wireless network traffic in the actual world.

Muna et al. [25] offer an IDS strategy based on DFFNNs and DEAs. The UNSW-NB15 and DEA data were utilized in the studies. The DEAs were utilized to get a more detailed picture of the data. These pictures are then supplied to the categorization process via DFFNNs. The DAE-DFFNN approach demonstrated its ability to create and extract critical features that boost the model's effectiveness.

The researchers used the AWID dataset to develop a WIDS built on a semi-supervised DL approach [26]. To pick the best inputs that were needed to detect threats efficiently, the strategy provided in this paper used a ladder network approach. The ladder network's architecture included an SAE, as well as a clean decoding unit, a noisy encoder unit, and a decoder non-noisy unit. This wrapper design was able to select the most appropriate attributes for improved classification accuracy. In addition, a focus loss function was used to advance the generalization capability of the suggested framework. The multiclass scheme was the subject of the simulations. This framework achieved 89.32 percent, 73.41 percent, 82.79 percent, and 99.77 percent in the flooding, injection, impersonation, and normal classes, respectively. Overall, 98.54 percent of the predictions were correct. In the study [27], the authors constructed a WIDS employing the AWID to derive classification rules using an ACO-based approach. They used a correlation-based technique to develop a filter-based extracting features methodology in their tests. The whole feature representation of the AWID was condensed to an array of 35 characteristics as a result of this method. The RF algorithm achieved an overall accuracy of 98.87 percent and 99.10 percent for the multiclass, and binary classification systems, respectively, according to the results of their simulations. We summarized the existing studies in Table 1. The classification techniques, as well as the weaknesses, are listed in this table. We also list the data utilized, the attribute selection method employed and the accuracy rate of the test data. We utilize the phrase Null in the "Feature Selection technique" field to indicate that no feature selection technique was employed.

Table 1

Existing methods with the datasets and feature selection strategies
Authors/Year	Algorithms	Feature Selection Technique	Dataset utilized	Accuracy	Weaknesses
[15]	FFDNN	Filter technique	NSL-KDD	81.19%	Deprecated dataset, accuracy remains an issue
[16]	DNN	Null	UNSW-NB15	78.50%	Higher training time
[25]	DEA	Wrapper	UNSW-NB15	92.40%	The risk of over-fitting is high
[25]	DFFN	Wrapper	NSL-KDD	98.60%	Due to the complexity of the data models, training is quite expensive
[18]	DCNN	Null	NSL-KDD	85.00%	The model tends to be bias
[24]	DT, ANN, NB	Filter	UNSW-NB15	81.34%	It is confronted by the 'zero-frequency problem.'
[21]	DT + GA + LR	Wrapper	UNSW-NB15	81.42%	DT require more time for training the model.
[23]	APCA + I + ELM	Wrapper	UNSW-NB15	70.51%	Training is fundamentally an issue of linear learning.
[23]	APCA + I + ELM	Wrapper	NSL-KDD	81.22%	The risk of over-fitting is very high
[20]	LSTM, RNN	Null	CICIDS	84.83%	It requires a lot of memory to train
[26]	SAE	Wrapper	AWID	98.57%	The risk of over-fitting is high
[17]	SVM	Wrapper	KDD Cup 99	93.00%	Extensive training time required for huge datasets
[27]	ACO, RF	Filter	AWID	98.87%	It is afflicted by the problem of premature convergence.

2.1. The motivation for the Present work

WSNs are susceptible to a variety of attacks due to their characteristics, which include an undependable channel, dynamic topology, reliance on node routing mechanisms, and the absence of a monitoring and management center. Intrusion detection is critical in network security because it acts as a "second firewall." Existing intrusion detection models have several significant shortcomings, including a lack of adaptive capability, an incapability to identify new threats, a low rate of detection, and a high degree of false positives. Numerous types of intrusions must be identified using a diverse set of features. To meet this need, methods for improving the classification of anomalies are required. The analysis of feature selection highlights the relationship between the type of attribute and the type of attack detected. As a result, a methodology based on features is crucial. A low rate of detection, primarily due to a high false-positive rate, damages intrusion detection performance. The majority of the work in detecting wireless intrusions is around the use of several ML models, DL models, and comparing the performance of these algorithms, as seen by the related work. It's also worth noting that less attention has been placed on improving the wireless incursion dataset's quality, which might lead to more high accuracy. It's worth noting that the accuracy of results given by ML and DL models is dependent on the dataset's characteristics. The removal of the most important attributes from the data and the use of suitable dimensionality reduction methods help to improve the accuracy of the ML and DL models' prediction outcomes. Another challenge in the existing approaches is that the IDS utilized in WSN produces a high rate of false-positive warnings (i.e. false alarms). The security analyst's ability to detect successful attacks and take corrective action is hampered by a large number of false-positive warnings. Such warnings have not been categorized according to the level of threat they pose. They must also be processed to determine the most serious warnings and the response time. Hence, many of the existing methods in the literature neglect the false alarm (warning) issue. The current study addressed these issues and uses a two-layered dimensionality reduction (PCA + FA) strategy for feature dimensionality reduction. In the classification phase, machine learning (RF, DT, NB) and deep belief network models were proposed and utilized for intrusion detection in WSNs.

Figure 3 depicts the suggested methodology's experimental environment. A Standard Scaler approach is employed in this work as a phase of pre-processing to avoid variability. To remove bias in prediction outcomes, the Standard scaler approach regularizes the data by transforming it to the same range. After that, the PCA technique is used for the normalized data. The primary goal of PCA is to remove irrelevant features from consideration when training the ML (NB, RF, DT) algorithms, as well as the DBN model. The FA optimization algorithm, a popular nature-inspired technique, was employed in this work to enhance the process of feature engineering. The FA algorithm's key strength is that it adjusts the variables in this manner that it finds the best parameters with a fast union rate while preventing local minima. This attribute of the FA algorithm brands it an excellent high-quality idea for feature engineering, as it allows for the selection of optimal variables that have a beneficial impact on classification and hence reduce training time. The NSLKDD, UNSW NB-15, and CICIDS2017 datasets were then classified using the NB, RF, DT, and DBN using the dimensionally reduced datasets. Except for the output layer, the Adam optimizer and the soft signs activation function were employed at each layer. Because it is a binary classifier, the output layer employed the sigmoid activation function to categorize the NSLKDD, UNSW-NB15, and CICIDS datasets. For backpropagation, the root means square (RMS) propagation error was utilized. The data was divided 80:20 for testing and training purposes. Instead of training the model on the entire 80% of data and then testing it on the remaining 20% of data, 64 records were supplied to the model for each epoch, with 80% of the records used to train the classifier and the remaining 20% used to test the model. Table 2 gave the model proposed pseudocode.

Table 2

Pseudocode of the proposed model
	Algorithm 1: Pseudocode of the Model
	Input: NSLKDD, UNSW-NB15, CICIDS2017 datasets
	Output: Class label classification
i.	Transformation of data: Standard Scaler is used to normalize the input dataset.
ii.	Dimension reduction: The data transformed is input to PCA for dimensionality reduction. Use the FA optimization approach to fine-tune feature engineering.
iii.	Classification: To categorize the NSLKDD, UNSW-NB15, and CICIDS2017 datasets, feed the retrieved features to the NB, RF, DT, and DBN.
iv.	Evaluation: Assess the model's performance using a variety of metrics such as DR, accuracy, and FAR.
v.	Comparison without dimensionality reduction: Comparison of experimental findings of the proposed method with individual NB, RF, DT algorithms, and DBN model without dimensionality reduction.

3.1. Principal Component Analysis

PCA is driven by the goal of lowering the dimension of a data set composed of numerous variables that are highly associated with one another while retaining the data set's maximum variability [28]. The algorithm converts the features in the data into a novel set of equilateral principles inherent that are organized in such a way that the original variables' variation reduces as the order is traversed. As a result, the first main component keeps the most variation from the original components. The orthogonal eigenvectors in the covariation matrix are the primary components. PCA requires a scale dataset, and the approach abridges the data, producing conclusions that are subtle to comparative scaling as well. A "linear combination of ideally weighted observed variables" is defined as the major component. PCA generates principal components with numerical values that are either less than or equal to the values of the original features. The first step in doing PCA on two-dimensional data is to normalize the data. This is performed by deleting the means from each column of the data set, resulting in a set of data with a zero mean. In the second phase, the covariation matrix is computed. The covariance matrix's eigenvalues and eigenvectors are then computed. The Eigenvalues are then arranged in descending order to emphasize the significance of the elements, and the dimension is reduced by selecting the very first group of eigenvalues and disregarding the rest. A feature vector is made up of a matrix of vectors. It also has a broad range of applications in pattern recognition of high-dimension data in psychology, data mining, finance, and bioinformatics [29].

3.2. Firefly Algorithm

The Firefly algorithm (FA) is a revolutionary swarm intelligence-driven optimization technique. It imitates the behavior of fireflies in social situations. Table 3 shows the pseudocode of the FA. The attraction between fireflies determines FA's search pattern, with a less brilliant firefly moving toward a brighter firefly. The FA classification is a "nature-inspired" algorithm that is driven by flies' behavior [30]. In numerous stages of the ML process, nature-driven classifiers are widely utilized [31]. The natural lights that fireflies emit from their bodies serve them in attracting or locating mates [32]. Additionally, it helps them catch prey and protect themselves against predators. Three fundamental assumptions support the algorithm [33]:

The synthetic fireflies are gender-neutral, and their appeal transcends gender.

A firefly's attractiveness is related to the illumination of the flashes produced, and it diminishes as they move apart from one another as a result of air's absorption of light. Because all fireflies produce light, the one that produces the brightest attracts the most of its neighbors. In the absence of such a brilliant firefly, however, all of the fireflies wander around haphazardly in any direction.

The algorithm's objective function is to optimize the illumination of the blinking light as a criterion for magnetism.

Table 3

The Pseudocode of the FA [32]
Initialize the population randomly and generate M fireflies X_k, where k = 1, 2, ...., M;
DEs = M;
While DEs ≤ Max_DEs do
For k = 1 to M do
For j = 1 to M do
If F(X_j) < F (X_k) then
DEs++
End
End
End
End

3.3. Random Forest

RFs are ensemble learners (EL) that are utilized to perform regression and classification on intrusion detection data [34]. The random forest design is constructed from a large number of decision trees, each of which serves as a model [35]. Instead, features are selected at random from the remaining features. Formerly, the ideal attributes are determined from the randomly chosen attributes[36]. RF generates a variety of decision trees throughout the training phase and produces the class labels for those with the majority of votes. RF is highly accurate at classifying data and is capable of dealing with outliers and noise in the data [37]. In this investigation, the RF approach is utilized because it is less susceptible to overfitting and has shown high classification results in the past [34].

3.4. Decision Tree

In numerous fields, decision tree induction algorithms have been implemented. ID3, C4.5, and C5.0 are decision tree algorithm examples. C4.5 is a modification of the standard ID3 algorithm [38]. The proposed system employs enhanced C4.5, which is an improvement over C4.5. Using an information-theoretic approach, the ID3 and C4.5 algorithms categorize network traffic patterns [39]. The initial construction of the decision tree begins with the pre-classified dataset. Thus, each case is defined by the values of its attributes. Nodes, edges, and leaves encompass a decision tree. A node in a decision tree represents an attribute used to partition the instance. Each node contains a set of edges that are labeled based on their potential value and the attribute's likely value in the parent node. An edge connects two nodes or a node and a leaf in a tree. Leaves are marked with class labels to aid in the classification of the instance. The information gained from each attribute is calculated. At each stage, the optimal attribute for subset division is selected due to the data gain of each attribute. The examples are classified based on the values of these characteristics. If the attribute is nominal, a branch is created for each value; otherwise, a threshold is decided and two phases are established [40]. This operation is performed recursively on each sectioned subset of the cases. When all occurrences in the current sub-set relate to the same class, the procedure terminates. The role of information gain favors attributes with a large number of possible values.

3.5. Naïve Bayes

NB is a straightforward and extremely expandable [38] classification algorithm based on the Bayes Theorem[39]. NB is utilized to forecast the probability that a class will be classified as normal or attacked. It is simple to use during the training and validation phases. The NB assumption is that all of the vector's attributes are similarly independent and significant [40].

3.6. Deep Belief Network

DL is a step forward in the ML process that is generated and formulated from the ANN. DL classifiers are the century's most important breakthrough, driving many applications of Artificial intelligence (A.I) [41]. These algorithms are proficient in retrieving features in the same way that the brain and human eye do. DL algorithms are created using stage-by-stage ANN and large neural networks that can perform complex representations of data-extracting features. The single neuron's learned feature is examined and decided by its 1000s of sub-neurons, resulting in the complete categorization. The classic ANN is capable of managing nonaligned scenarios, but it lacks the confidence to make decisions as certain as the human brain, whereas deep learning algorithms are designed to grasp the attributes and make decisions as sure as the brain. There are many different types of DL classifiers, however, in this case, the DBN is utilized for the organization because the nature of the organization process is appropriate for the applications. DBN is composed of numerous layers, one of which is an RBM, which is constructed in a multi-stage manner as shown in Fig. 4 [42].

DBN utilizes 130 hidden layers in a single number to expedite the learning process. RBM is based on the MRF, which is also referred to as log-linear. To improve accuracy, the energy role in the RBM has its independent parameters [43]. As a result, one RBM connects with the RBM next to it to exchange learning characteristics [44].

4.1. Dataset Description

The authors [45] argued that the dearth of relevant public data for evaluating the detection of anomaly systems that rely on ML, particularly the widespread usage of replicated data such as the DARPA, and KDD’Cup datasets in the 1990s, makes it difficult to evaluate and compare solutions.

With the exponential growth of digital data over the past few years, the barrier of a dearth of standard sets or publicly accessible data in this field has evaporated. The availability of these data allows us to assess the model from a real-world standpoint. NSL-KDD is a prominent dataset that is still used by many academics, however, in our analysis, it does not replicate the traffic conditions and security problems that occur in today's WSN. As a result, we conducted a second survey and chose the most recent UNSW-NB15 and CICIDS2017 datasets. It is one of the few datasets that include current assaults, and it was chosen as the most complete IDS baseline to verify and test the proposed methods. The NSLKDD dataset, UNSW-NB15 data, and CICIDS2017 data are all utilized to test our suggested technique in this research. The last two datasets are the most recent, and they feature benign and up-to-date common assaults that closely reflect the authentic practical network context.

The UNSW-NB15 data was constructed via the Australian Centre for Cyber Security's (ACCS) Cyber Range Lab, which generated a combination of accurate contemporary normal activities and modern attacks from network circulation [24]. Each record contains 48 characteristics, plus one for classification. The 47 characteristics supply the facts of network traffic packets in the real world, whilst the labeling feature determines if this network access is authorized or unauthorized. These 47 features are further subdivided into five categories: fundamental features, flow features, time features, content features, and other created features. The UNSW-NB15 data has roughly 2,540,044 records. In addition, an appropriate split is proposed for training and testing.

The authors [46] from the Canadian Institute for Cybersecurity (CICIDS) suggested the CICIDS2017 dataset to overcome the limits of existing data and give accurate and credible data for intrusion detection. The CICIDS2017 data includes benign and contemporary common assaults gathered over 5 days from Monday, July 3, 2017, to Friday, July 7, 2017. The CICFlowMeter

utility extracts 80 net flow characteristics from the produced network data to describe each record. The CICIDS2017 dataset consists of 8 files with a total of 2,830,743 entries.

4.2. NSLKDD Dataset

The authors [22] presented significantly more real data, the NSL-KDD set of data, which is a revised form of the KDD'99 set of data that removes all needless data and recreates the format, making it more realistic in terms of both data quantity and formats. The NSL-KDD data includes TCP link records with 41 features and one labeling attribute. The 41 informative attributes are utilized to explain the specifics of each TCP protocol in the data; the labeling attribute aids in categorizing each connection as normal or anomalous.

4.3 UNSW-NB15 Dataset

Table 4 displays the 42 features (inputs) extracted from the UNSW-NB15 dataset [47] for this study. Three of the inputs are notional qualities, while the rest 39 are binary, integer, and float numeric attributes. A training set and a testing set are included in the UNSW-NB15[48]. In this study, we divided the training dataset into two sub-sets: UNSW-NB15-25, which signifies 25% of the total testing set, and UNSW-NB15-75 which represents 75% of the total training part. Following training on the UNSW-NB15-75, the UNSW-NB15-25 will be used as a validation group. This method prevents a model from learning from validation or test sets. During the model's initialization phase, the technique also ensures that the findings acquired on the test data and validation data are free of interference and bias.

Table 4

UNSW-NB15 Attributes list
Number	Name	Category	Number	Name	Category
A1	Dur	Float	A22	Dtcp	Integer
A2	Protoc	Categorical	A23	Dwin	Integer
A3	Servce	Categorical	A24	Tcprt	Float
A4	Stat	Integer	A25	Synac	Float
A5	Spkt	Integer	A26	Ackda	Float
A6	Dpkt	Integer	A27	Smea	Integer
A7	Sbyte	Integer	A28	Dmea	Integer
A8	Dbyte	Integer	A29	Trans_dept	Integer
A9	Rat	Float	A30	Response_body_le	Integer
A10	Sttl	Integer	A31	Ct_srv_sr	Integer
A11	Dttl	Integer	A32	Ct_state_tt	Integer
A12	Sloa	Float	A33	Ct_dst_sport_lt	Integer
A13	Dloa	Integer	A34	Ct_src_dpor_ltm	Integer
A14	Slos	Integer	A35	Ct_dst_spor_ltm	Integer
A15	Dlos	Integer	A36	Ct_dst_src_lt	Integer
A16	Sinpk	Float	A37	Is_ftp_login	Binary
A17	Dinpk	Float	A38	Ft_st_cmd	Integer
A18	Sji	Float	A39	Ct_flw_http_mth	Integer
A19	Dji	Float	A40	Ct_src_lt	Integer
A20	Swin	Integer	A41	Ct_srv_ds	Integer
A21	Stcp	Integer	A42	Is_sm_ip_port	binary

4.4. CICIDS Dataset

The CICIDS 2017 dataset [46] was created in 2017 by the University of New Brunswick's the Faculty of Computer Science. CICIDS 2017 is a streamlining of the ISCX 2012 data [49], founded on previous studies by Shiravi Ali [50]. The 2017 CICIDS set of data is derived from actual traffic generalization. The study [46] refers to the features of the IDS dataset and the methodology utilized to develop it. CICIDS 2017 collected data over five days, utilizing 225,745 packages with more than 80 attributes and capturing above seven (7) days of network connections (that is normal and attack). The attack simulations in the CIC 2017 dataset are classified into seven categories: heart bleed attack, brute force, DDoS attack, botnet attack, DoS, infiltration attack, and web-driven attack.

4.5. Experimental Analysis

We pre-normalized the dataset within a band of [0,1] to remove the unfavorable effect of the unit of features dimension and to avoid the values of features in vast ranges from dominating those in small ranges. Our suggested intrusion detection model was evaluated using the 10-fold cross-validation (CV) approach, which is a typical method for completing training and detection as recommended by the authors [45]. The unique data is then randomly tested into ten equal-sized mutually exclusive subsets. Nine (9) subsets are chosen to train the intrusion detection model in each run of the model, while the outstanding one is utilized to test the model. As a result, each subset has an equal probability of being chosen to train and test the model if the process is repeated 10 times. Finally, the proposed model's performance is calculated by averaging the outcomes of testing subgroups. Both the CICIDS2017 and UNSW-NB15 datasets have a large capacity and a severe imbalance class, ensuing in higher loading and processing overhead and a towards the class majority [51]. To circumvent these limitations, we model a piece of data from the initial two data for the attack classes as Chiba et al. [52] did. Table 5 provides more information.

Table 5

The sample data size of the attack class
Datasets	Original size	Extracted size
CICIDS2017	557,646	133,045
UNSW-NB15	119,341	62,000

To assess the suggested method's performance in terms of intrusion detection. The 10-fold CV has been used to solve the problem. The classifier was repeated ten times, with the final findings being averaged. Because minimizing detection mistakes, particularly false positives, is a high priority, we employ accuracy, DR, and FAR to assess the performances of the proposed model and compare it to other detection approaches for intrusion detection systems. These indicators do not require a sample size, which is extremely useful when evaluating the effectiveness of an intrusion detection system[53].

These indicators can be determined using the Eq. 1 to 3:

4.6. Detection Performance Analysis on the NSLKDD dataset

We began by comparing the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the NSLKDD dataset. The PCA + FA-RF gave a DR of 99.19%, accuracy of 99.23%, and FAR of 1.02. The PCA + FA-DT revealed a DR of 98.63%, an accuracy of 98.62%, and a FAR of 2.41. The PCA + FA-NB showed a DR of 89.95%, an accuracy of 85.85%, and a FAR of 2.95. The PCA + FA-DBN gave a DR of 99.52%, an accuracy of 99.46%, and a FAR of 0.40. The single individual model RF without feature dimensionality reduction (PCA + FA) gave a DR of 98.98%, an accuracy of 99.04%, and a FAR of 1.40. The individual model DT gave a DR of 98.37%, an accuracy of 98.34%, and a FAR of 2.60. The individual model NB revealed a DR of 89.59%, accuracy of 85.29%, and FAR of 3.10. The individual model DBN gave a DR of 99.41%, an accuracy of 99.10%, and a FAR of 0.60. From the performance evaluation of various models in Table 6, it is concluded that the proposed technique yielded the best results for the NSLKDD dataset.

Table 6

Results of hybrid (PCA + FA) and without hybrid feature dimensionality on NSLKDD dataset
Metric	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
DR	99.19	98.63	89.95	99.52	98.98	98.37	89.59	99.41
Accuracy	99.23	98.62	85.85	99.46	99.04	98.34	85.29	99.10
FAR	1.02	2.41	2.95	0.40	1.40	2.60	3.10	0.60

On the NSLKDD dataset, Fig. 5 illustrates the 10-fold cross-validation performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, PCA + FA-DBN models, and individual RF, DT, NB, and DBN (without hybrid features dimensionality PCA + FA) in terms of DR, FAR, and accuracy. As can be seen in Fig. 5, the proposed models outperformed the individual models in DR, accuracy, and FAR.

The comparison results in Fig. 5 demonstrate that our proposed method outperforms individual-RF, individual-DT, individual-NB, and individual-DBN in DR, accuracy, and FAR, suggesting that the presented hybrid PCA + FA feature dimensionality can significantly improve detection ability.

As revealed in Fig. 6, the random forest ROC class of the normal class 0 is 1, the ROC of attack class 1 AUC is 1, ROC of attack class 2 AUC is 1, the ROC of attack class 3 AUC is 0.75, the ROC of attack class 4 AUC is 1. These describe the AUC of some of the attack classes in the NSLKDD dataset.

As revealed in Fig. 7, the NB ROC curve of the normal class 0 is 0.95, the AUC of attack class 1 is 0.96, the attack class 2 is 0.97, and the attack class 3 AUC is 0.79, the attack class 4 AUC is 0.95. These show the NB algorithm AUC of all the attack classes in the NSLKDD data.

As seen in Fig. 8, the DT model ROC class of the normal class 0 is 0.99, the ROC of class 1 attack class AUC is 1, the AUC of attack class 2 is 0.98, and the attack class 3 AUC is

0.50, the attack class 4 AUC is 0.84. These describe the DT model AUC of all the attack classes in the NSLKDD data.

4.7. Detection Performance Analysis on the UNSW-NB15 dataset

We compare the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the UNSW-NB15 dataset in this section. As seen in Table 7, the PCA + FA-DBN gave outstanding performance than all other proposed models in terms of DR, accuracy, and FAR. While, the individual NB gave the least performance with a DR of 70.11%, an accuracy of 70.80%, and a FAR of 4.71.

Table 7

Results of hybrid (PCA + FA) and without hybrid feature dimensionality on UNSW-NB15 dataset
Metric	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
DR	99.98	98.79	71.21	100	98.19	97.80	70.11	98.90
Accuracy	99.99	99.00	71.37	100	98.98	98.00	70.80	99.40
FAR	1.51	2.51	2.64	0.30	3.80	4.71	4.89	4.90

The comparison results in Fig. 9 demonstrate that our proposed method outperforms individual-RF, individual-DT, individual-NB, and individual-DBN in DR, accuracy, and FAR, demonstrating that the proposed hybrid PCA + FA feature dimensionality can significantly improve detection capability on the UNSW-NB15 data. Additionally, the suggested PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN all have a FAR of less than 3%, but the single individual model's RF, DT, NB, and DBN all have a FAR of more than 3%. Interestingly, the individual model DBN achieved a high accuracy as well as a high FAR, showing that it is skewed and unable of detecting intrusions.

According to Fig. 10, the RF ROC curve score generated for class 0 normal class is 1, and attack class 1 AUC is 1 which indicates that there is no overlapping of the distribution.

As seen in Fig. 11, the NB ROC curve score produced for the normal class 0 is 0.82, and attack class 1 is 0.82.

According to Fig. 12, the DT ROC curve score generated for the normal class 0 is 1, and attack class 1 is 1, which indicates there is no overlap in the distribution.

The SHAP value of the DT model is given in Fig. 13, where the class 0 normal class is 0.5, and the attack class 1 is 0.5. Red pixels reflect positive SHAP values that enhance the class's probability, whereas blue pixels indicate negative SHAP values that decrease the class's probability. Each of the attributes as seen in Fig. 13 belonging to the attack class (starting from worms) contributes to the DT model output.

4.8. Detection Performance Analysis on the CICIDS2017 dataset

We compare the detection performance of PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN with that of individual RF, DT, and NB (without hybrid features dimensionality PCA + FA) on the CICIDS data in this section. The PCA + FA-RF gave a DR of 99.58%, accuracy of 98.95%, and FAR of 2.90. The PCA + FA-DT gave a DR of 99.42%, accuracy of 98.89%, and FAR of 3.08. The PCA + FA-NB showed a DR of 99.36%, an accuracy of 98.81%, and a FAR of 3.10. The PCA + FA-DBN gave a DR of 99.99%, accuracy of 99.98%, and FAR of 3.10. The single individual model RF without feature dimensionality reduction (PCA + FA) gave a DR of 98.10%, an accuracy of 97.20%, and a FAR of 2.80. The individual model DT gave a DR of 98.90%, an accuracy of 98%, and a FAR of 2.99. The individual model NB revealed a DR of 98.74%, an accuracy of 97.89%, and a FAR of 2.98. The individual model DBN gave a DR of 99.10%, an accuracy of 99.50%, and a FAR of 1.51. According to the performance evaluations in Table 8, the proposed models (with hybrid feature dimensionality reduction PCA + FA) produced the best results for the CICIDS dataset than the individual models (without hybrid feature dimensionality reduction PCA + FA).

Table 8

Results of hybrid (PCA + FA) and non-hybrid feature dimensionality on the CICIDS dataset
Metric	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
DR	99.58	99.42	99.36	99.99	98.10	98.90	98.74	99.10
Accuracy	98.95	98.89	98.81	99.98	97.20	98.00	97.89	99.50
FAR	2.80	2.99	2.98	1.51	6.90	7.08	5.10	6.98

More specifically as shown in Fig. 14, about DR, accuracy and FAR, our proposed models (with feature dimensionality reduction PCA + FA) gave significantly better performances than the single individual model RF, DT, NB, and DBN model. Besides, in terms of FAR, the proposed PCA + FA-RF, PCA + FA-DT, PCA + FA-NB, and PCA + FA-DBN are all below 3%, while the single individual model RF, DT, NB, and DBN are all over 5%. Most notably, the individual model DBN produced both a high FAR and a high DR, indicating that it is biased and incapable of detecting intrusions.

4.9. The required Training time of the proposed models on NSLKDD data

The training time (TT) needed by our suggested models on the NSLKDD

the dataset is given in Table 9 to further highlight the benefits of our proposed techniques.

Table 9

Required training time to build the model on NSLKDD data
Algorithms	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
Training Time	1.1056	1.2453	1.1190	1.1200	3.2930	2.1520	2.0200	3.1100

Figure 15 shows that the TT of our suggested models is faster than that of single individual models (RF, DT, NB, DBN). On the NSLKDD dataset, single-RF without hybrid feature dimensionality reduction (PCA + FA) requires approximately 2.18 times higher, and single-DBN without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.99 times higher the training time as PCA + FA-RF and PCA + FA-DBN.

4.10. The required Training time for the proposed models on UNSW-NB15 data

To validate the value of our proposed techniques, the TT required by our suggested models on the UNSW-NB15 dataset is presented in Table 10.

Table 10

Training time required to build the model on UNSW-NB15 data
Algorithms	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
Training Time	1.567	1.188	1.427	1.715	2.676	2.297	2.516	2.804

Figure 16 indicates that our suggested models train faster than single individual models (RF, DT, NB, DBN). Individual single-DT without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.10 times higher training time than PCA + FA-DT and PCA + FA-NB on the UNSW-NB15 dataset. Individual single-NB without hybrid feature dimensionality reduction (PCA + FA) requires approximately 1.08 times higher training time than PCA + FA-DT and PCA + FA-NB.

4.11 The required Training time of the proposed models on CICIDS 2017 data

Table 11 shows the training time required by our model predictions on the CICIDS-2017 dataset to highlight the effectiveness of our proposed strategies.

Table 11

Training time required to build the model on CICIDS-2017 data
Algorithms	PCA + FA-RF	PCA + FA-DT	PCA + FA-NB	PCA + FA-DBN	RF	DT	NB	DBN
Training Time	1.4780	1.1761	1.2380	1.2134	3.4560	2.3780	3.5160	3.7045

According to Fig. 17, our proposed models train faster than single individual models (RF, DT, NB, DBN). Individual single-RF without hybrid feature dimensionality reduction (PCA + RF) on the CICIDS dataset requires roughly 1.97 times higher training time than PCA + FA-RF and PCA + FA-DT. Individual single-DT without hybrid feature dimensionality reduction (PCA + FA) takes around 1.20 times higher as long to train as PCA + FA-RF and PCA + FA-DT.

4.12 Comparison with the Previous studies

We compare the proposed hybrid models with other current models utilizing the NSLKDD, UNSW-NB15, and CICIDS datasets to conduct an additional evaluation of our suggested intrusion detection architecture. Tables 12, 13, and 14 summarize the comparison results, respectively. As seen in these tables, the Null represents no feature dimensionality reduction techniques that were considered.

Table 12

On the NSLKDD dataset, the performance of various intrusion detection algorithms is compared.
Authors/Year	Algorithms	Feature Dimensionality Reduction	Dataset utilized	Accuracy	DR	FAR
[15]	FFDNN	Filter technique	NSL-KDD	81.19%	x	x
[25]	DFFN	Wrapper	NSL-KDD	98.60%	x	x
[18]	DCNN	Null	NSL-KDD	85.00%	x	x
[23]	APCA + I + ELM	Wrapper	NSL-KDD	81.22%	x	x
Proposed hybrid RF method	PCA + FA-RF	Filter + Wrapper	NSLKDD	99.23	99.19	1.02
(Null: No feature dimensionality reduction strategy considered)

Table 13

On the UNSW-NB15 dataset, the performance of various intrusion detection algorithms is compared
Authors/Year	Algorithms	Feature Selection Technique	Dataset utilized	Accuracy	DR	FAR
[16]	DNN	Null	UNSW-NB15	78.50%	x	x
[25]	DEA	Wrapper	UNSW-NB15	92.40%	x	x
[24]	DT, ANN, NB	Filter	UNSW-NB15	81.34%	x	x
[21]	DT + GA + LR	Wrapper	UNSW-NB15	81.42%	x	x
[23]	APCA + I + ELM	Wrapper	UNSW-NB15	70.51%	x	x
Proposed hybrid DT	PCA + FA-DT	Filter + Wrapper	UNSW-NB15	99.99	98.79	2.51
(Null: No feature dimensionality reduction method considered)

Table 14

The performance of various intrusion detection techniques is compared using the CICIDS 2017 dataset
Authors/Year	Algorithms	Feature Selection Technique	Dataset utilized	Accuracy	DR	FAR
[20]	LSTM, RNN	Null	CICIDS2017	84.83%	x	x
[54]	DBN	Wrapper	NSLKDD, CICIDS2017	98.24	99.0	2.10
[55]	AE-QDA	Filter	CICIDS2017	94.20	96.40	6.30
[56]	DNN	wrapper	CICIDS2017	92.92	92.38	3.24
[57]	DT-EnSVM	filter	CICIDS2017	98.46	99.15	4.00
[58]	BRS	Filter	CICIDS2017	97.96	96.38	1.42
Proposed Hybrid DBN	PCA + FA-DBN	Filter + Wrapper	CICIDS2017	99.98	99.99	1.51
(Null: No feature dimensionality reduction method considered)

Some recent studies have been chosen to help understand the benefits of our proposed intrusion detection methodology. The results in Tables 12–14 revealed that, as compared to FFDNN in [15], DT + GA + LR in [21], LSTM, RNN in [20], our suggested technique obtains improved overall performance, particularly in DR, accuracy, and FAR. Also, it is demonstrated that our suggested models outperform other detection systems, including DL detection methods, in terms of the three-measuring performance. Intrusion detection is frequently confronted with massive amounts of data. Therefore, feature dimensionality reduction is of high significance as shown in our proposed models. Consequently, small changes in evaluation criteria might have a substantial impact on practice if a significant number of attacks/intrusions are recognized and allowed. However, it should be noted that Tables 12–14 provide a comparison between our proposed intrusion models and other current IDS methods. Nonetheless, based on the comparison results provided above, our suggested solution remains competitive and may inspire future studies in intrusion detection in the WSNs arena.

4.13. Threats to validity

This section discusses possible issues with the validity of the verification results obtained during this investigation.

4.14. Internal Validity

Internal validity is the extent to which published findings reflect actual reality in the population under study and are not due to methodological flaws. There are two crucial factors to consider in this case.

4.14.1 Instrumentation: This term refers to inconsistencies resulting from changes in the instrument's calibration, as well as variances in the scorers, observers, or most likely the device itself. Accuracy, detection rate, and false alarm rate are all well-known validation metrics. There have been no changes that could have influenced the outcome of the evaluation.

4.14.2 Selection: A selection threat is any element other than the system that contributes to posttest discrepancies. As a result, the absence of feature scaling and data that are not on the same scale could be a role in this work.

4.15. Construct Validity

The amount to which the measuring instrument 'interacts' with conceptual assumptions and the scores appropriately represent the framework's complexity. This risk arises from the question of whether the experiment accurately replicates the investigated real-world occurrences. The proposed model is consistently based on the high accuracy and DR of the evaluation criteria.

4.16. External Validity

This pertains to our ability to apply study findings to practical issues. This risk raises the question, "Can this effect be extended across a range of contexts, populations, treatments, and measurement characteristics?"

The suggested hybrid feature dimensionality models for WSN threat identification were implemented and validated on the NSLKDD, UNSW-NB15, and CICIDS datasets. The findings corroborate what has been discovered in the literature. Validation will be conducted in the future in an industry context or on a recent WSN dataset.

In this paper, an enhance hybrid feature dimensionality reduction-IDS is presented to detect anomalous in WSN. The empirical evidence on the NSLKDD, UNSW-NB15 and CICIDS datasets demonstrate that the suggested intrusion detection approach can obtain excellent and more robust results with high DR, high accuracy, low-slung false alarm (warning) rate, and fast training speed, demonstrating the efficiency of hybrid PCA + FA feature dimensionality reduction in enabling intrusion detection. The results demonstrate that the proposed hybrid feature dimensionality reduction techniques PCA + FA-RF achieved an accuracy of 99.23%, PCA + FA-DT revealed an accuracy of 98.62%, PCA + FA-NB gave an accuracy of 85.85%, PCA + FA-DBN achieved an accuracy of 99.46%. These results outperform the individual base models RF, DT, NB, and DBN (without feature dimensionality reduction) on the NSLKDD dataset. In terms of the DR, the proposed hybrid feature dimensionality reduction techniques PCA + FA-RF achieved a DR of 99.98%, PCA + FA-DT revealed an accuracy of 98.79%, PCA + FA-NB gave an accuracy of 71.21%, PCA + FA-DBN achieved an accuracy of 100%. These findings gave a superior outcome to the individual base models RF, DT, NB, and DBN (without feature dimensionality reduction) on the UNSW-NB15 dataset. However, in terms of false warning which is the false alarm rate, the proposed hybrid feature dimensionality reduction models PCA + FA-RF gave a FAR of 2.80, PCA + FA-DT achieved a DR of 2.99, PCA + FA-NB gave a DR of 2.98, PCA + FA-DBN attained a DR of 1.51. These results outpace the individual base models RF, DT, NB, and DBN (without feature dimensionality reduction) on the CICIDS dataset in terms of the false warning (false alarm). Furthermore, when compared to other current techniques in intrusion detection situations, our proposed methods have a significant advantage in terms of DR, training time, accuracy, and false warning which makes our proposed models very competitive. Whereas this paper focuses exclusively on the binary condition in terms of intrusion detection, future research will focus on how to adapt our findings to scenarios involving a variety of attack kinds. In addition, we will focus more on the generalization ability of the IDS and the methods for dealing with extreme minority classes.

Compliance with Ethical Standards

Not Applicable

Competing Interests

Not Applicable

Research Data Policy and Data Availability Statements

Not Applicable

N. Mohd, A. Singh, and H. S. Bhadauria, “A Novel SVM Based IDS for Distributed Denial of Sleep Strike in Wireless Sensor Networks,” Wirel. Pers. Commun., vol. 111, no. 3, pp. 1999–2022, 2020, doi: 10.1007/s11277-019-06969-9.
K. R. C. Boni, L. Xu, Z. Chen, and T. D. Baddoo, “A security concept based on scaler distribution of a novel intrusion detection device for wireless sensor networks in a smart environment,” Sensors (Switzerland), vol. 20, no. 17, pp. 1–20, 2020, doi: 10.3390/s20174717.
S. Ramesh, C. Yaashuwanth, K. Prathibanandhi, A. R. Basha, and T. Jayasankar, “An optimized deep neural network based DoS attack detection in wireless video sensor network,” J. Ambient Intell. Humaniz. Comput., no. 0123456789, 2021, doi: 10.1007/s12652-020-02763-9.
M. Sadeghizadeh and O. R. Marouzi, “A Lightweight Intrusion Detection System Based on Specifications to Improve Security in Wireless Sensor Networks,” J. Commun. Eng., vol. 7, no. 2, pp. 29–60, 2018.
S. M. Kasongo and Y. Sun, “A deep learning method with wrapper based feature extraction for wireless intrusion detection system,” Comput. Secur., vol. 92, 2020, doi: 10.1016/j.cose.2020.101752.
M. Alqahtani, A. Gumaei, H. Mathkour, and M. M. Ben Ismail, “A genetic-based extreme gradient boosting model for detecting intrusions in wireless sensor networks,” Sensors (Switzerland), vol. 19, no. 20, 2019, doi: 10.3390/s19204383.
R. Verma and S. Bharti, “A Survey of Network Attacks in Wireless Sensor Networks,” Commun. Comput. Inf. Sci., vol. 1170, no. 5, pp. 50–63, 2020, doi: 10.1007/978-981-15-9671-1_4.
A. K. Das, P. Sharma, S. Chatterjee, and J. K. Sing, “A dynamic password-based user authentication scheme for hierarchical wireless sensor networks,” J. Netw. Comput. Appl., vol. 35, no. 5, pp. 1646–1656, 2012, doi: 10.1016/j.jnca.2012.03.011.
S. Pundir, M. Wazid, D. P. Singh, A. K. Das, J. J. P. C. Rodrigues, and Y. Park, “Intrusion Detection Protocols in Wireless Sensor Networks Integrated to Internet of Things Deployment: Survey and Future Challenges,” IEEE Access, vol. 8, pp. 3343–3363, 2020, doi: 10.1109/ACCESS.2019.2962829.
F. Zhang, H. A. D. E. Kodituwakku, J. W. Hines, and J. Coble, “Multilayer Data-Driven Cyber-Attack Detection System for Industrial Control Systems Based on Network, System, and Process Data,” IEEE Trans. Ind. Informatics, vol. 15, no. 7, pp. 4362–4369, 2019, doi: 10.1109/TII.2019.2891261.
S. Messaoud, A. Bradai, S. H. R. Bukhari, P. T. A. Quang, O. Ben Ahmed, and M. Atri, “A survey on machine learning in Internet of Things: Algorithms, strategies, and applications,” Internet of Things (Netherlands), vol. 12, p. 100314, 2020, doi: 10.1016/j.iot.2020.100314.
Y. K. Saheed and F. E. Hamza-Usman, “Feature Selection with IG-R for Improving Performance of Intrusion Detection System,” Int. J. Commun. Networks Inf. Secur, vol. 12, no. 3, pp. 338–344, 2020.
Y. Kayode Saheed, A. Idris Abiodun, S. Misra, M. Kristiansen Holone, and R. Colomo-Palacios, “A machine learning-based intrusion detection for detecting internet of things network attacks,” Alexandria Eng. J., vol. 61, no. 12, pp. 9395–9409, 2022, doi: 10.1016/j.aej.2022.02.063.
R. Zhang and X. Xiao, “Intrusion detection in wireless sensor networks with an improved NSA based on space division,” J. Sensors, vol. 2019, no. 1, 2019, doi: 10.1155/2019/5451263.
S. M. Kasongo and Y. Sun, “A deep learning method with filter based feature engineering for wireless intrusion detection system,” IEEE Access, vol. 7, no. DL, pp. 38597–38607, 2019, doi: 10.1109/ACCESS.2019.2905633.
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, no. c, pp. 41525–41550, 2019, doi: 10.1109/ACCESS.2019.2895334.
Y. Chang, W. Li, and Z. Yang, “Network intrusion detection based on random forest and support vector machine,” Proc. – 2017 IEEE Int. Conf. Comput. Sci. Eng. IEEE/IFIP Int. Conf. Embed. Ubiquitous Comput. CSE EUC 2017, vol. 1, pp. 635–638, 2017, doi: 10.1109/CSE-EUC.2017.118.
S. Naseer et al., “Enhanced network anomaly detection based on deep neural networks,” IEEE Access, vol. 6, no. 8, pp. 48231–48246, 2018, doi: 10.1109/ACCESS.2018.2863036.
S. A. Althubiti, E. M. Jones, and K. Roy, “LSTM for Anomaly-Based Network Intrusion Detection,” 2018 28th Int. Telecommun. Networks Appl. Conf. ITNAC 2018, pp. 1–3, 2019, doi: 10.1109/ATNAC.2018.8615300.
Y. Su, “Research on network behavior anomaly analysis based on bidirectional LSTM,” Proc. 2019 IEEE 3rd Inf. Technol. Networking, Electron. Autom. Control Conf. ITNEC 2019, no. Itnec, pp. 798–802, 2019, doi: 10.1109/ITNEC.2019.8729475.
C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,” Comput. Secur., vol. 70, pp. 255–277, 2017, doi: 10.1016/j.cose.2017.06.005.
M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” IEEE Symp. Comput. Intell. Secur. Def. Appl. CISDA 2009, no. June 2014, 2009, doi: 10.1109/CISDA.2009.5356528.
J. Gao, S. Chai, B. Zhang, and Y. Xia, “Research on network intrusion detection based on incremental extreme learning machine and adaptive principal component analysis,” Energies, vol. 12, no. 7, 2019, doi: 10.3390/en12071223.
N. Moustafa and J. Slay, “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,” Inf. Secur. J., vol. 25, no. 1–3, pp. 18–31, 2016, doi: 10.1080/19393555.2015.1125974.
M. AL-Hawawreh, N. Moustafa, and E. Sitnikova, “Identification of malicious activities in industrial internet of things based on deep learning models,” J. Inf. Secur. Appl., vol. 41, pp. 1–11, 2018, doi: 10.1016/j.jisa.2018.05.002.
J. Ran, Y. Ji, and B. Tang, “A semi-supervised learning approach to IEEE 802.11 network anomaly detection,” IEEE Veh. Technol. Conf., vol. 2019-April, pp. 1–5, 2019, doi: 10.1109/VTCSpring.2019.8746576.
F. D. Vaca and Q. Niyaz, “An ensemble learning based Wi-Fi network intrusion detection system (WNIDS),” NCA 2018–2018 IEEE 17th Int. Symp. Netw. Comput. Appl., pp. 1–5, 2018, doi: 10.1109/NCA.2018.8548315.
H. Mohsen, E.-S. A. El-Dahshan, E.-S. M. El-Horbaty, and A.-B. M. Salem, “Classification using deep learning neural networks for brain tumors,” Futur. Comput. Informatics J., vol. 3, no. 1, pp. 68–71, 2018, doi: 10.1016/j.fcij.2017.12.001.
E. Zisselman, A. Adler, and M. Elad, Compressed Learning for Image Classification: A Deep Neural Network Approach, 1st ed., vol. 19. Elsevier B.V., 2018.
Y. K. Saheed, “A Binary Firefly Algorithm Based Feature Selection Method on High Dimensional Intrusion Detection Data.,” in Illumination of Artificial Intelligence in Cybersecurity and Forensics. Lecture Notes on Data Engineering and Communications Technologies, S. Misra and C. Arumugam, Eds. Springer Cham, 2022.
G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, D. S. Rajput, R. Kaluri, and G. Srivastava, “Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis,” Evol. Intell., vol. 13, no. 2, pp. 185–196, 2020, doi: 10.1007/s12065-019-00327-1.
H. Wang et al., “Firefly algorithm with neighborhood attraction,” Inf. Sci. (Ny)., vol. 382–383, pp. 374–387, 2017, doi: 10.1016/j.ins.2016.12.024.
D. Sánchez, P. Melin, and O. Castillo, “Optimization of modular granular neural networks using a firefly algorithm for human recognition,” Eng. Appl. Artif. Intell., vol. 64, no. June, pp. 172–186, 2017, doi: 10.1016/j.engappai.2017.06.007.
I. Ahmad, M. Basheri, M. J. Iqbal, and A. Rahim, “Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection,” IEEE Access, vol. 6, no. c, pp. 33789–33795, 2018, doi: 10.1109/ACCESS.2018.2841987.
X. K. Li, W. Chen, Q. Zhang, and L. Wu, “Building Auto-Encoder Intrusion Detection System based on random forest feature selection,” Comput. Secur., vol. 95, p. 101851, 2020, doi: 10.1016/j.cose.2020.101851.
A. Verma and V. Ranga, “Machine Learning Based Intrusion Detection Systems for IoT Applications,” Wirel. Pers. Commun., vol. 111, no. 4, pp. 2287–2310, 2020, doi: 10.1007/s11277-019-06986-8.
Y. K. Saheed and M. A. Hambali, “Customer Churn Prediction in Telecom Sector with Machine Learning and Information Gain Filter Feature Selection Algorithms,” in 2021 International Conference on Data Analytics for Business and Industry (ICDABI), 2021, pp. 208–213, doi: 10.1109/ICDABI53623.2021.9655792.
M. O. Mughal, S. Kim, and S. Member, “Signal Classification and Jamming Detection in Wide-band Radios Using Na ¨ ıve Bayes Classifier,” vol. 14, no. 8, pp. 8–11, 2018, doi: 10.1109/LCOMM.2018.2830769.
S. M. Kasongo and Y. Sun, “A Deep Learning Method with Filter Based Feature Engineering for Wireless Intrusion Detection system,” IEEE Access, vol. PP, no. DL, p. 1, 2019, doi: 10.1109/ACCESS.2019.2905633.
L. Li et al., “A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset,” Genomics, vol. 85, no. 1, pp. 16–23, 2005, doi: 10.1016/j.ygeno.2004.09.007.
N. Balakrishnan, A. Rajendran, D. Pelusi, and V. Ponnusamy, “Deep Belief Network enhanced intrusion detection system to prevent security breach in the Internet of Things,” Internet of Things (Netherlands), vol. 14, p. 100112, 2021, doi: 10.1016/j.iot.2019.100112.
R. Arunkumar and P. Karthigaikumar, “Multi-retinal disease classification by reduced deep learning features,” Neural Comput. Appl., vol. 28, no. 2, pp. 329–334, 2017, doi: 10.1007/s00521-015-2059-9.
S. Otoum, B. Kantarci, and H. T. Mouftah, “On the Feasibility of Deep Learning in Sensor Network Intrusion Detection,” IEEE Netw. Lett., vol. 1, no. 2, pp. 68–71, 2019, doi: 10.1109/lnet.2019.2901792.
H.-J. Nam et al., “Security and Privacy Issues of Fog Computing,” J. Korean Inst. Commun. Inf. Sci., vol. 42, no. 1, pp. 257–267, 2017, doi: 10.7840/kics.2017.42.1.257.
R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” Proc. - IEEE Symp. Secur. Priv., pp. 305–316, 2010, doi: 10.1109/SP.2010.25.
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” ICISSP 2018 - Proc. 4th Int. Conf. Inf. Syst. Secur. Priv., vol. 2018-Janua, no. Cic, pp. 108–116, 2018, doi: 10.5220/0006639801080116.
N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” 2015 Mil. Commun. Inf. Syst. Conf. MilCIS 2015 - Proc., no. November, 2015, doi: 10.1109/MilCIS.2015.7348942.
Y. K. Saheed, “Performance Improvement of Intrusion Detection System for Detecting Attacks on Internet of Things and Edge of Things,” in Artificial Intelligence for Cloud and Edge Computing. Internet of Things (Technology, Communications and Computing), S. Misra, K. Tyagi, A, V. Piuri, and L. Garg, Eds. Springer, Cham, 2022.
A. Yulianto, P. Sukarno, and N. A. Suwastika, “Improving AdaBoost-based Intrusion Detection System (IDS) Performance on CIC IDS 2017 Dataset,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019, doi: 10.1088/1742-6596/1192/1/012018.
A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward developing a systematic approach to generate benchmark datasets for intrusion detection,” Comput. Secur., vol. 31, no. 3, pp. 357–374, 2012, doi: 10.1016/j.cose.2011.12.012.
S. Wang and Y. Yue, “Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm,” PLoS One, vol. 13, no. 4, pp. 1–20, 2018, doi: 10.1371/journal.pone.0195636.
Z. Chiba, N. Abghour, K. Moussaid, A. El omri, and M. Rida, “Intelligent approach to build a Deep Neural Network based IDS for cloud environment using combination of machine learning algorithms,” Comput. Secur., vol. 86, pp. 291–317, 2019, doi: 10.1016/j.cose.2019.06.013.
J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Comput. Secur., vol. 103, p. 102158, 2021, doi: 10.1016/j.cose.2020.102158.
P. Krishnan, S. Duttagupta, and K. Achuthan, “VARMAN: Multi-plane security framework for software defined networks,” Comput. Commun., vol. 148, no. July, pp. 215–239, 2019, doi: 10.1016/j.comcom.2019.09.014.
R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, and A. Abuzneid, “Features dimensionality reduction approaches for machine learning based network intrusion detection,” Electron., vol. 8, no. 3, 2019, doi: 10.3390/electronics8030322.
W. Elmasry, A. Akbulut, and A. H. Zaim, “Evolving deep learning architectures for network intrusion detection using a double PSO metaheuristic,” Comput. Networks, vol. 168, p. 107042, 2020, doi: 10.1016/j.comnet.2019.107042.
J. Gu, L. Wang, H. Wang, and S. Wang, “A novel approach to intrusion detection using SVM ensemble with feature augmentation,” Comput. Secur., vol. 86, pp. 53–62, 2019, doi: 10.1016/j.cose.2019.05.022.
M. Prasad, S. Tripathi, and K. Dahal, “An efficient feature selection based Bayesian and Rough set approach for intrusion detection,” Appl. Soft Comput. J., vol. 87, p. 105980, 2020, doi: 10.1016/j.asoc.2019.105980.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

An Efficient Machine Learning and Deep Belief Network Models for Wireless Intrusion Detection System

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

2.1. The motivation for the Present work

3. Proposed Methodology

3.1. Principal Component Analysis

3.2. Firefly Algorithm

3.3. Random Forest

3.4. Decision Tree

3.5. Naïve Bayes

3.6. Deep Belief Network

4. Results And Discussion

4.1. Dataset Description

4.2. NSLKDD Dataset

4.3 UNSW-NB15 Dataset

4.4. CICIDS Dataset

4.5. Experimental Analysis

4.6. Detection Performance Analysis on the NSLKDD dataset

4.7. Detection Performance Analysis on the UNSW-NB15 dataset

4.8. Detection Performance Analysis on the CICIDS2017 dataset

4.9. The required Training time of the proposed models on NSLKDD data

4.10. The required Training time for the proposed models on UNSW-NB15 data

5. Conclusion And Future Work

Declarations

References

Additional Declarations

Status:

Version 1