IoTID20 Dataset
The testbed used for the IoTID-20 dataset (I. Ullah and Mahmoud 2020) comprises a confluence of networked infrastructures and IoT devices. A conventional smart home setting has been established, using the EZVIZ Wi-Fi camera and SKT NGU for the purpose of creating the IoTID-20 dataset. Figure 4.1 presents a comprehensive depiction of the testbed setup for the IoTID-20 dataset. The two IoT devices are linked to the WiFi routers of the smart home. Furthermore, the smart home router is also responsible for connecting many other devices, such as tablets, laptops, and smartphones. The EZVIZ Wi-Fi camera and SKT NGU are used as victim devices inside the IoTID-20 dataset testbed, whilst the other devices are specifically identified as attacker devices. The IoTID-20 dataset was recently produced and obtained from Pcap files accessible on the Internet (I. Ullah and Mahmoud 2020).
The researchers used the CIC-flowmeter program (Lashkari et al. 2017) to extract relevant characteristics from the Pcap files and produce the IoTID-20 dataset in CSV format. The subsequent step involved labeling each instance within the IoTID-20 dataset. The dataset comprises 83 attributes and 625,783 instances. It includes labels for intrusion detection, category, and subcategory. Table 2 presents a comprehensive summary of the binary, class, and sub-class labels included inside the IoTID20 dataset.
Table 2
Binary, category, and sub-category of IoTID20 Dataset
Binary | Category | Subcategory |
Normal | Normal | Normal |
Anomaly | DoS | DoS-Synflooding |
Mirai | Mirai-UDP Flooding Mirai-Hostbruteforceg Mirai-HTTP Flooding Mirai-Ackflooding |
MITM | MITM ARP Spoofing |
Scan | Scan Host Port Scan OS Port |
The distribution of dataset records between normal and invasive procedures is shown in Table 3. The proliferation of more formidable threats has resulted in an increased susceptibility of IoT devices to security breaches. The dataset consisted of many harmful actions, including DoS, MITM, Distributed Denial of Service (DDoS), and active scanning, which were identified and monitored as the most often seen.
Table 3
Normal and attacked instances in IoTID20 Dataset
Subcategory | Count |
Normal | 40073 |
DoS-Synflooding | 59391 |
Mirai-UDP Flooding Mirai-Hostbruteforceg Mirai-HTTP Flooding Mirai-Ackflooding | 183554 121181 55818 55124 |
MITM ARP Spoofing | 35377 |
Scan Host Port Scan OS Port | 22192 53073 |
The IoTID-20 dataset offers a significant advantage by mirroring a contemporary trend in IoT networks. This dataset is notable for being one of the limited numbers of publicly accessible datasets specifically designed for IoT intrusion detection. The present study focuses on a specific kind of DoS attack (Cloudflare 2022), as described in reference [22], which entails the deliberate inundation of synchronized (SYN) packets into TCP-based connections. SYN packets are often used for the purpose of initiating TCP connections between communicating entities, hence allocating resources on both ends, including ports and buffers. This kind of attack has the potential to target the accessibility of servers and/or computers belonging to the victim. In addition, the use of Hyper-Text Transfer Protocol (HTTP) flooding, acknowledgment flooding, and User Datagram Protocol (UDP) packets has been seen as means to replicate DDoS attacks in the context of IoT Mirai.
Additionally, a brute force attack (Kara 2019) was employed to decrypt the data, compromising its confidentiality. A MITM attack was performed in order to change the Address Resolution Protocol (ARP) table, therefore linking the attacker's Media Access Control (MAC) address with the Internet Protocol (IP) address of the router. Consequently, the unauthorized individual will assume the identity of the network router and interfere with the communication between various network entities. The primary objective of this assault is to intercept or alter data during its transmission. The features of the dataset are listed and described in Table 4.
Table 4
Dataset Features Description
# | Feature Name | Feature Description |
1 | Flow ID | Flow Identifier |
2 | Src IP | Source IP Address |
3 | Src Port | Source Port Number |
4 | Dst IP | Destination IP Address |
5 | Dst Port | Destination Port Number |
6 | Protocol | Utilized IP |
7 | Timestamp | Packet Time-stamp |
8 | Flow duration | Flow Duration in Microseconds |
9 | total Length of Fwd Packet | Total Packet Size Forward |
10 | total Fwd Packet | Total Packets Forward |
11 | total Bwd packets | Total Packets Backward |
12 | Fwd Packet Length Min | Minimal Packet Size Forward |
13 | total Length of Bwd Packet | Total Packet Size Backward |
14 | Fwd Packet Length Max | Maximal Packet Size Forward |
15 | Bwd Packet Length Min | Minimal Packet Size Backward |
16 | Fwd Packet Length Mean | Average Packet Size Forward |
17 | Fwd Packet Length Std | Standard Deviation Packet Forward |
18 | Bwd Packet Length Mean | Average Packet Size Backward |
19 | Bwd Packet Length Max | Maximal Packet Size Backward |
20 | Flow Packets/s | Flow Packets per Second |
21 | Bwd Packet Length Std | Packet Standard Deviation Backward |
22 | Flow Bytes/s | Flow Bytes per Second |
23 | Flow IAT Mean | Average Time Between Flow Packets |
24 | Flow IAT Max | Maximal Time Between Flow Packets |
25 | Flow IAT Std | Time Standard Deviation Between Flow Packets |
26 | Flow IAT Min | Minimal Time Between Flow Packets |
27 | Fwd IAT Min | Minimal Time Between Forward Packets |
28 | Fwd IAT | Standard Deviation Time Between Forward Packets |
29 | Fwd IAT Total | Total Time Between Forward Packets |
30 | Fwd IAT Max | Maximal Time Between Forward Packets |
31 | Fwd IAT Mean | Average Time Between Forward Packets |
32 | Bwd IAT Min | Minimal Time Between Backward Packets |
33 | Bwd IAT Std | Standard Deviation Time Between Backward Packets |
34 | Bwd IAT Max | Maximal Time Between Backward Packets |
35 | Bwd IAT Mean | Average Time Between Backward Packets |
36 | Bwd IAT Total | Total Time Between Backward Packets |
37 | Fwd URG Flags | URG Flag Set in Forward Packets |
38 | Fwd PSH flags | PSH Flag Set in Forward Packets |
39 | Bwd PSH Flags | PSH Flag Set in Backward Packets |
40 | Bwd URG Flags | URG Flag Fixed in Backward Packets |
41 | FWD Packets/s | Forward Packets per Second |
42 | Fwd Header Length | Total Forward Header Bytes |
43 | Bwd Header Length | Total Backward Header Bytes |
44 | Bwd Packets/s | Backward Packets per Second |
45 | Packet Length Min | Minimal Packet Length |
46 | Packet Length Std | Packet Standard Deviation Length |
47 | Packet Length Max | Maximal Packet Length |
48 | Packet Length Mean | Average Packet Length |
49 | Packet Length Variance | Packet Variance Length |
50 | SYN Flag Count | Packets with SYN Flag |
51 | FIN Flag Count | Packets with FIN Flag |
52 | ACK Flag Count | Packets with ACK Flag |
53 | RST Flag Count | Packets with RST Flag |
54 | PSH Flag Count | Packets with PUSH Flag |
55 | CWR Flag Count | Packets with CWR Flag |
56 | URG Flag Count | Packets with URG Flag |
57 | ECE Flag Count | Packets with ECE Flag |
58 | down/Up Ratio | Upload/Download Ratio |
59 | Average Packet Size | Mean Packet Size |
60 | Fwd Bytes/Bulk Avg | Mean Forward Bulk Rate (Bytes) |
61 | Fwd Packet/Bulk Avg | Average Forward Bulk Rate (Packets) |
62 | Fwd Segment Size Avg | Mean Observed Size Forward |
63 | Bwd Segment Size Avg | Average Backward Bulk Rate (Bytes) |
64 | Bwd Bytes/Bulk Avg | Mean Backward Bulk Rate (Bytes) |
65 | Fwd Bulk Rate Avg | Average Forward Bulk Rate (Packets) |
66 | Bwd Packet/Bulk Avg | Mean Backward Bulk Rate (Packets) |
67 | Subflow Fwd Bytes | Mean Bytes in Forward Sub Flow |
68 | Subflow Bwd Packets | Mean Packets in Backward Sub Flow |
69 | Bwd Bulk Rate Avg | Mean Backward Bulk Rate (Packets) |
70 | Subflow Fwd Packets | Mean Packets in Forward Sub Flow |
71 | Fwd Init Win bytes | Total Bytes Sent in Initial Window Forward |
72 | Subflow Bwd Bytes | Mean Bytes in Backward Sub Flow |
73 | Bwd Init Win bytes | Total Bytes Sent in Initial Window Backward |
74 | Active Min | Minimal Active Flow Duration Before Idle |
75 | Active Mean | Average Active Flow Duration Before Idle |
76 | Fwd Act Data Pkts | Packets with ≥ 1 Byte Payload (Forward) |
77 | Fwd Seg Size Min | Minimal Observed Segment Size (Forward) |
78 | Active Max | Maximal Active Flow Duration Before Idle |
79 | Active Std | Standard Deviation Active Flow Duration Before Idle |
80 | Idle Mean | Average Idle Flow Duration Before Active |
81 | Idle Max | Maximal Idle Flow Duration Before Active |
82 | Idle Min | Minimal Idle Flow Duration Before Active |
83 | Idle Std | Standard Deviation Idle Flow Duration Before Active |
84 | Label | Anomaly or Normal |
85 | Cat | Attack Category or Normal |
86 | Sub-Cat | Attack Sub-category or Normal |
Methodology
In this section, we are describing the methodology used in this study. The methodology framework is shown in Fig. 2. The methodology unfolds with the initial step of acquiring the IoTID20 dataset, laying the foundation for subsequent analyses. A crucial stride follows, encompassing meticulous data preprocessing aimed at purging noise and enhancing the dataset's integrity. The process advances organically to feature selection, a pivotal phase orchestrated through a judicious fusion of correlation coefficient assessment, PSO, and GWO techniques. This synergistic approach culminates in the identification of a distilled set of features primed for optimal information encapsulation. Harnessing these curated features, the classification stage takes center stage, leveraging the prowess of a decision tree algorithm to decipher underlying patterns. Furthermore, the methodology demonstrates a commitment to refinement through hyperparameter optimization, where the innovative Coronavirus herd immunity optimizer (CHIO) is harnessed. This final touch amplifies system accuracy, ultimately yielding a robust framework poised to navigate the intricacies of the IoTID20 dataset with precision and efficacy.
Data Preprocessing
In the data preprocessing phase, a systematic approach was adopted to enhance the dataset's quality. Initially, features tied to network packet identification, namely 'Flow_ID', 'Src_IP', 'Dst_IP', and 'Timestamp', were discerned as extraneous and consequently excluded. A subsequent examination of feature distributions uncovered the presence of noisy distributions resembling null values within certain attributes. Notably, 'Fwd_Byts/b_Avg', 'Fwd_Pkts/b_Avg', 'Fwd_Blk_Rate_Avg', 'Bwd_Byts/b_Avg', 'Bwd_Pkts/b_Avg', 'Bwd_Blk_Rate_Avg', 'Fwd_PSH_Flags', 'Fwd_URG_Flags', 'Init_Fwd_Win_Byts', 'Fwd_Seg_Size_Min', 'Flow_Byts/s' and 'Flow_Pkts/s' exhibited such patterns.
The initial dataset, comprising 86 features and 3 target attributes, underwent a comprehensive refinement during this preprocessing stage, resulting in a streamlined feature set of 67 attributes. This meticulous curation contributes to a more focused and robust foundation for subsequent analytical endeavors. Illustrated in Fig. 3 is an exemplary representation of feature distribution subsequent to the preprocessing phase. The graph demonstrates the non-noisy distributions, accentuating the pivotal role played by this preprocessing stage.
Feature Selection
Feature selection is a critical facet of data analysis and machine learning, as it empowers the identification and extraction of the most relevant attributes from a dataset. This process involves discerning the subset of features that hold the greatest discriminatory power or predictive capability, while discarding or excluding those that might introduce noise, redundancy, or insignificance to the model. The importance of feature selection lies in its capacity to enhance model performance, alleviate the curse of dimensionality, and streamline computation resources. By focusing on the most influential attributes, feature selection not only accelerates model training but also mitigates overfitting, where a model becomes excessively tailored to noise in the data. Moreover, it provides interpretability, enabling researchers and practitioners to understand and explain the underlying patterns driving model decisions. In the realm of model development, feature selection assumes a crucial role in the creation of models that are both efficient and accurate, while also allowing for easy interpretation. Feature selection strategies may be classified into three primary classes, embedded methods (Mahendran and P M 2022), wrapper (Maldonado, Riff, and Neveu 2022) and filter (Lyu et al. 2017) methods. In our proposal we used correlation coefficient, which is an example of filter methods, PSO and GWO which are examples of embedded Methods.
Filter Methods
Filter methods (Lyu et al. 2017) evaluate the relevance of features based on their statistical properties or information-theoretic measures. These strategies do not include the process of training a machine learning model. In contrast, the dataset is examined in isolation from any particular learning technique. One often used approach in feature selection is the correlation-based filter technique (Mohamad et al. 2021), which quantifies the degree of connection between characteristics and the target variable. Information Gain (Kurniabudi et al. 2020), Chi-Square (McHugh 2012), and Mutual Information (Bennasar, Hicks, and Setchi 2015) are other prominent filter methods. These methods are computationally efficient and can serve as a preprocessing step to reduce the dimensionality of the dataset.
Correlation Coefficient
The correlation coefficient, as described by reference (Akoglu 2018), is a statistical measure used to assess the magnitude and direction of the linear association between two variables. The scale of measurement spans from negative one to positive one, providing valuable perspectives on the characteristics of the correlation. A correlation value close to + 1 indicates a strong positive linear association, implying that if one variable grows, the other variable also tends to increase. On the other hand, a correlation value that approaches − 1 indicates a robust negative linear association, indicating that when one variable grows, the other variable tends to decrease. A correlation value close to zero suggests a limited or inconsequential linear association between the variables. The correlation coefficient can be computed using Eq. (1).
$$r=\frac{n\left(\sum XY-\sum X\sum Y\right)}{\sqrt{n\sum {X}^{2}-{\left(\sum X\right)}^{2}} \sqrt{n\sum {Y}^{2}-{\left(\sum Y\right)}^{2}}} \left(1\right)$$
where X and Y represent the two variables and n is the count of instances for the variables.
The application of the correlation coefficient for feature selection yielded valuable insights into the dataset. By identifying and retaining the top 35 features based on their correlation with the target variable, we focused on the most influential factors driving the predictive power of our model. This selection process highlighted the critical variables that exhibit strong associations with the target outcome. These chosen features are expected to play a pivotal role in the accuracy and performance of our model.
Embedded Methods
Embedded methods (Mahendran and P M 2022) seamlessly integrate feature selection within the model training process. These techniques assess feature importance during the training phase of a machine learning model. LASSO (Least Absolute Shrinkage and Selection Operator) (Kim et al. 2019) is a widely used embedded method. A penalty component is included into the linear regression equation, so promoting the model to prioritize a subset of significant characteristics. Ridge Regression (Dorugade 2014) is another example, which introduces L2 regularization. Additionally, techniques like PSO (Jain et al. 2022) and GWO (Negi et al. 2021) can be employed as embedded methods. These metaheuristic algorithms can optimize feature selection within the context of specific machine learning models.
Particle Swarm Optimization
PSO (Jain et al. 2022) is a computer optimization methodology that draws inspiration from the collective behavior seen in bird flocks or fish schools [34]. The method in question is a heuristic approach that emulates the movement of individual particles within a population as they navigate a search space with several dimensions, with the aim of identifying optimum solutions. Every individual particle inside the system symbolizes a possible solution that may be considered for the optimization challenge at hand, and the collective movement of particles is guided by their own experiences and the experiences of their neighbors.
At the core of PSO is the concept of fitness evaluation, where each particle's solution is assessed based on a predefined fitness function. The particles adjust their positions in the search space iteratively to find the best possible solution. During each repetition, particles modify their velocities by including two primary components: their cognitive component, which accounts for their historical best position, and their social component, influenced by the best position achieved by their neighbors. These components effectively balance exploration (diversification) and exploitation (intensification) of the search space. Figure 4 shows the framework of PSO.
Particles are drawn towards promising regions in the solution space as they iteratively update their positions. Over time, this collaborative behavior steers the particles towards optimal or near-optimal solutions. The utilization of PSO for feature selection has demonstrated itself to be a remarkably effective strategy in our analysis. Notably, the PSO algorithm demonstrated a consistent and balanced selection of features across all three target classes—34 features each for label, category, and sub-category. This uniformity suggests that PSO intelligently identified a cohesive set of attributes that collectively contribute significantly to the classification process. By doing so, PSO effectively reduced dimensionality while preserving the discriminative power of the selected features.
Grey Wolf Optimization
The GWO algorithm (Negi et al. 2021), is a nature-inspired optimization technique that derives its principles from the hierarchical social structure and hunting patterns seen in grey wolves in their natural habitat. The objective of this method is to replicate the collaborative hunting dynamics seen in wolf packs in order to address optimization difficulties. GWO operates on the principles of exploration, exploitation, and adaptation, aiming to iteratively refine solutions within a given search space.
In GWO, the optimization process involves a population of wolf "packs," each containing an alpha wolf (the leader), beta wolves (subordinates to the leader), and delta wolves (subordinates to both alpha and beta). Each wolf corresponds to a potential solution in the search space. The positions of these wolves represent potential solutions to the optimization problem, and their fitness is evaluated using the objective function. During each iteration, the wolves adjust their positions based on their hierarchical roles and a set of equations derived from the hunting behavior of real wolves. The alpha wolf's position is updated using a formula that emphasizes exploitation, attempting to improve the best solution found so far. The beta and delta wolves' positions are adjusted to explore promising areas of the search space. The movement of the wolves is influenced by various control parameters and the positions of the alpha, beta, and delta wolves. The framework of GWO is shown in Fig. 5.
Through successive iterations, the wolf pack collaboratively refines their positions in the search space, converging toward optimal or near-optimal solutions. GWO has demonstrated effectiveness in solving a wide range of optimization problems, including those with both continuous and discrete variables. Its inherent balance between exploration and exploitation, inspired by the cooperative nature of grey wolves, makes it a promising tool for solving complex optimization challenges across different domains. The integration of GWO for feature selection has yielded noteworthy results in our analysis. Impressively, GWO exhibited a discerning approach by selecting 63 features for label, 64 for category, and 63 for sub-category. The slightly varying feature counts across different target classes suggest that GWO efficiently adapts to the distinct characteristics of each class, emphasizing its adaptability.
Hyperparameters Optimization
The optimization of hyperparameters is an essential stage in the development and refinement of machine learning models. Hyperparameters refer to the configuration settings that dictate the behavior of a model and have an impact on its performance. In contrast to the parameters of a model, which are estimated from data during the training process, hyperparameters need pre-determined values provided by the user prior to training. The process of hyperparameter optimization is conducting a methodical exploration of various combinations of these parameters in order to identify the configuration that produces the most optimal results on a selected measure, such as accuracy or error rate.
The challenge lies in the fact that the impact of hyperparameters on the model's performance can be intricate and non-linear, making manual selection a time-consuming and often suboptimal process. In order to tackle this issue, researchers have devised a range of methodologies, including grid search, random search, as well as more sophisticated methods such as Bayesian optimization, genetic algorithms, and particle swarm optimization. These methods efficiently explore the hyperparameter space to identify combinations that maximize model performance.
Hyperparameter optimization significantly impacts a model's effectiveness. A well-tuned set of hyperparameters can enhance a model's generalization capability, prevent overfitting, and expedite convergence during training. It plays a pivotal role in achieving the best possible results from a machine learning model and ensures that the model is adaptable to the specific problem at hand. As the search for optimal hyperparameters is often time-consuming and resource-intensive, choosing an appropriate optimization method and striking a balance between exploration and exploitation is crucial for achieving efficient and effective model configurations. For this reason, we used CHIO (Al-Betar et al. 2021) for finding the best parameters for decision tree (DT) model to improve its accuracy.
Coronavirus Herd Immunity Optimizer
CHIO (Al-Betar et al. 2021) is an optimization algorithm inspired by the concept of herd immunity in epidemiology, particularly relevant in the context of managing and mitigating the spread of diseases like COVID-19. CHIO leverages the principles of natural immunity build-up within populations to guide its optimization process.
The CHIO system works via the simulation of herd immunity dynamics, wherein a certain segment of the population acquires immunity to a disease, resulting in a decrease in its transmission. Similarly, in the optimization context, CHIO divides the solution space into "immune" and "susceptible" regions. Solutions in the immune region are considered favorable, while susceptible solutions are subject to potential improvements. During each iteration, CHIO adjusts the positions of solutions based on their "immune" or "susceptible" status, mimicking the idea that immune solutions are less likely to change drastically, while susceptible solutions have more room for exploration.
This approach combines exploration and exploitation, analogous to how herd immunity balances the risk of infection and immunity within a population. By incorporating this concept into optimization, CHIO aims to enhance convergence speed, balance exploration-exploitation trade-offs, and find optimal or near-optimal solutions. The algorithm's unique foundation in epidemiology sets it apart, potentially making it a valuable tool for tackling complex optimization challenges. The CHIO approach is shown in Fig. 6.
CHIO's adaptive nature and innovative approach in selecting the best hyperparameters have resulted in a notable improvement in accuracy. CHIO fine-tuned the model's parameters, enhancing its ability to discern intricate patterns within the data. This not only bolstered the model's predictive power but also bolstered its robustness across different scenarios.
Classification
In the categorization stage, the decision tree (Kotsiantis 2013) machine learning model was used. The decision tree method is a frequently used and straightforward machine learning technique that is capable of performing classification and regression tasks. The structure of the system has resemblance to that of an inverted tree, whereby each core node signifies a choice made on the basis of a distinct characteristic, and each leaf node corresponds to either a class label or a forecast value. The decision-making process starts at the root node and proceeds down the tree, with each decision node branching into subsets of data based on feature values. This process continues until a leaf node is reached, providing the final prediction or classification.
The decision tree method operates by iteratively dividing the dataset into subsets, using the characteristics that provide the most significant information. The choice of feature and the corresponding splitting criteria are determined using various metrics like Gini impurity or information gain. These metrics evaluate the effectiveness of a certain characteristic in dividing the data into discrete and distinguishable groups or values. The method proceeds with the process of dividing nodes until it reaches a predetermined stopping condition, which may include reaching a maximum depth or a minimum number of samples per leaf. This recursive nature of decision tree construction enables it to capture complex decision boundaries and relationships within the data.
The selection of the Decision Tree algorithm for classification stems from a meticulous comparative analysis of various classifiers including SVM (Chauhan, Dahiya, and Sharma 2019), kNN (Zhang 2021), and Extreme Learning Machine (ELM) (Wibawa, Malik, and Bahtiar 2018). Our findings unequivocally favored the Decision Tree as the optimal choice for this specific intrusion detection system. The algorithm's performance surpassed its counterparts in terms of accuracy, precision, and computational efficiency. This can be attributed to the Decision Tree's innate ability to create interpretable, hierarchical structures that effectively capture the underlying relationships in the data. This characteristic proved invaluable in discerning complex intrusion patterns. The interpretability of the resulting model also lends itself to meaningful insights, allowing for a deeper understanding of the features driving the classification process.
Experimental Results
In the context of intrusion detection systems within the IoT environment, our proposed methodology exhibits several notable strengths. Our approach leverages the IoTID20 dataset, a comprehensive and diverse collection of real-world IoT network traffic data. This foundational choice ensures that our system is grounded in empirical observations, enhancing its relevance and applicability to actual operational environments.
The data preprocessing step resulted in removal of 16 features. Four features are identification features, ten features are features exhibiting a noisy distribution and 2 features contain null values. The statistical distribution of the eight removed features with noisy distribution is shown in Table 5.
Table 5
Noisy Features Statstical Analysis
| count | mean | std | min | max |
Fwd_PSH_Flags | 625783 | 0 | 0 | 0 | 0 |
Fwd_URG_Flags | 625783 | 0 | 0 | 0 | 0 |
Fwd_Byts/b_Avg | 625783 | 0 | 0 | 0 | 0 |
Fwd_Pkts/b_Avg | 625783 | 0 | 0 | 0 | 0 |
Fwd_Blk_Rate_Avg | 625783 | 0 | 0 | 0 | 0 |
Bwd_Byts/b_Avg | 625783 | 0 | 0 | 0 | 0 |
Bwd_Pkts/b_Avg | 625783 | 0 | 0 | 0 | 0 |
Bwd_Blk_Rate_Avg | 625783 | 0 | 0 | 0 | 0 |
Init_Fwd_Win_Byts | 625783 | -1 | 0 | -1 | -1 |
Fwd_Seg_Size_Min | 625783 | 0 | 0 | 0 | 0 |
The decision to exclude these features from the dataset was driven by a thorough analysis of their statistical properties. Upon examination, it became evident that these specific features exhibited a consistently low variance, with a standard deviation of zero. This indicates that these features maintained a constant value across the entire dataset, rendering them devoid of any discriminatory power. Additionally, the mean, minimum, and maximum values for each of these features also remained constant, further affirming their lack of variability. Given this lack of variability and informativeness, it was a judicious decision to remove these features from the dataset. This curation process aims to streamline the dataset, focusing on features that contribute significantly to the characterization of the underlying patterns and behaviors in the data, ultimately enhancing the effectiveness of subsequent analyses and modeling efforts.
The features after preprocessing are crucial for accurate classification. Figures 7, 8, and 9 present heatmaps that visualize the relationships between the top 10 features and the three corresponding target classes: label, category, and sub-category. These visualizations provide valuable insights into the correlations between features and target classes. The heatmap patterns indicate potential strong associations, which further emphasizes the importance of these features in distinguishing between different classes.
The dataset known as IoTID20 is partitioned into two subsets: an 80% training set and a 20% testing set. The training set is used for the purpose of training the classifier, whilst the testing set is applied to assess its efficacy. Using the full 67 features for classification, Table 6 shows the results of classification.
Table 6
Results using full 67 features
| SVM | kNN | DT | ELM |
Label | 96.68 | 99.76 | 99.94 | 98.25 |
Cat | 73.66 | 97.97 | 99.55 | 78.85 |
Sub_Cat | 47.08 | 74.31 | 74.65 | 65.12 |
The reduced features after data preprocessing which 67 features are used as input for three feature selection techniques correlation coefficient, PSO and GWO. The intersection between the 35 highest absolute value of correlation coefficient and the result of PSO for each target is computed and the results are shown at Table 7. Using the selected features for classification, Table 8 shows the results of classification.
Table 7
Feature Selection Results
Target | Length | Features |
Label | 17 | 'Dst_Port', 'SYN_Flag_Cnt', 'Down/Up_Ratio', 'Bwd_Pkt_Len_Mean', 'Init_Bwd_Win_Byts', 'Fwd_Pkt_Len_Mean', 'Flow_IAT_Mean', 'Fwd_Pkt_Len_Std', 'Fwd_Header_Len', 'Pkt_Len_Var', 'Pkt_Len_Max', 'Flow_IAT_Min', 'Bwd_PSH_Flags', 'Bwd_Seg_Size_Avg', 'Fwd_Pkt_Len_Max', 'Src_Port', 'Protocol' |
Category | 16 | 'Idle_Max', 'Bwd_Pkt_Len_Max', 'Flow_Duration', 'Pkt_Len_Var', 'SYN_Flag_Cnt', 'Src_Port', 'Fwd_Pkt_Len_Max', 'Bwd_Pkt_Len_Mean', 'Active_Max', 'Dst_Port', 'Init_Bwd_Win_Byts', 'Bwd_Pkt_Len_Std', 'Idle_Mean', 'Bwd_Seg_Size_Avg', 'ACK_Flag_Cnt', 'Pkt_Len_Min' |
Sub -Category | 22 | 'Flow_IAT_Max', 'Subflow_Bwd_Pkts', 'Subflow_Fwd_Pkts', 'Flow_IAT_Std', 'Src_Port', 'Bwd_IAT_Max', 'ACK_Flag_Cnt', 'Fwd_Header_Len', 'Protocol', 'Tot_Fwd_Pkts', 'Bwd_Seg_Size_Avg', 'Bwd_Pkt_Len_Std', 'Bwd_IAT_Mean', 'Pkt_Len_Var', 'Idle_Mean', 'Dst_Port', 'Idle_Max', 'Active_Min', 'Idle_Min', 'Flow_IAT_Min', 'Pkt_Len_Max', 'Flow_Duration' |
Table 8
Results using full selected features
| SVM | kNN | DT | ELM |
Label | 93.42 | 99.81 | 99.94 | 97.59 |
Cat | 71.24 | 98.68 | 99.61 | 88.08 |
Sub_Cat | 34.78 | 75.4 | 74.56 | 56.05 |
The meticulous data preprocessing step plays a crucial role in purging noise and augmenting the dataset's integrity. This pre-eminent effort ensures that the subsequent analyses and decision-making processes are built upon a solid and reliable foundation. By addressing noise at its source, our system demonstrates robustness in handling complex and dynamic IoT network environments. The feature selection phase represents a pivotal advancement in our methodology. Through the judicious integration of correlation coefficient assessment, PSO, and GWO techniques, we have engineered a synergistic approach that distills the feature set to its most informative components. This not only reduces computational overhead but also ensures that the selected features are optimally primed for encapsulating relevant information.
The intersection between the three sets for targets are exactly four features which are 'Bwd_Seg_Size_Avg', 'Pkt_Len_Var', 'Src_Port', 'Dst_Port'. The four features are selected and are input for the hyperparameters tuning of decision tree model. The optimization process involves determining the optimal values for two hyperparameters, namely the criteria and the max_depth. The criterion is a mathematical function used to assess the quality of a split in a decision tree. There are two alternatives, namely the Gini impurity and the entropy Shannon information gain. The parameter max_depth determines the upper limit on the depth of the tree. The hyperparameters of decision tree after optimization using CHIO are presented in Table 9. The hyperparameters are then used for training decision tree model on the training data. Then the trained model is used to predict the label, category and subcategory of the testing data and the results are used to evaluate the model and obtain the results. The proposed system achieved accuracy of 99.96%, 99.56% and 77.6% for label, category and subcategory respectively.
Table 9
Decision tree hyperparameters values after optimization
Target | Criterion | Max_depth |
Label | entropy | 17 |
Category | entropy | 5 |
Subcategory | entropy | 429 |
Our classification stage, centered around the decision tree algorithm, is a strategically chosen approach for deciphering underlying patterns in IoT network traffic. Decision trees are known for their interpretability and ability to handle non-linear relationships, which is crucial in detecting anomalies and potential intrusions within complex network behaviors. The commitment to refinement through hyperparameter optimization is another hallmark of our methodology. By introducing the innovative CHIO, we bring a cutting-edge optimization technique to bear. This demonstrates our willingness to explore and integrate state-of-the-art methodologies to fine-tune the system's performance.
Our proposed intrusion detection system for IoT environments is fortified by a series of carefully crafted steps. From data acquisition to feature selection, classification, and hyperparameter optimization, each phase is strategically designed to capitalize on the unique strengths of the chosen techniques. The use of this comprehensive strategy not only guarantees the resilience of the system but also establishes it as a powerful instrument for protecting IoT networks from possible security vulnerabilities. After the system successfully identifies and categorizes potential intrusions, it's imperative to establish an efficient and effective follow-up protocol. When an intrusion is detected, it would be prevented, and network administrator will receive prompt notifications detailing the nature of the threat along with recommended actions for mitigation. Additionally, comprehensive logs and reports will be generated to document the event, aiding in post-incident analysis and compliance with regulatory requirements.