Deep Learning Based Attack Detection in IIoT using Two-Level Intrusion Detection System

The Industrial Internet of Things (IIoT), also known as Industry 4.0, has brought a revolution in the production and manufacturing sectors as it assists in the automation of production management and reduces the manual eﬀort needed in auditing and managing the pieces of machinery. IoT-enabled industries, in general, use sensors, smart meters

stolen, or even the devices may be subjected to a Denial of Service 1 Introduction IDS can be seen as either hardware or software which detects the anomalous activity and report it to the administrator. It helps in alleviating the impact of the attack by informing the system administrator prior. James Anderson presented the definition of IDS in 1980 [1]. He examined access logs and server case logs using a collection of methods. Dorothy E. Denning created an anomalybased IDS based on statistics in 1986 and called it Intrusion Detection Expert System (IDES). Teresa F. Lunt applied Artificial Neural Networks (ANN) to IDES to boost it. Wisdom & Sense created a rule-based anomaly detector using mathematical analysis. Researchers at the University of California, Davis developed the Distributed Intrusion Detection System in 1991 (DIDS).
In general, based on the type of detection, there are two types of IDS, Signature-based detection and Anomaly-based detection [2]. Signature-based detection matches the incoming network traffic with the existing patterns extracted. This is also called Misuse based detection or Knowledge-based detection. An Anomaly-based Intrusion Detection System (AIDS) is used to detect unknown attacks. It works on the principle that any abnormal network traffic could be malicious. As a result, such network activity can be identified as an exception and subjected to further investigation to determine the nature of the traffic. Machine Learning (ML) algorithms are proven to work effectively and are reliable for anomaly detection [3].
Many research papers have presented machine learning based IDS. Both supervised and unsupervised are used to detect intrusions. Ensemble techniques have been used to detect attacks [4]. Semi-supervised models have also been found beneficial to balance the dataset to detect anomalies [5]. Attacks can also be detected using unsupervised approaches such as clustering and Self Organized Maps (SOM) [6,7]. Later, deep learning models have been used to train intrusion detection models. This is because machine learning techniques normally require a huge processing time. Deep learning has been employed as it has the capability of training large datasets. It can be combined with big data techniques to train a huge dataset and proven to detect better than other methods [8]. It has been proven that it can reduce the false positive rate [9].
The challenges faced in building the IDS are twofold: to begin with, an imbalanced dataset will lead to a low detection rate of the minority attacks; secondly, even though the IDS model can detect the vast majority of assaults, some attacks go undetected due to misclassification. Such attacks necessitate sophisticated identification. Hence, the proposed work aims to develop DL-TL-NIDS that detects malicious activity. The following workflow is used; initially, the dataset is balanced and standardized. In the first-level detection, Deep Neural Networks (DNN) are trained and tested. The attacks that get a lower detection rate or lower precision are declared as challenging attacks. The challenging attacks are fed to second-level detection. Second-level detection employs two models namely the Negative Selection Algorithm (NSA) and Deep Neural Networks (DNN) trained using Dragonfly Algorithm. Dempster Shafer's combination rule is used to combine the outputs of both models. We have evaluated our work against CICIDS 2017, CICIDS 2018 [10] and TON IoT datasets [11][12][13][14][15][16][17][18].

Motivation
The major concern for IIoT is security and privacy [19]. The real-world examples demonstrate that security in IIoT environments has become a necessity since failure to do so has severe consequences. Many researchers have advocated the use of Intrusion Detection Systems (IDS) to identify and mitigate attacks. However, IDS in the IIoT context poses several problems. The difficulties are listed below.

IIoT consists of heterogeneous components interconnected via networks.
Due to the distributed nature of IIoT devices, hackers can simply get access to them via networks [19][20][21]. 2. IIoT devices generate massive data on benign traffic and less data on intrusions that occur sporadically [22][23][24]. That is, the dataset's distribution is extremely skewed, which has an impact on the model's detection performance. 3. Even though IDS can be constructed with Machine Learning methods, it might lead to overfitting when a large dataset is used for training [25].
We propose a DL-TL-NIDS to combat the above challenges. The main contributions of our work are as follows . Second-level detection will be applied only to challenging attacks. Easy-to-detect attacks will not be subjected to second-level detection to avoid additional computations. 5. In second-level detection, two detectors namely the NSA and DNN (trained using enhanced Dragonfly Algorithm) were employed. The output obtained from each detector is the probability of a test data being an attack. Dempster Shafer's combination rule is used to fuse the decision obtained from the detectors.

Literature Survey
In the IIoT, attacks majorly occur in (1) Operational Technology (OT) and (2) Information Technology (IT). IP spoofing, eavesdropping, bruteforce password guessing, and data manipulation may occur at the OT level for IoT devices and controllers such as programmable logic controllers, gateways, and operator stations. Phishing, SQL injection, brute force attacks, and DoS attacks can affect IT level components such as data centers, online and mail services, edge devices, and mobile devices. The following are some of the consequences due to the aforementioned attacks in IIoT.
1. Business secret data can be stolen by competitors, resulting in the manufacture of duplicate items. 2. Data collected from sensors and smart meters can be tampered with, resulting in a loss of product quality. 3. End-user devices and IoT devices could be compromised, resulting in service unavailability.
Hence it is essential to detect and mitigate the attacks in the IIoT context. Many researchers have proposed various types of IDS or the IIoT environment that are capable of detecting attacks. Aboelwafa et.al [26] detected falsified data injection attacks using Autoencoders (AE) and cleaned them using Denoising Autoencoders (DAE). The evaluation was done via simulation with an accuracy of 97.02%. Maede et al [27] addressed the use of machine learning approaches to identify attacks. The machine learning models were tested on real-world test datasets and were successful in detecting backdoor, command injection, and SQL injection. However, the study lacks complex hybrid models, and false negatives are significant. To discover abnormal behaviors, Li et al [28] employed time series analysis. The researchers used multilayer Long Short-Term Memory (LSTM) and an enhanced bidirectional LSTM. The suggested work was evaluated against the CTU-13 and AWID datasets, with 95.01% and 97.58% accuracy, respectively. Mahbub et al [29] concentrated their research on selecting adversarial samples that can deceive a classifier. The models were retrained after eliminating adversarial samples. The samples were chosen based on the malware's cluster center's closeness and likelihood calculated using Kernel-Based Learning (KBL). The work was evaluated against a publicly available dataset with an accuracy of 86.08%.
Yan et al. [30] created a deep learning model, trained using mini-batch gradient descent with adjustable learning rate and momentum. The work was evaluated against web domains obtained from Alex top1w and 360 netlab and achieved a precision of approximately 90%. Bhunia et.al [31] proposed Soft-Things, a Software Defined Networks (SDN) based IIoT security framework. The machine learning algorithms were employed at the SDN controller to monitor the network. The precision of SoftThings was tested using the mininet emulator, and it was found to be 98%. Deep-IFS, implemented in a Fog environment, was proposed by Abdel-Basset et al [19]. The master fog node distributes the training parameters to the workers and then combines their decisions. The suggested work was tested against the Bot-IIoT dataset, yielding a 98.1% accuracy. Latif et.al [32] proposed a Deep Random Neural Network (DRaNN) and evaluated using UNSW-NB15. The experimental results show that the model was able to detect attacks with an accuracy of 99.41%.
Yao et.al proposes a Multi-level intrusion detection model framework named MSML to overcome the imbalance of network traffic and non-identical distribution between the training set and test set in feature space [33]. This framework includes pure cluster extraction, pattern discovery fine-grained classification, and model updating. KDD'99 dataset was used for evaluation. The framework can effectively distinguish known unknown pattern samples with accuracy (99.3%). Liang et.al [34] resolves the inherent problems in IDS such as low detection rate, low real-time performance, and high false-positive rate by proposing a multi-feature data clustering optimization model. The clusters are formed based on the distance between the cluster center and the data point. For the NSL-KDD dataset, the average time saving is 7.8% than the existing models. An overall detection rate (accuracy) of 97.8% is achieved. Yan et.al [35] proposed a multi-level DDoS mitigation framework to identify DDoS attacks in the IIoT. The proposed approach was evaluated against real-time DDoS attacks generated using the ping of death and TCP SYN flood.
In [36], the Apache framework is used and proposes a hybrid algorithm to exploit deep learning and machine learning advantages. The latent features are extracted using stacked autoencoders. The accuracy achieved in ISCX 2012 dataset is 90%. The overall accuracy of 99% is achieved in CICIDS 2017 dataset. Maharani et.al [37] presented the IIoT attack detection in the fog layer of the IIoT environment. The proposed model employed ML models like Decision Trees, K-means and Random Forest and was evaluated against the KDD Cup'99 dataset. The study concluded that K-means outperformed other algorithms, reaching a 93% accuracy rate. A non-symmetric deep autoencoder (NDAE) was proposed in [9] for unsupervised feature learning. The benchmark KDD Cup '99 and NSL-KDD datasets were used for random forest evaluation, obtaining 85.43% and 97.85% accuracy, respectively. Though the proposed model has better overall performance, it could not detect the User to Root (U2R) and Remote to Local (R2L) attacks. Zhong et.al [38] tried to adopt big data for the machine learning model to train on a massive amount of data. The behavioral and content features were extracted and used a combination of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) for the prediction. The proposed method was evaluated against the ISCX2012, CICIDS2017, and DARPA1998 datasets.
It is inferred from the existing works that many works had presented the overall performance of the model. However, they lacked analysis of attack-wise performance, and single-level detection was performed. Single level detection will be an issue when certain attacks, that are extremely harmful (eg., ransomware, DoS, infiltration, etc.,), cannot be identified by the model. Hence in our proposed work, we focus on attack-wise performance and two-level detection. We propose a DL-TL-NIDS, a two-level IDS framework. At the first level, the challenging attacks were identified and segregated using DNN. In second-level, the challenging attacks were detected and classified using hybrid models namely NSA and DNN(trained using enhanced dragonfly algorithm). The decision is fused using Dempster-Shafer's combination.

Proposed Methodology
This section explains the DL-TL-NIDS in detail. Figure 1 illustrates the workflow of DL-TL-NIDS. As mentioned earlier, challenging attacks are those attacks identified with low precision or low accuracy by the model. The key justification for focusing on challenging attacks is that, while the model identifies other types of attacks effectively, challenging attacks might cause the model's performance to deteriorate. Hence, the proposed work seeks to develop a two-level IDS that identifies attacks in the IIoT and focuses on challenging attacks. The sensors and smart meters in IIoT create monstrous traffic. The data will have a wide range and magnitude. So, the data preprocessing and normalization are done before the first-level detection.

Oversampling and Standardization
The attack distribution in the IIoT dataset is highly disproportionate as the benign traffic spawns frequently, and the intrusion traffic seldom occurs. Hence, balancing the dataset is necessary to overcome the variance in IIoT dataset distribution. Attack distribution of the dataset has a significant impact on the After finding the nearest neighbors, the algorithm takes each set of neighbors and produces the new data point. The flowchart of SMOTE is given in Figure  2.
The data is standardized using a Z score technique. The technique uses the mean and standard deviation of a feature. The formula of Z score is given in equation (1): Here x norm is the normalized feature value, x is the original feature, µ is the mean value of the feature and σ is the standard deviation of the feature value.

Level-1 Detection
First level detection distinguishes between easy-to-detect and challenging attacks. A challenging attack may not be appropriately classified because it may be detected as benign or misclassified. The challenging attacks will be handled in Level-2 detection. The components of Level-1 detection are as follows

Hyperparameter Optimization
Hyperparameter optimization is employed to find the optimal hyperparameters required for the model. Hyperparameters are the parameters set before learning and can exert on the model's performance (here it's DNN). Learning rate, momentum, epoch size, batch size, kernel initialization, and dropout regularisation are the hyperparameters considered here and optimized for a DNN. A fairly performing baseline model is opted to perform hyperparameter optimization. The hyperparameter optimization algorithm adopted is Random Search. Random search picks a random set of hyperparameter values at each instance. Since the values are random, there is a high likelihood that the entire search space of hyperparameters is exploited. In Figure 3, we have considered two hyperparameters, hyperparameter-1 and hyperparameter-2. The black dots represent the set of values taken by random search at each instance.
The model is trained and tested using the set of hyperparameters chosen at each instance. The random search returns the hyperparameters for which the model works best after a fixed number of iterations.

Deep Neural Networks
The Deep Neural Networks (DNN) are similar to ANN, however, there are more hidden layers in DNNs. Each DNN has an input layer, multiple hidden layers, and an output layer. Each hidden layer has several neurons. Each neuron gets fired or remains neutral based on the inputs it receives. In figure 4, x i is the input feature, W (i) are the weights of the connection between neurons from layer L i to layer L i+1 , y i is the output, and g is the activation function that fires the neuron based on the computation done during forwarding propagation. Each layer can use different activation functions. In our proposed work, we employed the relu function for hidden layers, as it worked better in the baseline model, and the softmax function as the activation function for the output layer, as the output we need is in terms of probability. The relu and softmax functions are given in (2)    During the forward propagation, the inputs are multiplied by weights and bias assigned by each neuron and travel through each hidden layer, and then finally predicts the output y i . Each neuron in hidden layer l calculates the following In the above equation, g denotes the activation function. The DNN is trained using backpropagation. Backpropagation employs gradient descent. The motive of backpropagation is to update the weights such that the error between expected and predicted output is minimal. Gradient computation involves computing changes in weight with respect to the expected output (i.e dW and db). The error between actual and predicted output in the output layer is calculated and backpropagated to the preceding hidden layers. The weights and bias values are updated according to the value of gradients.
Gradient descent algorithms have many extension algorithms that are optimized. One of the optimizers is the Adam optimizer. This algorithm is efficient and it is a combination of gradient descent with momentum and the Root Mean Square (RMS) Prop algorithm. In the momentum method, the velocity with which the gradient is changing is calculated and RMSProp employs an exponentially weighted average method on the second moment of the gradients (dW 2 ). Adam optimiser uses decays both past squared gradient (V) and past momentum (S) calculated using (6) and (7). The adam adds bias correction to V and S using (8) and (9). Finally, the weights are updated using (10).
In our proposed work, DNN, as shown in Figure 4, is trained and tested using the preprocessed IIoT dataset. The hyperparameters obtained from the optimization algorithm were used for training the DNN.

Classwise Accuracy Measure
The module assesses the performance of the DNN by examining the accuracy and precision of each IIoT attack. An attack is considered easy-to-detect if it has both high accuracy and high precision, otherwise, it is labeled as a challenging attack.

Level-2 Detection
The goal of the second-level detection is to detect the challenging IIoT attacks that were misclassified at the first level detection. As hybrid models can enhance detection accuracy [39], Two models, DNN (trained using enhanced dragonfly algorithm) and NSA, are employed in the second level detection to identify challenging attacks.

Deep Neural Networks (Trained Using Enhanced
Dragonfly Algorithm) The same DNN architecture as in the first level is used. Softmax classifier is used as an activation function. DNN is trained using the Enhanced Dragonfly algorithm. Backpropagation is susceptible to noisy input and learning rate, and it has an issue with local optima. We use the dragonfly algorithm to determine ideal weights and biases for DNN. The dragonfly algorithm is a Swarm-based metaheuristics technique that is used to identify the best solutions to problems. The algorithm is motivated by dragonflies' quest for food sources while fleeing from predators. A food source is the best solution, in terms of fitness, from an algorithmic standpoint, whereas an enemy source is the worst choice so far. The algorithm considers 5 elements namely separation (S), cohesion (C), alignment (A), food source (F) (i.e., best solution), and enemy (E) (i.e., worst solution). Also, each element is associated with the coefficients namely s, a, c, f, and e. The values of S, C, A, F, and E are computed as follows.
The population and the number of iterations are both initialized, and each individual represents a vector of weights and bias values. At each iteration, the fitness of an individual is calculated using the fitness function. The fitness function for Neural Networks (NN) can be one of the performance metrics used to evaluate NN. In our work, we've used accuracy as the fitness function. Among all the individuals, the one with the highest fitness is taken as the best solution and it is assigned as the food source. The individual with the least fitness is taken as the worst solution and marked as the enemy source. The weights w, s, a, c, f, and e are assigned. For each individual, if there exists at least one neighbor the values of S, A, C, F, and E are calculated using the equations (11) - (15). The weights bias value is updated using equation (16) if there is a neighbor. If there is no neighbor to an individual, then the solution is updated using equation (17). The Levy flight is calculated using (18). Here b and r1 are constant values.
levy(x) = 0.01X(r 1 * σ) The drawback of this algorithm is as follows [40] 1. The lack of internal memory in this algorithm is a disadvantage since it might lead to premature convergence to the local optimum.
2. Levy flight mechanism is utilized to model the random flying behavior of dragonflies in nature. The disadvantages of Levy flight are overflowing of the search area and interruption of random flights due to its big searching steps.
The aforementioned flaws are addressed in our enhanced dragonfly algorithm. The changes are made on the use of the Levy flight and the introduction of the concepts of local best and global best solutions. Algorithm 1 describes the algorithm. For the neighbors in a cluster, the local best solution is used as a food source. The reason for selecting the local best is that the global best solution found thus far may lead to a local minimum in subsequent iterations, narrowing the search space. At each cycle, a local best solution can be used to overcome this. When a dragonfly has no neighbors, the global best and worst solution will be used. for each x ∈ X t do 4: if no of neighbors(x)! = 0 then 5: l best , l worst = get local extreme(x, local best , local worst )

6:
Update w,s,a,c,f,e using equation  11: x = x + △x 12: else 13: x = x + w * |g best − x| The negative Selection algorithm [41,42] is a kind of Artificial Immune System (AIS).AIS, a type of rule-based machine learning system [43,44] and it is inspired by the human immune system. It works similarly to a pattern-based selection. There are two phases namely 1. Training phase 2. Testing phase Each IIoT traffic of the dataset will be taken and compared to the current detectors using similarity scores throughout the training phase. We'll add the instance as a detector if it belongs to a new pattern or doesn't have any similarities to current detectors. Incoming IIoT traffic will be considered during the testing period. The similarity scores will be used to compare the incoming test traffic to the detectors. The similarity scores will be used to compare the incoming test traffic to the detectors.
Since the size of the IIoT dataset is large, checking each data point and then selecting it as a detector would be difficult. Hence, in our proposed algorithm, each IIoT attack in the dataset is clustered. The detectors will then be generated by taking random samples from each cluster. The average similarity score between each challenging IIoT attack and the test data is calculated and stored. The probability is determined by dividing the average score of each attack by the overall score. At the end of the test phase, we'll acquire the probability of test data belonging to each IIoT attack. The training and testing phase of NSA is presented in Algorithm 2 and Algorithm 3. for each cluster ∈ cluster class do data point = T ake random points f rom cluster detector.add(data point) 4: end for 5: end for 6: return detector

Dempster Shafer's Combination rule
The Dempster-Shafer theory is a probabilistic combination strategy that has been used to combine the output from classifiers [45]. The output that we get from the DNN will be the probability of test data belonging to a particular IIoT attack. The output that we get from NSA will be also based on the probability that a data point belongs to a particular class. So, Dempster Shafer's theory will be used to combine the two. The following equation (19) presents the combination rule: where m 1 (B) is the probability obtained from deep neural networks and m 2 (C) is the probability obtained from novel Negative selection algorithm. K is a measure of the amount of conflict between the two mass sets.

Experimental Setup
The proposed work is implemenented using Python. For implementation, Google colab is being used. Our proposed model is evaluated using benchmark datasets such as CICIDS-2017, CICIDS-2018, and Ton IoT datasets. Accuracy, precision, recall, and F1-score are the performance measures that are used to evaluate DL-TL-NIDS. The aforementioned measures are computed from the confusion matrix. The following four terms make up the confusion matrix: With the help of these terms, the performance metrics can be calculated. The calculation method for each performance metrics is described and the formula is given through equation (21)-(24).

Accuracy: This metric takes into consideration the number of samples
correctly predicted. It is the ratio of the correct predictions to all the predictions made by the model.
2. Precision: It is the ratio of correct positive predictions made to the actual number of positive samples in the dataset.
P recision = T P T P + F P 3. Recall: It is the ratio of correct positive predictions made by the model to the total positive predictions made by the model.
4. F1-score: F1-score is the harmonic mean of recall and precision.
The performance of our proposed model across different datasets is evaluated and compared against the existing state-of-art models. In both the levels, DNN opts for the optimized hyperparameters using Random Search, a hyperparameter optimization method. Table 1 summarises the default hyperparameter values utilized across all datasets. Learning rate, batch size, epoch size, and kernel initializer were the hyperparameters that were tuned via Random Search. Table 2 shows the results of the Random Search executed for each of the datasets mentioned above.  classes and 83 features. The upside of this dataset is that it has recorded upto-date attacks, and the drawback is that it has profoundly imbalanced classes ranging from SQL Injection (0.0007%) to benign (80.30%). Table 3 represents the performance of our proposed DL-TL-NIDS with the CICIDS-2017 dataset. The attacks that were highly misclassified (had accuracy/precision below 90%) in the first level detection were Benign, Bot and Portscan. The overall accuracy, precision, recall and F1-score is given in Table  4. The challenging attacks were fed for advanced detection. The false alarm rate (flagging Benign as an attack) was around 5% in the first level detection, and it has come down to 1% in second level detection.  In [46], DNNs were used for their study, changing the number of hidden layers from 1 to 5 and it's been referred here for comparison with DL-TL-NIDS.
For each attack, the best accuracy obtained by the existing work is taken, and we have compared it with our results. It can be inferred from Table 5 that our model performs well. The False Positive Rate (FPR) has been reduced in DL-TL-NIDS when compared with [46].  Table 5. In the First level or simple detection, six attacks, namely, Benign, XSS, SQL Injection, DoS-SlowHTTPTest, Infiltration, and FTP-Bruteforce attacks had accuracy/precision less than 90%. The aforementioned attacks were marked as challenging attacks and fed for Level-2 detection. Table 7 presents the comparison, in terms of accuracy, of an existing state-of-art method [47] that employed ensemble learning along with feature selection. It can be inferred from Table 8 that the percentage of misclassification in our proposed method is far less than the ensemble method.
When compared against [48], which implemented CNN, DL-TL-NIDS performed well, especially the detection was better for Infiltration, FTP Bruteforce, and DoS attacks. Figures 5 to Figure 12 present comparisons between the [48] and DL-NL-TIDS in terms of accuracy, recall, precision, and F1-score.

TON IoT
The dataset was generated by UNSW Sydney, especially for attack detection in IIoT. The dataset is generated by collecting information from sensors in IoT devices such as Garage, GPS-tracker, Fridge, and Thermostat. It comprises Backdoor, Command Injection, XSS, Scanning, DDoS, and Ransomware attacks.
Our proposed DL-TL-NIDS is compared against Kumar et.al [49], which presents Ensemble-Anomaly based detection. It can be inferred from Table    9 that the false alarm rate is reduced in our proposed model. Also, other performance metrics like accuracy and precision have been improved. The Industrial Internet of Things (IIoT) provides a number of benefits. This study explored various applications, security issues induced by IIoT, and presented a Deep Learning-based two-level Intrusion Detection System. The proposed model first segregates challenging attacks in the first-level detection and performs Advanced detection in the second-level. The model is evaluated using benchmark and IoT and network datasets. It is demonstrated that it performed better than the existing state-of-the-art models.
In the future, the proposed two-level IDS can be extended to perform well on zero-day attacks or new attacks. Also, the proposed work can be extended to fit other IoT specific environments like the Internet of Medical Things (IoMT) and Machine-to-Machine (M2M), etc, to protect the devices and data.