A novel reinforcement learning-based hybrid intrusion detection system on fog-to-cloud computing

doi:10.21203/rs.3.rs-3864298/v1

The increasing growth of the Internet of Things and its open and shared character has led to a rise in new attacks exponentially. As a consequence, the quick detection of attacks in IoT environments is essential. The intrusion detection system (IDS) is responsible for protecting and detecting the type of attacks. Creating an IDS that works in real-time and adapts to environmental changes is critical. In this paper, we propose a deep reinforcement learning-based (DRL) adaptive IDS that addresses the mentioned challenges. DRL-based IDS helps to create a decision agent, who controls the interaction with the indeterminate environment and performs binary detection (normal/intrusion) in fog. For multi-class classification of attacks, we use the ensemble method in the cloud. The proposed approach is evaluated on the CICIDS2018 dataset, in binary and multiclass classification. The results show that the proposed model achieves a comparable performance in detecting intrusions and identifying attacks and it reduces the prediction time significantly. Furthermore, it is a suitable solution for underrepresented attacks. Overall, combining multiple methods can be a great way for an intrusion detection system.

IoT

Intrusion detection system

Deep learning

Reinforcement learning

Deep Q-networks

The Internet of Things (IoT) provides a global network of devices, computers, and sensors able to interact with each other [1], [2]. Cloud computing is a centralized solution, and provides unlimited, remote computing and storage resources for the IoT applications. While cloud computing supplies a large amount of resources, it faces challenges like limited and unreliable bandwidth, security and high communication latency between IoT devices and cloud servers[3]. Therefore, latency-sensitive applications in IoT cannot be efficiently executed. To solve the mentioned problems Fog computing extends cloud computing and provides storage, processing, and other things. It is a heterogeneous paradigm and distributed in the proximity of IoT devices. The fog layer is an intermediate layer between the Cloud and the IoT and is able to deploy resources close to the edge of the network. Therefore, Fog is an optimal solution for latency-sensitive IoT applications. However, the fog has restricted resources in comparison to the cloud[4]. Fog-to-Cloud computing[5] is also a hybrid type of computing that is used in many aspects of IoT, such as healthcare, agriculture, driving, and industry.

The characteristics of the IoT network have made it susceptible to various kinds of attacks. These attacks damage both devices and data. Therefore, the implementation of security systems is essential. An Intrusion Detection System (IDS) is a helpful security tool to detect malicious actions or attacks[6]. IDS can be implemented in IOT devices, cloud layer, or fog layer. It should be noted that due to limitations in processing and storage in most IOT devices, they are not able to run such applications. On the other hand, deploying IDS in the cloud layer will lead to delays. Consequence the most suitable option is to implement IDS in fog. Therefore, we need to design a new generation of intrusion detection systems that are suitable for the fog layer.

The new generation (IDS) requires automatic and intelligent intrusion detection strategies to manage threats. The new IDS must be able to operate based on an independent agent and with minimal human intervention. At the same time, they should be able to evolve and improve themselves.

Although deep learning methods had a significant impact on the development of intrusion detection systems, their efficiency is highly dependent on the selection of appropriate features and the setting of hyperparameters used for DL training[7]. In addition, DL-based techniques face limitations in adapting to changing attack patterns. In other words, they cannot adjust themselves to any possible change in the attack pattern of the environment. As a result, accuracy will decrease and FPR will increase. Reinforcement learning (RL) is the most effective solution to security IoT against new various attacks. It can integrate environmental behavior into the learning process simultaneously[8]. On the other hand, it can model an autonomous agent to take actions optimally without knowledge of the environment. Therefore, it is useful in real-time and adversarial environments. Reinforcement learning is modeled using a Markov decision process (MDP) consisting of five entities, state, action, reward, policy, and value. The foundation aim of reinforcement learning is to find an optimal policy. The policy function is used to compute the next best action, based on the current state and reward provided by the environment.

Since the IoT environment is growing and becoming more complex, the RL agent will not be able to learn a larger number of parameters. On the other hand, the fog environment is stochastic in several aspects such as resource requirements of applications and the arrival rate of application requests. Thus, heuristic-based techniques are not able to efficiently adapt to changes in the fog. Deploying the deep reinforcement learning (DRL) model that combines deep learning (DL) and reinforcement learning (RL) can overcome this issue. DRL agents can learn the optimal policy and long-term rewards without prior knowledge[9]. In other words, deep learning significantly helps to optimize functions in reinforcement learning to create an optimal policy. DRL can deal with complex problems in dynamic and stochastic fog environments[10]. Moreover, DRL is also excellent for security applications where attacks are increasingly sophisticated, rapid, and ubiquitous[11].

The efficiency of IDS is highly dependent on the deep learning technique and their model training mechanisms. Therefore, feature selection methods and hyperparameter tuning greatly impact IDS performance, especially in complex environments where the number of parameters is increasing[12]. Using DRL for IDS causes DNNs as performance estimators to extend the scalability of RL as well as adapt to complex IoT. DLR increases the capability of continuous self-learning by interacting with the environment and improving its ability over time.

To address the challenges raised and due to the advantages of DRL and DL, we have proposed a Deep Reinforcement Learning approach to separate normal events from intrusions in fog and we have used several machine learning methods to recognize types of attacks in the cloud.

The main contributions of this paper are summarized as follows:

· A two-level state-of-the-art suitable for online learning in a dynamic environment based on the Deep Reinforcement Learning approach for binary, and ensemble learning for multiclass detection.

· Employing GRU to learn the internal relations between events based on a relatively long duration of the past time. Using Deep Q-learning to estimate the Q-function and choosing the appropriate action (permeation/normal) in fog. Utilizing multiple machine learning models in the cloud to detect the attack type of suspicious events

· Sharing the information obtained in different fog nodes to the summary module in the cloud to update the parameters.

· Extensive experiments were conducted to evaluate and compare the efficiency of the proposed DQL models. Our experimental results based on the CIC-IDS2018 dataset, demonstrate that it is effective in detecting intrusion of underrepresented classes and is excellent in achieving prediction time in comparison to another approach.

The rest of the paper has been organized into six different sections. Section 2 is about the related work on IDS using deep learning and reinforcement learning techniques. Section 3 describes detailed of the model description. The experimental evaluation, results of the proposed model, and the comparison models are given in Section 4. Section 5 presents limitations and future scope. Finally, Section 6 concludes our paper.

Cybersecurity concerns have become a significant obstacle to the faster adoption of IoT technology. Although numerous IDS have been proposed in the literature, they frequently have poor accuracy because of creating various attacks in IoT environments and the lack of adaptation of attack patterns.

A lot of research has been conducted to enhance IDS performance. Researchers use a variety of techniques including machine learning and deep learning[13],[14],[15]. Recently Deep Reinforcement Learning has provided a promising solution that can improve the efficiency of IDS in dynamic environments[16],[17]. Considering the large number of works on IDS techniques, we study in this section, some related works of DRL-based and DL-based techniques in IoT environment.

Sethi et al. [18] have proposed a deep reinforcement learning-based IDS. It is adaptive to cloud architecture. It consists of three components, namely the host network, agent network, and administrator network. This method can provide higher accuracy against new and complex attacks and is able to maintain a balance between high accuracy and low false positive rate (FPR) compared to state-of-the-art IDSs. Experiments have been done on the UNSW-NB15 test dataset. The result demonstrates 83.80% average accuracy and 2.6% FPR. Moreover, this model has been successful in detecting all types of attacks.

Yang et al. [19] have conducted an intrusion detection model based on a two-layered bidirectional long short-term memory (Bi-LSTM) on fog. In the first stage fog nodes identify attacks based on traffic received from IoT devices and then send them to the cloud to summarize the global security condition. The first level is organized into three phases, including: preprocessing, categorizing and collecting data. If the predicted events are classified into normal groups, IoT devices are permitted to continue the process, otherwise activities are categorized as suspicious. The model has achieved the highest accuracy (99.05%), the highest detection rate (99.36%), and the highest f1-measure (99.154%) on the UNSW-NB15 dataset.

Strickland et al. [20] have proposed a novel hybrid technique that can perform binary and multiclass classification. The model uses GANs to improve the learning capabilities of the proposed model. The GAN model is trained with the NSL-KDD dataset for four attack categories and normal traffic. DRL is used to implement this IDS. Although overall accuracy decreased, the results show generating synthetic data can improve the precision and recall for underrepresented classes in imbalanced datasets.

Lazzarini et al. [21] suggest a novel intrusion detection approach named Deep Integrated Stacking for the IoT (DIS-IoT). This model works based on combining four different DL models (MLP, DNN, CNN and LSTM) into a fully connected DL layer, creating a standalone ensemble model. DIS-IoT implements binary and multiclass classification techniques. Experiments demonstrate the model on ToN-IoT and CICIDS2017 datasets has achieved great scores in accuracy, recall, precision and F1- Measure in both binary and multi-class classification. This method tries to show that the combination of several methods can be effective in improving the metrics.

Authors in [22] present a distributed binary IDS method based on neural networks and genetic algorithms to efficiently detect various intrusions on Fog nodes. Genetic algorithm is used to optimize the weights of the network and the biases associated with each neuron. Experiments have been implemented on UNSW-NB15 and ToN_IoT datasets. The results show that optimized weights and biases can improve the performance model compared to the neural network without optimization. The proposed model compared to other state-of-the-art methods achieved a 16.35% and 37.07% reduction in execution time for both datasets. Therefore convergence process will increase and will save processing power.

Stefanova et al. in [23] have proposed an intrusion response mechanism that is based on reinforcement learning techniques (RLT). The model uses a Q-learning approach. The agent can control the process of interacting with the indeterminate environment and finding an optimal policy to represent the intrusion response. Evaluation of the Model shows proposed IDS based on Q-learning establishes the balance between exploration and exploitation.

Lopez-Martin et al. [24] present a novel application of several deep reinforcement learning (DRL) algorithms to IDS. It performs supervised learning based on a DRL framework. The model is suitable for dynamic environments where online learning is necessary. The authors supply a comparison study of four algorithms (DQN, DDQN, Policy gradient and actor-critic). They compare four algorithms with machine learning models on NSL-KDD and AWID datasets. Results illustrate DDQN algorithm has an excellent performance in accuracy, F1, precision and recall. Moreover, the performance of prediction scores, training and prediction times in DDQN was better than other approaches.

In addition to the methods mentioned above, there are also other different proposals such as[25], [26], [27], [28], [29], [30] in intrusion detection. Stressing privacy preservation in the IoT is important. Therefore adopting a strategy to detect attacks is essential. Each of the existing suggested methods has examined specific attacks or specific aspects of the intrusion detection system and has advantages and disadvantages. Due to the effective features of deep learning and reinforcement learning, the study based on these methods in intrusion detection systems is still ongoing.

Intrusion detection and classifying the types of attacks are the two most important of our purpose. Our study uses deep Q-learning and machine learning approaches to binary and multiclass intrusion detection, respectively. Since we need fast response in binary diagnosis, binary detection is implemented in the fog layer. In other words, deep Q-learning identifies attacks in the fog layer. Events marked as intrusions are sent for more analysis and to identify types of attacks on the cloud. Cloud can perform complex analysis because there is no need for an urgent classifier and it has flexibility in response time. In the cloud layer, there is a robust method. It includes an ensemble machine learning approach of multi-class classification.

In the first step, we model the environment using GRU architecture to learn the internal relations between events in fog. In general, we integrate the influence of the historical information of the dynamic environment on policy optimization. Because we aim to detect attacks with small mutants compared to previous attacks, and recognition zero-day attacks in an IoT environment.

The important point is to send and share the information obtained in different fog nodes to the summary module in the cloud. That means the information created in each fog node is sent to the cloud so that it will be visible to all nodes. This module is responsible for updates based on received information. Generally, the proposed method is implemented in three phases, which are: 1) the preprocessing and environment modeling phase, 2) the binary detection and parameter updating phase, and 3) the multi-class detection phase. The overview of the proposed architecture is shown in Fig. 1. Three proposed phases with detail are described in the following subsections.

3.1 preprocessing and environment modeling phase

Since the IoT environment is completely variable and new attacks are being produced with very few differences compared to previous attacks, finding the relationship between events will significantly help in detecting intrusions and their types. GRU can learn values over short and long periods. It was proposed to solve the vanishing gradient problem in RNN. GRU is similar to LSTM but simpler than LSTM and uses fewer parameters. Therefore, it will be more computationally efficient. The use of GRU can lead to the benefit of the information on previous events in the detection of new attacks. In other words, prior information can be incorporated into an internal state that is a suitable representation of the interactive environment. These states are entered into Deep Q-Learning to perform binary recognition. We use Deep Q-Learning to deploy agents in discrete action space environments.

According to Fig. 2, B_n (Mini Batch) records are sampled from the dataset in the first step. Choosing B_n records means that each state (S) will be equal to B_n records, and each record will contain S₁ to S_m features.

In the Next step (Fig. 3) (S_i), (h_i-1) and (A_i-1) which represent the current state, the hidden state of the previous step, and the previous action, respectively, enter the GRU module at any time (t). Eq. (1) calculations are performed in each GRU unit.

Z_i = δ (W_zS_i + U_zh_i−1)

r_i = δ (W_rS_i + U_rh_i−1) (1)

h^'_i = tanh (WS_i ʘ Uh_i−1)

h_i = (1-Z_i) ʘ h_i−1 + z_i ʘ h_i)

Z_i is the update gateway for state i. This gateway decides what information should be discarded and what new information should be kept in it. W_z and U_z are the weights of S_i and h_i−1, respectively. Next, the sigmoid activator function is used to produce the output (0 or 1). In this stage, the model decides how much information to transfer to the future. This action also avoids the problem of gradient vanishing. r_i is the reset gate. In this gate is decided what information from the past should be forgotten. W_r shows the weight of S_i and U_r shows the weight of h_i−1. We use h^'_i to specify how the two update and reset gates determine the output. The output of this function tells what to remove from the previous step. In the last step, the network decides to keep the information related to the current memory and transfer it to the network by calculating the h_i−1 vector. This requires the update gate to determine what information to retrieve from the current memory contents of hi and what from the previous step h_i−1. This step is calculated by the equation h_i.

The output of this module includes the prediction of the next observation S_i+1 and the next hidden state h_i. In fact, the produced h_i indicates what information from the previous step h_i−1 and what information from the current observation S_i should be stored. In addition to being used as an input at time t + 1 in the GRU, the generated output h_i is also sent to the binary detection module to perform attack detection. In general, we perform binary detection on data that have useful information from previous events and current events. In other words, we have provided the ability to learn in earlier time steps.

3.2 Binary detection phase

RL is based on the Markov Decision Process (MDP) theory. An MDP consists of tuples (S, A, T, R), where S is a set of states, A is a set of actions, T is a mapping function that presents the probability of transition from the pair (S, A) to a new state, and R is the reward function. The mapping function uses Markov probability. In such a way that the probability of transition to a new state depends only on the current state. MDP aims to learn the optimal policy so that it chooses the best action for each state. As mentioned deep Q-learning is used as a binary detection technique. Deep Q-learning is a model-free algorithm and it doesn’t create a model of the environment’s transition function. The purpose of Deep Q-learning is to estimate the Q-Value and solve the Q(S, A) function based on the experience samples in each time step according to the following formula:

Q(S.A)←Q(S_i,A_i)+α(R_i + γQ(S_i+1,A_i+1)-Q(S_i,A_i)) (2)

To obtain Q(S, A), it is necessary to calculate Q (S_i,A_i) which represents the function in state i, (S_i+1,A_i+1) the function in state i + 1, α is the learning rate and R_i is the reward for the action is A_i. Therefore, to update Q(S, A) for each action A in state S must calculate the estimated yield R_i + γQ (S_i+1, A_i+1). The estimated yield is also called TD-Target. Repeated execution of the update rule results in the correct Q(S, A).

Before implementing binary detection, the initialization of the parameters and the DQL must be done. It means that the output from the modeling phase is entered into the DQL and it determines the QV, SV, AV and B_n vectors.

As shown in algorithm1, we use two categories of iteration (inner iteration and outer iteration) to detect intrusion in the binary detection phase. The outer iteration is used to train the DQL model, and the inner iteration is responsible for improving the Q-values. Each B_n represents a state (S) which is determined at the beginning of the outer iteration. The specified current state (S_i) is entered into the outer loops to perform the DQL training process. At the end of the external iteration, the B_n value will be reset and the DNN parameters will be updated if needed.

As mentioned, inner iteration is used to improve Q-values. In the inner iteration, the Q-function is estimated by the DNN function. The values of B_n records of the environment represent the current state of the environment (note that the current state in this step is the output of the modeling step. It means that previous events have also affected the current state).

We use a deep neural network (Q-Network) to calculate Q(S, A). In each inner iteration, the characteristics of each record (values of variables) are entered into the DNN input layer. The DNN output layer also displays Q-values (intrusion/normal). All values obtained for each iteration are stored in the vector Q (Q-Value). The action that has the highest Q value (Q-Value) is predicted as the current action (A'_i) for the desired record according to the following equation.

Action = argmax (Q-Value) (3)

The noteworthy point is that the epsilon-greedy approach is used for exploration in DRL. Epsilon-greedy is a learning strategy based on the definition of reinforcement learning that helps the agent discover all possible actions by increasing the number of explorations and finding the optimal policy. An action selected by the epsilon-greedy approach is an action that is either randomly selected with probability ɛ or an action predicted with probability (1-ɛ). In the first inner iteration, the probability of choosing a random action is high. Over time, due to the use of the epsilon-greedy approach, this probability has decreased and actions are predicted with a probability of (1-ɛ). The DNN used is a three-layer deep neural network that uses the ReLU activation function for all layers. The loss function used is the Mean Square Error function.

In the following, the values in (A^'_i) and the label of each state (A_i) are used to calculate the reward, Eq. (4).

RV_i =Reward (AV_i,label_i) (4)

According to the Eq. (2) we need to calculate TD-Target. For this purpose, we must calculate QV_i+1. As you can see in Fig. 4 the next state (S_i+1) is used to calculate QV_i+1 and obtain TD-Target. Here we also use another deep neural network (Target-Network) with separate parameters from Q-Network). According to the Eq. (5):

QT_i=RV_i + γ Q (S_i+1, Ai_+ 1)

QT_i=RV_i + γ (QV_i+1) (5)

RV_i is the earned reward for S_i, [0, 1] € γ is the discount factor for future rewards and QV_i+1 is the state vector of S_i+1. The value of γ is updated in each iteration. After each iteration, the loss function will be calculated to improve the performance of the neural network (Target-Network). According to Eq. (6) loss function is calculated at the end of each inner iteration.

Loss= (QV_i-QT_i)

Loss = 1/n∑ (Q_s-Q_T) r + γQ(S^,,a^,))² (6)

At the end of the outer iteration, the updated parameters are sent to the Q-Network and Target-Network.

Using two neural networks (Target-Network, Q-Network) will lead to better stability of the model. Note that outer iterations continue until the all dataset is covered.

The event that is detected as an attack is provided to the cloud in order to detect the type of attack.

3.3 Multi-class detection phase

Classifying attacks into different types requires a more complex model and more resources. The multi-class detection phase of the proposed method is less sensitive to delay. Therefore, are used several algorithms in the cloud. The main idea of ensemble methods and the combination of several conceptually different machine learning classifiers is to achieve higher improvement and robustness than individual classifiers. In other words, ensemble methods often provide better estimates than the best estimators. Ensemble models perform best when independent predictors are used as much as possible because the chance of making different mistakes increases and the accuracy of the model increases. The main challenge of Ensemble-based methods is how to combine base classifiers (base learner). Here we use majority voting to aggregate the results.

All the information related to detected events (binary detection phase, multi-class detection phase) is sent to the summary module. This module is responsible for updating the model. The updated parameters are sent for the binary detection phase and the multi-class detection phase.

To evaluate the performance of the proposed method, experiments are carried out on CICIDS2018[31] datasets. First, we describe the dataset used in the experiment, then consider the metrics to evaluate the model, and elaborate on the setting parameters. Finally, we analyze the experimental results and compare our approach and existing techniques.

4.1 Dataset

Assessing the dataset accuracy and collecting based on the real world is a challenging issue. In this study, CICIDS2018 is used as input and the model is evaluated using the CSE-CIC-IDS2018 dataset. As a result of cooperation between the Communications Security Establishment (CSE) and The Canadian Institute for Cybersecurity (CIC) the CIC-IDS2018 datasets were created. They used the concept of profiles to make a cybersecurity dataset that contains an explanation of intrusions and abstract distribution models for protocols and applications. Heartbleed, Botnet, Web attacks, Brute force attacks, DoS, DDoS and infiltration are seven kinds of attacks in the CIC-IDS2018 dataset. Table 1 shows the percentage distribution of data types in CIC-IDS2018 dataset. The dataset includes the captured network traffic and system logs of each machine, along with 80 features extracted.

4.2 Machine Learning Performance Evaluation

We used different measurements to evaluate the performance of our DRL model and other ML algorithms, such as Accuracy, Precision, Recall, and F1- Measure (combines precision and recall). Accuracy is one of the most common metrics to evaluate a model. It measures the percentage of samples that are correctly classified and ignores samples that are misclassified. In general, the accuracy metric is insufficient to evaluate unbalanced datasets such as network traffic data, which usually include normal traffic. Therefore, using various metrics in IDS is necessary. All metrics are derived from the confusion matrix (True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). We analyze our proposed method based on performance metrics as shown in Table 2.

Table 1

Value and percentage distribution of data types in CIC-IDS2018 dataset
Traffic type	Distribution (%)
Benign	83.070
DDoS	7.786
DoS	4.031
Brute force	2.347
Botnet	1.763
Infiltration	0.997
Web attack	0.006

Table 2

Evaluation metrics
Metric	Formula
Accuracy	\(\frac{(TP+TN)}{(TP+FP+FN+TN)}\)
Precision	\(\frac{TP}{\left(TP+FP\right)}\)
Recall	\(\frac{TP}{\left(TP+FN\right)}\)
F1- Measure	\(\frac{2}{\frac{1}{Recal}+ \frac{1}{Precision}}\)

4.3 Implementation details and Parameter setting

Experiments were operated on Intel Core i7-8700 LGA1151 @ 3.70 GHz, 16 GB DDR4 RAM 2400 MHz machine. The proposed model was implemented in Python language. Tensorflow, Keras and Scikit-learn are important machine learning libraries that were used.

The hyperparameters involved in the experiment and the value of the hyperparameters of the algorithms are shown in Table 3.

Table 3

The hyperparameters value of the proposed model
Parameters	Values
Inner itration	50
Outer itration	100
ɛ	0.9
Decoy rate	0.99
γ	0.001
α	0.001
Mini Batch	500
Activation function(Q-network)	ReLU
Activation function(target-network)	ReLU
Hiden layer (Q-network)	3
Hiden layer (Target-network)	2

4.4 Result

At first, we experimented the proposed approach based on three different values of the discount factor (γ) (γ = 0.001, γ = 0.01, and γ = 0.1). The results obtained for Accuracy, Precision, Recall and F-Measure metrics are displayed in the table. According the Table 4, it can be concluded that with the increase of the discount factor (γ), the investigated metrics decrease, and with the decrease of the discount factor, these metrics increase. In general, the amounts of loss and reward obtained during the DQN training process depend on the value of the discount factor.

Table 4

Comparison of performance for different discount factors
	0.001 = γ	0.01 = γ	0.1 = γ
Accuracy	0.9597	0.8629	0.8234
Precision	0.9034	0.8276	0.8128
Recall	0.9171	0.7790	0.7640
F-Measure	0.9689	0.8803	0.8269

Since we get better results with γ = 0.001 in the proposed method, we will do the next comparisons with this value.

Figure 5 shows the binary detection results obtained from the comparison of the proposed method with NB, KNN, DNN, [32] and [33] in terms of accuracy, Precision, Recall and F- Measure. As can be seen, the proposed approach has obtained the highest value compared to other methods in all metrics. For example, the proposed method achieved a Precision of 0.9969.

Our proposed model can detect all kinds of attacks. In the continuation of the experiments, we determined the power of the proposed model in detecting types of attacks with accuracy and F1- Measure metrics. The results obtained are shown in Fig. 6 and Fig. 7 respectively. As it is clear the proposed method has achieved the highest accuracy and F1- Measure compared to other methods. According to Fig. 6 and Fig. 7, our findings demonstrate that training the DRL can result in better performance in multi-class classifying minority classes (web attacks and infiltration) over training on the true imbalanced dataset.

DRL models are suitable to be used in an industrial and online production environment. One of the most important advantages of it is that once trained, it provides the correct action for a specific state (intrusion features). Table 5 shows the training time and prediction time for the proposed DRL-based model and other models (KNN, NB and DNN). We have implemented a Bernoulli NB using only the discrete features in Naive Bayes (NB). The DNN was applied with three hidden layers with 1024, 512 and 128 nodes. Our proposed method requires more training time than methods KNN and NB, but it has much less prediction time. In general, DRL-based methods significantly reduce the prediction time, which makes them suitable for online detection and IoT network services.

Table 5

Training time and prediction time for the proposed DRL-based model and other models (KNN, NB and DNN).
Method	Training Time (sec)	Prediction Time (sec)
KNN	92.73	84.39
NB	6.82	0.98
DNN	308.69	0.82
Our approach	276	0.52

4.5 Summary of results

Our proposed DRL-based IDS provides binary detection and attack type detection in two separate steps. In the evaluation section, both components were examined and compared. The results showed that our proposed method was able to obtain better values in different metrics and provided a more robust solution.

Reducing the number of false negatives is one of the main goals of IDS. Therefore Recall is important in evaluation. According to the results, the model has achieved a high Recall in comparison to the other models. In addition, F1- Measure has also obtained excellent results and it can act as a solution for imbalanced datasets Considering the outcomes presented in Table 5, the proposed model provides much smaller prediction times than the experimented other model. Our solution makes the convergence slower but more stable. Overall, we can conclude that the performance of DRL models depends to a large extent on the choice of the value of the discount factor.

The results of the experiment support the claim that our approach can provide an effective and practical solution to automatically IDS in IoT. However, like other intrusion detection systems, our proposed IDS has some defects. We consider these weaknesses. By examining the limitations and solving them, more effective approaches can be done in the future.

One of the most fundamental limitations in the evaluation of intrusion detection systems is the lack of access to real datasets. We evaluated our proposed method on the CIC-IDS2018 dataset. This dataset has a limited number of attack types. Many other types of attacks that exist in IoT networks have not been considered. In addition, we have trained our method only on the CIC-IDS2018 dataset, thus, it needs to be retrained to be implemented in other environments. Therefore, investigating this method in other data sets with different types of attacks or creating a large dataset with various types of attacks can be future works.

Examining the generative algorithms in the proposed solution to overcome the challenges of unbalanced data and deal with new attacks is another future work.

In this work, we have proposed a Deep Q-learning-based (DQL) IDS in IoT networks. We identify traffic behavior hierarchically. Binary classification is used to allow fast detection in fog, especially for benign traffic. In the next step attack identification can be trained for different classes in the cloud. We model the environment based on GRU. This process aims to find internal relationships between events over a relatively long time. Deep Q-learning received the internal state from GRU as the input to the Q-function approximator, which is used for the action (normal/intrusion). In each iteration, parameters are updated to achieve better results. As mentioned, all events detected as intrusions are sent to the cloud for intrusion detection. In the cloud, several machine learning methods are used to increase robustness and improve performance. Using the ensemble method can be an excellent method for IDS.

We showed that the discount factor parameter is important in regulating the speed of convergence of the algorithm. The low value of this parameter converges the DQL algorithm quickly. Our experiment demonstrated that the suggested DQL approach can learn effectively from the environment in an autonomous manner, be successful in detecting unrepresented classes and can significantly reduce prediction times, which makes it suitable for online detection in IoT networks.

Ethical Approval Not applicable.

Competing interests Not applicable.

Author Contributions All authors contributed to the writing of the manuscript text and reviewed it.

Funding Not applicable.

Availability of data and materials Not applicable.

Wójcicki K (2022) Biega nska, M.; Paliwoda, B.; Górna, J. Internet of Things in Industry: Research Profiling, Application, Challenges and Opportunities—A Review. Energies 2022, 15, 1806
Lee I, Lee K (2015) The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Business horizons 58:431–440
Sabireen H, Neelanarayanan V (2021) A review on fog computing: Architecture, fog with IoT, algorithms and research challenges. Ict Express 7:162–176
Goudarzi M, Palaniswami M, Buyya R (2019) A fog-driven dynamic resource allocation technique in ultra dense femtocell networks. Journal of Network and Computer Applications 145:102407
De Souza CA, Westphall CB, Machado RB (2022) Two-step ensemble approach for intrusion detection and identification in IoT and fog computing environments. Computers & Electrical Engineering 98:107694
Labiod Y, Amara Korba A, Ghoualmi N (2022) Fog Computing-Based Intrusion Detection Architecture to Protect IoT Networks. Wireless Pers Commun 125:231–259. https://doi.org/10.1007/s11277-022-09548-7
Halim Z, Sulaiman M, Waqas M, Aydın D (2023) Deep neural network-based identification of driving risk utilizing driver dependent vehicle driving features: a scheme for critical infrastructure protection. J Ambient Intell Human Comput 14:11747–11765. https://doi.org/10.1007/s12652-022-03734-y
Uprety A, Rawat DB (2020) Reinforcement learning for iot security: A comprehensive survey. IEEE Internet of Things Journal 8:8693–8706
Wang J, Hu J, Min G, et al (2020) Fast adaptive task offloading in edge computing based on meta reinforcement learning. IEEE Transactions on Parallel and Distributed Systems 32:242–253
Huang L, Bi S, Zhang Y-JA (2019) Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks. IEEE Transactions on Mobile Computing 19:2581–2593
Lu X, Xiao L, Xu T, et al (2020) Reinforcement learning based PHY authentication for VANETs. IEEE transactions on vehicular technology 69:3068–3079
Wang T, Liu Z, Zhang T, et al (2022) Adaptive feature fusion for time series classification. Knowledge-Based Systems 243:108459
Abou El Houda Z, Brik B, Khoukhi L (2022) “why should i trust your ids?”: An explainable deep learning framework for intrusion detection systems in internet of things networks. IEEE Open Journal of the Communications Society 3:1164–1176
Sadaf K, Sultana J (2020) Intrusion detection based on autoencoder and isolation forest in fog computing. IEEE Access 8:167059–167068
Illy P, Kaddoum G, Moreira CM, et al (2019) Securing fog-to-things environment using intrusion detection system based on ensemble learning. In: 2019 IEEE wireless communications and networking conference (WCNC). IEEE, pp 1–7
Deng Q, Goudarzi M, Buyya R (2021) FogBus2: a lightweight and distributed container-based framework for integration of IoT-enabled systems with edge and cloud computing. In: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments. ACM, Virtual Event China, pp 1–8
Goudarzi M, Wu H, Palaniswami M, Buyya R (2020) An application placement technique for concurrent IoT applications in edge and fog computing environments. IEEE Transactions on Mobile Computing 20:1298–1311
Sethi K, Kumar R, Prajapati N, Bera P (2020) Deep reinforcement learning based intrusion detection system for cloud infrastructure. In: 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE, pp 1–6
Yang Y, Tu S, Ali RH, et al (2023) Intrusion detection based on bidirectional long short-term memory with attention mechanism
Strickland C, Saha C, Zakar M, et al (2023) DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection. arXiv preprint arXiv:230103368
Lazzarini R, Tianfield H, Charissis P (2023) A Stacking Ensemble of Deep Learning Models for IoT Network Intrusion Detection. A Stacking Ensemble of Deep Learning Models for IoT Network Intrusion Detection
Mohamed D, Ismael O (2023) Enhancement of an IoT hybrid intrusion detection system based on fog-to-cloud computing. J Cloud Comp 12:41. https://doi.org/10.1186/s13677-023-00420-y
Stefanova ZS, Ramachandran KM (2018) Off-policy q-learning technique for intrusion response in network security. World Academy of Science, Engineering and Technology, International Science Index 136:262–268
Lopez-Martin M, Carro B, Sanchez-Esguevillas A (2020) Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Systems with Applications 141:112963
Sharma P, Jain S, Gupta S, Chamola V (2021) Role of machine learning and deep learning in securing 5G-driven industrial IoT applications. Ad Hoc Networks 123:102685
Ashenafi A (2022) A Model to Detect MiTM Attack in IoT Networks: A Machine Learning Approach. PhD Thesis, St. Mary’s University
Daoud WB, Mahfoudhi S (2022) SIMAD: Secure Intelligent Method for IoT-Fog Environments Attacks Detection. Computers, Materials & Continua 70:
Sarwar A, Alnajim AM, Marwat SNK, et al (2022) Enhanced anomaly detection system for iot based on improved dynamic SBPSO. Sensors 22:4926
Sewak M, Sahay SK, Rathore H (2022) Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection. Inf Syst Front. https://doi.org/10.1007/s10796-022-10333-x
Abou Ghaly M, Hannan SA (2024) Protecting Software Defined Networks with IoT and Deep Reinforcement Learning. International Journal of Intelligent Systems and Applications in Engineering 12:138–147
A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018) - Registry of Open Data on AWS. https://registry.opendata.aws/cse-cic-ids2018/. Accessed 26 Dec 2023
Labiod Y, Amara Korba A, Ghoualmi N (2022) Fog Computing-Based Intrusion Detection Architecture to Protect IoT Networks. Wireless Personal Communications 1–29
de Souza CA, Westphall CB, Machado RB (2022) Two-step ensemble approach for intrusion detection and identification in IoT and fog computing environments. Computers & Electrical Engineering 98:107694

No competing interests reported.

A novel reinforcement learning-based hybrid intrusion detection system on fog-to-cloud computing

Status:

Version 1

Abstract

Figures

Introduction

Related Work

Model description

3.1 preprocessing and environment modeling phase

3.2 Binary detection phase

3.3 Multi-class detection phase

Experimental Evaluation

4.1 Dataset

4.2 Machine Learning Performance Evaluation

4.3 Implementation details and Parameter setting

4.4 Result

4.5 Summary of results

Limitation and Future Scope

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1