IoT-based intrusion detection system for healthcare using RNNBiLSTM deep learning strategy with custom features

The need for security in healthcare environments has become increasingly important due to the rise of cyber-attacks and data breaches. In order to address this issue, this paper proposes an Internet of Things (IoT)-based intrusion detection system for healthcare using a deep learning strategy with custom features. This paper proposes an IoT-based intrusion detection system for healthcare using a deep learning strategy with custom features. The proposed system utilizes the IoT technology to gather real-time data from various medical devices and sensors deployed in a healthcare environment. The system incorporates a recurrent neural network (RNN) and a bidirectional long short-term memory (BiLSTM) algorithm to detect and classify intrusion attempts. The custom features are extracted from the incoming data streams and used to train the deep learning models. The proposed system is evaluated on a dataset comprising different types of intrusion scenarios, and it achieves an accuracy of 99.16%, error rate of 0.008371%, sensitivity ratio of 99.89% and specificity ratio of 98.203% for IoTID20 with custom features using RNNBiLSTM. The results demonstrate the effectiveness of the proposed system in detecting and mitigating security threats in a healthcare environment. The system has the potential to improve patient privacy and security, ultimately leading to better healthcare outcomes.


Introduction
IoT devices generate vast amounts of data that make detecting attacks almost impossible without proper mechanisms.Intrusion Detection Systems (IDS), which are defense systems that keep an eye on network activity in Internet of Things devices, can offer this security.A Network Intrusion Detection System (IDS) can detect vulnerabilities exploit within the network traffic and various security attacks.IDS monitors traffic and sends alerts to administrators regarding its findings when there is a deviation from the normal behavior.Networks within the system are watched by the intrusion detection system for suspected invasions.The intrusion detection system monitors system networks for potential intrusions [21].Notification is sent to the administrator by the systems that manages security information and events when an unusual activity is found [23].Upon detecting suspicious activity, the security system is got alerted.
IoT comprises several layers containing a network layer.Data packets are moved between hosts by this layer, which is designed under the customary layers used for internet communication.A key component of IoT architecture, the network layer is multifaceted and prone to many security concerns.There are several security frameworks in place to address these concerns.Figure 1 outlines the proposed IDS Framework environment for IoT.

Figure 1: Proposed IDS Framework Environment
Utilizing machine learning means modeling new network behavior through trustworthiness and evaluating it.Machine learning approaches take into account the ideal model to handle the massive nonstop volume of data provided by IoT devices [22].Deep Learning (DL) is used as a method for the distribution of interpretations and predictions based on the deep analysis of patterns in the data.IoT environments employ various DL techniques to detect patterns of abnormal behavior.By using DL techniques, we can model a prediction system that is effective and adaptive in exploring and classifying volatile and unpredicted intrusions, which are inherent in dynamic attack techniques.A large portion of the study is lacking in the examination of features' misperception related aspects during learning, as well as in the assessment of false-positive (FP) and false-negative (FN) rates during prediction [7,8,21,25].
For intrusion prediction, the proposed model utilizes the IoTID20 dataset.By generating a set of novel features from a dataset, the model fills the gaps in the existing studies, removing the potential for misperception while learning.The prediction's accuracy would rise, and the FN and FP rates would drop, if misconceptions were eliminated [23].
The proposal made by this paper is a system that inspects invasion in smart homes and medical environments IoT devices by making use of the IoTID20 Dataset.By utilizing a deep learning strategy, the prediction system recognizes the intrusion to achieve the following goals.(i) As part of the pre-processing, null value columns, redundant rows are removed and data are encoded to unique format from the dataset, so that features can be evaluated further during prediction.(ii) In order to avoid imperceptions during the evaluation of the features, the clean dataset is utilized to derive innovative Custom Features.(iii) To obtain significant features for prediction, the feature selection strategy is applied to the clean set.In assess prediction accuracy both the custom feature set and significant feature set are used.The intrusions were classified using three advanced deep learning algorithms CNN, ANN and RNNBiLSTM models, which combine the two deep learning algorithms to evaluate accurate result.

Research Contributions
➢ Probability of improving IDS models with an uncluttered dataset.➢ Construct a custom feature set to avoid misperceptions of data.➢ Implementation of the significant feature selection algorithm to an uncluttered dataset.
➢ Forecasting attacks in hefty datasets using custom features.➢ With the significant feature set and customized features set, checked the prediction accuracy.

Paper Organization
This section contains the organization of the paper, which comprises five sections.Section 1 and 1.1 comprise the introduction and summary of the paper.Background information and a review of the literature are covered in section 2. Section 3 contains the entire proposal, including dataset collection, custom feature construction, feature selection, and classification.In section 4, we evaluate the methods we used to detect attacks based on their performance.The inferences of the study are presented in Section 5.

Review of Literature
The work by Hasan Alkahtani et al. describes a robust intrusion detection system that makes use of the Internet of Things for research and employs PSO for feature selection and CNN with LSTM to identify attacks [2].Utilizing the NSL-KDD and UNSW-NB15 datasets, Kun Xie et al. created an intrusion detection system by combining the fuzzy MutiVariateOptimizer algorithm with ANN to identify unique threats [3] [26].Nahida & Farhin et al. [4] performed data analysis for intrusion detection system in the IoT environment using public dataset NSL-KDD, DS2Os, IoTDevNet, IoTID20 and IoTBotnet using various shallow and deep learning algorithms to identify various IoT threats.Yahalom Yahalom et al. [5] developed techniques to enhance detection performance and lower the false positive rate using data produced for IP Cameras and IoT devices on the MIL-STD-1553 communication protocol for hierarchical data using ensemble algorithms [27].
In order to construct an adaptive intrusion detection system with greater level accuracy, Liu et al. [20] used fuzzy rough set-based feature selection and a genetic algorithm-based learning approach.By maximising fitness value hashing to find abnormalities in networks using the CICIDS2017 dataset, Chiba et al. [6] developed an intelligent method for developing deep neural network-based IDS.The intelligent intrusion detection system designed by Vinayakumar et al. [7] detects unpredictable cyber-attacks, based on deep learning.The framework was trained and tested using the datasets NSLKDD, UNSW-NB15, Kyoto, WSN-DS and CICIDS17 to undergo the research.
To detect low frequency attacks, Pajouh et al. [8] presented a unique two-layer dimension reduction model by employing supervised and unsupervised feature selection algorithms for intrusion detection.This model is also proposed to detect attacks not only in network layer but also in other layers like application and support layers using various protocols.Through the use of a deep neural network classification algorithm and binary classification, Mohammed Maithem et al. [9] developed a model to identify unexpected threats.This detection is carried out utilising network packet analysis and connection parameters without knowledge of the packet payload [24].In order to reduce the processing value of the computation by reducing instances in the IoTID20 dataset, Qaddoura et al. [10] introduced novelty in data reduction with K-means++ clustering and SVM-SMOTE oversampling technique with SLFN Classification.
Lirim Ashiku et al. [11] created application of DNN architectural principles in a resilient and adaptive network intrusion detection system for sensing recognised and zero-day attacks using CNN with regularised multi-layered perceptron.Principal Component Analysis is employed in the research by Saba et al. [12] for selecting significant features and to detect assaults used ensemble classifiers on Internet of Medical Things (IoMT) devices in a smart healthcare environment [29].Proved the efficiency of the Intrusion Detection System using the evaluation metrics accuracy, precision, recall and F-score.An efficient autonomous Darknet Traffic Detection System of Abu Al-Haija et al. [13] proposed a multipurpose high performance IoT IDS using CICDarknet2020 datasets using six different machine learning techniques for the Darknet grouped classes VPN and TOR and the system is evaluated using the metrics accuracy, positive prediction rate, harmonic mean score, classification error percent.
A new dataset NSL-KDD which is a subset of KDD99 is proposed by Tavellaee et al. [14] to overcome the issues of KDD99 using statistical analysis and improved the performance measures which have traces of DoS, Probe, U2R and R2L.AWID dataset developed by Kolias et el.[15] is a well-tailored dataset with real traces traffics in 802.11 networks to assist researchers who work on wireless IDS.The dataset harvested the traces of real-time network utilization which include various attacks like FMS attack, Kork Family attack, PTW attack, ARP injection attack of Key Retrieving Attacks, Chip-Chop attack, Caffe Latte attack of Key Streaming Retrieving attacks , Availability Attacks and MITM attacks.Moustafa et al. [16] published UNSW-NB15 dataset for academic research purpose which includes hybrid real time model normal and contemporary synthesized attack network traces to handle attacks of "Worm, ShellCode, Reconnaissance, Generic, Fizzers, Exploit, Dos and Backdoor".Sharafaldin et al. [17] generated CICDDoS2019 dataset to detect common DDoS attacks like "NTP, DNS, LDAP, NetBIOS, SNMP, SSDP, UDP-Lag, WebDDos, SYN, TFTP, ProtScan, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag and SYN" with most important features needed for IDS to detect and provided this dataset to overcome designing realtime DDoS detector.A special attention is given to Mirai Botnet Attack in Hyunjae et al. [18] devised IoT Network Intrusion Dataset which has packets captured from wifi camera and SKT NUGU(NU 100).IoTID20 dataset generated by Ulla et al. [19] is purposely designed to detect IoT Botnet attacks and it families like DoS, MITM, Scan which includes packet traces of IoT attacks of AI Speakers, security camera and smart phone with both flow-based features and network features to detect intrusions in smart object.

Problem Formulation and Motivation
Based on the analysis of the related research ideas and study, most of the work utilized the benchmark datasets NSL-KDD, UNSW-NB15 based intrusion detection system and there is lack and minimized researches based on the IoTID20 dataset.Related researches say various issues like existing benchmark dataset NSL-KDD, UNSW-NB15 deficit features to handle latest IoT based attacks that lead to low prediction level is one of the key challenging problems.Also, most of the work focused on host-based and network-based intrusion detection is one of the key motivations to construct this work.Thus, the work motivated to focus on flow-based and IoT based intrusion detection.Another motivation to develop this work is the method of collecting the significant features and use them for the prediction.As a result of rule-based, data mining and machine learning approaches' inability to predict future IoTID20 attack patterns and their tendency to misinterpret attacks, deep learning techniques were offered as a solution to these problems.

Proposed Model
In addition to the standard features, the dataset's innovative custom feature functions as a crucial component of the design the system that detects intrusion within the IoT context and offers an efficient IDS.The architecture of the proposed model includes five layers which contains different components to predict the attack is shown in Figure 2.Among these layers, the novel contribution of this work is performed in Construction Layer.The construction layer functions with Custom Attribute Constructor that derives novel features from the dataset by employing statistical modelling of the existing features in the dataset.The novelty of custom attribute construction would assist in sophisticated assessment for exact prediction.In the previous work, the features are reduced with the feature selection strategies which may reduce the accuracy and imperceptions.To avoid these, the work proposed custom features set as a unique selection of attributes and also the resultant set is compared with the significant features set using feature selection strategies.The components of the five layers are described as follows,

Data Acquisition
The dataset, IoTID20, contains a wide range of network attack types that can occur in an Internet of Things (IoT) context.The IoTID20 dataset was created using the general smart environment, which uses smart devices including the Smart Lock, Wi-Fi Camera, Smart Alarms, and Smart Light.Tablets, laptops, and smartphones are among the smart devices plugged into the smart Wi-Fi router.IoT victim devices are Wi-Fi cameras, whereas attacker devices are all other devices.

Dataset Description
Monitor mode of a wireless network adapter is used to capture packet files.The dataset IoTID20 contains the IoT Intrusion dataset, which contains 222 MB of records for training and testing and this dataset includes 625784 records with 86 fields [28].An array size of 86 X 625784 represents the input dataset.This dataset is produced in CSV format of the IoTID20 dataset.In IoTID20, each instance is identified with one of the three labels and one of the 83 network features.The features of the dataset are labelled as binary, category, and sub-category.

Data Parser and Cleaner
Deep-learning algorithms are used in the data cleaning process to identify possible errors from large datasets.A second challenge is avoiding learning from noisy data, avoiding building a biased model, without giving reasons for the data quality might have been compromised.It takes a lot of time and effort to clean the dataset and create an error-free dataset for using machine learning data.When data is cleaned by machine learning, the best practices are to fill missing values, remove unnecessary rows, reduce the data's size, and implement a data quality plan.Tolerating missing values in either the training or test datasets is done by ignoring, deleting, or removing them.When fields contain unwanted or irrelevant information, they are removed.The fields Bwd_Byts / b_Avg and Fwd_Byts / b_Avg and contain only the value (0), so this column is removed.Columns that are redundant are eliminated, as well as the reasons for eliminating them.It is unnecessary to include Cat or Sub_Cat, for example, since a column named 'Label' (Anomaly and Normal) exists, therefore the column is excluded to avoid duplicate values being entered.Redundant Column Removal using Brute Force (RCRBF) algorithm is used to remove duplicate values.

Field Encoder
For the deep learning technique, the training set should have numerical values so that all features are converted to numerical values.The dataset contains information about Flow IDs, Source IP addresses, and Destination IP addresses.A binary feature is also included in the "Label" column.In the encoding phase, string and binary features are converted into numeric features, and all data fields are transformed into a common type.

Feature Name Feature Values Encoded Value
Source / Destination IP 192.X.X.X 0TO N Label Anamoly / Normal 0/1

Table 1: Encoding Features Values of IoTID20
As shown in Table 1, IoTID20 fields are encoded with binary values and string values.In the table, the fields are labelled with their binary-encoded values and string-encoded values.For example, the anomaly field is encoded with a binary value of '0' and the IP address field has a string value of 0 to N.

Custom Attribute Constructor
The component for building novel custom attributes is based on the fields that exist in the cleaned dataset Dc is performed in this section.The key aim of constructing novel features set DCA is by integrating the dataset fields to avoid the imperceptions during attack prediction with large amounts of data.These custom features are derived by statistical modelling to obtain non-zero value features and to filter zero value features from the dataset.The analysis and prediction can be applied to the set DCA that would reduce the learning time, false-positive & false-negative rate, and increase the learning rate and accuracy of the attack recognizer.The following custom attributes are constructed by the custom attribute constructor:

c) Average Ratio of Backward Segment Size (ARBSS):
The feature is computed with the average ratio for the backward packet segment size is computed with the total number of forwarding and backwarding segment size.

d) Ratio of Forward IAT (RFIAT):
The ratio of inter-arrival time for the forward packets transmitted is computed in this feature.The RFIAT is computed by evaluating the IAT of forward packets proportionate to the combination of IAT of forward packets and backward packets.

Attribute Reducer
In accordance with the feature selection technique, Attribute Reducer decreases the fields in the cleaned dataset D to retrieve the Significant Features.Using a novel feature selection technique, the attribute reducer finds the significant features from the set Dc as follows:

Proposed Method-Accessible Contiguous Attribute Assessment and Selection (ACAAS)
In the Accessible Contiguous Attribute Assessment and Selection (ACAAS) method, the significance of contiguous attributes (i.e.obtained by the arithmetic mean of sequence variable sub set) may differ, since the more rapidly a variable can be accessed, the greater the effect on the prediction of the target variable.It is assumed that the most relevant features of the accessible contiguous are most accurately used to determine the target feature variable.The set of [fs1, fs2, . . .fsm} consist of numerical variables of 'm', {x1, x2, . . ., xn} consists of 'n' observations of dataset DN, and y = (y1, y2, . . ., yn) be the target variable.In ACAAS, the selected FSi represents the arithmetic mean of the target variable's accessible contiguous attribute from the training set, which is a real number derived from the selected variable's value in the observation xi from the test set.
Where, ACAttr denotes the accessible contiguous attributes, and yTarget(j) is the target variable feature labelling of the ACAttr of testing set xi.However, this attribute levelling extension assumes that observations that are correlated with one another are similar.Therefore, estimates of the target feature variable should be more intensely affected by the correlated contiguous attributes, even when all attributes with accessible contiguous are equal in importance.A levelling assessment of the attributes can also be used to determine the effectiveness of an attribute for determining the target feature variable.

The selection of target variable 𝑦 𝑖
̂ after considering the correlation assessment levelling can be defined by: Where the correlation among xi and xj is represented by Cij.l(Cij) represents a levelling function of the association Cij, and the value of l(Cij) is between 0 and 1.
The proposed ACAAS has the advantage of attribute levelling and selection.The assumption here is that the most relevant attributes of the most accessible attribute are most likely to determine the target feature variable most accurately.To determine correlations between observations, use a feature levelling technique.A levelling correlation exists between the two random observations xi and xj if Lv is the levelling vector and Cij is the correlation between them, then for two observations, this correlation is: In this correlation association, an accessible contiguous attribute is selected.It is important to note, however, that a change in the levelling can result in a change in the feature vector with ACAttr.

Attack Recognizer
With the illustration, the attack recognizer describes the classification strategies used among the detection layers to identify attacks throughout the network.As part of accuracy analysis, detecting rates and false alarm rates are key factors.To increase the prediction rate and reduce false alarms, the intrusion detection system must be enhanced.Deep learning algorithm can detect anomalies in networks without labelled datasets by learning the typical patterns and behaviours of the network.While it is capable of detecting new types of intrusions, it is also prone to false positives.

Training and Testing Splitter
By using the FS algorithm, the significant features SFS are determined by the cleaned dataset Dc.For the purposes of using a prediction model, the dataset Dc is divided into a training set and a testing set.Training and Testing Splitter parts the dataset by applying the rule of randomly sample fraction (one-fourth) of the data spontaneously.Along with the labels of the class, SFS has been assigned to DTrain.A 75:25 split ratio is used to divide the set into training (DTrain) and testing (DTest) portions.

𝑫 𝑻𝒓𝒂𝒊𝒏 = 𝟕𝟓% 𝒂𝒏𝒅 𝑫 𝑻𝒆𝒔𝒕 = 𝟐𝟓%
To train the structure data, one of the well-known models is the recurrent neural network.The conventional RNN model has learning issues when training large sets.The proposed model for RNN with bidirectional long short-term memory is used to solve this problem.

Recurrent Neural Networks with Bi-Directional Long Short Term Memory (RNNBiLSTM)
Feed-forward neural networks with repeat cycles are recurrent neural networks (RNNs).An RNN, on the other hand, is cyclic, which makes it ideal for modeling sequences of events.The input sequence, known as the hidden vector sequence, and the output sequence are denoted by X, H, and Y. X, H, and Y represent the input, hidden vector, and output sequences, respectively.The input sequence is shown by X = (x1, x2, ..., xT).With respect to t = 1 to T, an RNN computes the hidden vector sequence (H = (h1, h2,..., hT)) and the output vector sequence (Y = (y1, y2,..., yT yT)) as follows.
=  ℎ ℎ  +   ( 22) Assigning function 'σ' to a nonlinearity function, 'W' to a weight matrix, and 'b' to a bias term.With back-propagation, RNNs deal with inputs of variable lengths during the training session.The model is trained using TT data first.Each step is then saved with a gradient describing the output error.The gradient is easy to spot when it disappears or expands during TT training despite the difficulty of training the RNN.

(i) BiLSTM RNN Model
In sequence classification problems, bidirectional LSTMs can improve the model's performance.When every time step in the input sequence is available, this is applicable.For bidirectional LSTMs, two LSTMs are trained on the input sequence rather than just one.In the network, the first recurrent layer is duplicated, resulting in the presence of two layers.Figure 3 shows how the input sequence is given to the first layer in its original form, then to the second layer as an inverted copy.

Figure 3. RNNBiLSTM Layers Formulation
Multilayer LSTM RNNs are constructed by stacking LSTM layers.LSTM RNNs are already deep architectures because each layer of LSTM RNN shares the same set of parameters, so it can consider them as feed-forward neural networks.As shown in the model, the inputs undergo multiple non-linear layers, but a single non-linear layer only processed each time instant's features before causing the output for that instant.
A bidirectional recurrent neural network (BiRNN), which can be trained utilizing all the input data in the past and the future of a certain time frame, overcomes the constraints of a conventional RNN described in the preceding section.Essentially, the state neurons of a standard RNN are split into two sections: one that manages the forward states (positive direction of time) and the other that manages the backward states (negative direction of time) (backward states).There is no connection between the inputs of backward stages and the outputs of forwarding states.
In the absence of the forward states, an RNN with reversed time axis is obtained.Due to the fact that both time directions are handled by a single network, it is possible to minimise an objective function directly with input data from the past and the future of the time frame that is being considered, as opposed to the traditional unidirectional RNN, which requires delaying the inclusion of future data.

(b) Input Layer of RNNBiLSTM
The RNN can only process data in one direction: from the input layer through the hidden levels, and finally to the output layer.Without ever contacting a node twice, information is transported throughout the network from one node to another.The information flows around the RNN through a loop.It considers the current input as well as the lessons it has learnt from prior inputs before making a decision.The short-term memory of RNNs is typical.Long-term memory is included in addition to the LSTM.In this sense, the memory can be compared to a gated cell, where the cell decides whether to store or delete information based on the priority it gives it [31].The input layer of the RNNBiLSTM model is initiated with an input shape of 25 X 337006 for FS_IoTID20 and 17 X 337006 for the CF_IoTID20 dataset.The input layer NI is constructed with the size of 25 and 17 for FS_IoTID20 and CF_IoTID20 respectively.

(c) Hidden Layer of RNNBiLSTM
RNNs use a feedback loop to link the hidden layer.Due to the recurrent feedback link, the hiddenlayer input of the RNN at a given time is partially derived from the hidden-layer output at a prior moment, indicating that the RNN has memory for all previous moments.The input sequence is scanned by two distinct hidden recurrent layers of the RNNBiLSTM algorithm in opposing directions.As the output layer is connected to both layers, it is possible to retrieve data in mutually beneficial ways.
The hidden layer is created using input sizes of 25 and 17 for FS_ IoTID20 and CF IoTID20, respectively.At time t of the RNN forward direction, x (t) is fed into the hidden layer, which constructs the hidden layer.Forward RNN status values are transmitted to the forward layer, while backward RNN status values are transmitted only to the backward RNN layer.

(d) Bi-LSTM Layers
Both forward and reverse RNNs receive the input of each time node.The RNNBiLSTM analyzes the entire sequence before calculating the specific result.Forward and backward passes across the BiRNN network are conducted using the TT in a manner similar to standard MLP.For the determination of all predicted outputs, input data are obtained through the BiRNN through one slice of time (1 ≤ t ≤ T).This forward pass is performed for forward states where t=1 to T, and the backward pass is performed where t= T to 1.The output layer forward pass is transferred using forward layer activation 'Softmax' and the return sequence 'True' for the two CF_IoTID20 and and FS_IoTID20.The unbiased derivative of the function is also evaluated for the forward layer time slice (1 ≤ t ≤ T).This layer is then backward-passed (neurons).The advance states occur from t= T to 1 and the backward states occur from t= 1 to T during the backward phase.In the Bi-LSTM layers, the backward layer activation 'Adam' with the return sequence of 'True' occurs.

(e) Output Layer of RNNBiLSTM
By combining their outputs and connecting them to the RNNBiLSTM outputs, the forward and backward outputs are synthesized (0/1).Training calculates the current time node loss.In this respect, RNNBiLSTM does the reverse transmission process of the RNN, which is trained by the TT.In this method, the output layer error is first calculated and then passed along to the hidden layers of the forward and reverse RNN.The model parameters are then optimized according to the gradient of the error.

(f) Optimizer
Using the RMSprop optimizer, the proposed model restricts the variations in the vertical direction while enhancing the learning rate.To reduce losses, the RMSprop optimizer changes network attributes such as weights and learning rates.Optimization problems are solved by minimizing functions.To obtain the predicted class label which is optimized by the 'RMSprop' optimizer and 'categorical_crossentropy()' as loss function.

Performance Evaluation
Python3.2 in Anaconda is used to implement the framework.It utilises a Windows with 64-bit operating system and has an Intel Core i7-4600M processor operating at 2.90 GHz, 16 GB of RAM, and an 8 GB GPU.The effectiveness of the suggested model is assessed, and it is found to perform better than others in terms of prediction accuracy and error rates.The following performance metrics are used to illustrate how well the suggested model performs.

Comparative Analysis
The comparative analysis for the existing algorithms applied to the IoTID20 dataset is described in this section with illustrations.Because attack patterns are changing, it is necessary to increase the accuracy of the machine learning algorithm in order to forecast new attacks [19] comparative analysis includes the comparison of benchmark methods employed to IoTID20 with the proposed model RNNBiLSTM using metrics accuracy and error rate is illustrated as follows.

Figure 7: Accuracy Ratio
The accuracy ratio analysis based on the benchmark methods are compared with the proposed model RNNBiLSTM is shown in Figure 7.The figure shows that the proposed model RNNBiLSTM gives high accuracy rate than others while employed in IoTID20.

Conclusion
To enhance the operational functionality of IoT devices that connects various utility services, government services, health services for citizen welfare, an assured enormous reliable safety network to be provided.One of the components to achieve the above is providing an Efficient Intrusion Detection System.study suggests a framework for effective intrusion detection system that employs deep learning techniques to analyse and identify intrusions on the IoTID20 dataset.Based on the dataset and the novel custom feature derivation, the proposed model can make an effective prediction.The model uses IoTID20 and IoTID20_CF features in order to make a prediction using deep learning.Data cleaner and parser are used to eliminate duplicate records, missing fields, and null values from the raw dataset.From the cleaned dataset novel custom features are constructed to improve learning rate and to avoid misperception during prediction.
ACAAS is employed in this study to improve detection and prediction accuracy since it selects the most significant features.RNNBILSTM is used to identify attacks of CF_IoTID20 and FS_IoTID20.In terms of the performance measures accuracy rate, error rate, specificity, sensitivity, FDR, and FOR, results says that the proposed RNNBiLSTM technique with innovative custom features set CF IoTID20 outperforms existing methods.

•
Data Extraction Layer with Data Acquisition component • Pre-Processing Layer with Data Parser and Cleaner, Field Encoder components • Construction Layer with Custom Attribute Constructor component • Selection Layer with Attribute Component • Detection Layer with Attack Recognizer component

Figure 2 .
Figure 2. Proposed Architecture Algorithm RCRBF() { Input: Dataset -D Output: Duplicate Columns -DDupli Compute the size of the D DSize = Find_Size(D) // calculating the size of the dataset Initialize the array DDupli as a null set // duplicate data array is initialized as empty For each i in 1 to DSize For each j in i+1 to DSize Fetch records from the dataset D If (D[i] == D[j] and not in DDupli DDupli.append(D[i])Return DDupli } a) Unique ID (UPID): This feature describes the unique ID for references of the networking transaction along with intrusion attack.It is the combination of Source & Destination IP Addresses with ports and Protocol.  = [_, _, _, _, ] ( 1) ℎ  ∈ . b) Average Ratio of Forward Segment Size (ARFSS): The average forward segment size of the transmitted packets in proportion to the sum of the total number of forward segment size and backward segment size.This feature provides the AFSS for the average size computation. =     /(    +     )

e)
Ratio of Backward IAT (RBIAT):The IAT ratio for backward is computed using this feature by evaluating backward IAT proportional to the sum of IAT of forward and backward packets transmitted.=   /(  +   )( 5)f) Subflow Forward Byte Packet Ratio (SFBPR):The computation of sub-flow forward byte packet ratio is evaluated in this feature.The ratio is evaluated by apportioning the bytes with packets that flow in the forward direction.=   _ /  _( 6)g) Subflow Backward Byte Packet Ratio (SBBPR):The ratio of backward byte packets is evaluated by backward bytes with the proportional to the backward packets that are flowing in the backward direction. =   _ /  _ ( 7) h) Ratio of Sub-flow Forward Packets (RSFFP): This feature is used to evaluate the ratio of sub-flow packets that flow in the forward direction.The evaluation is performed based on the total number of forwarding sub-flow packets with the sum of sub-flow packets of both forward and backward. = __/__ + __ ( 8) i) Ratio of Sub-flow Backward Packets (RSFBP): The evaluation of backward sub-flow packets is described in this feature.The computation is performed by sub-flow of backward packets proportional to the number of packets flowing in forward and backward. = __/__ + __ ( 9) j) Ratio of Total Length of Forwarding Packets (RTLFP): The ratio of the total length of forwarding packets is computed with the total number of forwarding packets proportion to the total length of packets of forwarding and backward direction. = __/__ + __ ( 10) k) Ratio of Total Length of Backward Packets (RTLBP): The total length of backward packets ratio is evaluated by proportion to the total length of the backward packets with a sum of total length packets of forwarding and backward direction. = __/__ + __ ( 11) l) Forward Packet Duration (FPD): The duration time is in microseconds and is calculated as part of the total number of forwarding packets divided by the flow time of the forwarding packets. = __ / _ ( 12) m) Backward Packet Duration (BPD): The computation of the backward packet duration is evaluated using the fraction of the total number of backward packets with duration taken for the flow of packets. = __ / _ ( 13) n) Skew of forwarding Packet Length (SFPL): The skew of forwarding packet length is computed using this feature concerning the average of maximum and minimum length of forwarding packets which is partial to the evaluation of meantime for the packets active and standard deviation of the forward packets. = 3 * ((___ + ___)/2 − ___)/ ___ ( 14) o) Skew of Backward Packet Length (SBPL): This feature is used to compute the skew of backward packet length with the backward packet length and active packets mean time proportional to the standard deviation of the packet's length in the forward direction. = 3 * ((___ + ___)/2 − ___)/ ___ ( 15) p) Skew of Active (SA): This feature is used to compute the skew of the active flow of the packets among the network is evaluated by combining active maximum and minimum time flow for the active meantime and standard deviation. = 3 * ((_ + _)/2 − _)/ _ ( 16) q) Skew of Idle (SI): The feature evaluated the idle skew with the minimum and maximum idle time for the packets proportional to the idle meantime and standard deviation. = 3 * ((_ + _)/2 − _)/ _ ( 17)

1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : 11 :
Read the DN set and retrieve the feature set FS Obtain the features from the set FS = {fs1, fs2, . . ., fsm} For each i to 'n' from FS(i) Choose the contiguous attribute from the set xi Evaluate the target variable yTarget from the set Calculate the correlation among the features for leveling vector Cij(LV) Assign leveling for the target variable based on the correlation level l(Cij) Compute Cij correlation among the observation xi and xj Assign leveling values of the evaluated features in the vector LV Sort attribute assessment based on the vector LV Select the top assessed features with non-zero level for ACAttr 12: Return ACAttr 13: End for }

Figure 4
Figure 4 shows an evaluated metrics of the prediction accuracy rate and an error rate of the classification.The figure shows that the proposed classifier RNNBiLSTM gives a high accuracy rate of 99.16% and has a lower error rate for the set IoTID20_CF.

Figure 4 .
Figure 4. Accuracy and Error Rate Evaluation

Figure 5 .Figure 5
Figure 5. Sensitivity and Specificity RatioFigure5illustrates how the sensitivity and specificity ratios are evaluated.According to the figure, the proposed model RNNBiLSTM provides high sensitivity and specificity of 99.49% and 99.90%, respectively.For the IoTID20 and IoTID20_CF sets, Figure6illustrates the evaluation of False Omission Rate and False Discovery Rate.FOR and FDR are lower for the proposed model RNN than other methodologies.

Figure 8 :
Figure 8: Error RateThe evaluation of error rate comparison shows that proposed model RNNBiLSTM provides less error rate when compared with other methods is shown in Figure8.
. The