A Novel Fully Convolutional Neural Network Approach For Detection and Classification of Attacks on Industrial IoT Devices in Smart Manufacturing Systems

doi:10.21203/rs.3.rs-1739779/v1

Download PDF

Research Article

A Novel Fully Convolutional Neural Network Approach For Detection and Classification of Attacks on Industrial IoT Devices in Smart Manufacturing Systems

https://doi.org/10.21203/rs.3.rs-1739779/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Recently, Internet of things (IoT) devices have been widely implemented and technologically advanced in manufacturing settings to monitor, collect, exchange, analyze, and deliver data. However, this transition has increased the risk of cyberattacks, exponentially. Subsequently, developing effective intrusion detection systems based on deep learning algorithms has proven to become a reliable intelligence tool to protect Industrial IoT devices against cyber threats. This paper presents the implementation of two different classifications and detection utilizing the Long Short-Term Memory (LSTM) architecture to address cyber-security concerns on three benchmark Industrial IoT datasets (BoT-IoT, UNSW-NB15, and TON-IoT) which take advantage of various deep learning algorithms. An overall analysis of the performance of the proposed models is provided. Augmenting the LSTM with Convolutional Neural Network (CNN) and Fully Convolutional Neural Network (FCN) achieves state-of-the-art performance in detecting cybersecurity threats.

Malicious attacks

Industrial IoT

Machine learning

Classification and detection

Cybersecurity

Big data

The continuous integration of cyber-physical systems (CPS) into the internet has led to a boom in smart IoT devices and the emergence of various applications of Industry 4.0 [1],[2] such as smart manufacturing. A smart manufacturing system is heavily made up of complex networks of large-scale CPS that are safety-critical and rely on networked and distributed control architectures [3]. The decreasing cost of sensors and advanced single board computers combined with better access to high bandwidth wireless networks (currently in its fifth generation - 5G) have encouraged the proliferation of the Internet of things (IoT) systems into manufacturing systems [4]. However, those who choose to reap the benefits of IoT systems have to also face the ever-growing threat of exposure to attacks. Thus, the security of IoT systems has become a very critical issue for individuals and businesses. IoT systems have been targeted by malicious third parties and the trend has been increasing exponentially in numbers and growing in complexity and diversity after the emergence of Mirai in 2016 [5]. It has been reported that during the years 2013-2017, not a single month has gone by without news about sensitive user information data being exposed on the web due to an online breach to a certain enterprise [6]. According to the Industrial Control Systems Monito Newsletter issued by the U.S. Department of Homeland Security, it is estimated that one-third of these cyber-attacks target the manufacturing sector making manufacturing systems at the heart of such attacks [7],[8]. Moreover, based to the National Institute of Standards and Technology (NIST) - part of the U.S. Department of Commerce -, these attacks via cyberspace, target an enterprise's use of cyberspace to disrupt, disable, destroy, or maliciously controlling a computing environment/infrastructure; or destroy the integrity of the data or steal controlled information [9]. To address the increased risks and challenges of the growing number and potential of cyber-attacks, realistic protection and investigation countermeasures such as network intrusion detection and network forensic systems need to be developed effectively [10],[11]. Although, several research have been done to solve and decrease the risk of cyber-attacks with different machine learning models and algorithems [10],[11], it is necessary to implement novel and efficient methods to keep protections updated. In this paper, for the first time, we propose and compare the use of two novel models, reliable, and effective data analytics algorithms for time-series classification on three different and unique datasets. The first approach is Long Short Term Memory Fully Convolutional Network (LSTM-FCN) and the second approach is Convolutional Neural Network with Long Short Term Memory (CNN-LSTM). The results of the current study show how such approaches can be utilized to enhance the deterrence level of malicious attacks in industrial IoT devices.

The last three decades have been marked by a significant increase in available data and computing power. Nowadays, data analytics is at the forefront of the war against cyber-attacks. Cybersecurity experts have been utilizing data analytics not only to improve the cybersecurity monitoring levels over their network streams but also to increase real-time detection of threat patterns and to conduct surveillance of real-time network streams [12],[13],[14]. Both supervised learning and unsupervised learning techniques in data analytics have been used in the detection process of malicious attacks [12],[15]. One of the special features of Neural Networks (NN) is that they can be used in both supervised and unsupervised learning processes. Neural Networks were inspired by the way the human brain works. NN is composed of different data layers which makes them the best-suited algorithms to be used in different Artificial Intelligence (AI) and Machine Learning (ML) applications.

Recurrent Neural Networks (RNNs) propagate data forward and also backward from later processing stages to earlier stages (networks with cyclic data flows that can be used for applications in natural language processing and speech recognition) [16]. RNN was used to achieve a true positive rate of 98.3% at a false positive rate of 0.1% in detecting malware [17]. In another recently published paper, Shibahara et al. [18] used RNN to detect malware based on network behavior with high precision. Also, Loukas et al. [19] have used RNN on a Vehicle's Real-time data [19] to develop a mathematical model to detect cyber-physical intrusion for vehicles using a Deep Learning (DL) approach. Despite many advantages, one problem with RNN is that it can only memorize part of the time series which results in lower accuracy when dealing with long sequences (vanishing information problem). To solve this problem, the RNN architecture is combined with Long Short Term Memory (LSTM) [20]. An RNN-LSTM approach has been used in intrusion detection systems to detect botnet activity within consumer IoT devices and networks [21],[22].

LSTM [20] refers to neural networks that are capable of learning order dependence in sequence prediction and able to remember a lot of previous information using Back Propagation (BP) or previous neuron signals and include it in the current processing. LSTM can be leveraged with various other architectures of NN. The most noticeable application for such network builds is seen in text prediction, machine translation, speech recognition, and more [16]. LSTM suggests an improvement to the RNN model by replacing the hidden layer nodes with three gates structure (forgetting, input, output) that acts on memory cells through the Sigmoid function. These memory cells are responsible for trading-off information by storing, recording, and updating past data [24].

Convolutional Neural Network (CNN) uses a feed-forward topology to propagate signals, CNN is more often used in classification and computer vision recognition tasks [16]. Kim et al [26] used KDD CUP 1999 and CSE-CIC-IDS2018 data sets to develop a CNN model to detect denial of service category intrusion attacks, early results showed a high accuracy detection that ranged between 89% − 99%. CNN was also used by Wang et al [27], McLaughlin et al [28], and Gibert [29] to detect malware. The latter evaluated their technique using a Microsoft Malware Classification Challenge dataset and managed to outperform other methods in terms of accuracy and classification time. Wang et al [27] proposed a malware traffic classification method using a convolutional neural network by taking traffic data like images and then presented his method as a new taxonomy of traffic classification from an artificial intelligence perspective. In a unique study, Yu et al. [30] suggested a neural network architecture that combines CNN with autoencoders to evaluate network intrusion detection models. Also, Kolosnjaji et al. [31] proposed neural network architecture that consisted of CNN combined with RNN to better detect malware from a VirusShare dataset showing that this newly developed architecture was able to achieve an average precision of 85.6%. The same approach was also utilized by [32],[33] to detect domain generating algorithms codes that provide malware with new demands on the fly to prevent their servers from being detected and flagged. In conclusion, CNN is a DL network architecture that learns directly from data without the necessity of manual feature extraction. It is worth noting that CNN can also be very effective for classifying time series, and signal data.

A Fully Convolutional Neural Network (FCN) is a CNN without fully connected layers [34]. A major advantage of using FCN models is that it does not require heavy preprocessing or feature engineering since their’ neuron layers are not dense (fully connected) [35]. FCN has been used [36] to detect fake fingerprints and it was shown that FCN provides high detection accuracy in addition to less processing times and fewer memory requirements compared to other NN. In this paper, LSTM will be combined with FCN and CNN to show how these two models can be used to accurately detect cyber security threats with three different datasets. LSTM gives any NN model the ability to almost seamlessly model problems with multiple input features.

Network Intrusion Detection Systems (NIDS) based on DL algorithms have proven to be a reliable network protection tool against cyber-attacks [37]. In this paper, we applied state-of-the-art DL algorithms on three benchmark NIDS datasets known as UNSW-NB15, BoT-IoT, and ToN-IoT. These three different datasets were released by The Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) in the years 2015, 2018, and 2020, respectively.

3.1 Preprocessing of the Bot-IoT Dataset

The Bot-IoT dataset [11] contains roughly 73 million records (instances). The BoT-IoT dataset was created by the Cyber Range Lab of UNSW Canberra. The process involved designing a realistic network environment that incorporated a combination of normal and botnet traffic. For better handling of the dataset, only 5% of the original set was extracted using MySQL queries. The extracted 5%, is comprised of 4 files of approximately 1.07 GB total size, and about 3 million records, [38],[39],[40],[41],[42]. The dataset includes a range of attack categories as shown in Table 1.

Table 1

List of Attacks in the Bot-IoT Dataset [37].
Attack Type	Definition	Counts
Denial-of-Service (DoS)	A type of attack is a malicious attempt to overflow the internet traffic of an IoT device or its’ surrounding infrastructure (sensors). The attack leaves the IoT device unavailable making it inaccessible to its intended users	56833 counts of recorded attacks
Distributed Denial-of-Service (DDoS)	A type of attack similar to a DoS attack but uses multiple attack resources (computers) to flood a targeted IoT device	56844 counts of recorded attacks
Operating System Scan (OS Scan) also known as Reconnaissance or Prope	A type of infiltration attack that uses Nmap tool to scan the operating system of the targeted IoT device to capture network vulnerabilities [43]	470655 count of recorded attacks
Keylogging (Theft)	A type of information theft in which an attacker compromises a remote host for an IoT device to record the administrator's keystrokes, potentially stealing sensitive information [44],[45]	Both Keylogging and Data Exfiltration (theft attacks) had a count of 1909 recorded attacks
Data Exfiltration (Theft)	A type of information theft in which an attacker compromises an IoT device to gain unauthorized access to data to transfer it on a remote attacking machine [44],[45]	Both Keylogging and Data Exfiltration (theft attacks) had a count of 1909 recorded attacks
Benign (No attack)	Just normal unmalicious flow of data traffic	13859 counts of no attacks

This data set contains 45 explanatory features and one binary response feature (attack or benign), only 16 of the 45 features were used as input to our models. In all conducted deep learning models and for all used datasets, feature selection was employed when the algorithm itself extracts the important features.

Furthermore, an upsampling technique [46],[47] was used to overcome the heavily imbalanced binary response feature. The feature contained only 13859 minority counts of benign compared to a whapping 586241 majority counts of attack. Upsampling procedure prevents the model from being biased toward the majority label. The existing data points corresponding to the outvoted labels were randomly selected and duplicated into the training dataset.

Since input numerical features have different units which means that they have different scales, the SKlearn Standard Scaler was utilized to standardize numerical features by subtracting the mean and then scaling to unit variance by dividing all the values by the standard deviation [48].

DL models require all features to be numeric. For categorical features where no ordinal relationship is in existence, the integer encoding (assigning an integer to each category) can be misleading to the model and results in poor performance or unexpected results (predictions halfway between categories) as it allows the model to assume a natural ordering between categories. In this case, a one-hot encoding can be applied to the categorical representation [49].

3.2 Preprocessing of the UNSW-NB15 Dataset

The UNSW-NB15 dataset was created by capturing 100 GB of the raw traffic data packets using the IXIA PerfectStorm tool at the Cyber Range Lab of UNSW Canberra. It was created by generating a hybrid of real modern normal activities and synthetic contemporary attack behaviors [50],[51],[52],[53],[37]. Table 2 shows a list of all the attacks that were generated.

Table 2

List of Attacks in the UNSW-NB15 Dataset [37]
Attack Type	Definition	Counts
DoS	An attack that aims to prevent access or availability to data by overloading a computer system's resources with traffic	5051 counts of recorded attacks
Fuzzers	An attack that aims to discover security vulnerabilities in a system. Then cause it to crash by sending large amounts of random data	19463 counts of recorded attacks
Cross-Site Script (XSS) or Analysis	An attack that targets web applications or end-users through ports, emails, and scripts	1995 counts of recorded attacks
Backdoor	An attack that bypasses security mechanisms by replying to specific constructed client applications	1782 counts of recorded attacks
Exploits	An attack that executes a sequence of commands to control the behavior of a host to exploit a vulnerability	24736 counts of recorded attacks
Generic	An attack that targets cryptography	5570 counts of recorded attacks
Reconnaissance	A type of infiltration attack that uses Nmap tool to scan the operating system of the targeted IoT device to capture network vulnerabilities [43]	12291 counts of recorded attacks
Shellcode	A malware attack	1365 counts of recorded attacks
Worms	An attack that replicates itself to spread to other computers	153 counts of recorded attacks
Benign (No attack)	Just normal unmalicious flow of data traffic	550712 counts of no attacks

This data set contains 48 explanatory features and one binary response feature (attack or benign), only 42 of the 48 features were used as input to our models. Similar to the previous dataset, One-Hot Encoder was used to encode categorical features [49].

Since our input features contained a significant amount of outliers, the SKlearn Robust Scaler was used to scale the features and make them robust to outliers. This can be achieved by calculating the median and the interquartile range (IQR). The values of each feature then have their median subtracted and are divided by their IQR [54].

3.3 Preprocessing of the TON-IoT Network Dataset

This is one of the newly generated datasets in an Industry 4.0 environment. The dataset can be used to evaluate the fidelity and efficiency of different cybersecurity applications based on various DL algorithms [55],[56],[57]. The datasets were collected from a realistic and large-scale network designed at the Cyber Range and IoT Labs of UNSW Canberra [58],[59],[60],[61],[62]. Table 3 shows a list of all the attacks that were generated.

Table 3

List of Attacks in the TON_IoT Network Dataset [37].
Attack Type	Definition	Counts
DoS	An attack that aims to prevent access or availability to data by overloading a computer system's resources with traffic	17717 counts of recorded attacks
Backdoor	An attack that bypasses security mechanisms by replying to specific constructed client applications	17247 counts of recorded attacks
DDoS	A type of attack similar to a DoS attack but uses multiple attack resources (computers) to flood a targeted IoT device	326345 counts of recorded attacks
Reconnaissance	A type of infiltration attack that uses Nmap tool to scan the operating system of the targeted IoT device to capture network vulnerabilities [43]	21467 counts of recorded attacks
Injection	An attack that injects untrusted SQL and Codes to alter the course of execution	468539 counts of recorded attacks
Man in the Middle (MITM)	An attack that intercepts traffic and communications between the victim and the host with which the victim is trying to communicate	1295 counts of recorded attacks
Password	An attack that aims at recovering or retrieving passwords	156299 counts of recorded attacks
Ransomware	An attack that takes control over the victim’s files or devices then asks for compensation in exchange for bringing it back to normal	142 counts of recorded attacks
XSS	An attack that targets web applications or end-users through ports, emails, and scripts	99944 counts of recorded attacks
Benign (No attack)	Just normal unmalicious flow of data traffic	270279 counts of no attacks

This data set contains 45 explanatory features and one binary response feature (attack or benign), only 20 of the 45 features were used as input to our models. Similar to the UNSW-NB15 dataset, One-Hot Encoder was used to encode categorical features [49] and

SKlearn Robust Scaler was used to scale the features and make them robust to outliers [54].

Table 4 shows a summary of the datasets' characteristics and the preprocessing techniques that were used on them. Note that the benign to attack ratio for the BoT-IoT dataset is imbalanced and thus upsampling was used on it.

Table 4. Summary of the characteristics of the datasets and the deployed preprocessing techniques

	BoT-IoT	UNSW-NB15	TON-IoT
Balanced data		✓	✓
Upsampling used	‏‏✓
Outliers being overrepresented		✓	✓
Feature Scaling	✓	✓	✓
Robust Scalar used		✓	✓
Standard Scaler used	✓
Use of One-Hot Encoding	✓	✓	✓
Released Year	2018	2015	2020
Total #of features	46	49	46
Benign to attack ratio	0.2 to 10	7.61 to 1	2.41 to 10

Two main architectures were proposed to generate our four different models on the three different datasets, CNN-LSTM (Fig. 1) and LSTM-FCN (Fig. 2). The proposed CNN-LSTM architecture uses a one-dimensional convolutional hidden layer that operates over a 1D sequence with 3 filters (collection of kernels that are utilized to store values learned during the training process) and a kernel size of 32. The convolutional hidden layer is accompanied by batch normalization to normalize its input by applying a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. The hidden layer is used for feature extraction. An activation function is used in the hidden layers of a neural network to allow the model to learn more complex functions. In our architecture, we used Rectified Linear Activation (ReLU) to enhance the results of the training. The ReLU is Followed by then a MaxPooling1D layer whose job is to reduce the learning time by filtering the input (output of the previous layer) to the most salient new output. A dropout layer was introduced to avoid overfitting, a common issue in LSTM models. The introduced dropout layer had a probability value of 0.2 at which outputs of the layer are dropped out. The output of the dropout layer is then passed into the LSTM block. The LSTM block comprises a single hidden layer made up of 8 LSTM units, and an output layer used to make a prediction. The LSTM block is followed by a Dense layer (a Dense layer receives input from all neurons of the previous LSTM output layer) to produce one output value for the sigmoid activation function. The input values for the sigmoid function belong to the set of all real numbers, and its output values have a range of (0, 1) a binary outcome that represents (benign, attack). As part of the optimization of the algorithm, a Binary Cross-Entropy loss function was used to estimate the loss of the proposed architecture on each iteration so that the weights can be updated to reduce the loss on the next iteration [63],[64],[65],[66],[67],[68].

LSTM-FCN augments the fast classification performance of temporal convolutional layers with the precise classification of LSTM Neural Networks [69]. Temporal convolutions have proven to be an effective learning model for time series classification problems [35]. The proposed LSTM-FCN has a similar architecture to the proposed CNN-LSTM architecture but instead, it utilizes a GlobalAveragePooling1D layer to retain much information about the “less important” outputs [65]. The layers then concatenate to one Dense final layer with a Sigmoid activation function.

Both models have utilized Adam Optimization Algorithm [70] with a steady learning rate of 0.03 (the proportion that weights are updated throughout the 3 epochs of the proposed architecture). The 0.03 is a mid-range value that allows for steady learning. There was no need to optimize the hyperparameters (finding the optimal number of LSTM cells) due to the almost 0% misclassification rate of the proposed models. The default weight initializer that was used in the proposed architecture is GlorotUniform or Xavier Uniform. Since k-fold cross-validation (CV) is not commonly used in DL, here it is introduced on each model to investigate if it produces different results by preventing overfitting. Moreover, the k value is chosen as 5 which is very common in the field of ML [71],[72]. The models have utilized the StratifiedKFold [73] to ensure that each fold of the dataset has the same proportion of observations (balanced) with the response feature. In the case where k-fold CV was not introduced, the train_test_split function from Scikit-learn [74] was utilized to split data into 80% for training and 20% for testing. A summary of the accuracy and loss results for all applied models for all datasets are listed in Table 5.

Table 5

Accuracy and Loss values for all datasets.
Dataset Method	BoT-IoT		UNSW-NB15		TON-IoT
Dataset Method	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss
CNN-LSTM	99.99%	0.0016	99.99%	0.0001	98%	0.05
LSTM-FCN	100%	0.0068	100%	0.0054	90%	0.36
CNN-LSTM 5-folds CV	99.99%	0.0020	99.99%	0.0001	95%	0.25
LSTM-FCN 5-folds CV	100%	0.0015	100%	0.0002	85%	0.59

Accuracy describes just what percentage of test data are classified correctly. In any of these models, there is a binary classification of Attack or Benign. When accuracy is 99.99%, it means that out of 10000 rows of data, the model can correctly classify 9999 rows. Table 5 shows that very high accuracy levels (≈ 99.99%) were achieved for the BoT-IoT and UNSW-NB15 datasets. However, this was not the case for the TON-IoT dataset where accuracy levels ranged from 85%-98%. It also reveals that using 5-folds CV has decreased the accuracy of the models used on the TON-IoT dataset. The proposed LSTM-FCN models have shown slightly better performance than the proposed CNN-LSTM models in detecting attacks using the BoT-IoT and the UNSW-NB15 datasets (100% vs 99.99%) while the CNN-LSTM significantly performed better in detecting attacks compared to the LSTM-FCN using the TON-IoT dataset (98% vs 90% and 95% vs 85%).

The models use probabilities to predict binary class Attacks or Benign between 1 and 0. So if the probability of Attack is 0.6, then the probability of Benign is 0.4. In this case, the outcome is classified as an Attack. The loss will be the sum of the difference between the predicted probability of the real class of the test outcome and 1. Table 5 shows that very low loss values were achieved for the BoT-IoT and UNSW-NB15 datasets. At the same time, using 5-folds CV reduced the loss values for the FCN-LSTM from 0.0068 to 0.0015 and 0.0054 to 0.0002 for the BoT-IoT and UNSW-NB15 datasets respectively. However, this was not the case for the TON-IoT dataset where loss values increased when 5-folds CV was implemented.

The Area Under the Receiver Operating Characteristics (AUROC) is a performance measurement for classification models. The AUROC tells us what is the model probability of separating between different classes, Attack or Benign in this case. The AUROC is a probability that measures the performance of a binary classifier averaged across all possible decision thresholds. When the AUROC value is 1, it indicates that the model has an ideal capacity to distinguish between Attack or Benign. When the AUROC value is 0, it indicates that the model is reciprocating the classes. In other words, predicting a Benign class as an Attack class and vice versa. And when the AUROC value is 0.5, it indicates that the model is incapable of distinguishing between Attack or Benign. Table 6 shows a summary of AUROC values for all proposed models on the three datasets.

Table 6

Summary of AUROC values from different models.
Model Dataset	CNN-LSTM	LSTM-FCN	CNN-LSTM 5-folds CV					LSTM-FCN 5-folds CV
Model Dataset	CNN-LSTM	LSTM-FCN	1	2	3	4	5	1	2	3	4	5
BoT-IoT	1.00	1.00	0.500	0.500	0.500	0.500	0.500	0.998	0.976	0.987	0.993	0.998
UNSW-NB15	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
TON-IoT	0.993	0.868	0.887	0.997	0.999	0.583	0.999	0.543	0.892	0.891	0.807	0.660

All models showed ideal capacity (AUROC = 1.00) for predicting Attack or Benign classes for the UNSW-NB15 dataset. As for the BoT-IoT dataset, the CNN-LSTM and LSTM-FCN models showed high capacity (AUROC = 1.00) for predicting Attack or Benign classes. The CNN-LSTM 5-folds CV had an AUROC = 0.5 which indicates that this model can be incapable of distinguishing between Attack or Benign. The LSTM-FCN 5-folds CV had an AUROC value larger than 0.992 which means that this model is almost capable of predicting Attack or Benign classes.

The TON-IoT dataset showed a slightly different case than the first two datasets, with AUROC values that were 0.993 and 0.868 for the CNN-LSTM and LSTM-FCN models respectively indicating that they are near capable of predicting Attack or Benign classes. The 5-folds CV for both CNN-LSTM and LSTM-FCN models showed AUROC values that ranged between 0.543 and 0.999. Figure 3 demonstrates accuracy, precision, and recall results by CNN-LSTM and LSTM-FCN models for all datasets.

In the current paper, novel deep learning models for attack classification and detection were proposed utilizing Industrial IoT datasets (BoT-IoT, UNSW-NB15, and TON-IoT). The results have shown a state of the art performance in identifying, classifying, and detecting cybersecurity threats. The evaluation process has employed accuracy and AUROC values as performance metrics to show the effectiveness of the proposed models on the three benchmark datasets. The results have shown that deep learning algorithms are capable of accurately detecting and classifying the attacks in more than 99.9% percent of the instances in two of the three datasets employed. Future researchers can explore the usage of attention mechanisms to improve time series classification with the Attention LSTM block. The future endeavor can also focus on studying whether having a similar or different set of features across various datasets can affect the performance of the NIDS via DL algorithms.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Author Contributions

All authors contributed to this paper's conception and design. Material preparation, data collection, and analysis were performed by Mohammad Shahin, Hamed Bouzarya, and Ali Hosseinzadeha. The first draft of the manuscript was written by Mohammad Shahin and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Zheng Y, Pal A, Abuadbba S, Pokhrel SR, Nepal S, Janicke H (2020) “Towards IoT Security Automation and Orchestration,” Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA),2020 Second IEEE International Conference on, TPS-ISA, pp. 55–63, Oct. 2020, doi: 10.1109/TPS-ISA50397.2020.00018
Shahin M, Chen FF, Bouzary H, Krishnaiyer K (2020) “Integration of Lean practices and Industry 4.0 technologies: smart manufacturing for next-generation enterprises,” Int J Adv Manuf Technol, vol. 107, no. 5, pp. 2927–2936, Mar. doi: 10.1007/s00170-020-05124-0
Baumann D, Mager F, Wetzker U, Thiele L, Zimmerling M, Trimpe S (2021) “Wireless Control for Smart Manufacturing: Recent Approaches and Open Challenges,” Proceedings of the IEEE, Proc. IEEE, vol. 109, no. 4, pp. 441–467, Apr. doi: 10.1109/JPROC.2020.3032633
Donnal J, McDowell R, Kutzer M (2020) “Decentralized IoT with Wattsworth,” IEEE 6th World Forum on Internet of Things (WF-IoT), Internet of Things (WF-IoT), 2020 IEEE 6th World Forum on, pp. 1–6, Jun. 2020, doi: 10.1109/WF-IoT48130.2020.9221350
Sungwon LEE, Hyeonkyu JEON, Gihyun PARK, Jonghee YOUN (2021) “Design of Automation Environment for Analyzing Various IoT Malware,” Tehnicki vjesnik / Technical Gazette, vol. 28, no. 4, pp. 827–835, Jul. doi: 10.17559/TV-20210202131602
A. e. (1) Elhabashy, L. j. (2) Wells, and J. a. (3) Camelio, “Cyber-physical security research efforts in manufacturing - A literature review,” in Procedia Manufacturing, 01 vol. 34, pp. 921–931. doi: 10.1016/j.promfg.2019.06.115
Elhabashy AE, Wells LJ, Camelio JA, Woodall WH (2019) “A cyber-physical attack taxonomy for production systems: a quality control perspective,” Journal of Intelligent Manufacturing, vol. 30, no. 6, pp. 2489–2504, Aug. doi: 10.1007/s10845-018-1408-9
“ICS Monitor Newsletters | CISA,” Oct. 21 (2019) https://www.us-cert.gov/ics/monitors (accessed Oct. 20, 2019)
O’Reilly P, Rigopoulos K, Feldman L, Witte G(2021) “2020 Cybersecurity and Privacy Annual Report,” National Institute of Standards and Technology, Sep. doi: 10.6028/NIST.SP.800-214
Shahin M, Chen FF, Bouzary H, Zarreh A (Jan. 2020) Frameworks Proposed to Address the Threat of Cyber-Physical Attacks to Lean 4.0 Systems. Procedia Manuf 51:1184–1191. doi: 10.1016/j.promfg.2020.10.166
Koroniotis N, Moustafa N, Sitnikova E, Turnbull B(2019) “Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset,” Future Generation Computer Systems, vol. 100, pp. 779–796, Nov. doi: 10.1016/j.future.2019.05.041
Mahmood T, Afzal U(2013) “Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools,” in 2nd National Conference on Information Assurance (NCIA), Dec. 2013, pp. 129–134. doi: 10.1109/NCIA.2013.6725337
Terzi DS, Terzi R, Sagiroglu S(2017) “Big data analytics for network anomaly detection from netflow data,” in International Conference on Computer Science and Engineering (UBMK), Oct. 2017, pp. 592–597. doi: 10.1109/UBMK.2017.8093473
Gaggero GB, Rossi M, Girdinio P, Marchese M(2019) “Neural network architecture to detect system faults / cyberattacks anomalies within a photovoltaic system connected to the grid,” in 2019 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Nov. pp. 1–4. doi: 10.1109/ISAECT47714.2019.9069683
Bruce PC, Shmueli G, Patel NR (2016) Data mining for business analytics: concepts, techniques, and applications in Microsoft Office Excel with XLMiner. Wiley-Blackwell
Ciaburro G(2017) Neural Networks with R. Packt Publishing, Accessed: Oct. 18, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b5582708&site=eds-live&scope=site
Pascanu R, Stokes JW, Sanossian H, Marinescu M, Thomas A, “Malware classification with recurrent networks,” in(2015) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015, pp. 1916–1920. doi: 10.1109/ICASSP.2015.7178304
Shibahara T, Yagi T, Akiyama M, Chiba D, Yada T(2016) “Efficient Dynamic Malware Analysis Based on Network Behavior Using Deep Learning,” in IEEE Global Communications Conference (GLOBECOM), Dec. 2016, pp. 1–7. doi: 10.1109/GLOCOM.2016.7841778
Loukas G, Vuong T, Heartfield R, Sakellari G, Yoon Y, Gan D (2018) Cloud-Based Cyber-Physical Intrusion Detection for Vehicles Using Deep Learning. IEEE Access 6:3491–3508. doi: 10.1109/ACCESS.2017.2782159
Hochreiter S, Schmidhuber J, Memory “LongShort-Term(1997) ” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. doi: 10.1162/neco.1997.9.8.1735
Kim J, Kim J, Thu HLT, Kim H(2016) “Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection,” in International Conference on Platform Technology and Service (PlatCon), Feb. 2016, pp. 1–5. doi: 10.1109/PlatCon.2016.7456805
McDermott CD, Majdani F, Petrovski AV(2018) “Botnet Detection in the Internet of Things using Deep Learning Approaches,” in International Joint Conference on Neural Networks (IJCNN), Jul. 2018, pp. 1–8. doi: 10.1109/IJCNN.2018.8489489
Chatterjee CC(2019) “Implementation of RNN, LSTM, and GRU,” Medium, Jul. 25, https://towardsdatascience.com/implementation-of-rnn-lstm-and-gru-a4250bf6c090 (accessed Dec. 10, 2021)
Q (1) Zhao(2018) Y. (1) Zhu, D. (1) Wan, Y. (1) Yu, and X. (2) Cheng, “Research on the data-driven quality control method of hydrological time series data,” Water (Switzerland), vol. 10, no. 12, 23 doi: 10.3390/w10121712
Yasrab R, Pound M(2020) PhenomNet: Bridging Phenotype-Genotype Gap: A CNN-LSTM Based Automatic Plant Root Anatomization System. doi: 10.1101/2020.05.03.075184
Kim J, Kim J, Kim H, Shim M, Choi E(2020) “CNN-Based Network Intrusion Detection against Denial-of-Service Attacks,” Electronics, vol. 9, no. 916, p. 916, Jun. doi: 10.3390/electronics9060916
Wei Wang M, Zhu X, Zeng X, Ye, Sheng Y(2017) “Malware traffic classification using convolutional neural network for representation learning,” pp. 712–717. doi: 10.1109/ICOIN.2017.7899588
McLaughlin N et al(2017) “Deep Android Malware Detection,” in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, Scottsdale, Arizona, USA, pp. 301–308. doi: 10.1145/3029806.3029823
Gibert D, Mateu C, Planes J, Vicens R (2019) Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hacking Techniques 15(1):15–28. doi: 10.1007/s11416-018-0323-0
Yu Y, Long J, Cai Z(2017) “Network Intrusion Detection through Stacking Dilated Convolutional Autoencoders,” Security and Communication Networks, Nov. 16, https://www.hindawi.com/journals/scn/2017/4184196/ (accessed Jun. 20, 2020)
Kolosnjaji B, Zarras A, Webster G, Eckert C(2016) “Deep Learning for Classification of Malware System Call Sequences,” in AI 2016: Advances in Artificial Intelligence, Cham, pp. 137–149. doi: 10.1007/978-3-319-50127-7_11
Mac H, Tran D, Tong V, Nguyen G, Tran H-A (2017) DGA Botnet Detection Using Supervised Learning Methods. Dec 211–218. doi: 10.1145/3155133.3155166
Yu B, Gray DL, Pan J, Cock MD, Nascimento ACA(2017) “Inline DGA Detection with Deep Networks,” in IEEE International Conference on Data Mining Workshops (ICDMW), Nov. 2017, pp. 683–692. doi: 10.1109/ICDMW.2017.96
Fazle Karim S, Majumdar, Darabi H (Jan. 2019) Insights Into LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 7:67718–67725. doi: 10.1109/ACCESS.2019.2916828
Zhiguang Wang W Yan, and Oates T(2017) “Time series classification from scratch with deep neural networks: A strong baseline,” International Joint Conference on Neural Networks (IJCNN), Neural Networks (IJCNN), 2017 International Joint Conference on, pp. 1578–1585, May 2017, doi: 10.1109/IJCNN.2017.7966039
Eunsoo Park X, Cui THaiB, Nguyen, Kim H(2019) “Presentation Attack Detection Using a Tiny Fully Convolutional Network,” IEEE Transactions on Information Forensics and Security, Information Forensics and Security, IEEE Transactions on, IEEE Trans.Inform.Forensic Secur., vol. 14, no. 11, pp. 3016–3025, Nov. doi: 10.1109/TIFS.2019.2907184
Sarhan M, Layeghy S, Moustafa N, Portmann M(2021) “NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems,” arXiv:2011.09144 [cs], vol. 371, pp. 117–135, doi: 10.1007/978-3-030-72802-1_9
Peterson JM, Leevy JL, Khoshgoftaar TM(2021) “A Review and Analysis of the Bot-IoT Dataset,” IEEE International Conference on Service-Oriented System Engineering (SOSE), Service-Oriented System Engineering (SOSE), 2021 IEEE International Conference on, SOSE, pp. 20–27, Aug. 2021, doi: 10.1109/SOSE52839.2021.00007
Koroniotis N, Moustafa N, Sitnikova E, Slay J(2018) “Towards Developing Network Forensic Mechanism for Botnet Activities in the IoT Based on Machine Learning Techniques,” in Mobile Networks and Management, Cham, pp. 30–44. doi: 10.1007/978-3-319-90775-8_3
Koroniotis N, Moustafa N, Sitnikova E(2020) “A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework,” Future Generation Computer Systems, vol. 110, pp. 91–106, Sep. doi: 10.1016/j.future.2020.03.042
Koroniotis N, Moustafa N(2020) Enhancing Network Forensics with Particle Swarm and Deep Learning: The Particle Deep Framework. p. 60. doi: 10.5121/csit.2020.100304
Koroniotis N, Moustafa N, Schiliro F, Gauravaram P, Janicke H (2020) A Holistic Review of Cybersecurity and Reliability Perspectives in Smart Airports. IEEE Access 8:209802–209834. doi: 10.1109/ACCESS.2020.3036728
Cox J, Singh. A(2018) Practical Network Scanning: Capture network vulnerabilities using standard tools such as Nmap and Nessus. Packt Publishing, Accessed: Oct. 21, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b5447291&site=eds-live&scope=site
Tankard C(2011) “Advanced Persistent threats and how to monitor and deter them,” Network Security, vol. no. 8, pp. 16–19, 01 2011, doi: 10.1016/S1353-4858(11)70086-1
“A SURVEY ON AUTHENTICATION ATTACKS AND, COUNTERMEASURES IN A DISTRIBUTED, Accessed: Oct. 21, 2021. [Online]. Available: https://www.semanticscholar.org/paper/A-SURVEY-ON-AUTHENTICATION-ATTACKS-AND-IN-A-Jesudoss/4a6383ce27766f892cebb0269d7be20260023cec
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F(2018) Learning from imbalanced data sets. Springer, Accessed: Dec. 10, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b4768180&site=eds-live&scope=site
“Handling Imbalanced Data- Machine Learning, Vision C(2020) NLP,” Analytics Vidhya, Nov. 07, https://www.analyticsvidhya.com/blog/2020/11/handling-imbalanced-data-machine-learning-computer-vision-and-nlp/ (accessed Dec. 10, 2021)
Bishop CM(1995) Neural networks for pattern recognition. Oxford University Press, Accessed: Dec. 11, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b1535649&site=eds-live&scope=site
Zheng A, Casari A(2018) Feature engineering for machine learning: principles and techniques for data scientists, First edition. O’Reilly Media, Accessed: Dec. 11, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b5167004&site=eds-live&scope=site
Moustafa N, Slay J, : A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” presented at the 2015 Military Communications and Information Systems Conference, MilCIS 2015 - Proceedings, 07 2015. doi: 10.1109/MilCIS.2015.7348942
Moustafa N, Slay J (Apr. 2016) The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inform Syst Secur 25:1–3
Moustafa N, Slay J, Creech G(2019) “Novel Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation on Large-Scale Networks,” IEEE Transactions on Big Data, Big Data, IEEE Transactions on, IEEE Trans. Big Data, vol. 5, no. 4, pp. 481–494, Dec. doi: 10.1109/TBDATA.2017.2715166
Moustafa N, Creech G, Slay J (2017) “Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models,”. In: Palomares I, Carrascosa HK, Kalutarage, Huang Y (eds) Data Analytics and Decision Support for Cybersecurity: Trends, Methodologies and Applications. Springer International Publishing, Cham, pp 127–156. doi: 10.1007/978-3-319-59439-2_5.
Witten IH, Frank E, Hall MA, Pal CJ(2017) Data mining: practical machine learning tools and techniques, Fourth edition. Morgan Kaufmann, Accessed: Dec. 11, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b5158398&site=eds-live&scope=site
Moustafa N (Sep. 2021) A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustainable Cities and Society 72:102994. doi: 10.1016/j.scs.2021.102994
Booij TM, Chiscop I, Meeuwissen E, Moustafa N, den Hartog FTH (2021) ToN_IoT: The Role of Heterogeneity and the Need for Standardization of Features and Attack Types in IoT Network Intrusion Datasets. IEEE Internet of Things Journal 1–1. doi: 10.1109/JIOT.2021.3085194
Alsaedi A, Moustafa N, Tari Z, Mahmood A, Anwar A (2020) ” IEEE Access 8:165130–165150. doi: 10.1109/ACCESS.2020.3022862. “TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems
Moustafa N, Keshky M, Debiez E, Janicke H(2020) “Federated TON_IoT Windows Datasets for Evaluating AI-Based Security Applications,” in IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Dec. 2020, pp. 848–855. doi: 10.1109/TrustCom50675.2020.00114
Moustafa N, Ahmed M, Ahmed S(2020) “Data Analytics-Enabled Intrusion Detection: Evaluations of ToN_IoT Linux Datasets,” in IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Dec. 2020, pp. 727–735. doi: 10.1109/TrustCom50675.2020.00100
Moustafa N(2021) “New Generations of Internet of Things Datasets for Cybersecurity Applications based Machine Learning: TON_IoT Datasets,”Research Data Australia. https://researchdata.edu.au/new-generations-internet-toniot-datasets/1425941 (accessed Dec. 11,
Moustafa N(2019) “A Systemic IoT-Fog-Cloud Architecture for Big-Data Analytics and Cyber Security Systems: A Review of Fog Computing,” arXiv:1906.01055 [cs], May Accessed: Dec. 11, 2021. [Online]. Available: http://arxiv.org/abs/1906.01055
Ashraf J et al (Sep. 2021) IoTBoT-IDS: A novel statistical learning-enabled botnet detection framework for protecting networks of smart cities. Sustainable Cities and Society 72:103041. doi: 10.1016/j.scs.2021.103041
Livieris IE, Pintelas E, Pintelas P(2020) “A CNN–LSTM model for gold price time-series forecasting,” Neural Comput & Applic, vol. 32, no. 23, pp. 17351–17360, Dec. doi: 10.1007/s00521-020-04867-x
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (01 2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Chollet F(2018) Deep learning with Python. Manning Publications, Accessed: Dec. 12, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b5162307&site=eds-live&scope=site
Mahmoudi MA, Chetouani A, Boufera F, Tabia H(2020) “Kernelized Dense Layers For Facial Expression Recognition,” IEEE International Conference on Image Processing (ICIP), Image Processing (ICIP), 2020 IEEE International Conference on, pp. 2226–2230, Oct. 2020, doi: 10.1109/ICIP40778.2020.9190694
Chiluveru Sr,. Gyanendra S, Chunarkar M, Tripathy, Kaushik Bk(2021) “Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function,” IEEE Transactions on Circuits and Systems II: Express Briefs, Circuits and Systems II: Express Briefs, IEEE Transactions on, IEEE Trans. Circuits Syst. II, vol. 68, no. 11, pp. 3461–3465, Nov. doi: 10.1109/TCSII.2021.3082941
Ioffe S, Szegedy C(2015) “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in 32nd International Conference on Machine Learning, ICML 01 2015, vol. 1, pp. 448–456. Accessed: Dec. 13, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edselc&AN=edselc.2-52.0-84969584486&site=eds-live&scope=site
Karim F, Majumdar S, Darabi H, Chen S (2018) LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 6:1662–1669. doi: 10.1109/ACCESS.2017.2779939
Kingma DP, Ba J(2017) “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 [cs], Jan. Accessed: Dec. 13, 2021. [Online]. Available: http://arxiv.org/abs/1412.6980
Kuhn M, Johnson K(2013) Applied predictive modeling. Springer, Accessed: Dec. 13, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat00022a&AN=txi.b2605857&site=eds-live&scope=site
Ethem, Alpaydin(2014) Introduction to Machine Learning, vol. Third edition. Cambridge, MA: The MIT Press, Accessed: Dec. 13, 2021. [Online]. Available: https://libproxy.txstate.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=836612&site=eds-live&scope=site
Adagbasa Eg, Adelabu Sa, Okello Tw (2019) Application of deep learning with stratified K-fold for vegetation species discrimation in a protected mountainous region using Sentinel-2 image. Geocarto Int 01. doi: 10.1080/10106049.2019.1704070
“scikit-learn (2022) : machine learning in Python — scikit-learn 1.0.2 documentation.” https://scikit-learn.org/stable/index.html

Download PDF

Reviewers agreed at journal
15 Jun, 2022
Reviewers invited by journal
15 Jun, 2022
Editor assigned by journal
15 Jun, 2022
First submitted to journal
08 Jun, 2022

You are reading this latest preprint version

A Novel Fully Convolutional Neural Network Approach For Detection and Classification of Attacks on Industrial IoT Devices in Smart Manufacturing Systems

Status:

Version 1

Abstract

Figures

1. Introduction

2. Background

3. Preprocessing Of Datasets

3.1 Preprocessing of the Bot-IoT Dataset

3.2 Preprocessing of the UNSW-NB15 Dataset

3.3 Preprocessing of the TON-IoT Network Dataset

4. Results And Analysis

Conclusions

Declarations

References

Status:

Version 1