Malicious Traffic classification Using Long Short-Term Memory (LSTM) Model

Malicious traffic classification is the initial and primary step for any network-based security systems. This traffic classification systems include behavior-based anomaly detection system and Intrusion Detection System. Existing methods always relies on the conventional techniques and process the data in the fixed sequence, which may leads to performance issues. Furthermore, conventional techniques require proper annotation to process the volumetric data. Relying on the data annotation for efficient traffic classification may leads to network loops and bandwidth issues within the network. To address the above-mentioned issues, this paper presents a novel solution based on artificial intelligence perspective. The key idea of this paper is to propose a novel malicious classification system using Long Short-Term Memory (LSTM) model. To validate the efficiency of the proposed model, an experimental setup along with experimental validation is carried out. From the experimental results, it is proven that the proposed model is better in terms of accuracy, throughput when compared to the state-of-the-art models. Further, the accuracy of the proposed model outperforms the existing state of the art models with increase in 5% and overall 99.5% in accuracy.


Introduction
Intrusion Detection System (IDS) is widely used alert system to monitor computer and networks. IDS simply check the network traffic and user authentication level of security, if there any unnecessary activity occurred in the network the IDS will alert the user. This software used in various applications to defend the system from threats. The system monitoring the computer or network in continuous manner to identify any kind of violation in the system will report the result to administrator of the network or alert the centralized monitoring system such as Security Information and Event Management (SIEM) in the network. An SIEM uses the alarm filtering techniques to separate the false alarm and malicious activity. Honeypot is one of the IDS methods to customize detection rules and security policies with specific malicious thread detection. Some systems are used to detect the intrusion in the network but it not satisfies the expected result of system monitoring. From that aspect the organization concentrated on both prevention and detection. They uses Intrusion Detection and Prevention System (IDPS) to focusing the security policy of the system defined, network protocol rules and existing threads documentation. IDPS is the extended or up gradation of the IDS system, both are used to identify the malicious activity and intruders in the network. The main different is IDS can only detect the threads but in IDPS system can actively block or prevent the system from malicious threads which are detected. An Intrusion Prevention System (IPS) can take such actions in the network and sending alert message to computer. They also have permission to monitor the network traffic, dropping of malicious packets in the network, block the access from unauthorized IP address, medicating TCP issues, and also check and correct the Circular Redundancy Check (CRC) errors. IDS can have various detection and classification methods. They are signature based, state full protocol analysis and anomaly based. The classifications of IDS are Network Intrusion Detection System (NIDS), Host Intrusion Detection System (HIDS), Protocol based Intrusion Detection System (PIDS), Application Protocol-based Intrusion Detection System (APIDS) and Hybrid Intrusion Detection system. To test the IDS system, the experts analyze the system with various attacking strategies to verify the efficiency of the newly invented or updated IDS system. They use various attacking method to verify the quality of the system. Such attacks are DDoS, Trojan horse attack, evasion of IDS attack, surveillance attack, and exploit attack. Anomaly based IDS is a recent trending method in IDS detection systems. This method uses the machine learning approach to detect the different type of malware in the network or computer. This system focuses on detecting the unknown attacks and overcome the traditional signature based IDS draw backs. This model also uses the defined pattern approach with the addition of machine learning method to create new defending patterns to detect the unknown malicious attacks and reporting them to the system admin. The above-mentioned methods suffer due to the detection rate and struck in false positive rate.

Contribution in this paper
• A novel malicious traffic classification system using LSTM model. The uniqueness of the paper is that the LSTM used in this paper completely replaces the traditional nodebased classification error in the hidden layer of a network by introducing "memory cells", which overcomes the problems faced by the RNN architecture. Practically the LSTM architecture has shown improved results over the RNN architecture. • The proposed traffic classification model works impressively well for any sort of sequence to sequence prediction, and by pairing it with LSTM, we can utilize the inherent nature of LSTM in recognizing the long-term dependencies of the sequences; hence next predicted traffic sequence would be fairly accurate. • Consideration of multiple data sources to detect and distinguish the malware activities from user activities.

3
The organization of this paper is as follows that Sect. 2 provides useful literature about the malwares and intrusion detection techniques, Sect. 3 gives a clear view about proposed methodology. Section 4 describes about experimentation, Sect. 5 follows with the result and performance analysis and Sect. 6 concludes with impact of results and the future work.

Literature Survey
This section explores the state of the art methodologies, techniques used in the current IDS for the malicious traffic classification.
A new IDS system was proposed by Zhan [5] to detect the threads in WLAN networks. They introduced the system architecture to adopt the browser/server mode, and the system displays the result to client, it consists of the client and server interactions through the web browsers. The overall system architecture is focus on data storage layer, data acquisition layer, result analysis layer, and detection & analysis layer. They also introduced the block chain intrusion detection for more secure and reliable services. And also overcome the privacy challenges by using the block chain intrusion detection method, it results the more efficiently protect the system from malicious threads. Aung [6] presented the collaborative intrusion detection based on k-means methodology to improve the detection accuracy in intrusion detection system, they use the data mining in hybrid method and single method. It result comparatively reduced the time complicity of the system between the single method and data mining in hybrid method. The authors also describe a method called concept of projective adaptive resonance, which is used to particularly reduce the system model training time and maintaining the detection accuracy. This result the data mining algorithm role in IDS inters of time complexity.
A CNN based intrusion detection system was presented by Xiao [7]. The authors use the KDD-CUP99 dataset to compare the performance of the IDS by using the CNN. It results the CNN based IDS model provide the higher detection rate and reduce the false negative rate of the system. Kumar & Sharma [8]. proposed the IDS for signature and anomaly based methods. The authors describe the intrusion detection in the cloud computing environment and hybrid IDS algorithm for improves the detection rate in the private cloud environment. Ali [9] proposed the model for apple based intrusion detection and validation. They use NSL-KDD data set. And compare this model with the Extreme Learning Machine (ELM) approach for IDS with the hybrid Particle Swarm Optimization (PSO) technique. It results the model PSO-ELM shows improve the accuracy in intrusion detection system. Li [10] proposed the IDS for large networks. The authors develop two algorithms such as reduce & cluster, which is used to detect the intrusion and improve the filtering rate of the system. This algorithm to reduce the false alert rate and improves the analysis process.
An ID algorithm using AdaBoost technique was used by Hu [10] in decision stumps as weak classifiers. Their system performed better than other published results with a lower false alarm rate, a higher detection rate, and a computationally faster algorithm. However, the drawback is that it failed to adopt the incremental learning approach. Pan [11] proposed the automated approach for hybrid intrusion detection. Authors proposed algorithm detects the intrusion from the data logs, in the accuracy rate of 73% with related dataset. But this approach not suitable for larger data set, problem is capturing of log file in the system very difficult. Hurley [12] proposed the HMM based intrusion detection techniques for software defined network. The authors use the Android OS platform to detect the anomaly behavior in the android systems with the accuracy of 85%. The only problem in this approach, it expands the feature vectors, and increase the maliciousness code in data sets.
A detailed discussion about the cyber security issues and discussions was carried out by Md zahangir [13]. Author includes the neuromorphic cognitive computing method for IDS network in cyber security using the deep learning. They use the NSL-KDD data set with vector factorization approach. It result the increase accuracy in the classification in rage of 81.31-90.12%. Deep learning method reduces the human effect in the task, and improves the performance of the system. Aydin [14] proposed the hybrid intrusion detection system with the snort method. They use the Packet Header Anomaly Detection (PHAD) and Network Traffic Anomaly Detection (NETAD), methodology to detect the intrusion detection with low false positive rate. They uses the DARPA 1998 dataset for detect the intruders in the network. And also describe the misuses of the hybrid ids in anomaly detection and signature based detection in the system. Safwan Mawlood Hussein [15] proposed the effectiveness of the hybrid ids with snort with native bayes network to improve the performance of the hybrid ids system. They used KDD cup 99 dataset for her research of intrusion detection. It results the average false alarm rate is improved and bayes networks gives the j48 graft response.
A novel methodology to detect mobile device attacks using the anomaly based IDs with machine learning classification was presented by Dimitrios Damopoulos [18]. The authors use the four machine learning algorithms to detect the anonyms attack in mobile devices. They use 4 algorithms such as Bayesion network, k-nearest neighbors, random forest and radial basis function algorithm. It results the high true positive 99.8%. Souparnika jayaprakash [16] proposed a system for data base intrusion detection using the octraplet and machine learning based on anomaly detection system. They create the architecture to implement the role based access control and implement new data structure is called octraplet, which is used to store the sql queries. This method is improves the performance of the system and detection rate. Nikolov [17] proposed the recurrent neural network classifier for network intrusion detection based on short or long term memory units. This approach is mainly focus on HTTP server based intrusion detection.
An IoT IDS was proposed by Pamukov [19] to improve the detection rate and performance of the detection system in IoT devices. The authors use multiple negative selection algorithms to reduce the errors in intrusion detection and it can runs without input operators. It results the device detect the intrusion with 90% succession rate. Chungming [20] proposed a host based IDS by using the machine learning, which is inspired by adoptable agent based artificial intelligence. This approach is detects the malicious attacks from the system call and protect the system from host based intrusion attacks. It also shows the exchange of packets between the computers by detection signals. Ved prakash Mishra [25] proposed the simulator system for IDS to detect the DDoS attacks and alarm about the attack to the administrator. It approach uses the core of IDS and IPS to simulate the software in self-execution mode. And protect the system from traffic information. It results the system with increase in the performance of the detection system with factors of accuracy, security. This approach used for education purpose because of some implementation difficulty in the real time network devices.
A security system focusing on the IoT network devices was proposed by Hafeez [1] to detect the malicious activity in the IoT network devices. The authors use the IoT keeper method to detect and analysis the malicious activity in IoT devices. They use the C-means clustering and fuzzy interpolation algorithm to effectively detect the intrusion in the network. Han [2] proposed the system to detect the anomalous traffic in the network controller. The authors use the novel classifier technics to detect the intrusion in the controller network devices. They use cross entropy and SVM algorithms to detect traffic in the network. It improves the system detection rate and accuracy. Wei Wang [3] proposed the malware classification technic to analysis the network traffic and detect the malicious activity in the network. They use the CNN algorithm to train and test the data set for detect the malicious traffic in the network. It results the system with high detection accuracy. Chakkaravarthy [23,24] proposed the IDS to detect the intrusion in the wireless LAN networks. The authors [4] discuss the various causes of novel wireless intrusion attacks. They use combination KDE and HMM algorithms to detect intrusion in the network. It works through tandem queue feedback method. It detects the intrusion with the accuracy of 98%.

Proposed model -LSTM model
Long Short Term Memory (LSTM) is an advancement of Recurrent Neural Network (RNN). RNN is the first deep learning algorithm to retain the input state of users. The prediction using RNN is based on short term dependencies. While dealing with time series data short prediction based on short term dependencies does not give accurate results. A piece of information about the previous data might give a variant information in behavior analysis of a network. The minute information on the long-term dependencies may lead to a completely different and more accurate prediction. LSTM network prediction is based on both long term and short-term dependencies, which increases the prediction accuracy in time series data. STM network retains both input and output state of users. LSTM approach is highly adaptable for the analysis on timeseries data with time lags of unknown size.

LSTM Structure
LSTM is a three layer model with loop like structure which can add, delete and modify data easily. LSTM cells are build with input gate, output gate and forget gate as shown in Fig. 1. The sigmoid function is the activation function(α), it enhances the fitting ability of the model. All three layers contains activation function. The first layer contains forget gate, decides whether the information should be retained or deleted. The next layer has input gate and candidate state gate. The second layer decides which layer should be stored in a memory cell, whether the value should be stored in a new memory cell or updated in existing memory. The third layer has output gate, which controls the value stored in the cell to compute output activation. Based on all three layers the cell retaining, updating or deleting is done based on these three gates, it further composed of memory cells and multiplicative gates. The LSTM process with respect to the gates is detailed below.

Forget Gate
The forget gate removes the information of no use. The current input x t and previous output h t-1 are used in this layer. The two parameters are multiplied with weight and the bias is added to them. Activation function process the above result and gives the output in binary form. The output will be either 0 or 1. Where 0 represents that the information is forgotten or reset, and1 represents the information should be retained.

Input Gate
Input gate uses the cell state retained as necessary information. The activation function is used for the regulating the information and the values to be retained are filtered. The input function is passed through a tanh function, which gives the output from − 1 to + 1.
The vector values and the regulated values are multiplied with the weight to obtain the value of input gate.

Output gate
The output information is extracted from the current cell state in output gate. The tanh function is applied to the generated vector on the cell. As in input gate the information is regulated by the activation function and the value to be retained are filtered. The value of output gate is calculated using the below formula.
The result of the output gate is the binary output [0,1]. All the three gates using sigmoid activation function. (1)

Candidate memory cell
Candidate memory cell computes in the same way as the above gates as shown in Eq. 4. The activation function tanh is used so the output varies from − 1 to + 1.

Memory Cell
In LSTM to Compute c t , the input gate and forget values are multiplied with the old contents in the memory cell. The value retained are used to perform this operation. The Current memory cell values are computed as given in Eq. 5.
For forget gate with the value 1 and the input gate value will be 0. The previous memory cell values c t-1 will be saved and used in current time whenever necessary.

Hidden States
Hidden state is computed using the output gate. The hidden state is calculated using Eq. 6. The output of hidden state varies from − 1 to 1 When the output is 1 the memory information is efficiently passed to the predictor. For output 0 the information is retained within the memory cell. The expansion for the above used notations is described in Table 1.
More hidden states can be added in the network but the increase in the hidden states does not increase the prediction accuracy.

LSTM Based Network Traffic Analysis
A network is managed based on the prediction of real-time traffic volume. For accurate and efficient prediction long term dependencies play a vital role. LSTM extracts the temporal information from the traffic flow to analyze the behavior data of a network and predicts an application is malicious or not.Since network traffic contains time series data which are time variant and nonlinear. This increases the difficulty in the prediction of real time traffic, which leads to low accuracy problem. While the network traffic volume prediction problem is treated as regression problem, but it's a classification problem. LSTM is more suitable for network volume prediction problems. When dealing with massive data for network traffic prediction congestion control should be done for accurate predication of network behavior. For detecting the existence of malware, the opcode sequence is extracted and LSTM learns the features of malicious code sequence and pattern of network. If the opcode sequence of a file varies there exist a possibility of a file being modified dynamically by attacker. The malicious code is injected by the attacker in a normal file to launch an attack. When the malicious features or abnormal network behavior occurs from the LSTM prediction. LSTM analysis the network traffic and confirms the deviation in the normal traffic, a malware suspect is raised. The forget gate stores such abnormal dependencies for a very long time. Analyzing and detecting malware using LSTM is precise, since the LSTM is highly adaptable for networks with dynamic behavior. Here, the LSTM works as a classifier which differentiates the normal behavior with abnormal behavior and detects the existence of malware based on the increased abnormality.

Experimental Setup
The proposed LSTM model is experimentally validated using a real time testbed which consists of a wireless router, a laptop, attacker machine and an external packet capturing device. The external packet capturing device is attached to the laptop running Tshark utility to log all the tapped packets. The laptop is allowed to access the internet and connect with the local Wi-Fi network (established for experimentation). Attacker machines are configured with Parrot Operating system and an automated python script is written to run the payloads required to launch the attack. Configuration for the above mentioned peripherals are given in the Table 2. Figure shows the illustration of the experimental setup.

Data Set Formation
The data set is collected from the internal network structure designed for the experimentation. The entire TCP flow is tapped and recorded. Every payload bytes of the TCP packet are recorded along with the TCP session. Each byte representation ranges from the value 0 to 255 in binary format. Then these bytes are normalized to a scale between [0,1]. The dataset collection takes nearly about 5 h for collection with various range of scenarios such as malware network attack, normal network, DDoS attack etc. The attacks performed during the data collection (given in Table 3) includes the malware traffic, authentication based attacks such as Fakeauth, Deauth, normal attacks such as SSL attacks, DNS attacks etc. The sequence for the payload is defined and ranges upto 1000 in the length. The total number of protocols monitored for recording is nearly 60. Python naïve data cleansing method is applied to remove the duplicates. Nearly 0.5 million records are removed after cleansing the duplicated records.

Feature Learning
Feature learning always refer two important terms namely feature extraction and selection. The proposed LSTM model is used for the feature extraction and selection. This is performed by merging the node layers. The merging operations are handled by the deep layers which selects the information from the shallow layers. The information processed in the nodes except the outer layer are considered to be the features and the process is called as parameter tuning or learning. Furthermore, in the network input layers the nodes represent the features whereas in the hidden layers the features possess the activation property which is of deeper significance. The advantage of the proposed LSTM model is that the robustness involved in the dimensionality reduction. Since the data is of huge volume. The entire process is automated and the extracted features automatically gets mapped to the new feature space where the redundant information are filtered.

Feature Selection
Feature selection is performed based on the hyper parameter tuning. The entire LSTM model is trained as described in the above section. The stabilization of the proposed Arp replay (arp injection) Used to generate IVs model is achieved by executing it with the collected dataset for multiple trails. Each trails records the error cost and epoch count. This means that the lower error cost within the minimum epochs. When the error costs around a very low i.e., 1/1000 and the epoch count which ranges from 80 to 100, this trial can achieve an optimal hyper parameters. This modelled using the standard hypothesis as given below.
• Hypothesis 1: The First two layers are considered to be the original features • Hypothesis 2: Hidden layers always look for the features with appropriate weights.
• Hypothesis 3: All the absolute weights are summed with respect to the nodes.

Results and Discussion
The proposed LSTM structure defines the layers of the system, which can able to perform dataset CURD operations easily. Let us consider the scenario that the attacker can able to push the malicious traffic into the wireless network. Figure 4 shows the results obtained by tapping the traffic. From the Figs. 4 and 5 it could be easily seen that the proposed LSTM based method is efficient in traffic classification. It is also observed that the traffic utilization carried out by the attacker are utilizing UDP based data transmission methods for exfiltration. Further these attacks targets the packet flow and always involves in the hijacking of the network.

Performance Analysis
The proposed model is tested in various environments with real time test bed. Assume that attacker configured with parrot OS with some python scripts to attack the system. The proposed model uses the feature extraction method to detect the pre-defined attacks and create the pattern for future possible attacks in an efficient way to detect the malicious attacks in wireless system. The entire experimental setup is as shown in Fig. 2 and Fig. 3. In this section the trained and tested data sets are plotted into the graph in various aspects. In execution phase each test trials records errors, it can be controlled with hypothesis values as listed in the LSTM. Figures 4 and 5 shows the variations in the network, if system analyze the 2 IPs and b shows the increase IP. Figure 5b, c, e, g, i shows the huge definition in the traffic when compared to the other Fig. 5(a, d, f, h, j). From the Figs. 4 and 5 it is clearly shown that the traffic generated by the attacker machine is very huge for a time period whereas the normal machine operates normally with the usual traffic. Figure 4d represents the normal traffic behaviour whereas the other Figures represents the abnormal traffic behaviour. Figure 6 shows the performance analysis of the proposed LSTM model with the state of the art models. From the performance results, it is clear that the proposed LSTM model outperforms all the state of the art models as listed.

Conclusion
In this paper, a LSTM model was implemented with the combination of artificial intelligence sub base to analyze the network traffic. The existing model available in the literature detects the attacks based on the patterns and suffers in performance and scalability in the deployment. The proposed model helps to overcome the existing drawbacks and improvised the system performance in the accuracy at the rate of 99.5%. Figures 7, 8 and 9 shows the performance metrics of the proposed model. From the results it is clearly proven that the proposed model outperforms the state of the art models available for traffic classification. The comparison result of LSTM with the state of malicious traffic exhibits an outstanding performance with 99.5% accuracy in malicious traffic detection which also leads to attack detection.
The experimental results confirms that the proposed LSTM efficiently detects all the attacks which are listed in the papers [21][22][23][24][25][26]. The experimentation also confirms that there is a better increment inaccuracy, overall detection rate, reduced learning speed of the system. The proposed LSTM does not require any additional hardware, protocol modification, firmware upgrade in both the client and server. In future, this LSTM model can be extended to a mobile version. In addition, the metric selection for LSTM can be efficiently optimized using upcoming learning techniques.