Cloud Zero-Day Attack Detection Using Hidden Markov Model with Transductive Learning

In cloud security, detecting attack software is considered an essential task. Among several attack types, a zero-day attack is considered as most problematic because the antivirus cannot able to remove it. The existing attack detection model uses stored data about attack characteristics, which fails to detect zero-attack where an altered attack is implemented for an antivirus system to detect the attack. To detect and prevent zero-day attacks, this paper proposed a model stated as Hidden Markov Model Transductive Deep Learning (HMM_TDL), which generates hyper alerts when an attack is implemented. Also, the HMM_TDL assigns labels to data in the network and periodically updates the database (DB). Initially, the HMM model detects the attacks with hyper alerts in the database. In the next stage, transductive deep learning incorporates k-medoids for clustering attacks and assign labels. Finally, the trust value of the original data is computed and computed in the database based on the value network able to classify attacks and data. The developed HMM_TDL is trained with consideration of two datasets such as NSL-KDD and CIDD. The comparative analysis of HMM_TDL exhibits a higher accuracy value of 95% than existing attack classification techniques.

for attack identification. With k-medoids clustering, soft labels are assigned for attack and data and update to the database. In the last phase, with computed HMM_TDL database is updated with computed trust value for attack prevention within the cloud.
The remainder of the paper is organized as follows: section II presented a review of conventional techniques for attack detection. In section III problem domains related to the zero-day attack and model framework are presented. Section IV described developed HMM_TDL for attack prevention. The results obtained for developed HMM_TDL are presented in Section V with comparative analysis. Finally, section VI provides the overall conclusion of developed HMM_TDL.

Related Works
This section presented existing proposed methods for securing a zero-day attack. Through analysis, existing security schemes are categorized as signature-based, hybrid detection, statistical-based, and behaviour-based approach [10].
In statistical-based techniques, an attack profile is generated from past elements. Through those identified elements, the profile parameters are updated in historical exploits for the detection of attacks [15]. However, these statistical-based techniques cannot be able to apply instantaneously for attack protection and detection [2]. This technique relies on a static profile of attack and requires modification in the attack detection setting manually. Signature-based attack detection methods use a library with different malware signatures. Based on the requirement of users those signatures are cross-verified with local files n network web downloads and network files. The created libraries are updated periodically with a new signature. Also, this signature mechanism exploits the new vulnerabilities within the network. Signature-based schemes are widely utilized and they demand high-class signature generation mechanisms.
A behaviour-based mechanism derives worms characteristics for accurate estimation of attack in the webserver. With the identification of attacks in the server machine able to identify victims or deny services that are not involved [16]. This involved estimation of network traffic flow within the network. To overcome the challenges associated with the statistical-based technique and behaviour-based technique, a hybrid-based technique is evolved. In a hybrid-based mechanism, signature schemes are integrated based on the applications [17][18][19][20][21]. Based on this, Kaur and Singh [18] developed a zero-day attack identification using a hybrid approach. But the developed hybrid approach is applicable for polymorphic warm detection alone. To evaluate vulnerabilities risk levels Hazard metrics are developed in [23,24]. In [25] and [26] estimated hazard level frequency and impact factors. However, risk level assessment in zero-day is immeasurable where vulnerability severity is not defined [27][28][29]. To make it a more exploitable degree of exploits needs to be defined for zero-day risk assessment. However, the existing technology is subjected to difficulty due to the dynamic characteristics of attacks. Hence, this paper developed HMM_TDL for attack detection for improving the robustness of the cloud platform.
3. Problem Domain and Model Framework An attack is considered a complex behaviour with several goals based on that attack phases are implemented. In fig 1. illustrates, attackers are engaged to steal data by exploiting the vulnerabilities within the intranet. The vulnerabilities are injected with the network through the following steps. Initially, to gain access to root privileges attackers need to scan the target machine. The attackers scan the target machine bypassing a firewall. In the next stage, Trojans are injected into to file server via a network file interface system. This inclusion of Trojan within the system leads to data leakage. In figure 2 zero-day attack in the cloud server is illustrated.

Model Framework
The overall framework of the proposed zero-day attack detection model is stated in figure 3. The proposed model comprises of four phases as; 1) data collection -obtain intrusion dataset; 2) preprocessing of data -Training and testing of pre-processed data; 3) Training Model -Provides historical alert to data for processing, and 4) Testing Model -Real-time attack data were processed. Initially, Hidden Markov Model (HMM) is provided with a transfer relationship between variables to provide temporal alerts to data. In the next stage, HMM two-layer model is converted into a Bayesian network with a set of rules with probabilistic inference. This HMM model with probabilistic inference uses transductive learning for rule update. By this, attacks within the network with each time is identified for unknown attacks Figure 4: Zero-day attack alerts in HMM_TDL In recent years, receiving original alerts from unprocessed data information cannot be achieved directly. This put forth the demand for extracting effective attack sequences based on IDS alerts and rebuilding them with effective characteristics. As shown in figure 4.

Developed HMM_TDL for zero-day attack
In this section, presented about detailed description of developed HMM_TDL for zero-day attack detection and prevention. Initially, HMM is explained for attack detection with hyper alerts. In next subsection, presented about transductive model for soft labelling is described. Finally, overall performance of developed HMM_TDL is presented with algorithm.

Hidden Markov Model for Attack Detection
HMM is stated as a dynamic Bayesian network model with probabilistic characteristics in a timeseries manner. It provides states in hidden form with the generation of sequences in Hidden Markov Chains and is involved in the generation of random sequences from different states. As defined, HMM has been applied in a vast range of applications like image recognition, biology, signal processing, and so on. HMM is applied in an intrusion detection system. This provides a significant relationship between undefined sequences and attack with temporal estimation. The attacks obtained from malicious insiders incorporate multiple attack steps that differ from the normal characteristics of the network which increases the detection accuracy of the network. Through temporal relationship identification complex relations in attacks are identified. Based on this proposed scheme constructs an attack plan with several hidden layer states and transmits alerts to the observation layer. With the application of HMM, malicious attacks in the cloud are identified with transfer probabilities as presented in figure 5. Similarly, for horizontal attack intents conditional probability of j S as   i j S S P . In general HMM incorporates three different challenges such as: calculation of probability, parameter estimation and decoding. Among those challenges this paper examined parameter estimation, which provides information about sequence of attacks. In this paper, transductive transfer learning mechanism for examining attack stage are measured.  (2):  (3) and (4): The probability state is presented as in equation (5) For backward probability it is given in equation (6), For the defined model  , the sequence of observation is given as N , the probability of state i q and j q with time t and 1  t is given in equation (8) The simplified equation is presented as in equation (11), Through this HMM construct a trained model as

Ttransductive transfer learning framework
In next phase, this paper constructed transductive transfer learning (TL) framework for prevention of zero-day attack. The constructed TL perform binary classification with leverage of labelled instances denoted as s L . To prevent zero-day attack s A . To improve zero-day attack prevention, performance labels are assigned based on class label of source cluster based on class labels. In this technique, instances with target soft label with satisfied threshold value able to involve in construction of cluster for data transfer. The labels assigned with TL is based on the derived HMM model values. In this, cluster or group without any attack is defined as  . The developed transductive model is applied in HMM model ) , , (   J C  for detection and prevention of zero-day attack. In figure 6 overall architecture of HMM_TDL is presented. Step 1: Initially, set label value as zero Step 2: If cluster source ranked as i r the attack  is included else it will eliminated from the cluster group.
Step 3: For source cluster ranked as 1  i r , then attack 2  is included within the system else it will be removed from cluster group.
Step 4: For source cluster ranked as 1  i r , then attack 2  is included within the system else it will be removed from cluster group.
Finally, with estimation of target scores values of 0 and 1 are normalized with estimation of attack or normal cluster instances. The assigned soft labels for threshold instances of threshold 1 T is considered as "attack" else 2 T is set as below threshold value is defined as "normal". The target instances are denoted as follows: \\ Set as normal label In this label, the assignment scheme with labeled instances attacks is classified and prevented from injecting within the network through the generation of soft labels.
The assigned labels are incorporated within the cluster group with the inclusion of three parts with zero-day attack classification. The node within the cluster consists of the following factors such as prior knowledge, edge probability, and conditional probability table (CPT). The HMM is involved in the estimation of attacks in the network with consideration of causality. This paper uses HMM model with transductive transfer learning. Hence, based on the assigned labels unknown attack labels are estimated with the update of rule in equation (12) With respect to assigned label 1 T and 2 T the unknown attack is estimated with consideration of CPT attack as figure 7 overall flow of developed HMM_TDL is illustrated with attack detection and prevention mechanism. Once the attack is detected by HMM model transductive network is trained for attack based on trust computational value of database.  In transductive framework involved in the generation of source mapping for construction of target domain with consideration of latent space. After the conversion of latent space, the source domain is designed with instance label probable classification of attacks. With the assigned soft label in the data accuracy of zero-day is improved with building training classifier with target instance. In algorithm 2 label assigned for zero-day attack, estimation is presented.

Transform target domain with consideration of latent data dimensions
The estimation of zero-day attack involved in HMM model for attack estimation. HMM involved in zero-day attack detection with consideration of threshold in the calculation, with the estimation of threshold labels are assigned and set of parameters are constructed.
The analysis of developed transductive deep neural network involved in training phase for discriminator generation. To characterize zero-day attack it is defined as attack A . The real-data in the network is defined as D , hence the first order training in real-time data is given as involved in formulation of convergence with estimation of distributed data. The transductive learning for attack detection is computed as ) ( ij Z . It is defined as in equation (13 x w Z (13) To compute zero-day attack in from the TDL it is given as in equation (14) and (15), ; the real-time dataset is denoted as  The complete training data from HMM-TDL is stated as in equation (16)   The attack detected from HMM is incorporated in transductive deep learning model which subjected to two constraints such as defined as follows from equation (17) Case 2: Consider as minimal value for data

Experimental Analysis & Results
This research focused on improving security in the cloud from zero-day attack scenarios. The proposed scheme transmits hyper alerts with the estimation of the time window. HMM estimate the attacks in the network and the attack is classified based on consideration of threshold values on the network. The developed transductive framework was evaluated based on consideration of two datasets such as NSL-KDD and CIDD dataset. Both datasets involved in examination of features in data the description of the dataset are presented below:

5.1Availability of Data and materials NSL-KDD
NSL -KDD is stated as a standard dataset with the distribution of packets with experimentation of IDS. This dataset consists of features and instances count of 43 and 147,907 respectively. The attack and normal instances are in the count of 76,967 and 70,940 respectively. NSL-KDD dataset incorporates attacks such as a probe, U2R, DoS, and R2L. The overall features selected for analysis are presented in Table 1.

CIDD Dataset
CIDD dataset provides identification of cloud dataset with consideration of DoS attack and attack in cloud environment. CIDD dataset is based on time-based data distribution with 5274 instances and 25 features with data dimensionality. The data of CIDD dataset incorporates udp_flood, tcp_syn_flood, pod, dns_flood, land, icmp_flood (smurf) and slowloris. In table 2 presented features of CIDD F1-Score provides the classification performance, which provides average of precision and recall and it is calculated using equation (29) ) Re

.4 Experimental Setup
To evaluate the performance of developed model zero-day data attack detection in cloud is considered.

Test Scenario: Zero-day attack detection in Cloud
The CIDD attack consists of a different attack, this paper considers the DoS attack module for detection of zero-day attack in cloud dataset. From the analysis, it is observed that smurf, pod, and land are considered as common attack modules for both CIDD and NSL-KDD datasets. Here, NSL-DoS and CIDD incorporate this common instance with consideration of source and target domains. For analysis of complete labelled NSL-DoS module and CIDD, no label module is included within the framework were stated in Table 3. As presented in table 3, the purity node αis evaluated with the generation of cluster group with a size of 13. As stated, hyper alerts use k-medoids for R2L modules with DoS and cloud server. The kvalue of medoids is considered as 25, with an assigned value of α = 0.92327. Based on this R2L instances are generated in a cloud server with consideration of 25 clusters. The threshold estimated for identification of cluster purity α for T1 = 0.92327 and T2 = 0.03566. The labels for attack are assigned with a set of instances for R2L module with cloud value of 0.92327 and normal attack module of R2L includes instances of cloud value of 0.03566  The DoS module label instances are assigned with soft instances in R2L modules for training the DNN module. The DoS→R2L provides the accuracy of 0.9275, FPR of 0.0563, and the corresponding ROC curve is presented in figure 8. The transductive DNN provides an improved classification for zeroday attack detection compared with the existing classifier presented in figure 9. The comparative results stated that transductive deep learning scheme provides improved performance than existing ML methods.
In the zero-day attack detection module label instances are utilized on the Probe module for the DoS module for task DoS→R2L. The best validation is measured and obtained for transductive DNN with batch size 100 and epochs count of 90. The DoS→Probe attack provides an accuracy value of 0.9249, FPR of 0.8416 with the ROC curve presented in figure 10. and figure 12 provides the ROC curve for NSL_DoS→CIDD. Also, from analysis of figure 11 and 13, it is observed that for all parameters accuracy, FPR, F1-Score, and sensitivity developed HMM_TDL exhibits improved performance.