Adapted Convolutional Neural Networks and Long Short-Term Memory for Host Utilization prediction in Cloud Data Center

Infrastructure service model provides different kinds of virtual computing resources such as networking, storage service, and hardware as per user demands. Host load prediction is an important element in cloud computing for improvement in the resource allocation systems. Hosting initialization issues still exist in cloud computing due to this problem hardware resource allocation takes serval minutes of delay in the response process. To solve this issue prediction techniques are used for proper prediction in the cloud data center to dynamically scale the cloud in order for maintaining a high quality of services. Therefore in this paper, we propose a hybrid convolutional neural network long with short-term memory model for host prediction. In the proposed hybrid model, vector auto regression method is firstly used to input the data for analysis which filters the linear interdependencies among the multivariate data. Then the enduring data are computed and entered into the convolutional neural network layer that extracts complex features for each central processing unit and virtual machine usage components after that long short-term memory is used which is suitable for modeling temporal information of irregular trends in time series components. In all process, the main contribution is that we used scaled polynomial constant unit activation function which is most suitable for this kind of model. Due to the higher inconsistency in data center, accurate prediction is important in cloud systems. For this reason in this paper two real-world load traces were used to evaluate the performance. One is the load trace in the Google data center, while the other is in the traditional distributed system. The experiment results show that our proposed method achieves state-of-the-art performance with higher accuracy in both datasets as compared with ARIMA-LSTM, VAR-GRU, VAR-MLP, and CNN models.


Introduction
Cloud computing delivers three main services which are infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). These services are provided to the user on the basis of pay and gain rule through the internet. One of the main roles of cloud computing is to provide a huge amount of virtualized resource to the end-user [1,2]. Cloud computing main characteristics is the delivery of computation as a service in which different resources like central processing unit (CPU) , software, hardware, application are granted to user through the internet. Cloud technology has been widely functional in various fields of life and owing to its best resources on-demand delivery, low resource cost, and capricious resource scaling. Different numbers of application have been developed on the cloud platform for improvement of these applications and different techniques are used for resource allocation and predication is one of them. But this technology is still facing several issues like resource and application balancing that can improve the performance of the system [3,4]. Resource and work-load prediction are important parameters of the cloud management systems or platforms. Prediction process improves accuracy rate and directly affects the security, quality of service, economical and management process which improve the performance of cloud computing. Normally load and application prediction are used to describe the future behavior of resources and applications on the specific aspect of the collected information base [5,6]. Figure1 shows the prediction dimension where they are used in cloud data centers.

Fig.1. Predication type
Load prediction performance parameters can be measures like CPU utilization, response time, throughput, memory utilization, and network utilization. Predication approaches are future divided into application prediction and load prediction. For proactive approach, different kinds of machine learning techniques are used that uses background records of cloud applications for a particular period of time. One of the main aims of machine learning approach is to create an intelligent resource management system based on the previous data. Application prediction is one of the mandatory steps for effective resource management in cloud computing for predictions of future demands. Application prediction work in different domain like quality of service (QoS), workload prediction and service-level agreement (SLA) metrics [7,8]. To come up with future demand of resources in a fast and accurate way, prediction approach in cloud computing is important. Resource management in a cloud environment might be prophesied accurately during the application. For that reason, accurate prediction is need that can reduce the cost and manage resource usage optimally [9,10]. IaaS provides flexible and fast information technology (IT), resources on demand, therefore, majority of the cloud providers offer scalable services that auto matically provide computer resources (such as C PU, memory, and storage). However, the scaling time to initialize a CPU and VM mainly introdu ces a delay of several minutes. To reduce scaling time delay, it is important to fix the exact amou nt of resources in advance. Consequently predict ing CPU and VM utilization is the key solution t o solve this problem [11]. The rest of the paper is organized as: The necessary background information for the resource prediction is discussed in section 2. Section 3, present the paper contribution, In Section 4: the model is proposed. In Section 5, we present the dataset information. Section 6, is based on the an alysis of the results and section 7 presents the conclusion. For the comfort of readers, we provided a list of the most frequently used acronyms in the paper in table1.

Related work
Cloud computing is offering flexible resource allocation system on the demand of cloud customers. Establishment of the resource prediction model is difficult because the cloud user demand change over time [12]. In cloud computing, cloud server roundtrip time is important therefore in [13] author proposed neuron fuzzy network along with eight probability distribution functions for predicting the round trip time (RTT). This technique measured the time in which data travel from a source node to destination and back. The proposed technique improves the efficiency and reduces the error rate. The results the author achieved enhanced the offloading system and improve the prediction rate.
Author [14] proposed an algorithm known as swarm intelligence-based prediction approach (SIBPA) it was designed for achieving a higher prediction accuracy rate in resource allocation systems in terms of CPU, memory, and storage utilization. The proposed algorithm results are compared with well-known algorithms.
In [15], author presents multi-agent system (MAS-) for dynamic monitor prediction system for computational resource allocation system in the cloud computing. The proposed technique of reasoning agent work cooperatively with architecture system and consist of three layer sections. Multiple linear regression models were used for presenting prediction with reduced means for error system. Based on the result it author claim that it achieve good rate in error and predication role using Google platform.
Load balancing approach and balance optimization system for hardware resource utilization is important in cloud computing therefore the author in [16], proposed a model knows as long short-term memory (LSTM) encoder-decoder. The first approach is used for feature context in historical workload and the second model integrates the attention mechanism into the decoder network and carries out the prediction. All experiments are performed on Alibaba and Dinda workload traces dataset. Based on the result, the proposed technique is claim to work more accurate and small sequence monitor system.
For estimation under loaded or overloaded resource utilization in the cloud data center most of the existing estimation methods used single models technique. To address this issue the authors [17], proposed an approach by training a classifier based on statistical features system for historical resource. The proposed model is then implemented through real data set and was used for resource utilization for specific time interval. Based on the result the proposed approach achieves better results as compared with the baseline approaches.
Load balancing approach is very important for reducing resource wastage by optimizing resource utilization in the cloud data centers. Therefore the author in [18], used supervised learning technique known as support vector regression (SVRT), a technique suitable for non-linear cloud resource workload forecasting, to the future usage of multi-attribute host resources. For improvement in the training and regression section, sequential minimal optimization algorithm (SMOA) was used in the proposed technique and was implemented with different types of dataset. Based on the result the proposed technique improves 4%-16% and the error percentage was reduced by approximately 8%-60% compared with the state-of-the-art methods.
It is important to be noted that, accurate prediction of data center resource utilization needs proper planning like scheduling, energysaving, work-load placement, and load balancing in the cloud data center. However, accurately predicting that resource utilization is a big challenge due to dynamic nature and heterogeneous infrastructures. Therefore the authors in [19], proposed a model based on deep learning adaptive window size selection methods. The sliding window size technique captures the trend of the last resource utilization and builds an estimation for each trend period, and based on that evaluate resource utilization. The proposed estimation technique yields 16 to 54% improvement in the prediction accuracy as compared to the baseline methods.
Load balancing technique is one of the main parameters of cloud computing with the help of this technique one can improve the system life time, therefore, the author in [20], proposed osmotic hybrid artificial bee and ant colony (OHBAC) algorithm. The proposed algorithm is a combation of artificial bee and ant colony and is used to reduce number of the active virtual machines and ultimately improve the network lifetime. For resource prediction the author use simple linear regression and optimal piecewise linear regression, with the help of prediction results it accurately select best VM among all them for better utilization. The proposed algorithm improves the network stability and minimization of the system as compared to the standard algorithm.
The author in [21], proposed gradient descent (GD) and leven berg-marquardt (LM) algorithm for dynamic load prediction model of cloud computing. The proposed models are used for validation of CPU usage prediction using Google traces and its efficiency is compares with different standard models. Based on the result, the proposed models provide better results in terms of prediction.  Table 2 shows the summary of related work, it consists of technique name, dataset, predication section, and platform. After the study of related work, it seems that different researchers are trying to improve the accuracy and efficiency ratio but still need improvement, therefore, this paper carry out with hybrid technique for the improvement of predication techniques for host utilization in the cloud data center. Two prediction parameters are used to check the performance of the proposed model which is CPU and RM.

Paper Contribution
This section highlights the contribution made by the author's in this paper.
The paper mainly aims to optimize the cloud resource utilization by enhancing the load predication approach. Our main contributions of the paper are summarized below and Figure 2 show the working criteria of proposed model.
(1) Propose a hybrid CNN and LSTM model for multivariate resource prediction in cloud data centre.
(2) Main contribution in the proposed model is implementation of scaled polynomial constant unit activation function.
(3) The proposed model used for more host load prediction in cloud data centre. (4) Estimation and comparison of the proposed hybrid model with different standard technique. (5) Extensive experimental evaluation using publicly available google cluster trace and traditional distributed system data sets for different data centres in cloud environment.

Proposed Model
We know that CNN is good technique for removing noise and to take into account correlation between variable and multivariate and LSTM model used for temporal information and maps time serious in to detachable space to generate predictions. In our proposed model consist of CNN-LSTM which is used for predict CPU and VM utilization, VM utilization is multivariate time series that is recorded over time that including spatial information among variable and irregular patterns of temporal information. The proposed models used for resource predication the metric are CPU and RM. In the initial stage input data are analyzed by the VAR regression technique for filter the linear interdependencies among the multivariate data. After that the residual data are computed and entered in to CNN layer which extract the complex feature of each the VM usage and CPU components that LSTM temporal information of irregular in time series components and generate the predication. In our multivariate time serious data are self-possessed of two sections these are linear and nonlinear section thus we can definite as: = + + ℰ (1) Where present the linearity of data for time t used while present signifies of nonlinearity section for error term ℰ value used. First section for multivariate time serious are analyzed by the VAR model which apprehensions the line trends. While for nonlinear or residuals of model part ( ) used which contain spatial and temporal information [22].
= + (2) In the spatial section for features and extracted with the help of CNN model and then after for inputs process hybrid CNN-LSTM model used which appropriate for modeling temporal information after that final predication generates complete. Before going to present our model we introduce some of the related of these two models after that we shift to multivariate workload prediction in cloud data center. Vector autoregressive models are design for nature tools forecasting or predication because their step are design in a such a way when current values of set of variable are partly explained it need values from the past variable then it proceed. The main role of this model to describe the joint generation mechanism of the variable involved [23,24]. The structure of variable or each variable in liner function of past logs or present logs of the other variables and itself present in the below equation. 1 ( ) = 1 + 11 1 ( − 1) + 12 2 ( − 1) + 13 3 ( − 1) + 14 4 ( − 1) + 1 ( − 1)( ) Where the equation y1 (t), y2 (t), y3 (t) and y4 (t) are the CPU, and RM a usage present for moment t, y1 (t-1), y2 (t-1), y3 (t-1) and y4 (t-1) are the CPU,and RM movement usage t-1(here in the section the lag value is 1). a1, a2, a3 and = −2 (̂) + 2 Where (̂) notation present the value of like lihood function for degree of freedom k notation is used these are parameter used in the equation. When we have model and generate AIC value small in size then they are generally better result and batter model. The residual values are calculated and arrived to the subsequent CNN-LSTM model. As the VAR model has recognized the linear trend, the lingering is anticipated to comprise the nonlinear features. − =

CNN-LSTM section
Convolutional neural network (CNN) is proposed from human neural system and it shows best result in wide range of application. One of the main characteristic of CNN are sparse connectivity and shared weight typical CNN is a hierarchical model they performed in computational layers like (convolutional layer and subsampling or pooling layer) and ultimately classification thought fully connected layer [27]. It is specialized type of neural network which is designed for working with different dimensional of image they may be two or three dimensional data. In the time series forecasting problem, A 1D CNN is capable of reading across sequence input and automatically learning the salient features. A one-dimensional CNN is a CNN model having a convolutional hidden layer that operates over a 1D sequence. For time series or forecasting problem A 1D CNN is accomplished of reading across sequence number of input and inevitably learning the salient features.1D CNN is very operational for deriving topographies from a fixed length section of the overall dataset and it is not important that where the segment of data is located it work for all and work in proper section [28]. After the first layer it followed by second convolutional layer where in some cases long input sequence are equation (9) is the result of the vector 1 is the output from of the first convolutions layer.
Where the equation section 1 present the input vector section 1 represent the base of j th feature map section is present the kernel weight, is the index value of filter, and σ is used for activation function like ReLU. After the convolution layer it followed by the pooling layer this layer job is to distill the output of the convolution layer to the most salient element. Main role of pooling layer is to reduce the size of the representation parameter and network computation costs. Max-pooling used for resource usage forecasting or prediction by using the maximum value of each neuron along with the cluster in the previous layer this section also effect of adjusting the over fitting section [29]. Equation 10 presents the max pooling layer.
Where T is the stride which decides that how far to movie the area of input data and R presents the pooling size that is less than the input y. After the convolutional and pooling layer followed by the LSTM layer that infers the features extracted by the convolutional section of the proposed model. Flatten layer is used between the convolutions layer and LSTM layer that is used to reduce the future maps in to single one dimensional vector [30].

Long Short Term Memory neural networks
The  function, also denoted with the same symbol in Fig.3, is the logistic function, often called the scaled polynomial constant unit activation function. It's the activation function that enables nonlinear capabilities for the model. As we mention previously that LSTM has two property values which are hidden state ( ) value of cell that change with time and ( ) present the cell state which make possible to conserve memory in the long term. LSTM can add and remove information in the cell state these state are forget gate ( ) for connection of input ( ) for pervious hidden state ( − 1) to present the cell state ( ) it allow the cell to remember or forget ( ) and ( − 1) are used. For the input section ( ) and ( ) determine the feed the input value to the cell state ( ).When it serves a forgetter that is multiplied to the call state it has different time step to drop values that are not needed and keep those that necessary for predication. The output gate ( ) also determines the exit based on the cell state's process is show in the below [32].
The above education present the working of LSTM gate and its working criteria and figure 2 show the structure of cell and it working section.
The  function also present with the same notation in figure 3 it used fir logistic function it often called the scaled polynomial constant unit and it is the activation function that enables nonlinear capabilities for the model. Therefore in this model we change the activation function also we used the activation function. In the next step the input gate and candidate gate operate together for render the new state cell which is known as State t C this section passed into the next step as the renewed cell state the input gate user the scaled polynomial constant unit as the activation function which is explained in the comping section and the input candidate utilizes hyperbolic tangent, each outputting t i and t C' .The proposed hybrid CNN-LSTM model predication algorithm work in four steps these are: Data preprocessing, fixing model, model fitting along with estimation and predication of model. As we mention before the residual value calculated by the algorithm are pass in the CNN-LSTM model. In the proposed technique four time steps are used and every sample split into pair of subsequences the CNN model can deduce every sub sequence therefore the LSTM can piece along the interpretation from the subsequences. This subsequence we split into two times as per subsequence the CNN then defined to expect two times as per subsequence with four options. Then the whole CNN model wrapped in to time distributed wrapper layer so that it applied to very subsequence the sample. After that we interpreted the result by the LSTM layer that used fifty block or neurons finally the dense layer output the predication. Figure 3 present the working of active function.

Fig.3. Activation function
The rectified Linear unit and scaled polynomial constant unit activation function is used CCN layer and LSTM block [33].
Scaled polynomial constant unit activation function is activation function define by (Kise ) in 2020 which is given below.
With ( ) = 3 ( 5 − 2 4 + 2) and 1≤ < ∞ .we admit c goes to infinity with ( ) → ∞ . The network is trained for 100 epochs with batch size of 1.Where X presents the input of neuron the problem of disappearing gradient can be greatly reduce using the ReLU activation function. The network is trained for 100 epochs with batch size of 1.Where X presents the input of neuron the problem of disappearing gradient can be greatly reduce using the ReLU activation function. Table 3 present the parameters of setting of proposed model.

Dataset information
The workload offers data arrival execution and termination of different tasks along with time stamps. In this study we analyze and predict the CPU and RM resource usage metrics. We generate and analyze the out-of-sample predictions for the succeeding 80 (50 minutes), 200 (60 minutes) and 400 (120 minutes steps ahead. Resource usage values are aggregated at 4 Seconds time interval. Google cluster trace are based on a cluster of about 12500 machine and provide information about time of different tasks arriving to the center for a 29 days period. We took 70800 samples of 7 days for training of the resources predication models and the next 30 (4 minutes) sample for time series are present as validation data for selection of appropriate parameters. Before training the network we make preprocess the data with a technique which is known as standardization by first subtracting the mean value of the training data and then dividing the standard deviation of it. Ascending approach can help with the conjunction when applied incline descent to the networks, and it noticeably improves performance of the model. During the experiment, standardization was applied to all related methods in order to obtain a fair comparison. The parameters chosen by the validating set are given in Table 3. The data set traced over 670,000 jobs and about 4 million task events across over 12500 machine darning 29 days More than ten metrics were collected by the Google trace, including CPU usage, assigned memory, page-cache memory usage, disk I/O time, disk space. As the other methods did, we only predict the CPU and RM usage values.

Experimental Results and Analysis
This section consist of different result where we present predication effectiveness of our proposed model with four exciting model like, ARIMA-LSTM, VAR-GRU, VAR-MLP, CNN and compare their predictive results.

Mean load Prediction
To make the result equivalent with other models a metric used know as exponentially segmented pattern which was used for characterize the host load fluctuation over consecutive time intervals whose lengths increase exponentially [35].The mean segment squared error (MSSE) defines as below which was applied to quantify the performance of mean load prediction.

Prediction Result
The accurate predication of the C.P.U and RM utilization in cloud data center is vital to improvement of resource utilization. For this process mean-squared error (MSE) was used to evaluate the accuracy of the prediction results which is defined as below: 2 () 1 n I MSE y y n  

(21)
Where N is the prediction length, y is the predicted value, and 2 ) y is the real load value.
After the simulation two type of result are taking which are specific and over all result based on the result our proposed model provide an accurate prediction with history values due to the simple, regularly changes which present in figure 7.

Distribution of Load traces on System
For load trace predication we used HPC system the time services chosen here are form four most interesting host load which are axp0, axp7, sahara and themis, these are collected from load trace on Unix system collected by [36,37].These four load trace are present as diversity both in capture periods and in machine types as illustrated in Table 4 with other parameters. The load trace was scaled to a range of [0,1,0, 9] and for normalization the above formula was applied to each load trace.  is important element for improvement in resource allocation system in cloud computing. Due to the higher variance in data center accurate predication is important in cloud system. For that reason in this paper two real-world load traces were used to evaluate the performance. One is the load trace in the Google data center, and the other is that in a traditional distributed system. The experiment results show that our proposed method achieves state-of-theart performance with higher accuracy in both datasets.

Fig.9. Predication result
We only predict the actual load value and compared with our proposed model with the four models and the original hyper-parameters of Google cluster dataset was applied to the different models to compare the generation ability of them. In this study each load trace was split 80% of its length into a training set and the rest was the testing set. The prediction results are mention in to figure 9 based on those result the proposed model indicate powerful generation base on time and execution. From figure 10 the two load prediction are mention in which actual load prediction results are mention in (Left) The load traces of axp7 on the Unix systems mention in (Right).The noisy load trace in Google cluster data drastic fluctuates. Based on those result our proposed model provide barely satisfactory performance.

Conclusion
An important feature of cloud computing are the ability to determine allocation of resource and application based on actual usage. However for resource allocation operation required start-up time. For that reason it need plan in advance the amount of resource needed for future. For that reason in this paper we proposed new approach for host predication in term of CPU and RM utilization predication. The proposed hybrid CNN-LSTM model for multivariate workload prediction in an attempt to extract complex features of the CPU and VM usage components, then model temporal information of irregular trends in the time series components for that purpose we used new activated function. We also evaluated our proposed model with two type of dataset based on the experimental result our proposed model achieved satisfactory performance in both of the datasets. Our future work is to assimilate the proposed method in the scheduling algorithm, which will improve the resource utilization and lower the cost of the data center.

Compliance with ethical standards Conflict of interest
All authors declare that they have no conflict of interest Funding: This paper selected for close section and has no funding. Ethical approval: The paper not submitted any journal it the work of authors.