An Exotic IWD - SVR Based Approach for Failure Prognostication in Cloud-Based Scientic Workows

. Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques .


Introduction
Cloud is the buzzword in the computational technologies that has brought a paradigm shift in the way data is stored and computing is performed. Cloud computing is a subscription-based service that delivers computation as a utility. The key characteristics of cloud computing are providing elastic, on-demand delivery of services with dynamically configurable resources and innovative pricing models. In the past decade the concept of scientific workflow has emerged as a booming paradigm for modeling large scale complex data in diverse computing domains. Scientific workflows are abstractions composed of activities and data with intricate dependencies managed by complex engines. Fault tolerance is one of the critical factors to guarantee reliability of the cloud services. Fault tolerance is the ability of the system to keep working in presence of one or more faults but with decaying performance. Fault tolerance strategies in cloud are classified as two types: reactive and proactive. Reactive fault tolerance strategies are the techniques used to effectively troubleshoot a system upon occurrence of failure(s). Various reactive fault tolerance strategies are: Replication, Task resubmission, checkpointing and so on. The execution of such techniques automatically results in the performance degradation of the system. Proactive fault tolerance makes use of a prediction approach to anticipate the failures in advance thereby reducing the downtime of the system. Failure prediction influences the delivery of on-demand services in the cloud. Accurate predictions of failure prone machines can help in mitigating the impact of failures in a proactive manner. Hence this work emphasis on a mathematical prediction-based approach to accurately anticipate the failure prone hosts in the cloud resource pool which facilitates the early migration of the virtual machines to other active hosts with no performance degradation.

Machine Learning Approaches for Failure prediction
Cagatay Catal (2011) suggested Naive Bayes as the robust machine learning algorithm for supervised software fault prediction [1] . Malhotra and Ankita Jain (2012) observed that the random forest and bagging gave the best results for fault predictions [2] . But they did not consider the effect of size, resource related issues on fault proneness and its severity. Sadeka Islam, et. al (2012) proved that the prediction accuracy of Error Correction Neural Network (ECNN) demonstrated superior effectiveness for forecasting resource utilization in the cloud [3] . Ilenia Fronza, et.al (2013) introduced a new approach for predicting failures based on Support Vector Machines (SVM) and Random Indexing (RI) [4] . The results of their work proved that weighted SVMs perform better in improving sensitivity. This approach proved reliability in predicting both failures and non-failures. Anju Bala and Inderveer Chana (2015) have designed intelligent task failure prediction models that have been implemented through machine learning approaches like ANN, Naive Bayes, Random Forest, and LR using evaluated performance metrics [5] . The approaches proposed in this paper only predicts whether the task fails or not but does not determine the rate of failure of each task. Upasna Kothari and Moe Momayez (2018) proposed the use of machine learning to predict the time of failure [6] . The results of the study proved that Machine Learning provided prediction values that are 86% of the time closer to the actual time of failure when compared to the traditional methods. P.Padmakumari, et.al [7] used the ensemble with the combination of machine learning methods in workflow environment to improve accuracy and efficiency of the prediction model. Till now no work has been proposed using the combination of SVR and IWD in the failure prediction of scientific workflows.

Nature Inspired Support Vector Regression
Xiang -Ming, et.al. [8] (2017) constructed a PV power prediction model based on EMD and ABC-SVM and proved that the method is superior to other approaches. The model proposed by Jui-Sheng Chou, et.al. [9] (2015) is a hybrid of LS-SVR and SFA in which SFA integrates an artificial firefly colony algorithm with chaotic map, adaptive inertia weight, and Levy flight. It uses SFA to optimize LS-SVR hyper parameters (i.e., regularization parameter and sigma parameter) and then uses LS-SVR for prediction.

3.
Approaches Used for Task Failure Prognostication

Support Vector Machines for Regression
The concept of Support Vector Machine (SVM) [10] was first introduced by Boser, Guyon and Vapnik. SVM is a supervised machine learning algorithm based on statistical learning theory to fix both non-linear classification and regression challenges. It has great performance since it can handle a non-linear classification efficiently by mapping samples from low dimensional input space to high dimensional feature space with a non-linear kernel function. The key parameter of SVM is the type of kernel function used. Kernel functions are used for mapping into a higher dimensional feature space. Mercer's theorem is used for the construction of positive definite kernel for SVM regressor.
Support Vector Regression (SVR) has been applied to many actual issues like predicting the performance of compact heat exchangers (Peng and Ling, 2015) [11], modeling of heat transfer in a thermo syphon reboiler (Zaidi, 2015 ) [12] , predicting the sorption capacity of lead (II) ions (Nusrat Parveen et. al, 2016 ) [13] , predicting the price in a car leasing application (Mariana Listiani, 2009) [14], modeling and predicting Turkey's electricity consumption (Kadir Kavaklioglu, 2010) [15] , analyzing prognosis of infants with congenital muscular torticollis (Suk-Tae Seo, 2010) [16] , and so on. The architecture of Support Vector Regression is given as follows: Fig.1: Architecture of a regression machine constructed by the support vector algorithm [10] 3.1.1

Selection of Kernel Functions and Parameters
The key of SVR is the selection of kernel functions and model parameters. They have a great influence on the prediction accuracy of the SVR. Some of the kernel functions are

Type of Kernel Formula
Linear kernel

Intelligent Water Drops Algorithm for Optimization
The behavior of natural resources acts as a rich source of inspiration for us in tackling numerous real life optimization problems. Intelligent Water Drops Algorithm (IWDA) [17] is one among them in which the flow of water drops along the twist-and-turn path of the natural rivers to determine an optimal path towards its destination helps us to implement it in the form of an algorithm to solve optimization issues. The IWD algorithm is a population based constructive optimization algorithm whose idea depends on the responses of the natural water drops in rivers.
In the IWD algorithm, each intelligent water drop is created with two properties: velocity of the water drop and amount of soil each water drop carries. These properties commence with an initial value and change during the flow of an IWD from a source to its destination. The trip of an IWD begins with an initial velocity and zero soil.
During its journey, the IWD flows in discrete steps from its current location to its next location. Hence the velocity of the IWD increases non-linearly proportional to the inverse of the amount of soil between the two locations. Therefore, a path with less soil lets the IWD move with greater velocity than a path with more soil. An IWD gathers soil during its journey which is non-linearly proportional to the inverse of the time needed to travel from the current location to the next. The time taken is proportional to the velocity of the IWD.
The IWD uses a mechanism to select its path to the next location. According to this mechanism, it prefers to travel in the path with low soil so that it can move with greater velocities. The same mechanism can be adopted to find an optimal solution in several real-time problems. Some of the popular applications of IWDA are Vehicle routing problem, Multiple Knapsack problem, Job Shop scheduling, Travelling Salesman problem and so on.
The pseudo-code of an IWD-based algorithm might be indicated in eight stages:

4.
The Proposed IWD -SVR Approach The proposed model seeks to adopt intelligent water drops algorithm to optimize the support vector regression parameter which involves the following steps: a v b v + c v * soil 2 (i, j)) soil IWD = soil IWD + ∆soil(i, j) ∆soil(i, j) = a s b s + c s * time(i, j ∶ vel IWD (t + 1)) So that time(i, j ∶ vel IWD (t + 1)) = HUD(i, j) vel IWD (t + 1) Where HUD(j) is the heuristic undesirability defined for the problem. The soil along the path is updated using soil(i, j) = (1 − ρ n ). soil (i,j) −ρ n. ∆soil(i, j)

[Step 6.5] Update the soil along the path based on the current iteration best solution.
soil(i, j) = (1 + ρ IWD ). soil(i, j) − ρ IWD. 1 (N IB − 1) soil IB IWD ∀(i, j) ∈ S IB Where NIB, is the number of nodes in the iteration best solution, and S IB is the iteration best solution.
[ Step 7] Update the global best solution based on the objective function. [ Step 8] Increment the iteration count, and repeat the steps until the stopping criteria is met. After all IWD iteration complete, N number of water droplet obtains N group parameter C and σ.
[ Step 9] The parameter C searched for each IWD and σ, utilizes study collection to carry out the method for model training according to step 3. [ Step 10] Adopt these values to carry out the determination of failure prognostication; the accuracy of the process is high.

Evaluation Criteria
This work correctly identifies which tasks could be failed due to resource overutilization. For predicting its performance, the data were collected from different scientific workflows with 25, 50 and 100 tasks at fixed intervals using the CloudSim [18] and WorkflowSim [19] tools. The task failure prediction accuracy has been evaluated using some evaluation metrics. Pegasus [20] takes care of abstract mapping to concrete workflows.

Sensitivity (Recall)
Sensitivity is a measure of the proportion of actual positive cases that got predicted as positive (or true positive).

Sensitivity = (TP) / (TP + FN)
 TP -True Positive = Tasks that are labeled as failure and also evaluated as failed tasks.  FN -False Negative = Tasks that are labeled as non-failure but evaluated as failed tasks.

Specificity
Specificity is defined as the proportion of actual negatives, which got predicted as the negative. Specificity = (TN) / (TN + FP)  TN -True Negative = Tasks that are labeled as non-failure and also evaluated as non-failed tasks.  FP -False Positive = Tasks that are labeled as failure but evaluated as non-failed tasks.

Precision
Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. Precision = (TP) / (TP + FP) F1 Score F1 Score is the Harmonic Mean between Precision and Recall. The range for F1 Score is [0,1]. The greater the F1 Score, the better is the performance of our model.

Classification Accuracy
Classification Accuracy is the ratio of number of correct predictions to the total number of input samples.

Experimentation results and discussions
Our experimental setup has four steps:  Data collection  Evaluation of performance metrics using percentage splits  Comparison of proposed approach with existing models for various scientific workflows.

Data Collection
CloudSim [18] and WorkflowSim [19] classes are used to collect the dataset for failure prognostication. For predicting its performance, the data were collected from different scientific workflows with 25, 50 and 100 tasks at fixed intervals using the CloudSim [18] and WorkflowSim [19] classes. The attributes considered in the failure prediction process is listed below:  The maximum dynamic threshold is set using the utilization parameters such as CPU, Bandwidth, RAM, Memory, Disk and EET based on the current values and the historical data. If the current utilization value is greater than the maximum threshold value, the task is categorized as failure.
Tid and VMid denote the id of the failed task and its corresponding Virtual Machine. Did denotes the id of the datacenter where the task failure occurs. Percentages split used for training and testing data are 66% and 34% respectively.

Evaluation of Performance Metrics Using Percentage Split
The main contribution of this work is a SVR-IWD based approach in which the accuracy of the failure prognostication is high compared with the existing approaches. The metrics of the evaluation of the proposed approach is chosen as given by Aditya Mishra(2018) [21] . Figure 4,5 and 6 illustrates the comparison of various performance metrics such as sensitivity, specificity, precision, recall and F1 score.

Comparison evaluation of various scientific workflows with existing approaches
To evaluate the algorithm, library of realistic workflows which are used in the scientific community Montage (astronomy), Epigenomics (biology), LIGO (gravitational physics) and SIPHT (biology), CyberShake are used. These are available as abstract workflow in XML format (DAX) in the website [15]. Each workflow application is evaluated using four different sizes, small, medium, large and extra-large. Small workflows have 24 to 30 tasks, medium workflows have 45 to 50 tasks, large workflows have 100 tasks and extra-large workflows have 995 to 1000 tasks.
The Montage is an astronomy application which is used to create custom mosaics of the sky based on the set of input images. The tasks of the Montage workflow are mostly I/O sensitive and require less amount of CPU processing capacity to execute. The Epigenomics is used in the bioinformatics field and its basic purpose is to automate the execution of various genome-sequence operations. Most of the tasks of the workflow require high CPU capacity to process with low I/O utilization.

Fig.8: Structure of Epigenomics Workflow
The LIGO's Inspiral Analysis workflow is used in Physics to detect gravitational waves. Maximum numbers of the task of the workflow characterized as CPU sensitive that consumes large memory. The Cyber Shake workflow is generated by measuring the earthquake hazards by generating synthetic seismograms. Most of the tasks of the workflow are CPU sensitive with high memory requirements. The SIPHT workflow from the bioinformatics project at Harvard is used to automate the search for untranslated RNAs (sRNAs) for bacterial replicons in the NCBI database. This analyzes data from the coalescing of compact binary systems such as binary neutron stars and black holes. The jobs in this workflow are the most computationally intensive and have relatively high CPU utilization. The proposed IWD-SVR based approach is evaluated using the various scientific workflows such as Epigenome, Montage, CyberShake, Inspiral and Sipht. Figure 7 illustrates the accuracy comparison of proposed approach with existing Naive Bayes, Random Forest, Rule Based, and Logistic Regression. The overall accuracy of the proposed approach is 98.2% which is 2.18% higher than the existing Naive Bayes approach.

Conclusion
Proactive fault tolerance mechanism is preferred in scientific workflows in cloud environment because they can anticipate the failure well ahead so that there is sufficient time for migration and mitigation of failures. Proactive mechanisms are mainly based on the historical data to predict the future faults. The efficiency of such mechanisms is based on several factors of which the prediction accuracy plays the vital role. Hence the focus of my work is to enhance the prediction accuracy of the proactive fault tolerance mechanism. In our proposed IWD-SVR based approach, the prediction accuracy can be achieved better by nearly 2.18% higher compared to the existing approaches.