AN IMPROVED MULTI-OBJECTIVE WORKFLOW SCHEDULING USING NSPSO WITH FUZZY RULES

Lot of scientific problems in various domains from modelling sky as mosaics to understand Genome sequencing in biological applications are modelled as workflows with large number of interconnected tasks. Particle Swarm Optimization (PSO) based metaheuristics are currently used to address many optimization problems as they are simple to implement and able to produce quickly optimal or sub-optimal solutions based on learning capabilities. Even though many works are cited in the literature on workflow scheduling, most of the existing works are focused on reducing the makespan alone. Moreover, energy efficiency is considered only in few works included in the literature. Constraints about the dynamic workload allocation are not introduced in the existing systems. Moreover, the optimization techniques used in the existing systems have improved the QoS with little scalability in the cloud environment since they consider only the infrastructure as the service model.In this work a new algorithm has been proposed based on the proposal of a new Multi-Objective Optimization model called F-NSPSO using NSPSO Meta-Heuristics. This method allows the user to choose a suitable configuration dynamically. An average of above 15% in the energy reduction for the proposed system over simple DVFS was achieved for all types of workflow applications with different dimensions. Similarly when compared to NSPSO an energy reduction of at least 10% has been observed for F-NSPSO for all three types of workflow applications. Compared to NSPSO


Introduction
Cloud computing has revolutionized the Information and Communication Technology (ICT) field by proving dynamic and highly scalable resources on-demand to clients on a pay as per resource usage. It helps to reduce the high up-front investment cost for infrastructure and the maintenance and upgrade costs by allowing the organization to either outsource its computational needs or by building a private cloud data centre. Different fields of computing like grid, distributed cluster, and cloud aim at providing the computational power as a utility to many end users. Lot of scientific problems in various domains from modelling sky as mosaics to understand Genome sequencing in biological applications are modelled as workflows with large number of interconnected tasks. Scientific community is showing increasing interest in adopting the cloud platform for deploying workflow applications in because of its attractive features like dynamic resource provisioning, heterogeneous resources, pay for usage of resource and flexible billing models.In a white paper by Delforge & Whitney(2014),the authors have reported that energy consumption by IT equipments is very high and they contribute to overall 40 % of the energy consumption. In another survey (Sareh,2016) it is reported that cloud data centers contribute to 3% of the global energy consumption and it is expected to rise in future. The major resources for energy consumption in cloud data centers are servers,networking devices, memory, storage devices and other cooling equipments. Also, improper use of resources by the application also contributes to the high energy consumption by the cloud data centers.
High energy consumption by the data centers will lead to the high maintenance cost for the cloud service providers, hence for the cloud service providers it is very important to minimize the energy consumption and at the same time satisfy the SLA parameters such as deadline and budget constraints of the users.

A.Features of Scientific workflow
Scientific Workflow application contains thousands of interconnected tasks with input and output data dependence among the tasks. These applications are mathematically modelled using Directed Acyclic Graphs (DAG) with vertices representing tasks of the application and edges depict the dependency between the tasks. Figure 1.1a shows DAG representation for an example workflow application. The numeric value associated with each vertex is the task execution time. Numeric value associated with an edge is the size of data transfer between two tasks. Starting and end of a graph is indicated with two special tasks Tentry and Texit are inserted as pseudo tasks in the workflow.

Figure 1.1 (a) and (b)DAG and matrix representation of example workflow application
The dependencies of the DAG are stored in data structures such as 2 dimensional matrices to capture the dependencies and to store the data transfers. Figure 1.1b shows the matrix representation for the example workflow application given in Figure 1.1a.

II. Literature Survey
Efficient scheduling of workflows is with advanced optimization techniques was done by applying new methods using Direct Acyclic Graphs (DAGs) which perform the scheduling operations in parallel using distributed systems. In general for any application execution resource management and scheduling in cloud platform is very complex and many authors have carried out their research work in this area. Figure 2.1 shows the Google trends graph for the research interest on the search term "Task scheduling in cloud computing"  To perform efficient workflow scheduling, all the resource and task combinations have to be checked which results in large solution search space. In order to satisfy both cloud provider and scientific user requirements, a high performing multiobjective optimization algorithm which generates all the quality solutions is required.
As the set of quality solutions is huge for large scale applications, hence an efficient fuzzy mechanism is needed which analyses these solutions quickly based on the users requirement.

3.Background
In this section basic concepts about multi-objective optimization is presented.Currently to solve many practical problems metaheuristic techniques have become more popular .These algorithms promise to be less cost and easy to understand and implement many real time applications which have multi-objective constraint. [13] Traditional approach for solving multi-objective optimization was done by converting the problem into single objective optimization and it is known apriori approach.
Weighted sum approach is one of the popular algorithm from apriori methods.
Advantage of weighted sum approach is that it is simple and less calculations are needed to get the solution. However, the main drawback is that weighted approach provides only a single solution and hence trade-off analysis cannot be performed. Also, it is very important to apply proper weight values based on the user preferences and a small deviation in the weight value will result in different solutions. To overcome these draw backs, most of the multi-objective optimization problems are solved using Pareto-optimal set which is generated by using Pareto-dominance relation. Pareto dominance relation is used for determining which optimal solution is better. In Pareto dominance relation instead of obtaining a single optimal solution, it emphasizes finding a set of alternatives with different trade-off values among the objectives. These solutions are called Pareto optimal solutions or non-dominated solutions.

Basic concepts of multi-objective optimization :
Most of the engineering problems currently use Multi-objective optimization which refers to the optimization with more than one objective (criterion) function .

Pareto Optimal Solutions
Pareto optimal set denotes the notion of solution dominance in multi-objective optimization problems. A solution X is considered to dominate another solution Y, if X is better than Y in at least one of the objectives and better than or equal to in other objectives.

Definition 2:Pareto-Optimal Set
Formally, Pareto optimality is defined as, a solution x α ∈ is Pareto optimal if there does not exist another solution x∈ such that ( ) ≺ ( )

Non-Dominated Sorting PSO
The main goal of multi-objective optimization problem is finding as many solutions as possible which are close to the Pareto-front. Figure 3.1, shows an example for Nondominated Sorting technique in which three fronts, namely first front, second front and third front are considered.  bests and offspring of all other particles in the entire population to provide an appropriate selection procedure to push the swarm particles towards the true Pareto optimal front. This helps to identify a large number of non-dominating solutions.
In basic PSO, at each generation t, comparison between particle and its offspring is done which leads to losing some important non-dominating solutions.
Let P 1 t andP 2 t represent two particles considered in PSO algorithm at time instant t and F(P 1 t ) and F(P 2 t ) represent the evaluation of fitness values for particles P1 and P2.F(X 1 t+1 ) and F(X 2 t+1 ) represent fitness values for P1 and P2 at time instant t+1. In basic PSO, the comparison is performed between particle and its offspring. For example in this scenario, the basic PSO makes the comparison between only F(P 1 t ) and its offspringF(X 1 t+1 ),similarly between F(P 2 t ) and its offspringF(X 2 t+1 ). This results in losing some of the valuable non-dominated solutions. This drawback is overcome in NSPSO by comparing each particle with all the offspring resulting in comparison of The basic PSO produces fast convergence which reduces the diversity of the swarm. In NSPSO, diversity of the solutions is maintained by using the concept of niching or crowding distance calculation. In the proposed system, Niche count is used to maintain the diversity in population.

Workflow Energy Consumption
Energy consumption E(G) for running the workflow application G in a virtualized environment is directly proportional to the number of VMs required for running the application. Total energy consumption for executing a workflow application is given by  Energy consumption of the processor in an idle state with the time period t idle_time is calculated using the following formula The total energy consumption of tasks on VM depends on the energy consumption of the processor of a physical machines on which VM is running and it is defined below i=1 E total (VM) = E dynamic + E idle (4.6) Total power consumption by the workflow application is the sum of the power consumed by all VMs and calculated as follows

4.1.2Workflow Makespan Calculation
One more important performance measure for scientific workflow application is makespan

PARETO-BASED WORKFLOW SCHEDULING USING F-NSPSO
In the proposed work, non-dominated solutions generated by NSPSO workflow scheduling algorithm are used to construct the Pareto-fronts and fuzzy rules with a newly proposed fitness function being used to perform resource and schedule optimization. The resultant Pareto-fronts has been used in this work to provide an optimal configuration of cloud to the scientific users. Moreover, the proposed F-NSPSO based multi-objective optimization is useful for workflow scheduling and the proposed model has been tested by applying the proposed algorithm on various scientific workflow applications with different sizes. In addition, the quality of the Fuzzy and Pareto-front generated from the proposed F-NSPSO has been tested using different Pareto-front analysis metrics and the newly proposed fitness function.

Particle Encoding and Initialization
The following steps are used to represent particles in the proposed F-NSPSO based workflow scheduling problem.
Step 1: Apply topological sorting algorithm to maintain task dependencies of the workflows.
Step 2: Initialize the Virtual machine array VM [Id, VM Type, Available_ time] Step 3: Map the tasks in the sorted result to various instances for generating different particles Table 4.1 shows the sample particle encoding used in F-NSPSO for the tasks of example workflow application shown in figure 4.1 and VM details given

F-NSPSO Algorithm
As the proposed system is for workflow applications where the tasks have to be executed in a predefined order, the initial population is obtained by applying a list based HEFT algorithm. Algorithm 4.1 shows the steps used in performing workflow scheduling using F-NSPSO and the fitness function computation is given in Algorithm 4.2.

Fuzzy Rules
Fuzzy logic helps to perform reasoning under uncertainty.It incorporates a basic rule-based approach on well formed formulas and the rules are represented by IF x AND y THEN z. A fuzzy inference systems applies rules which are stored in a knowledge base against facts present in the application to perform inference. Table   3.3 shows the fuzzy rules used in the proposed algorithm for finding the best solution from a set of workflow scheduling solutions with optimal configuration from the cloud data centre in which the scheduling activities are carried out. The rules in the table 4.3 are interpreted as IF… THEN rules, for example, the last row of the table represents the rule:

IF MEMORY_SIZE IS HIGH AND ENERGY_CONSUMPTION IS LESS AND AVAILABLE_TIME IS SMALL THEN SOLUTION_TYPE IS EXCELLENT
Similarly, the first row of the Table 4.3 represents the rule:

IF MEMORY_SIZE IS LOW AND ENERGY_CONSUMPTION IS HIGH AND AVAILABLE_TIME IS LARGE THEN SOLUTION_TYPE IS POOR
In this way, the fuzzy inference system developed in this work applies the fuzzy rules using forward chaining inference mechanism to perform deductive inference in order to make efficient decisions on configurations and the scheduling methods. In memory size, Low memory size indicates a memory size up to 32 GB.

Results and Discussion
In this section, the description of parameter settings of this work for implementing the proposed F-NSPSO and set of real world workflow application used in the experiment are presented. Table 5.1 shows the parameters used in the experiments.    An average of above 15% in the energy reduction for the proposed system over simple DVFS was achieved for all types of workflow applications with different dimensions. Even though simple DVFS is an efficient energy saving mechanism, the proposed system is performing better because it looks for all the combinations of resources that can exist in the dynamic cloud environment. Similarly when compared to NSPSO an energy reduction of at least 10% has been observed for F-NSPSO for all three types of workflow applications.

Energy Consumption analysis for Cybershake Workflow with F-NSPSO
From the above performance analysis it can be observed that the energy consumption of the proposed F-NSPSO is less when compared with the existing task scheduling algorithms such as DVFS and NSPSO. This is due to the use of effective optimization process which is carried out by incorporating the appropriate fuzzy membership function

6.Conclusion
In this work a pareto based solution for workflow scheduling using multiobjective optimization based on Fuzzy-NSPSO has been developed. The algorithm uses a fitness function to minimize the energy consumption, makespan and paid idle time of the resources. For large scale scientific applications number of non-dominated solutions will be more ,for deciding the quality of the solution memory utilization is also considered as