A Multi-Objective Cloud Workow Scheduling Optimization Based on Evolutionary Multi-objective Algorithm with Decomposition

： In the cloud computing environment, cost-effective workflow task scheduling is the key problem that cloud computing service providers need to solve. However, previous scheduling methods only consider one-sided demands, such as minimizing running time or running cost. In this paper, the cloud workflow scheduling model including two minimizing time and execution cost are established, and then the MOEA/D algorithm based on weight vector adjustment and local search is proposed, and the algorithm is applied in the model solving process. Firstly, the weight vector adjustment method is employed to obtain more evenly distributed solutions; and in order to obtain more evenly distributed solutions and hope to speed up the convergence speed of the solution process, this paper adds local search operators into the solution process of evolutionary algorithm, and proposes MOEA/D algorithm based on local search and weight vector adjustment as an improved multi-objective optimization algorithm to solve the cloud workflow scheduling model based on time and execution cost, it can be turned out that MOEA/D algorithm based on local search and weight vector adjustment can obtain more evenly distributed solutions than MOEA/D algorithm and NSGA-II algorithm on the basis of faster convergence speed, which provides decision support for cloud workflow scheduling decision-makers.


Introduction
In the fields of astronomy, geography, bioinformatics and physics, workflow is often used to model and execute large-scale complex problems [1]- [2]. Among them, workflow in the field of scientific research can support large-scale and complex scientific processes, such as simulating experiments, proving scientific falsehood and visualizing scientific data [3]- [5]. To be sure, workflow provides an effective way to process and extract information from the growing mass of data. And, because workflow is often composed of many tasks with complex control and data dependence, the execution of workflow tasks usually involves invoking many different distributed computing services for intensive data analysis and collaborative knowledge discovery. Workflow can make full use of the characteristics of cloud computing environment, such as resource flexibility, scalability, and payment on demand, which enables users to deploy large-scale and complex applications at a very low cost, and dynamically adjust the resource configuration at different stages of the application. Therefore, workflow tasks are very suitable for running in the cloud computing environment.
Cloud workflow scheduling algorithm is a key factor that enables workflow to effectively utilize the characteristics of cloud computing environment. The scheduling algorithm is responsible for effectively scheduling workflow tasks in a group of computing resources while maintaining task data dependence. As a typical NP hard problem [6], it is unable to find the optimal solution of cloud workflow scheduling problem in polynomial time. Traditional scheduling algorithms, such as first come first service, priority ranking and so on, can only get one solution, and users can not make appropriate decisions according to their own preferences. In addition, many traditional scheduling algorithms seldom consider the multi-objective nature of workflow scheduling in cloud computing environment, such as users want to spend the minimum cost while minimizing the completion time.
Multi objective optimization technology makes it possible for users to make preference decisions from multiple optimization solutions. For multi-objective optimization of cloud workflow scheduling, there is no one decision that can optimize all the objectives at the same time, but can obtain a set of compromise Pareto decisions with multiple conflicting objectives [7]. It is difficult to obtain the real Pareto decision set of cloud workflow scheduling, and it is often unnecessary to obtain the real Pareto decision set. Generally, a group of Pareto asymptotically optimal decisions with uniform distribution in the target space are obtained where evolutionary algorithm can effectively solve complex problems by learning from the evolution operations of natural organisms such as heredity and mutation [8]- [9]. Based on the above analysis, the cloud workflow scheduling method based on evolutionary multi-objective optimization has very important value and significance. This paper establishes a multi-objective optimization model of cloud workflow based on time as well as execution cost, then an improved MOEA/D algorithm based on weight vector adjustment and local search is proposed, Compared with MOEA/D algorithm and NSGA-II algorithm, the algorithm proposed in this paper can obtain a group of Pareto optimal solutions with uniform distribution on the basis of faster convergence speed, which can provide decision support for cloud workflow scheduling problem. The specific chapters of this paper are arranged as follows: chapter 2 mainly introduces the literature review of cloud workflow scheduling problem and multi-objective algorithm; in chapter 3, we analyze the goal conflict of cloud workflow scheduling and establish the workflow scheduling model in cloud environment; the MOEA/D algorithm based on local search and weight vector adjustment is designed and applied to the cloud workflow scheduling model based on completion time and execution cost in chapter 4.And the fifth chapter summarizes and prospects followed.

Literature Review 2.1 Cloud workflow scheduling
The research content of cloud workflow scheduling is to map the task set of workflow and the computing resource set of running task one by one. Once the task is successfully executed, the results will be returned to users through the Internet. According to the number of cloud workflow scheduling optimization objectives, cloud workflow scheduling can be divided into single objective optimization cloud workflow scheduling and multi-objective optimization cloud workflow scheduling.
Among them, the single objective optimization of cloud workflow scheduling research mainly aims at one of the cost, time and other indicators. Wu et al. [10] mainly adopted a task level scheduling method based on market model to minimize the overall operation cost on the premise of meeting the constraints of cloud workflow task service quality. Liu et al. [11] used the co evolutionary genetic algorithm to study the case of reducing the running cost and running time under the condition of meeting the user deadline. Tirapat et al. [12] used hybrid genetic algorithm and particle swarm optimization algorithm to minimize the total cost, including execution cost and data transmission cost, under the limitation of task completion time. Mosleh et al. [13] calculated the completion time of data access by considering the network service time and the arrival rate of network input and output requests under the deadline limit, and then analyzed and allocated the cost of data path according to the task priority, so as to save the execution cost and data transmission cost. Sossa et al. [14] used deadline constrained meta heuristics to schedule scientific workflow applications on infrastructure service level cloud. Calheiros et al. [15] adopted an optimal scheduling method in the hybrid cloud environment to optimize the execution cost under the condition of meeting the execution time constraint. Feller et al. [16] modeled the load migration problem as a multidimensional bin packing model, and used the method based on ant colony algorithm to solve the model. Lin et al. [17] focused on the optimization of execution time, which reduced the execution time by allocating resources flexibly.
Single objective optimization of cloud workflow scheduling has been unable to meet the growing needs of users, such as minimizing the running time and the execution cost. In this trend, multi-objective optimization of workflow scheduling in cloud computing environment has attracted more and more attention of researchers. Barrionuevo et al. [18] proposed a heuristic list scheduling algorithm based on Pareto domination, which optimizes the task completion time and the user cost of task execution at the same time, and provides a set of optional optimal scheduling schemes for users. Zhu et al. [19] used evolutionary multi-objective optimization algorithm to solve the cloud workflow scheduling problem of optimizing task completion time and task execution cost at the infrastructure as a service level, and proposed a new scheme of coding method, population initialization, fitness evaluation and genetic operator operation for the problem. Wang et al. [20] proposed an optimal scheduling algorithm which optimizes the task completion time and reliability of cloud workflow application at the same time. Fard et al. [21] proposed an effective multi-objective workflow scheduling algorithm in heterogeneous systems which considers the indicators including task completion time, task execution user cost, reliability and energy consumption, and gives an effective scheduling scheme on the basis of meeting user related constraints. Wu et al. [22] considered the time limit and budget constraints, optimized the energy consumption and reliability simultaneously, and used the general list scheduling algorithm and tuning mechanism to solve the multi constraint multi-objective optimization cloud workflow scheduling problem. Padmaveni et al. [23] used memetic algorithm to optimize task completion time and task execution cost simultaneously. Compared with genetic algorithm, the workflow scheduling scheme solved by this algorithm has better scheduling scheme. Pandey et al. [24] used particle swarm optimization algorithm to optimize the execution cost and data transmission cost. Saurabh et al. [25] used a multi-objective optimization method based on cat swarm to schedule workflow tasks in cloud computing environment. The objective of optimization is to minimize the cost of task execution, task running time and CPU idle time. Yang et al. [26] proposed a novel intermediate data storage strategy to reduce the execution of scientific workflow and the cost of data transmission. This strategy can automatically and dynamically select appropriate intermediate data sets to store or delete in the cloud environment. Duan et al. [27] proposed a communication and storage aware method to optimize both task execution time and execution cost under bandwidth and storage constraints.

Multi-objective optimization algorithm
Based on the characteristics of population evolution in natural evolution, evolutionary algorithm (EA) can realize the diversity and globality of search in the process of evolution. It is not limited by the shape and continuity of search space, and has good robustness and versatility. It is widely used to solve complex NP hard problems. Moreover, EA can get a set of optimal solutions in one run, Users can make decisions according to their own preferences. In the research of multi-objective algorithm, vector evaluation genetic algorithm (VEGA) proposed by Schaffer [28]; Goldberg proposed to select the non-dominated solution set by Pareto dominance relation and obtain the diversity of non-dominated solution set by niche technology [29]. Fonseca and Fleming proposed the first MOEA [30] based on Pareto dominance relation, namely multi-objective genetic algorithm (MOGA). Among them, the genetic algorithm [31] (NSGA) proposed by Srinivas and DEB in 1994 and the niche non dominated genetic algorithm (NPGA) [32] proposed by horn in the same year are the most representative. After that, Zitzler and Thiele proposed the intensive Pareto evolutionary algorithm (SPEA) [33], and elite individuals began to become the focus of evolutionary multi-objective optimization algorithm design. For example, in 2002, Zitzler proposed the second generation of intensive Pareto evolutionary algorithm [34] (SPEA2), and in 1999, Knowles et.al proposed the Pareto archive evolutionary strategy [35] (PAEs). In 2002, Deb et al. [36] proposed the second generation non dominated solution sorting algorithm (NSGA-II). In SPEA algorithm, individual fitness is determined by Pareto strength value, which improves the diversity of population by clustering; SPEA2 adopts the enhanced fitness allocation and clustering method; PAEs uses grid based methods for individual selection and diversity preservation; NSGA-II makes individual selection based on fast non dominated solution ranking and crowding distance, which reduces the computational complexity and effectively improves the convergence of the solution.
In addition, some new concepts such as hybrid method, coevolution, parallel method and quantum evolution are proposed one after another. Some traditional evolutionary algorithms such as particle swarm optimization, ant colony algorithm and artificial immune algorithm are also introduced into the algorithm design of multi-objective optimization; Individual selection mechanism is no longer limited to the concept of Pareto domination. For example, Zitzler proposed a multi-objective optimization algorithm IBEA [37] based on evaluation index in 2004. IBEA uses binary performance index such as HV as the standard of individual evaluation; In 2007, Zhang et al. [38] proposed an evolutionary multi-objective optimization algorithm based on decomposition (MOEA/D), which decomposes the multi-objective optimization problem into a set of scalar subproblems by decomposition method, and then uses evolutionary algorithm to solve the subproblem [39] - [42].

Cloud workflow scheduling problem model
Usually, the workflow task of complex problem is modeled as a directed acyclic graph (DAG), in which each vertex represents the computing task of workflow, and each edge represents the data and execution dependency of workflow task. Given a workflow with n tasks, m virtual machine provided by a cloud service provider, and each virtual machine has k common types, then there are n m mk  scheduling schemes, and the scheduling algorithm should not only meet the task constraints of the workflow, but also consider the user's quality of service requirements, which are often conflicting, such as completion time and cost, this brings great challenges to cloud workflow scheduling, and multi-objective optimization of cloud workflow scheduling is the focus of this paper.

Task model
The task model of cloud workflow in design phase can be modeled as directed acyclic graph In order to describe the task model of the cloud workflow in more detail, here is a four-task workflow as an example, as shown in Fig.1. 1 S sends data of 1,2 1, 3 1,4 ,, W W W to the subsequent tasks 234 ,, S S S ， 234 ,, S S S start execution after obtaining the data. After the execution is completed, the data with the data volume of 2,5 3, 5 4,5 ,, W W W are sent to task 5 S , and it can only be executed when 5 S has all the required data. Fig. 1 Cloud workflow task model illustration

Resource model
In order to describe the resource model in the form of a directed acyclic graph more specifically, a virtual machine set containing four virtual machines is used as an example for illustration. In G V E  ，a reasonable scheduling plan will always make each task in the cloud workflow correspond to a suitable virtual machine, and the total number of tasks in the model adopted in this article is not greater than the total number of virtual machines, and it is assumed that each workflow task can only be allocated at the same time to a virtual machine resource, the parameters of the above graph mapping model are defined as follows:

Scheduling goals
As we all know,the service-based cloud computing must be based on improving the user experience, and the ET is the most influential target. And , the commercial cloud computing technology must consider the user's economic cost, the execution cost are the main components of user's economic cost, and they are also the goals that users care about. To sum up, ET and EC are selected as scheduling goals respectively, and establishes the cloud workflow scheduling model based on multi-objective optimization of formula (1). , , , (1) N points that make up a group： 12 , ,..., N x x x  ， where i x is the current optimal solution of subproblem i.

Adjustment of weight vector
In order to obtain the Pareto optimal solution uniformly distributed on the front of PF, a very direct method is to delete the solution in the crowded region and add the solution to the sparse region. However, for the problem of PF discontinuity, it is a challenge to identify the discontinuous region. For the corresponding subproblem of the discontinuous region, it will waste computing resources, because there is no Pareto optimal solution in the discontinuous region. This paper uses an external elite population to guide the change of subproblems, that is, to add subproblems to truly sparse regions instead of discontinuous regions, and to delete subproblems to dense regions. When an elite individual is located in a sparse region, it will be introduced into the evolutionary population, and a new weight vector will be generated and added to the corresponding subproblem. This strategy of elite population is helpful to add subproblems to the really sparse region, because when the evolutionary population evolves to a certain extent, the number of subproblems increases.Besides,this paper adopts the strategy based on weight vector adaptive adjustment to obtain the uniformly distributed Pareto optimization solution. The adjustment strategy can remove the redundant solutions of subproblems, which can help improve the computational efficiency of the algorithm.
In order to evaluate the sparse and crowded areas in the population, this paper adopts a crowded evaluation method using k nearest neighbors proposed by Deb  Based on the crowding evaluation method of k nearest neighbors, the following formula can be used to evaluate the sparseness of individuals in the population. The calculation formula is as follows: The specific adjustment algorithms are as follows: evol_pop ：The evolution group after deleting the crowded area sub-problem.
Step 1 Update the outer population of the current evolutionary population EP ： For each individual in the evolutionary population, if Step 2 Calculate the sparseness of each individual in the evolutionary population according to formula (4-3).
Step 3 Delete the congested sub-question.

Step 4 Termination condition:
If the number of deleted sub-questions does not reach the required number, delete the individual with the smallest sparse degree and then go to Step 2，Otherwise, output the remaining population as the evolutionary population after deleting the crowding sub-problem ' evol_pop . Step 3 Add a sub-problem to the sparse area.
Step 4 Termination condition: If the number of inserted sub-problems reaches nus , output the current population as '' evol_pop , otherwise go to Step 2.

Local search
In order to obtain better convergence results, this paper introduces a local search method in the framework of the MOEA/D algorithm, and uses a three-point quadratic interpolation approximation method as the local search method. This method is simple to calculate and suitable for local search. Operator, the local search algorithm is as follows:

MOEA/D algorithm based on local search and weight vector adjustment
Based on the above-mentioned weight vector adjustment strategy and local search method, this paper proposes an improved MOEA/D algorithm, denoted as the LS-MOEA/D algorithm. The algorithm solution process is as follows: Step 1 initialization Step 1.1：Set EP ， 0 gen  .
Step 1.2：Calculate the Euclidean distance of any two vectors, and select the nearest weight vector as its neighbor for each weight vector. Assume 12 ( ) { , ,..., }, 1,2,..., Is the nearest T weight vectors of i  .
Step 1.3：Initial population Step 2.2：For the newly generated y , the problem-related improvement strategy is used to generate ' y .
Step 2.3：If Step 2.6 Local search: Algorithm 4.3 is used for local search.
Step 3 Weight vector adjustment: If * max gen evolrate G  and gen mod wag = 0 ，adjust the weight vector: Step 3.1 For the newly generated population, the non-dominated solution sorting method based on proximity distance is used to update the EP .
Step 3.2 Algorithm 4.1 is used to delete the sub-problems of crowded areas.
Step 3.3 Algorithm 4.2 is used to add sub-problems to sparse regions.

Experimental results
For the results of 30 independent runs of the three algorithms on 12 workflows, the statistical results of the last generation PF are given in table 4.2, including the average value and variance of HV index, and the average value of HV index of the three algorithms is sorted. The larger the HV index is, the higher the ranking is. In order to test the superiority of LS-MOEA/D compared with MOEA/D and NSGA-II, and 30 times of HV index data for t-test, the significance level is 0.05. If LS-MOEA/D method is better than the comparison of MOEA/D or NSGA-II at the significance level of 0.05, it is indicated by "+" sign, If LS-MOEA/D method is significantly worse than the comparison MOEA / D or NSGA-II at a level of 0.05, it is indicated by '-' sign. If both of them are not significant, it is indicated by '=' sign. From the final statistics of HV, LS-MOEA/D algorithm proposed in this paper is superior to NSGA-II and MOEA / D in terms of time and execution cost, Not only the average performance of HV index is the best, but also the variance of HV index is small, which shows that the algorithm proposed in this paper is relatively stable.
For the scheduling of 12 kinds of actual workflows based on time and execution cost, the average HV trend chart of 30 independent runs is shown in Fig. 4. From the following HV trend, it can be observed that the proposed method LS-MOEA/D has faster convergence for most of the 12 kinds of actual workflows, and this trend becomes more obvious with the increase of the problem scale, It shows that the local search operator used in LS-MOEA/D can accelerate the convergence of the cloud workflow scheduling problem based on time and execution cost. Moreover, for the 0.5 times of the maximum function evaluation times in HV trend, LS-MOEA/D algorithm begins to use the weight vector adjustment method to adjust the population distribution, From the trend of HV, we can see that the HV of the method proposed in this paper starts to improve further. Combined with the POF of the last generation population in Fig. 5, we can see that compared with the other two methods, the method proposed in this paper has better distribution for most of the 12 actual workflows, This also shows that the weight vector adjustment method has a good adjustment effect on the distribution of cloud workflow scheduling problem solution. To sum up, the LS-MOEA/D proposed in this paper can quickly and effectively get a group of evenly distributed decisions when solving the cloud workflow scheduling problem based on time and execution cost. For cloud workflow scheduling with more tasks, the proposed algorithm can get a scheduling scheme with less time and less execution cost in the same time, It can provide good decision support for cloud workflow scheduling based on time and execution cost. Tab

Conclusion and future work
This paper establishes a cloud workflow scheduling model based on completion time and execution cost, then proposes a MOEA/D algorithm based on local search and weight vector adjustment, whose results show that in solving the cloud workflow scheduling model based on completion time and execution cost. Firstly, by deeply analyzing the cloud workflow scheduling process and its characteristics, the cloud workflow scheduling model based on completion time and execution cost is established. And followed by this, MOEA/D algorithm based on local search and weight vector adjustment is proposed and applied to cloud workflow scheduling problem. The experimental results show that the proposed algorithm has better effect than MOEA/D algorithm and NSGA-II algorithm for most actual workflow scheduling schemes. At the same time, a group of uniformly distributed Pareto dominant solutions are obtained, which can effectively provide decision support for cloud workflow scheduling problem.
In the future research of workflow scheduling problem, we can add the scheduling scheme of traditional heuristic method to the initialization population on the basis of this research. Meanwhile, considering that this paper adopts the random crossover and mutation evolutionary operator, we can use the heuristic crossover and mutation operator to accelerate the convergence of the algorithm in the future.

Ethics Approval and Consent to participate
Not applicable

Consent for publication
Not applicable

Availability of data and material
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Compering interests
The authors declare that they have no competing interests.

Funding
Key Scientific Research Project Plan of Henan Provincial Department of Education (20A120012).

Authors' contributions
Xue Li-Yao as the primary contributor, completed the analysis, experiments and paper writing.