Scientific Workflow Scheduling in Mobile Edge Computing Based on a Discrete Butterfly Optimization Algorithm

Mobile Edge Computing (MEC) is an interesting technology aimed at providing various processing and storage resources at the edge of the Internet of things (IoT) networks. However, MECs contain limited resources, and they should be managed effectively to improve resource utilization. Workflow scheduling is a process that tries to map the workflow tasks to the most proper set of computing resources regarding some objectives. For this purpose, this paper presents DBOA, a discrete version of the Butterfly Optimization Algorithm (BOA) that applies the Levy flight to improve its convergence speed and prevent the local optima problem. Then, DBOA is applied for DVFS-based data-intensive workflow scheduling and data placement in MEC environments. This scheme also employs the HEFT algorithm's task prioritization method to find the task execution order in the scientific workflows. For evaluating the performance of the proposed scheduling scheme, extensive simulations are conducted on various well-known scientific workflows with different sizes. The obtained experimental results indicate that this method can outperform other algorithms regarding energy consumption, data access overheads, etc.

: MEC architecture Three major energy models have been incorporated in MEC environment which are: the conventional energy model, the renewable energy model, and the dynamic voltage and frequency scaling (DVFS)based energy model. The latter is focused on this article. DVFS is an exciting method that can reduce the processor frequency to mitigate its power usage. However, reducing the processor frequency increases its execution time; thus, such reductions of frequency should be handled while considering deadlines and various QoS factors. Although numerous DVFS-based researches have been carried in different cloud computing platforms, very few MEC scheduling schemes are provided in the literature.
One of the essential issues in scientific workflow scheduling is the data placement strategy that can mitigate data transmission and data storage costs in the MECs. Both workflow scheduling and data placement problems are proven to be NP-hard problems [7,8]. However, only a few research has been conducted for optimal data placement and data-intensive workflow scheduling strategies in MEC environments. In this context, several heuristics and metaheuristic scheduling methods are widely studied by scheduling independent tasks. Still, fewer researches have been presented for workflow scheduling in the MEC context. Different optimization algorithms, such as PSO, genetic algorithm, etc., are used in the metaheuristic fog scheduling approaches. In the optimization algorithm context, the BOA algorithm is a population-based algorithm inspired by the butterflies' movement in their food foraging behaviors. It is used for solving continuous optimization problems that are firstly introduced in [9]. Furthermore, up to now, the BOA algorithm is successfully incorporated in different domains such as features selection [10,11], optimization algorithms [12][13][14][15][16], engineering optimization problems [17][18][19][20][21], enhancing WSNs [22], localization problems [23], artificial neural networks [24,25], etc.
However, despite the BOA's success in the continuous optimization context, it cannot be used without modifying the discrete problems. Thus, regarding the scheduling problem's discrete nature, there is a need for some improvement and changes in this algorithm to make it usable in discrete optimization problems. However, despite the BOA's success in the continuous optimization context, it cannot be used without modification. Thus, regarding the scheduling problem's discrete nature, there is a need for some improvements in this algorithm. This article presents DBOA, a discrete version of the BOA algorithm, and uses genetic algorithm operators such as mutation and crossover to solve discrete optimization problems. Afterward, a workflow scheduling and data placement framework based on the DBOA is provided for the MEC environments. For this purpose, a discrete encoding is used in this scheme, which will be processed by the DBOA to place the data replicas and the workflow tasks in the best available MEC sites and VMs, respectively. Furthermore, this scheme uses the HEFT scheduling algorithm's task prioritization method for finding the order of the execution of the task in the scientific workflows and then use the DBOA for assigning tasks for proper MEC virtual machines. The experiments on the various scientific workflows indicate that this proposed scheme can reduce the scheduling process's energy consumption and data access overheads.
The rest of this scheduling article is organized as follows: Section 2 gives an interesting review of the research background and previously published scheduling approaches on the MEC, and Section 3 discusses the BOA algorithm. Section 4 presents the proposed discrete version of the BOA algorithm, and Section 5 introduces the proposed workflow scheduling framework. Moreover, Section 6 reports the experimental results, and finally, Section 7 puts forward the concluding remarks and directions of future studies.

II. Research background
This section studies some of the primary schemes proposed for task and workflow scheduling and data placement in the different MEC environments.

A. SDN and NFV-based MEC schemes
This subsection discusses the MEC solutions provided using techniques such as big data, SDN (Software-Defined Networks), and NFV (Network Function Virtualization). For instance, in [26], Li et al. introduced a data migration-based heterogeneous task scheduling to reduce the task execution time and energy consumption of data centers. This scheme considers the data locality property, indicating the relationship between the tasks, data blocks, and servers. The authors evaluated their algorithm by comparing the cost of remote task execution and data migration. The experiment results show that the data migration method can decrease the task execution time.
The approach proposed in [27] introduces a computing migration solution for the next generation networks. Also, it presents a MEC strategy based on SDN and NFV technologies as well as multiattribute decision making and computing migration. The authors conducted their experiments on MATLAB and showed that the multi-attribute decision making based on SDN and NFV could select the appropriate MEC center, reduce the server response time, and improve the QoS experience.
In [28], the authors attempted to deal with resource allocation on NFV-enabled MECs, aiming to minimize the mobile services' latency and costs of the MECs. They proposed a resource allocation method consisting of a fast heuristic-based incremental allocation method to allocate resources dynamically based on operational cost. They exhibited that simulations that this scheme can allocate resources to guarantee applications' low latency requirements while saving cost compared to the fixed MECs.
The scheme proposed in [29] by Pham tries to optimize gateway placement and multihop routing in the NFV-enabled IoT(NIoT) and the service placement in the MEC and cloud layers. They developed some approximation algorithms such as SP1A, SP2A, GMA for dealing with a large NIoT system and optimizing routing, resource allocation for service functions, and gateways deployment. The authors indicated that their approximation algorithms could reduce the computation time and achieves nearoptimal results.
In [30], the authors introduced an open-source NFV/SDN-based MEC, which handles traffic management and application provisioning in the MECs. In this scheme, the MEC applications are managed as VNFs on the virtual environments provided using the Juju VNF Manager. In this scheme, an SDN controller is used to manage traffic on the MEC, and the control plane is used to find appropriate traffic management states. They evaluated their approach in two use-cases, in which, in the first case, the MEC caching is used to improve the user QoS and latency. In contrast, in the second case, the public safety communication method provides a communication method for rescue teams when the network core and a public data network are not available.
In [31], the authors introduced ADE 2 WiNFV, a new network system that applies the software-defined wireless network virtualization (WiNV) in the WiFi. This scheme can combine the software-defined WiNV with NFV-based MEC to handle application based end-to-end slicing in heterogeneous networks.
Development in the IoT domain enables remote monitoring in the E-Health systems. For instance, in [32], the authors tried to securely integrate big data processing with cloud M2M systems based on Remote Telemetry Units and propose an E-Health architecture built on Exalead CloudView, a searchbased application.
In [39], the authors presented a task scheduling approach in which computational tasks should be offloaded to the MEC servers. This task scheduling scheme tries to reduce power usage while meeting all tasks' deadlines and considering MDs' mobility. The resources of heterogeneous MEC servers and computation demands of tasks. They optimized computation tasks to minimize MDs and MEC servers' energy consumption and satisfy tasks' deadlines. The authors also analyzed their algorithm's performances and showed that it could reduce energy consumption.
The proposed approach in [40] studies the workflow scheduling in MEC by formulating the scheduling as an integer problem, aiming to handle different tasks while mitigating the makespan. This scheme uses a greedy method to construct the IGS (Improved Greedy Search) method to deal with constraints. Also, they proposed an improved heuristic algorithm that uses IGS for initialization and applies a twolayer scheme to improve initialized solutions. The authors showed the IGS achieves a high probability of generating feasible solutions, and ICH can better reduce the makespan.
In [41], Cao et al. presented UARP, an uncertainty-aware resource provisioning method for scheduling workflow in the software-defined network-based MECs. The UARP applies the NSGA-III optimization algorithm to elaborate on the workflow scheduling strategy. The authors showed that the UARP could reduce uncertainty, processing time, and energy consumption.
In [42], the authors proposed PIOTS, a pattern-identified online task scheduling mechanism for the networking infrastructure, where multitier MEC is provided to handle the offloaded tasks. They used the pattern of IoT tasks to train a self-organizing map, which represents the task pattern features in defined dimensions. Then, optimal task scheduling on MECs is conducted using SOM neurons with the Hungarian method. The authors showed that the PIOTS method could provide better service capability, computation performance, and task processing latency for handling IoT tasks.
The scheduling approach proposed in [43] provides a location-aware and proximity-aware multiple workflow scheduling on the MEC servers. This approach is capable of minimizing monetary costs with user-required workflow completion deadlines. It employs the discrete firefly algorithm for finding optimal scheduling solutions. For the validation purpose, the authors show that their approach can outperform other schemes based on a real-world dataset of edge resource locations and multiple scientific workflows.
In [44], Liu et al. proposed a Markov decision process-based method to schedule tasks regarding factors such as buffer queueing state, the local processing unit's execution state, and the transmission unit state. By analyzing each task's delay, and the average power consumption at the mobile device, a powerconstrained delay minimization problem is formulated. Also, a one-dimensional search algorithm to find the optimal task scheduling policy is proposed. Simulation results demonstrate the capability of the proposed optimal stochastic task scheduling policy in achieving a shorter average execution delay compared to the baseline policies.
The scheduling method in [45] tries to reduce the task execution delay in MEN(Mobile Edge Network) by considering the task properties, user mobility, and the network's constraints. It formalizes this scheduling problem as a constraint satisfaction problem and introduces a lightweight heuristic solution. Based on the conducted experiments, the authors showed that this scheme could reduce the task execution delay in MENs and mitigate the end-to-end delay for MEC tasks.
In [46], the authors tried to combine the optimal placement of data blocks and scheduling tasks to reduce the delay and response time and increase users' satisfaction in MEC. They considered the data blocks' popularity, storage capacity, and the MEC servers' replacement ratios for optimal data placement. This placement approach can prevent replacing the data blocks to reduce the bandwidth overhead. In this optimal task scheduling method, the containers are applied as resource units to use MEC servers' data storage and increase their performance. The authors conducted some experiments and exhibited that this scheduling algorithm's performance is better than the other approaches.
In [47], Lin et al. presented GA-DPSO, a discrete PSO algorithm with GA operators to optimize data transmission of scientific workflows' data on MEC and cloud computing. The GA's mutation and crossover operators are applied to prevent local optima problem, premature convergence of PSO, and reduce the data transmission time. This scheme considers factors such as bandwidth between DCs, the number of edge DCs, and their storage capacity in the scheduling and placement processes. The experimental results show that this scheme can reduce the data transmission time during workflow execution on MEC and cloud.
In the rest of this paper, the proposed hybrid optimization algorithm to solve workflow scheduling and data placement issues is presented.

BOA algorithm
In the BOA algorithm, each butterfly is a search agent whose fitness varies with the butterfly's movement from one location to another. Also, each butterfly participates in the optimization process by generating a fragrance based on its intensity. The fragrance can be emitted over the area, and butterflies can sense it, and by this method, they can share their data. When a butterfly can sense fragrance from other butterflies, it moves toward them (global search); otherwise, it will move randomly (local search). Then, every butterfly moves randomly toward the best butterfly emitting more fragrance. The BOA algorithm has three main phases, denoted as initialization, iteration, and final steps. In each run of BOA, the initialization step is executed. The searching step is performed iteratively, and in the last step, the algorithm is terminated when the best solution is found.
In the initialization phase, the algorithm defines the objective function and its solution space. After setting the values, the algorithm proceeds to create an initial population of butterflies for optimization. As the total number of butterflies remains unchanged during the BOA simulation, a fixed size memory is allocated to store their information. The positions of butterflies are randomly generated in the search space, with their fragrance and fitness values calculated and stored. This finishes the initialization phase, and the algorithm starts the iteration phase, which performs the search with the artificial butterflies created. In each iteration in the second phase of the BOA, all butterflies move to new positions in solution space, and their fitness values are computed and evaluated. Afterward, the butterflies generate fragrance using Equation 1.
In which parameter f is the fragrance value, c is the sensory modality, I denotes the stimulus intensity, and the power exponent depends on modality, which accounts for the varying degree of absorption. The parameters a and c are between [0, 1], affecting the algorithm's convergence speed. When a = 1, there will be no fragrance absorption, and the amount of fragrance emitted by a butterfly is sensed in the same capacity by the other butterflies. Thus, a butterfly emitting fragrance can be sensed from anywhere in the domain, and a single global optimum can be reached. On the other hand, if a = 0, the fragrance emitted by any butterfly cannot be sensed by the other butterflies. Another important parameter is c, which determines the speed of convergence and how the BOA algorithm behaves. In the BOA algorithm, butterflies can search for the mating partner and food at the global and local scales. In the global search phase, each butterfly moves toward the best butterfly by using Equation 2, in which is the i th butterfly in iteration t, g * is the best butterfly in the current iteration, and f i is the butterfly's fragrance, while r is a random number between 0 and 1. In this algorithm, the local search phase can be performed using Equation 3, in which r a random number between 0 and 1, while and are j th and k th butterflies in the same swarm. Figure 2 depicts the pseudo-code of the BOA algorithm. In this algorithm, probability p is used to switch between the global search and local search.

III. The proposed discrete BOA algorithm
This section presents the proposed DBOA algorithm, in which its contributions can be listed as follows: Produce initial solutions containing n butterflies Compute their Intensity Define c, a, p, sensor modality, power exponent, and modality. For i=1 to Max_iteration Compute the fragrance of the population Find the best butterfly For each butterfly do Produce a random number called r If r < p Then Go towards the best butterfly using Equation 2 Else Make a random move using Equation 3 End Next Update the power exponent a Next i Return the best solution Using chaotic maps to produce a discrete random initial population and create various random numbers applied in the DBOA algorithm.
Introducing new equations to convert the discrete population to another set of discrete solutions. For this purpose, the methods of global and local searches are changed.
Adding another step to the BOA algorithm to conduct Levy flight-based search and prevent local optima problem.
To provide a discrete version of the BOA algorithm, a method to create a discrete initial population is introduced. Then, the required operators to modify these discrete solutions and convert them into the other discrete solutions are designed. For producing a discrete population, Equation 4 is used to make initial discrete solutions in the DBOA algorithm, in which Chaotic_MAP() is one of the chaotic maps [48,49] that is used to produce random numbers between [0, 1]. The denotes the maximum number of the VMs in the MEC, Ndvfs is the number of DVFS levels. Also, is the number of data fragments, Ntask is the number of tasks, and Nmec is the number of MEC environments. In this section, genetic operators such as swap, crossover, and mutation are used to modify Equations 2 and 3 in the basic BOA algorithm. In the proposed algorithm, Equation 11 tries to compute the expression 2 × * , which is part of the Equation 2.
In Equation 11, the RandV1 is a real random number between [0,1] achieved by using the chaotic maps and ∆ * is the offset value which should be added to the best butterfly to achieve a solution near it. The ∆ * can be computed using equation 12, in which, the ∆ * has larger changes in the first rounds to benefit from more exploration rate, and the lowest changes will be made in the last rounds to profit from exploitation. The next expression to be computed is ( 2 × * − ) × , which is computed using Equation 13. This equation performs a two-point crossover to compute the required expression and uses a random variable RandV2, which is between 0 and 1. Another expression which should be changed is ( 2 × − ) × that is modified as Equation 18, in which the RandV6 is a random variable between [0, 1] and is the k th butterflies in the same swarm as . Afterward, for computing the + ( 2 × − ) × , Equation 19 is applied, which performs a multi-point crossover operation; in this equation, NC is a random number between [1,SolutionLength], and SolutionLength indicates the length of each solution. The proposed DBOA algorithm applies the Levy flight method to better search the problem space, increase its convergence speed, and prevent the local optima problem. The Levy flight can help for generating new butterflies in the problem space to achieve these objectives.
Generally, Levy flight is a random process with non-Gaussian distribution, proposed by the French mathematician Levy in the 1930s. Typically, for Levy flight distribution, Equation 20 can be used, in which s and µ are samples and transmission parameters. Also, in this equation, the parameter γ controls the scale of the Levy flight distribution. The definition of Levy flight distribution in the Fourier transform is expressed using Equation 21, in which is the scale parameter, and β is (0,2]. Furthermore, using the Mangata algorithm, the step length s can be computed using Equation 22, in which parameters v and u have Gaussian distribution and can be computed using Equations 23 and 24. Afterward, the stepsize parameter, which determines the search space step size, can be computed using Equation 25, and s depends on the problem dimension. Also, Equation 26 is added to the butterfly algorithm to perform a better local search and mitigate the local optima problem, in which, ( ) is the Levy flight of the jth solution in the population and can be computed using Equation 27, in which the step is calculated using Equation 28, ⊗ is an element-based multiplication operator, and random(size( )) is a random solution that can be computed by chaotic operators. Figure 3 depicts the pseudo-code of the DBOA algorithm.

IV.Workflow Scheduling Using DBOA Algorithm
This scheme's primary goal is to assign proper VMs to the workflow tasks to minimize energy consumption and makespan of the workflow scheduling process. This section explains how the proposed DBOA algorithm is used for efficient workflow scheduling in MEC environments. For this purpose, a formal formulation of the problem is provided, and then the proposed approach to deal with it is discussed.

A. DVFS
This subsection provides a formal definition of the energy model considered in this scheduling approach. Typically, the DVFS method reduces the CPU operational frequency and voltage, mitigating processors' energy consumption in the task execution. DVFS method can be incorporated in all computing systems, ranging from mobile systems to cloud computing DCs. However, reducing the CPU frequency mitigates its speed, and as a result, the QoS constraints of the application should be considered in using the DVFS. Thus, the main task of a DVFS-based scheduling framework is to determine the minimum necessary operating frequency according to each task's deadline.   Typically, when more frequency is used in DVFS-based scheduling [50,51], the task can be executed faster, but it takes longer to finish the task when a lower frequency level is utilized. Figure 4 shows the slack time, which can be used by DVFS-based scheduling. As shown in Equation 29, the slack time is the period between the deadline ( ) and the task finish time ℎ ( ). Figure 5: Data aggregation point in a workflow Figure 5 depicts some parts of a workflow that perform data aggregation and should be scheduled on the MEC environment. As shown in this figure, T1, T2, T3, T4, and T5 should be executed and completed before T6 starts to be executed. However, in some cases, T1 to T5 may be heterogeneous and have different execution times. For example, when the T1 is longer than other tasks, the VMs which run T2 to T5 will be idle after executing their task. Using the DVFS, the frequency of the VM1 to VM5's CPU can be mitigated, and the execution speed for extending the execution time of these tasks can be reduced while meeting the deadline determined by the task T1.
Also, the deadlines may be specified by the user or by the scheduling algorithm itself. The primary goal of DVFS-based scheduling is to detect the least possible frequency for each task's VM regarding its deadline. In this case, while the deadline is completed, the least power is consumed for tasks and workflow executions. In this scheme, it is assumed that power consumption consists of static and dynamic and energy consumptions. Typically, static energy consumption is ignored since dynamic energy consumption is more time-consuming and expensive.
Typically, in the idle periods of VMs, their voltage should be set to the lowest level to save energy. The energy consumption of idle periods for all available processors can be defined using Equation 30, in which and are the minimum voltage and frequency of the j th VM, and is considered to be the idle time of the j th VM. The or idle time power consumption is computed using Equation 31. By using Equation 32, the energy consumption of the VMs in their busy periods can be computed, in which K is the constant of dynamic power consumption and depends on the capacities of the devices. Also, 2 is the s th level's voltage in the j th VM, and is the frequency of processor j th VM, at the s th voltage level, and is the execution time of the i th task on the j th VM. According to these equations, the total energy consumption required for workflow scheduling in a MEC environment can be computed using Equation 33.

Problem formulation
This section provides a formal definition of the considered workflow model and the environment incorporated to run workflows. In this scheme, a set of MECs is considered, which are denoted as MEC={MEC 1  However, as shown in Equation 34, the size of the total storage in MECs are assumed to be far less than the size of the dataset, which is denoted by ( ). Each submitted workflow to the MEC environment is considered a directed acyclic graph (DAG). In this scheme, the set of workflows submitted to the MEC is indicated by W={W 1 , W 2 , W 3 , …}, in which each of them contains some tasks, W i ={T 1 , T 2 , T 3 , …}. Moreover, each workflow is considered a DAG, in which each node represents a task, and edges specify the data or control dependencies between tasks. Also, E ij defines the edge between the T i and T j , when T ≠ T . This indicates that the child task only can be executed after all of its parent tasks are fully executed, and their output data have been received. Control dependencies only transfer the configuration parameters needed to run the child task and transfer fewer data than data dependencies.
However, the transferred data in the data dependencies are used as input data to the child process. ( , ) specifies the speed of the j th VM using k th DVFS level, and _ ( ) specifies the length of the task in terms of the millions of instructions per second. Also, the average time to execute the task T i on the j th VM can be computed using Equation 38, in which Ndvfs is the number of DVFS levels in the VM. Furthermore, the task T i 's average execution time on all VMs can be computed using Equation 37. In this scheme, the earliest start time of each task can be computed using Equation 38, where ( ) is the time which j th VM becomes available to execute the requested task. Also, the communication time of the data transfer between the T i and T j can be computed using Equation 39, in which ℎ( ( ), ( )) is the bandwidth between two VMs which execute the T i and T j tasks and ( , ) denotes the amount of data that should be transferred between these tasks. Furthermore, in this scheme, the finish time of each task can be computed using Equation 40, in which denotes the deadline of the i th workflow. Moreover, the makespan of the workflow w i can be calculated using Equation 41.

C. Finding task order
List-based scheduling methods first compute the workflow tasks' priorities in a DAG and rank them in non-increasing order. HEFT is one of the popular list scheduling methods provided in the literature [52]. For finding the order of task execution in the scientific workflows, this scheme benefits from the task prioritization method provided in the HEFT or Heterogeneous Earliest Finish Time Algorithm. HEFT is a heuristic scheduling method for inter-dependent tasks onto a network of heterogeneous workers taking communication time into account. For inputs, HEFT takes a set of tasks, represented as a directed acyclic graph, a collection of workers, the times to execute each worker, and the times to communicate the results from each job to its children between each pair of workers. It descends from list scheduling algorithms. HEFT algorithm first determines the priorities of the tasks and then assigns tasks to the workers. The rank of each task indicates its execution turn in the workflow scheduling. Thus, tasks with the lower rank will be executed first, and tasks with the higher ranks will be executed later and will have the lowest priority for execution. Equation 48 indicates how the rank can be computed for each workflow task, where T i is the i th task in the workflow and ( _ ( )) is the average execution cost of the i th task. Also, Successor( ) specifies the successor tasks of the T i and _ ( , ) specifies the communication cost between the T i and T j .

DBOA-based workflow scheduling
The pseudo-code of the DVFS-based workflow scheduling using DBOA is shown in Figure 7. As shown in this algorithm first, the required parameters for running the algorithm are tuned first. The HEFT algorithm's task prioritization method is used to find the workflow's task execution order. Afterward, the DBOA algorithm is used to find the best possible location for the tasks and data replicas. Then, tasks are allocated to the required VM for execution by the order and setting specified in the best solution.

Simulation Results
This section presents the results of the experiments conducted to evaluate the performance of the proposed scheduling framework. MEC has several interesting simulation frameworks, such as iFogSim, FogNetSim++, MobFogSim, and EdgeCloudSim. This scheduling scheme uses the iFogSim simulator  Figure 7: DVFS-based workflow scheduling using DBOA to conduct the required simulations that are an efficient open-source tool for modeling and simulating resource management in IoT and MEC networks. The iFogSim simulator works with the CloudSim, another open-source java-based simulator simulating cloud computing environments and managing its resources. The iFogSim simulator applies the CloudSim to deal with the events among MEC components. The proposed workflow scheduling algorithm is evaluated on the Epigenomics, SIPHT, and Montage, LIGO scientific workflows. Figure 8 depicts the structure of the five well-known scientific workflows.
Epigenomics is a data processing workflow that indicates the execution of the genome sequencing operations and is applied by the Epigenome Center. In this workflow, the DNA sequence data is produced by a genetic analysis system, split into many chunks that can be processed in parallel. Furthermore, each data chunk is converted to a format required by the sequence aligner. Then noisy sequences are filtered. Besides, a map to detect the density of sequence at every genome position is created.
SIPHT is a program that uses a workflow for automating the search for sRNA encoding-genes for all of the bacterial replicons. It is created for a bioinformatics project at Harvard University in searching for untranslated RNAs for regulating processes like virulence or secretion in bacteria.
The NASAIPAC creates Montage as an open-source toolkit to generate custom mosaics of the sky, and it is presented as a workflow that can be run in various Grid, cloud, and even MEC environments.
LIGO workflow is applied to produce and evaluate the gravitational waveforms for the compact binary systems.
The earthquake center of southern California typically employs CyberShake workflow to analyze earthquake effects.
For evaluation of the scientific workflows, in [34], the authors provided a workflow generator tool that creates arbitrary size scientific workflows XML format that contains data about the task size and the amount of communication between dependent tasks.  For conducting the required experiments and evaluate the proposed algorithm, as shown in Table 1, three workflows with 50, 100, and 1000 tasks are utilized for the LIGO, CyberShake, and Montage scientific workflows. Regarding the Epigenomics, three workflows with 46, 100, and 997 tasks are used, which the workflow generator provides their DAX. Besides, for SIPHT scientific workflow, three workflows with 60, 100, and 1000 tasks are utilized. Furthermore, the depicted results are averaged from 40 different runs of the investigated optimization algorithms.

D. Energy consumption improvements
This subsection presents the improvements which have been achieved in the energy consumption context. In these experiments, we use two sets of DVFS levels in the simulation scenarios. In these experiments, the first case applies three DVFS levels, specified in Table 2, and the second case applies five DVFS levels, as shown in Table 3.   To indicate the improvements achieved by our scheme, we consider the ratio of the DVFS-based workflow scheduling energy consumption to the non-DVFS based workflow scheduling using different optimization algorithms. Table 4 exhibits the percentage of the energy consumption ratio in workflow scheduling conducted using three DVFS levels and by algorithms such as DBOA, ACO, PSO(particle Swarm Optimization), DE (Differential Evolution), and two variants of the DE presented in [53] and [54]. In this table, the simulation results are provided for three different size scientific workflows. As can be concluded from this table, it can better mitigate the energy required to schedule different size workflows. The reason for this improvement is that our proposed discrete optimization algorithm is more adaptable to the scheduling problem and can achieve better VMs and better DVFS settings for workflow scheduling. At last, from these experiments, the following items can be concluded about the amount of energy consumption reduction in DVFS-based scheduling: Workflow scheduling deadline.
Number of VMs applied in the scheduling.
Workflow structure.
Deadline distribution policy is applied to distribute the slack time among different workflow levels.
The number of rounds in which the optimization algorithms are executed.
Storage considered for each MEC environment.
The bandwidth of the broker, MECs, and VMs.
The number of DVFS levels considered for VMs.
Our proposed scheduling scheme attempts to reduce the energy consumption of the MEC's in scheduling the data-intensive workflows by minimizing the number of VMs, keeping interacting tasks to the same VM as much as possible, and placing proper data fragments at the best MECs. Table 5 shows the total energy consumption ratio when five DVFS levels are applied in the workflow scheduling process. As shown in this table, better results can be achieved by incorporating more DVFS levels, regardless of the optimization algorithm used in the scheduling. However, the proposed DBOA algorithm outperforms other algorithms such as ACO, PSO, DE, and its two variants, even with more DVFS levels. As shown in Table 1, 60 particles or solutions are considered for each optimization algorithm in these experiments. Besides, algorithms are run for a maximum of 900 rounds.

E. Communication overhead
Communication overhead is one of the important metrics that most scheduling schemes try to reduce by better assignment of the workflow tasks to the MEC's VMs. As outlined in the previous section, this scheme can further reduce the communication overheads by placing proper data fragments in the locations they will be used. In this subsection, we try to indicate the effectiveness of the proposed discrete optimization algorithm (DBOA) to reduce the communication overheads of the data-intensive workflows. For this purpose, the percentage of communication overheads ratio is computed when data placement is used to the scheduling without data placement. In these experiments, we compare the proposed scheme's results with other optimization algorithms such as ACO, PSO, DE, and two variants of the DE presented in [53] and [54].  Figure 12 indicates the percentage of communication overheads ratio in the SIPTH scientific workflows with 60, 100, and 1000 tasks. Finally, Figure 13 indicates the percentage of communication overheads ratio in the Epigenomics scientific workflows with 46, 100, and 997 tasks. As shown in this figure, data placement with the proposed DBOA discrete optimization algorithm incurs much less data access overheads than other optimization algorithms such as BOA, ALO, DE, and no data placement policy on MECs. The reasons for these improvements are the discretization of the BOA algorithm, application of Levy flightbased search in the DBOA algorithm, and chaotic variables applied to produce better random numbers.
In these scenarios, two places for the data are considered: cloud data center storages and MEC storages, in which MEC storage is limited. This scheme tries to put specific data fragments in these storages to mitigate the data access overhead of the scheduling process in MECs. However, for data that are not in the MEC storage, cloud storage should be accessed inevitably. Since the DBOA discrete optimization algorithm can place data replicas in better storage locations, it incurs much less data access overheads than other optimization algorithms such as ACO, PSO, DE, and DE variants.

F. Makespan
Almost all of the scheduling schemes minimize the makespan somehow. This subsection indicates how much our scheme successfully reduced the scheduling makespan compared to other optimization algorithms. Figures 14 to 18 show the ratio of makespan to the deadline in our proposed algorithm (DBOA), ACO, PSO, DE, and two variants of the DE presented in [53] and [54] when applied for scheduling of five scientific workflows discussed in the previous subsections. These experiments are conducted on the Epigenomics workflows with 997 tasks and LIGO, Montage, SIPHT, CyberShake workflows with 1000 tasks. As specified in Table 6, for all algorithms, 60 solutions are considered as their population, and the exhibited results are the average of 40 different runs of the studied optimization algorithms. Since all applied optimization algorithms use the same initial population, their fitness is the same at first. Then, the algorithms are run for 900 rounds, and as shown in these figures, our proposed scheduling scheme can quickly converge to a near-optimal solution. Generally, the achieved results indicate that some of the algorithms cannot reach the proposed DBOA algorithm's result.
In contrast, some others achieve the same result with our scheme, but our algorithm can converge to the final result quicker. The reason for these improvements is the modifications that have been made to the basic BOA algorithm. By changing the BOA algorithm, the proposed DBOA algorithm can better handle discrete solutions and explores the problem space. Using the Levy flight method can prevent the local optima problem and find near-optimal solutions more quickly.

V. Conclusion
Mobile Edge Computing (MEC) is an intermediate layer between the IoT and cloud computing data centers. Its resource is located at the edge of the IoT network to be used by them. Data-intensive workflows need to access several large datasets found in different cloud DCs. MEC can provide a costeffective and low-latency computing model to deploy and run data-intensive workflows. The MECs also support storing dataset replicas, but optimizing data and replica placement and the execution of scientific workflows to mitigate data transmission delays and the energy consumption is challenging. DVFS or Dynamic voltage and frequency scaling is an exciting and useful energy management method that can benefit the MEC virtual resources to mitigate the processors' frequency and voltage to reduce their energy consumption. Designing an efficient data placement method to decrease the data access costs in data-intensive scientific workflows is highly important in the MEC environment.
This article presented DBOA, an improved and discrete version of the butterfly optimization algorithm (BOA) to solve the BOA's local optima problem and improve its convergence speed. The DBOA optimization algorithm is then used for DVFS-based data placement and data-intensive workflow scheduling in MECs to reduce VMs' energy consumption while meeting the deadlines. By allocating proper VMs for the workflow tasks and placing data fragments in the proper MEC environments, this scheme tries to minimize the data access overheads regarding the storage constraints of the MECs. For verifying the effectiveness of the proposed scheduling approach, extensive simulations are carried out on the five well-known scientific workflows with four different sizes. The results exhibited that the proposed approach can outperform scheduling solutions created using other optimization algorithms such as ACO, PSO, DE, and two variants of the DE, regarding metrics such as energy consumption, communication overheads, and makespan.
In future studies, we try to deal with the MEC environment's scheduling problem using the multiobjective DBOA algorithm. Furthermore, enhancing MEC environments with multi-cloud environments can be further investigated in future studies. Also, considering factors such as reliability and the effect of DDoS attacks on the MEC resources should be examined in the subsequent analyses.