MAA: multi-objective artificial algae algorithm for workflow scheduling in heterogeneous fog-cloud environment

With the development of current computing technology, workflow applications have become more important in a variety of fields, including research, education, health care, and scientific experimentation. A group of tasks with complicated dependency relationships constitute the workflow applications. It can be difficult to create an acceptable execution sequence while maintaining precedence constraints. Workflow scheduling algorithms (WSA) are gaining more attention from researchers as a real-time concern. Even though a variety of research perspectives have been demonstrated for WSAs, it remains challenging to develop a single coherent algorithm that simultaneously meets a variety of criteria. There is very less research available on WSA in the heterogeneous computing system. Classical scheduling techniques, evolutionary optimisation algorithms, and other methodologies are the available solution to this problem. The workflow scheduling problem is regarded as NP-complete. This problem is constrained by various factors, such as Quality of Service, interdependence between tasks, and user deadlines. In this paper, an efficient meta-heuristic approach named Multi-objective Artificial Algae (MAA) algorithm is presented for scheduling scientific workflows in a hierarchical fog-cloud environment. In the first phase, the algorithm pre-processes scientific workflow and prepares two task lists. In order to speed up execution, bottleneck tasks are executed with high priority. The MAA algorithm is used to schedule tasks in the following stage to reduce execution times, energy consumption and overall costs. In order to effectively use fog resources, the algorithm also utilises the weighted sum-based multi-objective function. The proposed approach is evaluated using five benchmark scientific workflow datasets. To verify the performance, the proposed algorithm's results are compared to those of conventional and specialised WSAs. In comparison to previous methodologies, the average results demonstrate significant improvements of about 43% in execution time, 28% in energy consumption and 10% in total cost without any trade-offs.


Introduction
A popular term in computer science for decades, cloud computing (CC) includes innovations including complexity abstraction and concealing, resource visualisation, and effective utilisation of distributed resources. GoGrid, Google App Engine, Microsoft Azure, and Amazon EC2 are a few popular CC platforms [1]. Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) are three categories into which CC services can be divided [2]. The calculation needs of real-time latency-sensible applications of highly dispersed Internet of things (IoT) systems are protected by a new computation pattern termed "fog computing" (FC) [3]. In the cloud-to-things continuity, FC, reservations, management and networking are employed to distribute these services among end users [4]. The proof and development of the previous embedded systems are based on the FC platform that brings the cloud services onto the network edge [5]. We also see the challenge of IoT data processing as an interesting approach [6]. This system supports different applications, including IoT, wireless fifth-generation (5G) and artificial intelligence embedded [4]. FC is good for reducing latency and cloud pricing, whereas CC is only useful to satisfy the rising demands of computer-intensive offloading programs [7]. Fog's primary characteristics are low latency and consciousness of the location, extensive geographical distribution, mobility, multiple nodes, a prominent role in the availability of wireless, a substantial presence of flowing programmes and realtime differences [8].
Workflows are modelled as Directed Acyclic Graphs (DAGs) with n tasks, where the vertices correspond to the tasks and the edges to their dependencies. Scientific workflows contain a huge number of tasks. These jobs also contain dependencies, which makes it challenging for the scheduler to organise tasks and use cloud resources effectively. The scheduler serves as a bridge between workflow tasks and cloud resources. Scheduling workflows in a cloud context is regarded as NP-complete. The efficiency of scheduling algorithms is affected by a variety of variables, including quality of service (QoS), user deadlines, financial cost, execution time, data privacy and security, etc. The vast computational resources required by workflow scheduling algorithms (WSA) make them appropriate for heterogeneous computing systems (HCS). Workflow tasks can be carried out using scalable and affordable 1 3 FC infrastructure. Workflow tasks vary in execution duration and computing demand [9]. While certain workflows require a lot of computing power, others may require a lot of memory and bandwidth.

Motivation
The use of modern computing technologies is constantly developing in a variety of fields, including research, education, healthcare, and others. Due to this, the number of dynamic applications-also referred to as workflow applications (WAs) is growing steadily. Modern high-performance HCSs service the diverse spectrum of WAs. Different forms of HCSs, such as powerful cloud systems, pervasive computing systems, high-performance grid systems, and fog/edge computing, aid in the execution of heterogeneous WAs [10]. WSAs are used to analyse and run the WAs. In general, the WSA primarily focuses on executing WA as soon as feasible.
Other key performance aspects, like energy usage, monetary cost, load balancing, resource utilisation, reliability, etc., suffer as a result of the decrease in the total time of completion. The WSA is further made more difficult by the continued existence of task dependency limitations [11].
More crucially, WSA uses a vast array of resources, including servers, data centres, network interfaces, and very high-speed CPUs, to execute WAs. The resources that are deployed during an execution use a lot of energy and generate a lot of heat. It should be emphasised that increased energy use directly affects financial aspects during the cooling process of resources like data centres, servers, etc. The greater amount of heat from the resources also emits too much CO 2 . The sustainability of the environment is also a major problem. Additionally, it has been noted that other performance considerations are not adequately taken into account while WAs are being executed in order to reduce makespan. Due to all of these problems, it gets harder and harder as time goes on and technology advances [26].
Therefore, there is a need to create a WSA for heterogeneous FCE that takes various goals into account simultaneously. We are inspired to suggest a metaheuristic-based WSA for heterogeneous FCE employing several conflicting goals, including makespan, energy consumption, and total cost minimisation. In order to tackle the problem, a variety of techniques, from heuristics to meta-heuristics, have been researched. The heuristic-based strategies aid in producing an acceptable answer quickly but do not properly explore the solution space. For its capacity to locate a nearly optimal solution for NP-complete problems, a number of metaheuristic methods have received a great deal of attention.

Contributions
The purpose of this research is to propose a meta-heuristic algorithm for scheduling and distributing tasks across fog nodes in order to maximise the usage of network resources by users [6]. The following are our major contributions to this paper: 1 3 MAA: multi-objective artificial algae algorithm for workflow… A) We have analysed a heterogeneous fog-cloud-based hierarchical architecture where we can schedule workflow tasks that are both time-sensitive and computationally demanding. To reduce the amount of time needed for data transmission, real-time operations are performed in the fog layer, and computationally expensive operations are processed in the cloud layer. The task scheduling procedure is carried out away from the end-user in the fog and cloud layer. B) An efficient meta-heuristic approach, namely Multi-objective Artificial Algae (MAA) Algorithm, is proposed for workflow scheduling in Heterogeneous Fog Cloud Environment (HFCE). It includes a pre-processing stage to segregate the tasks into separate task-lists based on the number of offspring. It is challenging for the scheduler to create an ideal plan since workflow tasks have interdependence. The dependencies are the focus of the proposed approach in order to achieve better scheduling. C) The proposed workflow scheduling approach is evaluated on five realistic scientific workflow datasets, namely Montage, CyberShake, Epigenomics, LIGO, and SIPHT, to optimise three performance objectives, namely total execution time (make-span), energy consumption and total cost of computation and communication. D) We conducted thorough experiments and comparisons with the various other population-based meta-heuristic techniques, such as ACO, PSO, GWO and one hybrid algorithm like HPSOGWO, to show the efficacy of MAA for workflow scheduling problems.
The remaining part of the paper is structured as follows: Sect. 2 examines the classification of task scheduling strategies with related work, and Sect. 3 explains the system model and framework. In Sect. 4, we present our proposed workflow scheduling method. The simulation results are discussed in Sect. 5, and the conclusions are presented in Sect. 6. Table 1 contains the list of acronyms used in the research.

Related work
Extensive literature study is available on workflow scheduling with CC settings, while literature with FC settings is very rare. In order to overcome this issue, some academics have employed conventional scheduling algorithms, while others have concentrated on optimisation techniques. There are single-objective, bi-objective, and multi-objective solutions. The majority of researchers have focused on makespan, cost, load balancing, etc. Heuristics or meta-heuristics are two possible approaches. The literature [12,13] uses heuristics like min-min, max-min, etc. or combines these methods with meta-heuristics. The ability to exploit and explore is a characteristic of meta-heuristic algorithms. Exploitation indicates how effective the algorithm is in conducting local searches. Exploring indicates that the approach may be used to identify the first answer, which may be close to the overall optimal value. An effective meta-heuristic algorithm strikes a balance between exploitation and exploration potential. Despite having a great capacity for exploration, particle Reliability and fault tolerance with the constraints like deadlines and budget can also be considered swarm optimisation has a limited capacity for exploitation. Table 2 describes the literature analysis done during the research. The concept of workflow, which divides a complicated scientific workflow into manageable activities, is quite well-known among scientists [14]. These activities may be carried out via distributed and parallel computing, such as FC. In FC, workflow scheduling is a well-known NP-hard issue. In order to improve the efficiency of fog computing, a number of list-based algorithms have been proposed for task scheduling, including first come, first served (FCFS), round-robin (RR), shortest job first (SJF), minimal completion time (MCT), etc. The main principle of list-based heuristics is to prioritise each task and allocate the resources at hand in accordance with the preferences specified. In order to accommodate systems with heterogeneous multiprocessors, the Heterogeneous Earliest Finish Time (HEFT) was developed. When compared to the current HEFT and Critical Path on a Processor (CPOP), Dubey et al. [15] suggested improved version of HEFT can shorten the make-span.
The Min-Min method maps the job with the shortest possible execution time to the device with the shortest possible completion time [16]. A related technique is the Max-Min algorithm, which assigns the task with the longest possible execution time to the device with the shortest possible completion time. The tasks are not assigned to the resources as they enter when using the offline scheduling methods Min-Min and Max-Min, which operate in batch mode [17]. The problem with Min-Min and Max-Min algorithms is that they suffer from starvation [18]. In addition, they solely take time into account when evaluating resources. The user viewpoints are the exclusive focus of list-based heuristics; resource quality factors are given less attention. These aforementioned standard heuristics methods are straightforward, simple to use, and quick, but meta-heuristic techniques can provide a solution that is almost optimal for complicated issues like workflow scheduling and can further increase the quality of the solution [19]. Additionally, heuristic algorithms are problem-dependent approaches, but meta-heuristic methods are problem-independent strategies. Because they are straightforward and offer powerful searching capabilities in a short amount of time and money, meta-heuristic algorithms are frequently employed. To overcome the workflow problem, many meta-heuristic techniques were suggested [20][21][22][23][24][25][26][27][28]. Ant Colony Optimization (ACO), Particle Swarm Optimisation, and Genetic Algorithm (GA) are a few examples of common algorithms (PSO).
A genetic evolution-based approach for solving the job scheduling problem was proposed by Dasgupta et al. [1]. In comparison to Round Robing (RR), First Come First Serve (FCFS), and the local search method Stochastic Hill Climbing (SHC), the test's results demonstrate greater performance in terms of make-span. It has been claimed that the GA algorithm takes a long time to arrive at the best solutions [20]. In order to reduce the make-span, Tawfeek et al. [21] applied the Ant Colony Optimization (ACO) task scheduling method. They discovered that ACO outperformed FCFS and RR. Although ACO is a fairly complicated algorithm, it takes some time to achieve the best results. Also, task dependencies are not taken into consideration. In [22], the author proposed a hybrid of heuristic and metaheuristic techniques for the scheduling of tasks. The bi-objective optimisation approach is presented to minimise the make-span and cost factors. The results of exhaustive simulations have established the significance of the presented Bi-objective HEFT FireWorks Algorithm (BH-FWA). One of the wellknown meta-heuristic methods is particle swarm optimisation (PSO). It converges quickly and is easy to implement. Despite its benefits, it is unable to escape the local optimum for complicated problems [23]. Similarly, Wu et al. [24] proposed the Revised Discrete Particle Swarm Optimisation (RDPSO) algorithm to schedule the workflow applications over the various resources. The studies were carried out using a variety of work-related applications with various data transfer and processing costs. The results demonstrated that the proposed RDPSO algorithm outperforms the conventional PSO and Best Resource Selection (BRS) algorithms in terms of cost reduction and make-span. The suggested approach, however, is inefficient for vast search spaces. Particle Swarm Optimisation (PSO) was employed by Pandey et al. [25] for scheduling WAs in a CC environment. Although PSO is a quick optimisation technique, however, it has drawbacks such as early convergence and local optimal solution entrapment [26]. Recently, a meta-heuristic method called Grey Wolf Optimization (GWO) that imitates the leadership structure of grey wolves was suggested [16]. According to Mirjalili et al. [27], the GWO has a good blend of exploitation and exploratory abilities. The enhanced version of GWO was suggested by Khalil and Babamir [28] as a solution to the workflow issue.
For complicated issues like scientific process scheduling, a single meta-heuristic could not yield the best solution and instead become trapped in the local best solution. Choosing one or more meta-heuristic algorithms and combining them according to their strongest traits is a superior strategy. Hybrid algorithms have gained popularity during the past couple of decades. Only existing algorithms that are either hybrids of PSO or GWO are discussed here. The GA-PSO algorithm, which Manasrah and Ali [29] devised, is a combination of the Genetic Algorithm and Particle Swarm Optimisation. Comparing the hybrid GA-PSO method to GA, PSO, and other algorithms, the overall execution time is decreased. Another hybrid method, a combination of the PSO and gravity search algorithm (GSA), has been published in [30]. In terms of cost, this hybrid method outperforms several non-heuristics, PSO, and GSA algorithms. Bouzary and Frank [31] suggested a combination of the Grey Wolf Optimization (GWO) and Genetic Approach (GA), and they discovered that the proposed algorithm outperformed the GWO and GA in terms of cost. When compared to flower pollination with genetic algorithms, Khurana and Singh's [32] hybrid flower pollination algorithm with GWO gives more effective outcomes while taking less time and money. Although the aforementioned hybrid algorithms have advantages, one can wonder why the proposed technique was chosen. The free lunch theory [33] holds the key to the solution. According to the free lunch theory, a single method is ineffective for handling all optimisation issues. It might perform better for a specific optimisation issue. However, it might not perform well for the other optimisation problems. Optimisation issues don't have a single, universal answer.

Formulation of workflow scheduling problem and proposed system model
In this section, we first formulate the system model and then discuss the objective function used during the designing of the proposed algorithm. The standard Artificial Algae Algorithm (AAA) employed in the proposed approach is also described in this section.

System model
DAGs are used to represent workflows. Dependencies among tasks in a workflow are defined as G = (V, E), where V denotes the vertices and stands for the tasks in the workflow, and E denotes the edges and stands for dependencies between tasks. Prior to beginning the execution of any child task, the parent task must be completed [34]. Workflow tasks include several attributes, including execution time, data to be provided or received, and parent-child task relationships. Tasks in the workflow may be computation-intensive or data-intensive, and sometimes both. Several heterogeneous Fog Nodes (FDs), Cloud Datacenters (CDs) and End Devices (EDs) make up the hierarchical FCE model. Numerous physical machines make up each FD/CD. Resources for computation and storage make up the physical machines. Each resource has the capability for processing, storage, memory, and Fig. 1 Heterogeneous fog-cloud environment system model [50] bandwidth. Resources are shown as Virtual Machines (VMs) in FCE. The bandwidth, computing power, and cost of storage per unit of time for VMs are all fixed. Any of the resources that are accessible can execute workflow tasks scheduled for them. The quantity and strength of each Processing Element (PE) are used to calculate the processing capability of the VM. Figure 1 demonstrates the Heterogeneous Fog-Cloud Environment System Model employed in the research taken from our previous study [50].

Objective function formulation
The objective function described the desired outcomes to be optimised with the proposed scheduling method [35]. An objective function can be made multiobjective in two ways: priori and posteriori [36]. Each associated aim is given a weight based on its importance in the priori method, resulting in a single-valued function, also known as fitness value. In contrast, the posteriori technique uncovers the collection of non-dominant options. To design the fitness function, we use the priori technique. Table 3 defines the list of mathematical notations used in the research. Make-span (MS W ), Total Cost (TC W ) and Energy Consumption (EC W ) are the components of the fitness function. The considered fitness function can be described mathematically using Eq. (1).
where M denotes the mapping of workflow's n tasks to the m available VMs in EDs, FDs, and CDs, MS W is for the total execution time of the workflow, EC W stands for energy consumption of the workflow, and TC W stands for the total cost of the workflow which consists of the cost of computation and communication.
The weights allocated to each aim are α1, α2 and α3. We have used a weight of 0.33 to equate the values of α1, α2 and α3. The following sub-sections provide a comprehensive description of total execution time (make-span), total energy consumption and total cost:

Total execution time (make-span)
The total execution time (make-span) is the time it takes for tasks in a workflow to complete. To put it another way, make-span is the amount of time it takes to complete all of the tasks assigned to various virtual machines [27]. The workflow's make-span can be calculated mathematically using Eq. 2.
where CTi is the completion time of task Ti in the workflow. The entire time spent completing the tasks is the completion time. When tasks are interdependent, the time spent waiting for previous tasks is taken into account. Equation 3 represents the completion time CTi. (1) MAA: multi-objective artificial algae algorithm for workflow…  The algal colony with the highest starvation value during time t

A p
The adaptation parameter The ith algal cell's x, y, and z coordinates at time t Δ The shear force t x i The ith algal cell's friction surface area As indicated in Eq. 4, the waiting time of task Ti is equal to the total completion time of all its predecessor tasks. Equation 5 is used to compute the execution time of task Ti on virtual machine VM j , where SZ Task is the task's size in a million instructions (MI), Num(PEj) is the number of cores assigned to the virtual machine VM j , and PE Unit is the size of each core in MIPS.

Total cost
Because FC is based on a pay-as-you-go billing structure [37], the cost is an important goal to minimise. The majority of fog service providers charge for a fixed period of time based on the fog services used. Execution, connection, and storage costs are all included in the cost of FC. The total execution cost of a VM is the product of the VM's cost per unit interval and the time it takes to complete tasks on that VM. The total execution cost (TC) of workflow W is calculated using Eq. 6 [38].
where CO j is the cost of a type-i VM instance in the CD/FD per unit of time. τ is the amount of time that the user uses the resources. ET i,j is the time it takes for type-j VM instance to complete task T i .

Energy consumption
The energy consumption is taken from [49], which contains active energy components denoted by E active and idle energy components denoted E idle . The E active is related to the energy used while performing a task, whereas the E idle , is referred to the energy consumed by idle resources. The term "active energy" can be determined using where α is the constant, f i represents the frequency, and v i represents the supply voltage for the resource on which task i is being performed. When idle, the resource enters a sleep state with a low power supply and less relative frequency. As a result, [49] is used to calculate the energy consumed over this period: where IDLE jk is a set of all idle slots of resource j. f mini and v mini represent the lowest supply voltage and frequency of resource j, respectively. L jk is the amount of idle time for IDLE jk . During the execution of tasks in the workflow, the overall energy consumed by the FCE is

Modelling the solution vector
In this research, tasks can be scheduled to run on the resources like EDs, FDs, or CDs, as previously described. Concerning the sensor nodes, all computing resources have their processing capability and communication bandwidth. ED can only offload their tasks to FDs and CDs. They cannot offload their tasks to other EDs. As a result, only one representative ED is included in the encoding process for each sensor node when tasks are scheduled. Each artificial algae cell of an algal colony is represented using natural numbers because task scheduling in FCE is a discrete problem. Like a solution represented as an individual chromosome in GA and particle in PSO, we used artificial algae cells in MAA. Here task-resource schedules are taken as artificial algae cells. Each vector has a length of n equal to the total number of tasks in the workflow. Each index in the vector is a positive number representing the task number. The VM ID used to complete the task is the value supplied to this slot. The VM ID is chosen from all VMs accessible in the three-tier architecture of FCE. Assume a workflow includes ten scheduled tasks on five VMs: one ED, two FDs, and two CDs. The length of the individual, in this case, is ten, and each element is an integer between one and five. This individual's task assignment may look like this: [1, 1, 2, 2-4, 4, 4, 5, 5]. Tables 4 and 5 show a complete depiction of the solution vector and schedule. Table 4 Solution vector  example  Task ID-> T1 T2 T3 T4 T5 T6 T7 T8 T9 T10   VM ID->  4  3  2  1  5  4  2  1  5 1

Artificial algae algorithm
Using idealised versions of the attributes of algae, artificial algae are matched to each solution in the problem space. Artificial algae are similar to actual algae in that they can migrate toward the light source to photosynthesise by helical swimming, and they can adapt to their environment, alter the dominant species, and reproduce through mitotic division. The algorithm thus consisted of three fundamental components, referred to as "Evolutionary Process," "Adaptation," and "Helical Movement." Algae represent the primary genera in the algorithm. Algal colonies made up the entire population here. A collection of living algae is referred to as an algal colony, as represented in Eqs. (10) and (11) [39]. One algal cell splits into two new algal cells, which live next to one another. When these two are divided again, another four cells live next to one another, and so on. Algal colonies function like a single cell that moves as a unit, and their cells are susceptible to death in unfavourable environmental conditions. The colony may be divided into smaller parts by an external force like a shear force or by unfavourable conditions, and as life continues, each divided portion develops into a new colony. The colony that exists at the optimum point is known as the colony of optimums and is made up of the best-performing algae cells.
where x j i is algal cell in jth dimension of ith algal colony.
• Evolutionary process: When given adequate nutrients and light, an algal colony may expand and replicate, producing two new algal cells in time t, which is analogous to an actual mitotic division. On the other hand, an algal colony that doesn't get enough light persists for a time before passing away. The Monod model, which is provided in [39], was used to calculate the growth kinetics of the algal colony. Here, is the specific growth rate, max is the maximum specific growth rate, S is the nutrient concentration, which is the fitness value (f t (x i )) at time t in the model, and K is the algal colony's substrate half-saturation constant. max was taken to be 1, as the conservation of mass principle states that the maximum amount that can be converted to biomass should be equal to the maximum amount of substrate that can be eaten in a given amount of time. K was calculated as the algal colony's growth rate in time t under circumstances of half nutrients.
In the Monod equation, the size of the ith algal colony at time t+1 is determined by the following equation [39]: where N is the total number of algal colonies in the system and G t i is the size of the ith algal colony in time t.
Algal colonies that offer good solutions (the most effective and economical) grow more as the number of nutrients they receive increases. In the course of evolution, an algae cell from the largest colony gets duplicated for every algal cell that dies in the smallest colony, as shown in Eqs. (13), (14) and (15) [39].
where biggest t represents the largest algal colony and smallest t represents the smallest, and D represents the problem dimension. Algal colonies are arranged in AAA according to their sizes at time t. Algal cells from the smallest colony perish, whereas those from the largest colony multiply themselves in any randomly chosen dimension.
• Adaptation: When an algal colony struggles to expand adequately in a given environment, it tries to adapt, which changes the dominant species. An inadequately developed algal colony attempts to imitate the largest algal colony in its surroundings through the process of adaptation. The algorithmic modification to the starvation threshold puts an end to this process. For each artificial alga, the starvation value is initially set at zero. As algal cells receive inadequate light, the starvation value rises over time t. The artificial alga with the highest starvation value according to Eq. (16) has evolved according to Eq. (17) [39] where A t i is the ith algal colony's starvation value at time t. starving t is the algal colony with the highest starvation value during time t. The application of the adaptation process at time t is determined by the adaptation parameter (A p ). A p remains constant between [0, 1].
• Helical movement: Algal colonies and cells often swim and strive to remain near the water's surface because there, they can get enough light to survive. Thanks to the helical swimming motion of their flagella. Their flagella allow them to move ahead but are constrained by gravity and viscous drag as they swim helically in the fluid. Different algae cells move in different ways. Growing algal cells have a 17) starving t+1 = starving t + (biggest t − starving t ) × rand bigger friction surface, which enhances their capacity to conduct local searches and increases the frequency of helical motions. The amount of energy an algae cell has determines how much movement it can make. The quantity of nutrition taken in by an algal cell at time t is directly related to its energy level at that moment. As a result, an algae cell closer to the surface has more energy, giving it a better opportunity to travel around the liquid. On the other hand, because of the reduced friction surface, their moving distance in the liquid is greater. Consequently, they have a wider range of search possibilities. In contrast, they are less mobile in relation to their energy. An algae cell moves in a helical pattern, much like in real life. In AAA, viscous drag is represented as shear force, which is proportional to the size of the algal cell, and the gravitational force that restricts mobility is represented as 0. It has a spherical form, and the volume of it in the model determines its size. As a result, the friction surface is transformed into the hemisphere's surface area, as represented in Eqs. (18) and (19) [39].
where (x i ) is the surface of friction. The three dimensions of the algal cell's helical movement are chosen at random. One of these allows for linear movement in equation (20), and the other two dimensions allow for angular movement in Eqs. (21) and (22)  where x t ik , x t il , and x t im represent the ith algal cell's x, y, and z coordinates at time t, ∈ [0, 2]; p ∈ [1, 1]; Δ is shear force; and t x i represents the ith algal cell's friction surface area.

Proposed methodology
This section starts with explaining how the tasks are scheduled for resources available and how this schedule can be utilised to build an optimal solution vector. It also introduces an efficient meta-heuristic approach for scheduling workflows. Most WSAs are primarily composed of 2 basic components. The first is the prioritisation Fig. 2 Flow of the pre-processing phase of the proposed algorithm Fig. 3 Initialising the population of the task execution sequence while not violating the precedence constraints. Second, an effective task-to-processor mapping is carried out to reduce the overall completion time as much as possible. Here prioritisation of tasks is accomplished by using pre-processing of the provided scientific workflows. However, an effective task-to-processor mapping can be produced by multi-objective optimisation using the proposed MAA algorithm. Flowcharts of the proposed methodology are

Pre-processing the workflow
The proposed approach employs pre-processing stages to prepare task lists and resource lists for MAA algorithm prior to its use. The suggested method organises tasks based on the number of offspring; hence, tasks having a large number of descendants are handled first. These tasks act as a bottleneck for fog-based resources, resulting in lengthy execution durations [40]. The method also organises fog resources based on their processing power, categorising them as high-processing-power and low-processing-power resources. To perform workflow tasks, two resource lists are generated.
Parent tasks requiring a significant amount of execution time are executed on nodes with a high processing speed in order to swiftly reduce dependencies. After executing parent tasks, child tasks are executed based on their location in the graph, i.e. leaf tasks are executed with nodes with a low processing speed, while parent and intermediate tasks are executed with nodes with a high processing speed. Algorithm 1 uses lines 4-5 to separate the root tasks from workflow W. These root tasks are kept in a parent task list L 1 . Now, the workflow is checked for leaf tasks in lines 6-7. These leaf tasks are transferred to the child task list L 2 . Before including the intermediate/dependent tasks in the list, line-8 first verifies the status of their parent tasks. If the parent task is already included in the list, then the intermediate tasks are transferred to the parent task list; else, it has to wait till all its parent tasks are included. In line 16, two separate task-lists L 1 and L 2 are created for processing using MAA algorithm for workflow scheduling.

Initialising the population of algal colonies
The proposed MAA algorithm starts with a fixed value of maximum iterations MaxIter, here which is set to 100. The population of Algal colonies refers to the collection of solution vectors. Before the first iteration, the algal colonies are initialised with the solution vectors having serial values taken from task list L 1 . At the end of each iteration, a new algal colony is added to the population using Task List L 2 . Each repetition of the algorithm improves the solution vector. Figure 3 shows k algal colonies that have been randomly initialised by using task list L 1 .

Applying the multi-objective artificial algae (MAA) algorithm
The proposed approach is based on AAA [39] optimisation technique to decrease workflow execution time (make-span), and cost while distributing the workload evenly across all computing layers. AAA has a quicker algorithmic convergence rate than other meta-heuristics, which is its major benefit. The technique has not been utilised in any literature on FCE for a variety of workflow tasks, including scheduling and resource allocation algorithms. It is expected that the scheduler is aware of how different workflow tasks are dependent on one another. It is also known in advance how long workflow tasks will take to complete. The proposed algorithm's goal is to allocate computing resources (VMs) to workflow tasks while minimising their make-span, energy consumption and total cost. The proposed MAA WSA aims to assign resource R i to workflow task T j in a way that makes effective use of computing resources (VMs). While allocating resources to the tasks, the scheduler must optimise all objective factors taken into account. The MAA WSA assigns computing resources to tasks from both lists. The algorithm begins with initially generating N algal colonies from task list L 1 . Each algal colony representing a solution vector is assessed based on its fitness function value. Fitness value is determined by Eq. 1 based on the execution time, energy consumption and total cost. During each iteration, all the variables are updated and the procedure is repeated until the halting requirement is not met.
It is quite difficult to design an effective mapping of tasks and resources [43]. We employ a search space with n dimensions for n tasks with a set of discrete potential values ranging from 1 to m, where m is the number of VMs. We employ notations from earlier research [45] to denote allocation of VMs to tasks i.e.
. Where x t ij indicates the VM i is allocated to a jth algae cell of an algal colony at time t. The number of tasks in a workflow represents the dimension of an algal colony. The proposed algorithm stores viable solutions, i.e. colonies that are not dominated. The repository is initially empty. The repository is updated whenever the algorithm discovers a new solution. There are only non-dominated solutions in the repository. If the current solution is surpassed by any other solution during the process, the existing solution is replaced in the repository with the new solution. The judgement is based on the fitness criteria employed. In the final stage of the algorithm, the repository contains only viable solutions, which are non-dominated in nature.

Fitness function evaluation
The method starts with the computation of execution time and assigning these values to the make-span matrix, as depicted by Eq. (23). Each element value represents the execution time; for instance, ET 1,1 represents the execution time of task T 1 on VM 1 . Using Eq. 5, the value of execution time in the matrix is computed.
As stated in Eq. (24), a task dependency matrix (TD W ) can be used to depict the interdependence of tasks in a workflow. Each matrix entry is either 1 or 0. If d 1,2 equals 1, then task T 2 is performed after task T 1 .
As indicated in Eq. (25), the cost matrix records the execution cost per unit time for each VM. C 1 , C 2 ,…, C m represent the unit execution costs for VMs VM 1 , VM 2 ,…, VM m , respectively.
As depicted in Eq. (26), the energy consumption matrix records the energy consumption per unit time for each VM. EC 1 , EC 2, ,…, EC m represent the unit energy consumption for VMs VM 1 , VM 2 ,…, VM m , respectively.

Time complexity analysis
The time complexity of our proposed strategy is O (T+T*M*N*log(E)). It depends on following two factors:

The time complexity of the workflow pre-processing strategy
Task classification is performed by the for loop having nested if-else statements, so the average time complexity of task classification is O(T), where T is the total number of tasks in the given workflow.

The time complexity of MAA workflow scheduling algorithm
The task scheduling strategy is based on MAA Algorithm, whose complexity is usually measured in terms of average convergence and is influenced by the number of populations and the number of iterations. However, the randomness and group nature of these algorithms lead to complex and variable stochastic processes, which T adds difficulties to the time complexity analysis of the algorithms. We can only approximate the time complexity. The time complexity of the MAA algorithm is approximately O (T*M*N*log(E)). Here T is the total number of tasks, M is the maximum number of iterations, N is the population size, E is the the total energy of the algal colony.

Datasets used
This section describes the experimental setup, followed by the findings and discussion of the experiment. The suggested technique was experimentally evaluated using scientific workflows [42] from various fields of study. Workflows are comprised of varying numbers of tasks, degrees of task dependencies, and data transmission between tasks. Some of the most practical scientific workflow datasets, including Montage, CyberShake, Epigenomics, LIGO, and SIPHT, were published by the Pegasus project [51]. Figure 5 depicts the architectures of these five scientific Fig. 5 The structure of five realistic scientific workflows [42] workflows. Table 6 provides information about the scientific workflow datasets utilised during simulation. The algorithms were assessed based on their make-span, cost, and energy consumption. Make-span refers to the sum of all task execution times inside a workflow. Cost refers to the cost associated with the execution and transport of data for WA processes. Energy consumption is the matrix that indicates if the system's power consumption is optimal. Energy consumption is assessed as that of the summation of energy consumption during idle and active duty-cycles for all levels of computing resources. Minimum values are desirable for all performance parameters.

Simulation environment settings
Researchers have utilised simulated data to evaluate their proposed algorithms because of the novelty of fog computing and its very few real-world applications, and we also did the same in our article [44]. All the simulations are executed on a machine with a Windows 10 Pro 64-bit operating system, an Intel(R) Xeon(R) processor running at 3.70 GHz, and 16 GB of RAM. We used the Java IDE Eclipse to run the FogWorkflowSim-1.1 toolbox. FogWorkflowSim [41] is an extension of iFogsim that simulates user-defined task workflows in order to evaluate resource management strategies in FCE. We have employed "Simple" offloading strategy of the FogWorkflowSim toolkit for taking offloading decision while task scheduling. The proposed approach is compared to Particle Swarm Optimization (PSO) [46], Ant Colony Optimization (ACO) [21], Grey Wolf Optimization (GWO) [47], and HPSOGWO [48] techniques. For each algorithm, the number of iterations and the size of the population is taken as 100 and 25, respectively. The population size is assumed to be 25 for each algorithm. Table 7 displays the simulation environment parameter settings for the three layers of HFCE. Table 8 displays the parameter settings used specifically for evaluating each method. We have modelled each technique for weighted sum objectives based on TIME, ENERGY, and COST. All three of the weighted coefficients w 1 , w 2 , and w 3 are set to the same value of 0.33. The algorithms can be evaluated using different realistic scientific workflows. The five scientific workflows that are under consideration are Montage, CyberShake, Epigenoimics, LIGO (Inspiral), and SIPHT. The Pegasusgenerated scientific workflow structures are represented via a DAG XML file for each workflow [51]. These workflows are available with a range of task counts. For example, Montage is available with 20, 40, 60, 80, 100, 200, 300 and 1000 tasks. Simulations are conducted ten times for different combinations of workflow types and task counts in order to examine the algorithms' average performance.

Results and discussion
The Result and Discussion has been done based on three performance parameters evaluated during the experiment. In this section, we have demonstrated the performance comparison of our suggested MAA algorithm with existing WSA like PSO [46], ACO [21], GWO [47], and HPSOGWO [48]. The performance 1 3 MAA: multi-objective artificial algae algorithm for workflow… is evaluated for five well-known workflows: Montage, CyberShake, Epigenomics, LIGO (Inspiral), and Sipht with task counts ranging from 20 to 1000 (997 in case of Epigenoomics workflow) in terms of Make-span (MS), Energy Consumption (EC), and Total Cost (TC). There was a limit of 100 iterations. Each scenario is performed 10 times before the average value of the result is taken into account. The simulation results are compiled in Tables 9,10,and 11. Figures 6,7,8,9,10,11,12,13,14,15,16,17,18,19 and 20 demonstrate the three-performance metrics concerning five workflows taking weighted sum based objective function.

Performance evaluation using the Montage workflow dataset
The findings for make-span, cost, and energy usage for the Montage workflow are shown in Figures 6, 7 and 8. As per expectations, all the measures increase with the number of tasks. The result demonstrates that ACO performs somewhat worse than all the other four techniques. On the other hand, out of the other three compared techniques, PSO performs somewhat better. This is most likely due to PSO's most popular attribute of carrying out global search and local searches simultaneously. The MAA algorithm, which uses evolution and exploitation, clearly benefits from its capacity to see a wide variety of solutions thanks to the random reproduction and adaptation operators.

Performance evaluation using Cybershake workflow dataset
The findings for make-span, cost, and energy usage for the Cybershake workflow are shown in Figs. 9, 10 and 11. The results demonstrate that MAA performs better than the other four techniques. On the other hand, all three techniques perform at a comparable level. However, there is an exception in the case of energy since PSO performs worse than all the other alternatives in that regard. This is most likely due to PSO's prevalent issue with early convergence and becoming stuck in the local minima. The statistics for ACO, GWO, HPSOGWO and MAA tasks are pretty low, but PSO is high and rises rapidly as the number of tasks increases. For example, when the make-span increases from less than 60 seconds for 30 tasks to more than 130 seconds for 1000 tasks, energy rises drastically from less than 8KJ for 30 tasks to more than 57KJ for 1000 tasks and cost hikes from less than nearly

Performance evaluation using Epigenomics workflow dataset
Figures 12, 13 and 14 demonstrate the findings for the Epigenomics workflow in terms of make-span, cost, and energy usage. The statistics for 24 and 47 tasks are pretty small, but they rise dramatically as the number of tasks increases. For example, make-span increases from less than 20 seconds for 24 and 47 tasks to more than 300 seconds for 1000 tasks; cost hikes from less than ten thousand dollars for 24 and 47 tasks to more than eight lakh dollars for 1000 tasks; and energy increases from less than 10KJ for 24 and 47 tasks to more than 600 KJ for 1000 tasks. This is because the mapping tasks in the Epigenomics workflow, which are responsible for matching genome sequences, become much more computationally expensive, making turnaround time even longer as the number of activities increases. Compared to the other techniques, MAA still performs better in the case of make-span, energy and cost.

Performance evaluation using LIGO(Inspiral) workflow dataset
The findings for make-span, cost, and energy usage for the LIGO workflow are shown in Figs. 15, 16 and 17. The make-span of all methods is consistently increasing in Fig. 15. MAA performs the best for all tasks, but ACO appears to perform worse than that of the prior three workflows. As shown in Fig. 16, with the rise in the number of tasks, the energy consumed by all algorithms is consistent except HPSOGWO, which increases dramatically. As shown in Fig. 17, with the increase in the number of tasks, all algorithms perform consistently. For 1000 tasks, MAA continues to outperform all the other techniques. As the number of tasks rises, it appears that the HPSOWO algorithm distributes more tasks to FD/

Superiority of the proposed algorithm over existing WSAs
• To the extent of our expertise, the majority of algorithms uphold dependence constraints utilising a heuristic, meta-heuristic or hybrid of the two as described in the related work section. The proposed methodology appears to have a significant level of time complexity. Since the time complexity for task classification-based pre-processing is O(T), where T is the number of tasks in the workflow. On the other hand, our research suggests a unique pre-processing algorithm that consistently upholds the precedence relationship as described in Algorithm 1. Two task lists ordered by priority is used to    Fig. 12 Make-span for Epigenomics Workflow maintain the dependency relationships between tasks, and a task t i can only be appended to the task list after the parent task of task t j , has been completed. • The characterisation of the solution vector in place of an algal colony of MAA algorithm is one of the fundamental elements of our meta-heuristic algorithm. It is imperative to build the algal colony with some random quantitative data only because the AAA depends on some mathematical formulae.
In this study, we have developed a system model of real-valued entities (prioritised task lists from workflows) that can provide comprehensive solutions  to the workflow scheduling problem. Additionally, it is made sure that after updating the energy loss and modifying the most starving colony, the validity of the best algal colony is maintained. • Even though there has been a lot of research on task scheduling, most algorithms do not take multiple parameters into account at once. The major goals of the existing meta-heuristic approaches [20][21][22][23][24][25][26][27][28] and heuristic approaches [11,12] have been to minimise either make-span or reduce monetary cost or load-balancing etc. On the other hand, the suggested method takes into account a weighted sum based multi objectives, namely make-span, energy consumption and total cost of computation and communication.

Conclusion
CC is the most popular tool of choice for conducting scientific experimentation on CDs. It can be an even more efficient strategy to use FC along with CC for allocating resources and executing operations on both FDs and CDs. Complex scientific workflow operations need the effective use of virtual machines. Most of the research articles available till date are either focused on task scheduling rather than workflow scheduling or consider cloud computing settings instead of integrated Fog-Cloud Environment. Hence, there was a research gap, which had been addressed by our proposed workflow scheduling optimisation technique. A Multi-objective Artificial Algae (MAA) algorithm for scheduling scientific workflows in heterogeneous FCE is presented in this article. The MAA algorithm targets to minimise a weighted sum objective function based on execution time, energy consumption and cost. At first, the proposed algorithm pre-processes the scientific workflows to remove bottleneck tasks. The algorithm is then used to schedule the pre-processed tasks list on the available VMs. The proposed approach is supported by experimental findings on scientific workflows taken from various research areas. With respect to the specified performance parameters like execution time, energy consumption and cost, the method outperforms all the existing algorithms. Because of the system and cost constraints, real-time implementation and analysis of the proposed algorithm are not done. However, the same could be done on any of the cloud platform providers without any hassle. This empirical study is bounded to 20-1000 tasks to measure the performance of the optimisation algorithm