Energy-aware and carbon-efficient VM placement optimization in cloud datacenters using evolutionary computing methods

One of the most critical concerns of cloud service providers is balancing renewable and fossil energy consumption. On the other hand, the policy of organizations and governments is to reduce energy consumption and greenhouse gas emissions in cloud data centers. Recently, a lot of research has been conducted to optimize the virtual machine placement on physical machines to minimize energy consumption. Many previous studies have not considered the deadline and scheduling of Internet of Things (IoT) tasks. Therefore, the previous modelings are mainly not well-suited to the IoT environments where requests are time-constraint. Unfortunately, both the sub-problems of energy consumption minimization and scheduling fall into NP-hard issues. This study proposes a multi-objective virtual machine placement to jointly minimize energy costs and scheduling. After presenting a modified Memetic algorithm, we compare its performance with baseline methods and state-of-the-art ones. The simulation results on the CloudSim platform show that the proposed method can reduce energy costs, carbon footprints, service-level agreement violations, and the total response time of IoT requests.


Introduction
A big challenge with cloud data centers is high power consumption. In general, electricity and maintaining a cloud data center are two of the high costs imposed. The cost of electricity is a more critical factor. Studies show that an idle server consumes about 50-70% of its maximum power. Also, according to Amazon estimates, the energy cost of a data center accounts for 40% of its total Operational Expenditures (OPEX) (Khosravi et al. 2017). The rate of carbon generated in a cloud data center depends on the type of energy source it consumes. It varies for different energy sources such as oil, coal, and renewable energy. Thus, even a slight reduction in energy consumption leads to significant cost savings and a carbon footprint. Much of this problem can be alleviated through the dynamic placement of Virtual Machines (VMs). For this purpose, the real-world parameters affecting the Virtual Machine Placement (VMP) must be considered in the modeling. Optimizing energy consumption directly affects carbon footprint and carbon taxation, which, in turn, can reduce the cost of cloud service providers.
In the past, a lot of research has been conducted on the VMP. The main challenge in this regard is to perform the VMP in such a way that minimizes energy consumption and, at the same time, satisfies users' Service Level Agreement (SLA) and Quality of Experience (QoE). Some researchers have used classical mathematics methods (Ahvar et al. 2019;Thiam and Thiam 2019;Xu and Buyya 2020;López et al. 2019). The VMP falls into bin-packing problems, an NP-hard issue (Parvizi and Rezvani 2020;Aboutorabi and Rezvani 2020). Therefore, some researchers have used heuristic methods to reduce allocation time (Khosravi et al. 2017; Tavakoli-Someh and Rezvani 2019; Mohammadi and Rezvani 2019;Laganà et al. 2018). Another way to reduce convergence time on NP-hard problems is to use metaheuristic methods (Parvizi and Rezvani 2020;Aboutorabi and Rezvani 2020;Tavakoli-Someh and Rezvani 2019;Iwendi et al. 2021;RM et al. 2020;Zhang et al. 2019;Rashida et al. 2019;Zhao et al. 2018).
The emergence of the Internet of Things (IoT) has recently generated big data volumes. Many of these data come from delay-sensitive applications. Some of the most critical applications are agriculture (Khoobkar et al. 2022), vehicular traffic management (Esfandiari and Rezvani 2020; Babazadeh Nanehkaran and Rezvani 2021), e-health (Karuppiah and Gurunathan 2021), and so on. Each IoT request is assigned to a VM to be executed. Then each VM must, in turn, be mapped to a suitable physical server using VMP approaches. Here, it is essential to pay attention to the deadline of tasks. Unfortunately, many previous studies have merely focused on optimizing green energy consumption and have by no means considered task scheduling. Therefore, previous modelings are unsuitable for IoT environments where requests are time-constraint. Scheduling optimization, like energy efficiency, falls into the category of NP-hard problems.
This study proposes a multi-objective VMP algorithm for joint minimization of energy costs and scheduling. After presenting a modified Memetic algorithm, we compare its performance with baseline methods such as the Genetic Algorithm (GA) and state-of-the-art techniques. Roughly speaking, we seek to answer how to optimize the VMP concerning the SLA and the QoE of users while considering both energy and scheduling constraints. Some of our most important assumptions are: • Each IoT task has a deadline.
• We have information about changes in the outdoor temperature of the servers in a certain period. • Solar energy is considered a significant type of renewable energy. • Datacenters are considered geographically distributed.
• Energy costs vary for data centers that are geographically dispersed.
The most important innovations of this research are as follows: • We formulate the two sub-problem of energy cost optimization and IoT task scheduling optimization. Then, we solve them jointly with the deterministic weighted sum method, which is one of the well-known methods of nonlinear optimization. To the best of our knowledge, no research has been conducted on the joint optimization of green energy and task scheduling.
• Because the problem is NP-hard, we solve it on a large scale using a modified MA method. Then, we evaluate the performance of the proposed method against the baseline method, such as GA and state-of-the-art techniques.
The remainder of this paper is organized as follows: Sect. 2 reviews the most critical studies on energy-efficient VMP; Sect. 3 explains the problem formulation; Sect. 4 describes the proposed method; Sect. 5 evaluates the results of simulating the proposed method on the CloudSim simulator (Calheiros et al. 2011); Finally, Sect. 6 concludes the study and provides suggestions for future research.

Related work
So far, there has been a great deal of literature on energy cost minimization. Before reading this section, interested readers are highly recommended to refer to Ahvar et al. (2019) for a comprehensive survey regarding the architecture. After studying previous research, we divided it into three categories: exact (classic mathematical), heuristic, and meta-heuristic. For each of these categories, we present the purpose, limitations, methodology, and evaluation criteria in Tables 1 and 2. Some researchers address the issue of VMP. To solve the carbon dioxide problem, Xu and Buyya (2020) proposed a strategy for managing the greenhouse gas and renewable energy effects for several data centers in California, Virginia, and Dublin located in different time zones. Their results show that the proposed algorithm could reduce carbon emissions by nearly 40% compared to stateof-the-art data centers while guaranteeing an average response time to user requests. Other related research can be found in Khosravi et al. (2017). Primary research using classical mathematics and heuristic methods is shown in Table 1.
Another group of studies has used meta-heuristic methods. Gao et al. (2013) developed an algorithm inspired by Ant Colony Optimization (ACO). The goal is to obtain a set of non-dominant solutions (Pareto Optimal set) that simultaneously minimizes the waste of resources and energy consumption. Parvizi and Rezvani (2020) solved the VMP problem using a Non-dominated Sorting Genetic Algorithm (NSGA-III) to minimize total resource loss and minimize energy consumption and also the number of active physical machines. For this purpose, a multi-objective optimization problem was designed, and after introducing a nonlinear convex optimization solution, it is solved by the NSGA-III method. Also, in Rashida et al. (2019), the optimization of VMP in heterogeneous multicloud systems by considering the peak time of demand and the geographical location of allocated resources was presented, aiming to reduce energy costs. The authors also presented a dynamic energy model designed for physical machines and cloud communication components. The GA has been used in some research, such as (Tavakoli-Someh and Rezvani 2019; Misra and Kuila 2022;Keshavarznejad et al. 2020). Besides the methods mentioned above, other methods such as the artificial bee algorithm (Aboutorabi and Rezvani 2020) and the Cuckoo algorithm (Mohammadi and Rezvani 2019) have been used. Iwendi et al. (2021) developed a method to minimize the energy consumption of sensors in the IoT network by combining the Whale Optimization Algorithm (WOA) with the Simulated Annealing (SA). They tried to select the most suitable Cluster Heads (CH) in the IoT network by considering several criteria such as the number of nodes, load, temperature, and residual energy. Maddikunta et al. (2020) conducted a study with the same objectives, but this time combining the WOA and a Butterfly Flame Optimization (MFO) algorithm. RM et al. (2020) proposed a new architecture to minimize energy consumption in the IoT. Their proposed method first minimizes energy consumption by clustering different IoT networks using a wind-driven optimization algorithm. Next, an optimized CH is selected for each cluster using the Firefly algorithm. Their method reduces data traffic compared to other nonclustered state-of-the-art designs. The CH selection is a complex issue in almost all types of networks. For example, it is common for wireless sensor networks to use evolutionary methods to select CHs to minimize energy consumption. To see how to model such systems, you can refer to Chauhan et al. (2021a).
Optimization methods are also widely used in other fields of engineering. Vashishtha and Kumar (2022) proposed a deep-learning scheme to identify bucket defects in the Pelton wheel. Initially, the raw vibration signal is transmitted through a time-varying filter. The filter parameters are optimized by the Amended Grey Wolf Optimization (AGWO). They use the Convolutional Neural Network (CNN) model to train the dataset. Another example is the minimization of passband, stopband, and transition band errors for designing a two-channel quadrature mirror filter bank (Chauhan et al. 2021b). Interested readers can refer to Kumar 2021a, b andVashishtha et al. (2021) to see more engineering applications of such optimization methods. In summary, the most important differences between this study and the above research are as follows: • Some studies (Iwendi et al. 2021;RM et al. 2020;Maddikunta et al. 2020;Chauhan et al. 2021a) have tried to balance the load while minimizing response time. Although load balancing may reduce response time, it does not necessarily reduce energy consumption. In this research, our focus is not on load balancing and the selection of CHs. • Many previous studies (Khosravi et al. 2017;Parvizi and Rezvani 2020; Tavakoli-Someh and Rezvani 2019; Mohammadi and Rezvani 2019) focus exclusively on minimizing energy consumption within the data center and pay no attention to the outside. This study considers both energy consumption during offloading from the end device to the base station and renewable energy consumption within the data center. • None of the above research considers satisfying the time constraint for IoT tasks. This makes modeling inconsistent with real-world requirements, especially for delay-sensitive tasks. This study finds the joint minimization of brown energy consumption and task scheduling. Although optimizing these two conflicting goals may slightly increase energy consumption, it improves the overall acceptance ratio of users. The results of our simulation confirm this claim.

System model
In this section, we formulate the energy consumption optimization sub-problem and the scheduling optimization problem, respectively. Then, by combining them, we present a joint optimization problem.

Main parameters
In this study, the total cost of the data center, c total , consists of two components: the cost of energy consumption of the system, c energy , and the cost of carbon dioxide produced, c FP . Also, we assume that three types of energy sources can be used to supply power to each data center. First, we try to use green or renewable energy (for example, solar energy) if available. This will reduce the cost incurred by carbon dioxide. If renewable energy is not available, off-site grid energy can be used. Finally, brown fuels, such as diesel generators, can be used in the cloud data center if these two energies are not available. Several geographically distributed data centers are interconnected through the same network infrastructure. The notations used in this research are shown in Table 3.  The v i element of the row vector Y 1 Â 3 that indicates the virtual machine v i is mapped to the brown power supply network The v i element of the row vector Y 1 Â 3 that indicates the virtual machine v i is mapped to the green power supply network The v i element of the row vector Y 1 Â 3 that indicates the virtual machine v i is mapped to the off-site power supply network Energy-aware and carbon-efficient VM placement optimization in cloud... 9291

The datacenter energy efficiency
We use the Power Usage Effectiveness (PUE) metric to report the performance of a data center, introduced by Green Grid (a non-profit organization of IT professionals) in 2007. This metric has become the most common metric for reporting ''energy efficiency.'' Organizations such as the Uptime Institute track and report the average PUE of data centers using numerous surveys (Brady et al. 2013). Conceptually, PUE is the power used by IT devices in a data center divided by the total amount of fuel consumed (Fawaz et al. 2019). The higher the PUE, the more energy is wasted in data centers to maintain servers (Khosravi et al. 2017). Accordingly, companies such as Google and Facebook have significantly reduced their PUE in recent years by focusing on the utilization and design of custom hardware. Thus, a value of 1 for PUE is ideal for a data center and indicates that 100% of the energy provided by each piece of equipment is used.
Interested readers can refer to Khosravi et al. (2017) to see how the coefficients of the above equation are calculated.

Overhead power
We denote the overhead energy to run the virtual machine v i by E overhead v i . Today, server cooling systems are essential elements that have a remarkable effect on overhead energy costs.

Server energy
A previous study by Gao et al. (2013) has shown that there is a linear relationship between the power of a server h l and its CPU utilization u h l ðtÞ: where P idle h l and P peak h l are the average energy values of the server h l at idle and peak loads, respectively.

Energy cost
Let denote the cost of brown, renewable, and off-site energies consumed to run a virtual machine v i on the server by c brown energy , c green energy , and c offsite energy , respectively. Here, energy consumption is calculated in cents per kilowatt-hour (cent/ KWh). Although the cost of renewable energy, c green energy , is negligible, the cost of the other two fuels is high. Depending on the geographical distribution of data centers, these costs depend on governmental policies in different countries. The total brown energy available at the data center d j

L vi
The size of a request on a virtual machine v i (bytes) s hj The time duration of each instruction inside a physical server h j (s) The size of each instruction inside a physical server h j (byte)

t trans
The transmission delay of each byte of request from a user device to the cloud datacenter (s) The offloading latency of a request the IoT device i to the cloud datacenter (s) The processing latency of a request v i on a physical server h j (s) The deadline of an IoT request that is running on a virtual machine v i (s) The response time of an IoT request that is running on a virtual machine v i when it is placed on a physical server h j t resp vi The response time of an IoT request that is running on a virtual machine v i t R The aggregate response time of all IoT requests 3.1.5 Density/taxation of carbon dioxide The density of carbon dioxide produced by data centers can vary depending on the energy source and location. Three energy sources have been used in this research, and the carbon density for renewable energy has been considered zero. The rate of carbon dioxide produced is calculated in tons per Megawatt-hour (Ton/MWh). Due to environmental concerns, especially global warming, some countries calculate the tax on carbon dioxide produced in dollars per tonne (Dollars/Tonne) to implement environmental sustainability. This study considers the density of carbon produced from a renewable energy source zero, and the relevant tax cost is also zero.

Temperature
The following primary variable significantly affecting the PUE is the temperature outside the data center. Let h dc j ðtÞ denote the outside temperature of a data center dc j at time t. The higher the outdoor temperature of a server, the higher the cooling system's energy consumption. This, in turn, will lead to a significant increase in the PUE.

Cost optimization sub-problem
Here, we formulate the objective function and constraints of the energy efficiency sub-problem.

Objective function
Our goal is to reduce the total cost of the system, c T , which is the summation of energy cost, c E , and the cost of producing carbon dioxide c FP : We write We denote each element of the matrix it means assigning v i to the data center d j . Now, we describe each of the components of the above equation:

(a) Energy cost
The cost of energy, c E , can be divided into two components. The first component, c S v i , is the cost of energy consumed to execute a request by the server. The second component, c O v i , is the cost of overhead energy to perform the virtual machine v i 's request: Note that the cost of the carbon tax is calculated separately. Also, three different energy sources are used in geographically dispersed areas, each with a foreign financial policy. Therefore, the costs incurred to perform the server request can vary dramatically. This energy is calculated as follows: Also, the power consumption of the server to run a virtual machine v i is as follows: Depending on which of the three energy sources e ¼ fbr; gr; off g is used, the decision variable y e v i takes different values from the vector Y e . If the decision variable y e v i has a value of 1, the corresponding energy source is allocated to the virtual machine v i .
The energy used by the server at time t to execute the request of the virtual machine v i is calculated from the following equation: We now calculate the second term of Eq. (5), i.e., the overhead energy cost. Note that the overhead cost varies depending on which energy source we use: We now calculate the overhead energy to run the virtual machine v i . Here, similar to Eq. (7), we use the energy resource mapping matrix to the virtual machine to calculate the overhead energy: To calculate the overhead energy E O v i , we will have a procedure similar to Eq. (8): To calculate Eq. (13), we need to calculate the power overhead P O . From the definition of PUE, we can write: Energy-aware and carbon-efficient VM placement optimization in cloud... 9293 By replacing Eq. (14) in Eq. (13), we obtain (b) Cost of carbon dioxide production The cost of carbon production and the carbon tax in Eqs.
(3) to (4) are formulated as follows: Similar to what we did in Eqs. (7) and (12), we can write the following relationships using resource mapping vectors to virtual machines: We have already mentioned that the tax on carbon by renewable energy sources is zero. If the other two types of energy are not available, brown energy can be used when necessary. Based on this, Eq. (16) can be rewritten as follows:

Constraints
The constraints to be considered for the objective function of the problem are as follows: • Renewable energy sources have the highest priority, and if green energy is limited or unavailable, data centers may use external grid energy. • The total capacity of the processor, RAM, and storage for virtual machines running on a server h l should not exceed its capacity. So we write X The decision variable z h j v i is a binary variable of the matrix Z V Â H . If the value of this variable is equal to 1, it means that the virtual machine v i is located on the server h j ; otherwise, it will have a value of 0.
• Every virtual machine v i should use only one power source at a time. So, • Each element of the virtual machine mapping matrix must be non-negative: • The total local energy consumption to run the virtual machine on the server should not exceed the whole green energy capacity E T;gr d j and total brown energy • Each virtual machine must be placed exclusively on one server. So, X Finally, based on the above constraints and the objective function that we defined in Eqs. (3) and (4), the optimization problem is defined as follows: s:t: Eqs: 19 ð Þ to 23 ð Þ ð25Þ The above sub-problem was first solved by Khosravi et al. (2017) using a heuristic method. Then, Rashida et al. (2019) solved it with another heuristic method. Also, Xu and Buyya (2020) solved it to manage energy resources with different assumptions. Given that in VMP, there are multiple data centers, and within each, there are several servers with various capacities, it falls into the bin-packing problems. Previous researchers have shown that bin-packing is an NP-hard problem (Burke et al. 2006). With increasing the number of requests, the space for problem solutions becomes large.
The First-fit Decreasing (FFD) algorithm is one of the simplest heuristic algorithms to solve the bin-packing problem, which provides a quick but often non-optimal solution (Chekuri 1998). The pseudo-code of the FFD algorithm is shown in Algorithm 1. The greedy_Allocation procedure used in this algorithm is also shown in Algorithm 2. Suppose several requests for a virtual machine v i arrive in the cloud broker. A buffer is used to store these requests. Initially, the buffer list is sorted in descending order of priority based on several parameters. VM requests are then sorted in descending order based on the required processor cycles. If this parameter is equal for the requests, the processor's speed is used. If the above parameters are the same, the requests are sorted based on the memory parameter. Finally, if all of the above parameters are equal for requests, the hard drive capacity required to execute the VM request is used as the sorting parameter. After sorting the list, the first data center is searched to select appropriate hosts for VM requests. If no suitable host is found in the current data center, the next will be explored. In this way, the most relevant data center and its hosts are selected.
The FFD method may require a lot of execution time for longer lists. Therefore, for large-scale problems, it is better to use evolutionary algorithms.

Scheduling optimization sub-problem
It is important to note that each IoT request must be executed on a VM instance. Therefore, we will use the terms ''IoT request'' and ''VM instance'' interchangeably in the following. The response time of a request i sent by an IoT object consists of the sum of (a) the transmission latency to offload an object from the IoT instance i to the relevant fog instance, t offload v i , (b) the processing latency of the request i Energy-aware and carbon-efficient VM placement optimization in cloud... 9295 inside a fog instance t prc i;j . Total response time in the physical host h j 2 H for the request v i 2 V is obtained as follows: We now proceed to calculate t offload v i in Eq. (26). We write: where t trans denotes the transmission delay of each byte of request from an IoT device to the physical server. Note that here we have ignored the propagation delay. We now obtain t prc i;j in Eq. (26): where s h j is the time duration of each instruction inside a physical server (in seconds). Also, h h j is the size of each instruction on a physical server (in bytes). Therefore, s h j =h h j denotes the time duration of each byte of the IoT request to be executed on the physical server. Multiplying this value by the request size L v i , gives us the processing time t prc i;j , of the IoT request v i on the physical server h j . As mentioned earlier, a value of 1 for the binary deci- According to Eq. (26), the response time function for an IoT request running on a virtual machine v i 2 V can be defined as follows: and the aggregate response time of all requests is as follows: The response time of each IoT request running on a virtual machine v i 2 V should not exceed its deadline. That is, The sub-problem of scheduling optimization is defined as follows: Our goal is to minimize the overall response time of the system, taking into account the deadline of each task.

Joint cost and scheduling optimization problem
As mentioned earlier, our goal is to find the best candidates from the physical host nodes to place any IoT requests to. The placement operation must be performed to lead to a trade-off between response time and energy cost in the system. Finally, the optimization problem is formulated as follows: s.t: Eqs: 19 ð Þ À 23 ð Þ; and 31 ð Þ ð35Þ The above multi-objective problem can be converted to a single-objective form as follows (Jafari and Rezvani 2021): The use of the convex function of the logarithm in Eq. (36) guarantees the existence of the optimal solution (Parvizi and Rezvani 2020). Unfortunately, as mentioned earlier, both sub-problems of task scheduling and energy efficiency fall into NP-hard problems. Therefore, deterministic methods fail to obtain a solution. In the next section, we solve this problem using a metaheuristic method.
In this research, we solve the above problem for the first time using two meta-heuristic methods of Genetic Algorithm (GA) and Memetic Algorithm (MA).

Proposed metaheurisic methods
We consider the method presented in Khosravi et al. (2017) as the baseline method to solve the sub-problem of Eq. (24). We call this method ''Cost and Renewable Energy-Aware Dynamic PUE'' or in abbreviation CRA-DP. Then, we solve the joint cost and scheduling optimization problem and propose two metaheuristic algorithms to overcome the NP-hardness and time complexity of the problem.

Genetic algorithm
In this section, we present a genetic-based approach to the VMP problem of Eq. (24). We call this method ''Cost and Renewable Energy-Aware Dynamic PUE with Genetic Algorithm'' or in abbreviation CRA-DP-GA. In this method, there is a population that consists of several chromosomes. Each chromosome is in the form of an array. The number of cells in this array is equal to the number of VMs, N V . The value stored inside each cell is the identifier of a physical machine. Each cell of the chromosome is called a gene. Figure 1 shows a possible solution to the problem of Eq. (24).
The CRA-DP-GA flowchart is shown in Fig. 2. After collecting the user needs in the form of VM requests, they are placed on physical machines using a greedy algorithm, and the initial population is created. Then, a new population is created by applying the crossover operator. This operator selects several chromosomes from among the chromosomes in a population to reproduce. More graceful chromosomes are more likely to be chosen. During crossover, parts of chromosomes are randomly exchanged. This makes the children not precisely like their parents. Now, the mutation operator is applied to the chromosomes. This operator randomly selects a gene from a chromosome and then changes its content. This action causes the search scope to be extended to untouched and newer spaces. It can be concluded that the most crucial benefit of the mutation operator is to avoid getting stuck in the local optimal points. Calculating the fitness of each chromosome is the end of a loop in a genetic algorithm called a ''generation.'' This cycle continues until the chromosome's fitness does not change or the maximum number of iterations is reached. At this point, the best solution is reported.

Initial population
The method of selecting the initial population may significantly affect the speed of convergence (Esfandiari and Rezvani 2020). The random method is used if no information is available (Rashida et al. 2019). Accordingly, we use a random way to generate the initial population in this study. A pseudocode for the initial population is provided in Algorithm 5.

Selection operator
The selection process causes the chromosomes to have a better fit. For the selection operator, different methods are used, such as the roulette wheel or the binary tournament. In the roulette wheel method, the whole population is divided into sections representing an individual. The probability of choosing an individual for the next generation is determined by the ratio of the individual's fitness value to the total fitness values of the entire population. In contrast, the individual with the highest fitness value is selected in the binary tournament method for the next generation. Here, there are no arithmetic calculations based on the value of proportionality. The selection of an individual for the next generation is based solely on several comparisons based on fitness value. Usually, the speed of convergence of the tournament strategy is much faster than the roulette wheel. Algorithm 6 provides the pseudo-code for selecting individuals based on tournament strategy. First, a subset of the attributes of a population is determined. Then, members of the population compete with each other, and ultimately only one attribute from each subgroup is selected. The tournament method provides fair conditions for selecting all solutions and thus preserves diversity.

Cross-over operator
During the cross-over process, parts of the parent chromosomes are randomly swapped. This allows each offspring (child) to inherit a combination of its parents' Energy-aware and carbon-efficient VM placement optimization in cloud... 9299 attributes. The chromosomes selected for the cross-over operation and the resultant chromosomes are called ''parent'' and ''child,'' respectively. According to Fig. 3, based on experimental observations, we considered the appropriate cross-over rate to be 0.7. In this study, we use a two-point method for the cross-over operation. This operator randomly selects two points from the parent chromosomes and swaps the genes of those points. The pseudo-code of the cross-over operation is shown in Algorithm 9.

Mutation operator
A mutation operator is used to increase genetic diversity in a population and, consequently, to increase the chances of achieving an optimal solution in a shorter time. This operator also causes the algorithm to avoid getting stuck in the optimal local solutions. This operator modifies one or more genes in a chromosome with a small mutation probability. Like the crossover operator, we assume a mutation probability of 0.1 based on the experimental observations shown in Fig. 3. Selecting this value resulted in a more stable response than other values. As shown in Fig. 3, setting this value also causes fewer fluctuations in the cost function. The pseudo-code of the mutation routine is demonstrated in Algorithms 10 and 11. As shown in the pseudo-code of Algorithm 12., the crossover operation is performed with a probability of P crossover and the mutation operation is performed with a probability of P mutation , and ultimately the population size reaches populationSize.

The fitness function
In the GA, the idea is to remove the N-worst possible solutions (chromosomes) and leave the most appropriate ones. To measure the quality of solutions, we use a function called the fitness function. The larger the value of this function, the higher the quality of the solution. The fitness function was previously shown in Eq. (24), in which our goal is to minimize the cost. The pseudo-code for calculating chromosome fitness is shown in Algorithms 13 and 14.

Fig. 3 Impact of cross-over rate and mutation rate on overall costs
As shown in Algorithm 14, the server utilization at an upcoming round t ? is the sum of server utilization and required CPU. We have already obtained server power consumption in Eq. (2) as a function of CPU utilization. Also, according to Eq. (5), the total consumed energy is equal to the total energy consumed by the server and the energy consumption overhead. The PUE metric mentioned in Eq. (1) is used to calculate the overhead. In the final rows of Algorithm 14, the amount of carbon dioxide is checked. Then, the amount of costs imposed on the system is calculated according to the type and the amount of consumed energy and tax costs. If the amount of solar energy is less than the required energy of the VM, the server will use all available solar energy and consumes the rest of its energy needs from an offsite energy source. As stated in Eqs. (3) to (4), the energy cost of each data center Energy-aware and carbon-efficient VM placement optimization in cloud... 9303 includes two parts: the cost of consumed energy and the cost that is incurred due to carbon production. The total cost of the data center is obtained by adding the total amount of consumed energy, operational costs, the amount of carbon dioxide, and the cost of the tax. For each chromosome, this value is added to the fitness value. The above calculations are performed for all genes.

Main algorithm
The GA requires a lot of time to process different generations to achieve the desired result (Rashida et al. 2019).
We present a GA-inspired algorithm called the Mimetic Algorithm (MA) to overcome this shortcoming. We call this method ''Cost and Renewable Energy-Aware Dynamic PUE with Memetic Algorithm'' or in abbreviation CRA-DP-MA. The structure of each possible solution is similar to what was shown previously in Fig. 1. Unlike the GA, each chromosome in the MA improves its fitness by using a local search method, such as Hill Climbing (Russell and Norvig 2002). The MA flowchart is similar to what we showed earlier in Fig. 2, except that in the final stage, each chromosome (a possible solution) finds its neighbor solutions through minor modifications in its current state. It then replaces the best neighbor chromosome with the worst solution in the population (Rashida et al. 2019). Our goal is to minimize the cost function in Eq. (24). Since the MA's selection, crossover, and mutation routines are the same as the GA, we do not show them again. Also, we consider the climbing rate to be 0.8 based on the experiments shown in Fig. 4. The pseudo-code of the CRA-DP-MA is shown in Algorithm 15, and its flowchart is depicted in Fig. 5. Note that the probability of crossover, mutation, and local search is denoted by P crossover , P mutation , and P localSearch , respectively.

Local search
As explained in Sect. 4.2.1, the significant difference between the MA and the GA is local search, for example, a hill-climbing method. As shown in Fig. 6, hill-climbing is an iterative technique that starts with a candidate solution and then moves on neighbors to achieve a better solution (Russell and Norvig 2002). If this movement leads to a better solution, another move will be made on this new solution. This process continues until further improvement in the solution space is not possible. It can be said that the hill-climbing method tries to minimize the objective function of Eq. (24), namely c total . In each iteration, it selects an element of the state space, i.e., vector Z N V Â N H , and examines whether this change improves the value of the objective function. This process continues until no change leads to an improvement in the value of c total . At this point, the vector Z N V Â N H will be the ''local minimum'' and the algorithm's output. The most significant reasons for terminating the algorithm are the existence of a local minimum, flat, and ridges. Since each candidate solution can have multiple neighbors, the heuristic by which neighbors are selected is crucial. We propose Host-based Hill-climbing with the pseudocode shown in Algorithm 16. The only factor that limits the performance of the hill-climbing algorithm is the number of neighbors. If the number of neighbors is large, the cost of this method may be very high (Rashida et al. 2019). To solve this problem, we have defined a method called Hostbased Neighborhood (HbN) in Algorithm 17. In Algorithm 16, for each gene, we call Algorithm 17. Finally, Algorithm 16 generates all possible neighbors with the FFD algorithm without placement. Simply speaking, in finding neighbors, if the host for a gene is already selected, there is no chance of re-election. Eventually, the best neighboring chromosome is replaced by the current chromosome. We now proceed to describe the decisions made at each step of Algorithm 17.
Step 1: As shown in Fig. 7, the algorithm must first find all available hosts in the same data center for each gene. For example, if gene #1 on chromosome #X is located inside datacenter #4, we need to inspect all the hosts inside datacenter #4. We check only the hosts in the current data center because the locality of the solution must be met. Using the hosts inside other data centers leads to diversity in temperature conditions and, consequently, the availability of renewable energy.
Step 2: Once the permissible hosts for each gene have been found, those with the same utilization should be selected as the current host. Based on the experimental results shown in Fig. 8, we set the threshold for utilization similarity to be 30%. Based on this, a host with at least 30% similarity with the current host is selected. Otherwise, the host will be removed from the neighbors' list. This decision balances the load and prevents allocations from being biased towards empty or overloaded hosts.
Step 3: In this step, for each gene, several candidate hosts are obtained, each of which can identify the candidate neighbors. We then randomly select a maximum of 10 different neighbors using the FFD algorithm without replacement. Experimental results show that the optimal value for the number of neighbors in a local search routine is 10. If we consider the number of neighbors more than 10, the algorithm's efficiency decreases. Conversely, if we set the number of neighbors to more than 10, there is no significant improvement in the solution.
Step 4: In this step, the local search algorithm is applied to 10 neighbors obtained from the previous step, and the best solution is selected.

CRA-DPSCH-DDMPEA method
We compare our proposed method with one of the state-ofthe-art methods by Chauhan et al. (2021c). The method is called the diversity-driven multi-parent evolutionary algorithm (DDMPEA) with adaptive non-uniform mutation (ANUM). In the diagrams and algorithms, the implementation of the joint cost and scheduling optimization problem (Eq. (36)) using this method is abbreviated as CRA-DPSCH-DDMPEA. For a fair comparison with state-ofthe-art techniques, we will use the DDMPEA later in Sect. 5 for the sub-problem of Eq. (24). In the CRA-DPSCH-DDMPEA, non-uniform mutations are used to maintain diversity in solutions (Chauhan et al. 2021d). Here, fitness variance is used to detect premature convergence of the population to local optima. Also, more than two parents are used for cross-over operations. After selecting several parents and cross-over, a non-uniform adaptive mutation occurs, resulting in a reduction in the variance of the candidate solutions. Hence, the algorithm is driven according to population diversity and does not get stuck in the local optima. The steps of CRA-DPSCH-DDMPEA are described in detail as follows:

Population initialization
Initially, N P members are randomly generated according to the lower and upper bounds of the search space using a uniform distribution as follows: ; i ¼ 1; 2; :::; N P ; j ¼ 1; 2; ::: where r ij is a random number with uniform distribution for member i and dimension j. Here, N P is the population size and D is the search space dimension. Also, x min j and x max j are the minimum and maximum for the variable.

Multi-parent cross-over
Unlike GA, here, three parents are used for cross-over. For this purpose, parents are first selected from the best p% of the total population. After selecting three random members x r1 , x r2 , and x r3 from this set, an offspring O t ij is created with the following formula: where a ij is a random weight that follows a normal distribution with a mean of 0.7 and a variance of 0.1. Also, t represents the number of generations produced so far. This scheme leverages the Differential Evolution (DE) capabilities (Jafari and Rezvani 2021). The control operation is now performed on the generated offspring to ensure that it is within the allowable range: Now, the fitness of the created offspring is assessed by placing the answer in Eq. (43).

Adaptive mutation strategy
As is common in evolutionary computing, mutations maintain diversity from one generation to the following (Besharati et al. 2021). If the algorithm gets stuck in a local optimum, then the variance of the population fitness will become zero. To cope with this situation and also to create faster convergence, we use an adaptive non-uniform mutation operator as follows: Om t ij ¼ O t ij ð1 þ 0:5gÞ; i ¼ 1; 2; :::; where g denotes the weight coefficient obtained from a normal distribution with a mean 0 and a variance of 1. The mutation occurs when r i ð0; 1Þ\p m , where r i ð0; 1Þ 2 Uð0; 1Þ is a uniform random number in the interval (0, 1) for the i-th offspring. Also, p m is the probability of mutation that results from the following equation: where r 1 is the variance threshold, which is calculated from the range of variables as follows: Also, r 2 denotes the variance of population fitness. Here, the degree of diversity in the iteration t is obtained as follows: where x t best2 is the best fitness value position in the iteration t. When the value of h is high, individuals become more scattered in the search space, so a small amount of mutation is needed. Conversely, when the value of h is small, individuals become more crowded in the search space, and therefore, a large amount of mutation is required. Roughly speaking, when the CRA-DPSCH-DDMPEA gets stuck in local minima, it tries to escape this undesired situation by leveraging the fitness variance. This parameter is calculated as follows: where f i is the fitness of ith individual and f is the mean of the fitness values. Also, f is the returning factor used to control the individuals' fitness variance and is calculated as follows: The fitness variance r 2 indicates population density. The smaller it is, the closer individuals are assembled. Large amounts of fitness variance indicate that individuals are randomly dispersed. Once the mutation is done, the new individual is added to the population. The population is then ranked according to the fitness value of the individuals. Finally, N P best individuals are selected and create the next-generation population. Before proceeding, it should be noted that researchers have used various methods to achieve diversity. Interested readers can refer to Chauhan et al. 2021e for further study.
As previously stated in Eqs. (36), to (40) the optimization problem in this study has both inequality constraints (denoted by g i ðXÞ) and equality constraints (denoted by h i ðXÞ). Let us denote the number of inequality and total constraints by G and M, respectively. Thus, they can be shown as follows: For the sake of convenience, we can combine the above two constraints and rewrite them as follows: To obtain the total constraint violation VðXÞ for an infeasible solution, we calculate the weighted average of all constraints as follows: Decision theory manages constraints in the optimization problem (Naghdehforoushha et al. 2022). Here, there are many methods, one of the most common of which is fuzzy decision-making. In this method, the input variables are fuzzified into a membership function l. Each input variable X is assigned a membership value in the range [0,1]. A value of l ¼ 0 for an input variable indicates that it is deterministic, not fuzzy. Simply speaking, the larger the membership value l for an input variable X, the fuzzier it is. The fuzzy membership function l is plotted in Fig. 9 and is expressed as follows: It is important to note that here we have two objective functions F 1 , and F 2 . The objective function F 1 is the same as our principal objective, as stated in Eq. (36). The objective function F 2 , imposed by fuzzy decision-making, indicates the degree of constraint violation. The value of the membership function indicates the level of satisfaction of the objective function F i . Let us denote the number of all non-dominated solutions by K. We now rank the achievement of a non-dominated solution k concerning all solutions. To do this, we must first divide the achievement of this solution by the sum of the achievements of all nondominated solutions. By performing such normalization, the function l k D can be calculated as a membership function for non-dominated solutions as follows: The best solution O best is an offspring that gets the maximum membership score l k D . It is obtained as follows: The pseudo-code of the CRA-DPSCH-DDMPEA is shown in Algorithm 18. Energy-aware and carbon-efficient VM placement optimization in cloud... 9309

Performance evaluation
In this section, we first describe the settings used in the simulation. Then, we compare the performance of the modified MA and the GA with the baseline and state-ofthe-art methods.

Experimental setting
We used the CloudSim tool (version 3.03) to simulate the proposed method (Calheiros et al. 2011). The CloudSim was developed at the University of Melbourne, Australia. Also, parts of the analysis of results were done with SPSS software. The experiments were done on a 64-bit IntelÒ Core TM i5-8269U Processor with 6 MB Cache, 4 Cores, 4.20 GHz CPU frequency, and 8 GB RAM.
To locate data centers, we have selected four cities in the United States, with different time zones and the same network infrastructure: Hampton (in Virginia state), Houston (in Texas state), Salem (in Oregon state), and San Francisco (in California state). The number of hops for each packet to reach the destination is estimated between 12 to 14. So, the side effects of different distances in the communication network can be ignored (Khosravi et al. 2017). Each data center has 130 physical machines that are configured in 5 different types, according to Table 4.

Solar energy
We used data reported from the PVGIS-NSRDB data center on the European Commission portal (Commission 2014) concerning the solar energy of the four cities mentioned above. These data result from a joint venture between PVGIS-NSRDB and the National Solar Radiation Database (NREL). In this study, each PhotoVoltaic (PV) solar panel, obtained from solar cell assembly, has modules installed in a fixed position and does not change during the day/year. Using this data, we have the amount of solar energy per day in terms of W/m 2 . Also, we have considered the area of each flat solar absorber plate equal to 2684 m 2 . This number is based on the configuration used in Khosravi et al. (2017). Therefore, by having the energy produced for each solar panel, we can calculate the total power of a solar power plant consisting of 2648 solar panels. The solar energy specifications are shown in Fig. 10.

Outdoor temperature
We used the European Commission portal data to obtain the ambient temperature per hour (Amazon https://aws. amazon.com/ec2/instance-types/). For this purpose, ambient temperature data regarding the above four cities have been collected from May 20, 2014, to May 30, 2014. Figure 11 shows these data.

PUE and server power consumption
We use Eq. (1), described earlier, to calculate the PUE. Also, it was previously stated in Eq. (2) that power has a linear relationship with CPU utilization (Gao et al. 2013). We consider the values of P idle h l and P peak h l to be 162 and 215 watts, respectively. Therefore, Eq. (2) can be simplified as follows: P h l ðu; tÞ ¼ 162 þ 53u h l ðtÞ ð 57Þ

Carbon dioxide production rate and carbon tax
The rate of carbon dioxide emission in each data center is obtained from the website of the US Department of Energy (Administration https://www.eia.gov/electricity/state/ archive/2014/) in Tons/MWh. We have also calculated the tax on carbon dioxide emissions, which is applied in some countries (Khosravi et al. 2017). The carbon dioxide tax is calculated in dollars per tonne (Dollars/Tonne). These values are shown in Table 5.

Energy cost
The cost of consumed electricity is obtained from the US Energy Information Administration (EIA) (Workload https://www.cs.huji.ac.il/labs/parallel/workload/) concerning the four cities. The information is shown in Table 5. We estimate the cost of solar energy to be zero. In this regard, note that installation and maintenance costs are incurred only once and are not a function of the consumed solar energy. Thus, even if more solar energy is consumed, it will not impose more costs on a cloud data center. Energy-aware and carbon-efficient VM placement optimization in cloud... 9311

Workload data
The workloads produced by users arrive at VMs. We select VMs' specifications based on what Amazon EC2 offers (Amazon https://aws.amazon.com/ec2/instance-types/). The characteristics of the applications are shown in Table 6. The workload is generated using the Lublin-Feitelson model (workload, p. xxxx), including two types: the bag of the task (in which there is no interaction with the user) and the web request. These two types of workloads have different distributions. The only difference between the two groups is holding time. Generally, web requests take longer than the bag of task requests. By modifying a few parameters from workload, p. xxxx, we created a proportional distribution of requests to generate two different user requests. According to the approach (Khosravi et al. 2017) to make the bag of task requests, we changed the first parameter of the gamma distribution to 20.4. In this way, we generate requests with a longer holding time. We also changed the holding time distribution to Hypergamma (with an average of 73 and a variance of 165) to generate web requests.
In this research, 30 different workloads are produced, each of which is tested separately. In this way, we can make more confidence in the accuracy of the test results.

Execution time of algorithms
For the sake of fairness, we compare our proposed method with one of the state-of-the-art methods called DDMPEA. The details of this method have already been described in Sect. 4.3. We first use this method to solve the cost optimization sub-problem of Eq. (24) and call it CRA-DP-DDMPEA. Then, we use it to solve the joint cost and scheduling optimization of Eq. (36) and name it CRA-DPSCH-DDMPEA. So far, none of the previous studies have addressed the issue of joint optimization. Similarly, we abbreviate the solution of the cost optimization subproblem by the MA with the symbol CRA-DP-MA. Also, solving the problem of joint cost and scheduling optimization is abbreviated with CRA-DPSCH-MA.
We now proceed to analyze the execution time of all algorithms. Roughly speaking, we want to examine the extent to which different metaheuristic methods could make a significant difference with the baseline algorithm CRA-DP.
One of the prerequisites for most statistical analyzes concerning random variables is the normality of the data. As a result, checking the normality of the data is a crucial point, especially when the distribution is not transparent and the sample size is small. One of the criteria for data normality is the Quantile -Quantile (Q-Q) plot. There are many statistical tests such as Shapiro-Wilk, Dagostino, Anderson-Darling, and Kolmogorov-Smirnov that can check the normality of the data. In this study, we use the Shapiro-Wilk and Kolmogorov-Smirnov methods to inspect the hypothesis of normality of the data (H 0 ). Because we assume a 95% confidence interval, the significance level is 0.05. Assumption H 0 is rejected if the value of the sig (significance level) is less than 0.05. Table 7 shows the result of the normality test concerning the execution time of algorithms. Since the Sig value in both Shapiro-Wilk and Kolmogorov-Smirnov tests for all VMP algorithms is less than 0.05, the hypothesis H 0 is rejected. In other words, the data obtained from the execution time of the algorithms are not normal. There are two ways to continue: (1) use non-parametric tests to compare the results, (2) try to normalize the data to use parametric methods. Note that parametric methods are more accurate than non-parametric ones. So, we choose the second way. Due to space limitations, we do not provide details on data normalization. Table 8 shows the execution time after normalization.
The Q-Q plot for the execution time of algorithms is shown in Figs. 12, 13, and 14. As mentioned earlier, the Q-Q plot determines how close the runtime value is to the normal distribution. Now, according to the results obtained from Table 8 and the value of the Sig parameter for each algorithm, hypothesis H 0 (data normality) is accepted. Therefore, to continue the analyses, we can use any parametric methods, for example, the Analysis of Variance (ANOVA) (Babazadeh Nanehkaran and Rezvani 2021). The ANOVA test hypotheses are as follows: H 0 : All algorithms' average convergence time is equal (no significant difference). H 1 : The average convergence time is not equal for all algorithms (a significant difference). Table 9 shows the results of the ANOVA test. Since the value of the Sig is less than 0.05, it can be concluded that hypothesis H 0 is rejected. In other words, there is a significant difference among algorithms (groups) in terms of convergence time. Apart from the normality, another prerequisite for the ANOVA test is that the data variance is homogeneous. The Levin test checks this. Due to space limitations, we do not provide details of this test. Because the data variance is not homogeneous, the Games-Howell post hoc test can be used for the pairwise comparison of algorithms. The results of this test are shown in Table 10.
The value of Sig in all rows of the table is less than 0.05, which indicates a significant pairwise difference among algorithms. The analysis of this table is based on the upper and lower limit values with the following rules: (1) If both the upper limit and the lower limit are positive, the difference between the mean of the two algorithms is more significant than zero, and the mean of the first algorithm is greater than that of the second algorithm.
(2) If both the upper limit and the lower limit are negative, the difference between the mean of the two algorithms is less than zero, and the mean of the first algorithm is less than that of the second algorithm.   Energy-aware and carbon-efficient VM placement optimization in cloud... 9313 (3) If the upper limit is positive and the lower limit is negative, the difference between the means of the two algorithms is not significant; hence, the equality of the mean of the two algorithms is not rejected.
The Mean Difference column in Table 10 shows that the means of the CRA-DP-GA and CRA-DP-MA differ by 0.0256 and 0.0376, respectively, compared to the CRA-DP algorithm. Given the random nature of metaheuristic algorithms, this amount of discrepancy seems plausible.
Also, the average time to find the optimal solution is shown in Fig. 15. As can be seen in the figure, there is a significant difference between the average time of the CRA-DP and other algorithms. The CRA-DP has the highest average execution time, followed by the CRA-DP-GA and CRA-DP-DDMMPEA, respectively. We expect that with the increase in the number of data centers and physical machines, the time to find the optimal solution for the CRA-DP algorithm will increase significantly. The reason is that this algorithm searches most of the data centers and physical machines, which in turn makes VMP longer, while the search space in the other two metaheuristic algorithms is much more limited.
As shown in Fig. 15, combined optimization methods (CRA-DPSCH-DDMMPEA and CRA-DPSCH-MA) have a longer convergence time than methods in which only cost optimization is performed. Note that the multi-objective optimization problem presented in Eq. (36) has more constraints than the single-objective case of Eq. (24). Also, the values obtained for each function in multi-objective problems are usually less than those obtained in singleobjective cases. This is because of the tradeoff that can be made between potentially conflicting goals. This causes exploration to be slower in the search space, leading to long execution times. Although this time overhead is small, it has impressive achievements explained later in Figs. 17 and 22. Also, Fig. 15 shows that the convergence time in multi-objective methods never increases by more than 4% compared to single-objective cases.

Energy consumption
As shown in Fig. 16, the FFD algorithm consumes more fossil fuel for VM placement than other algorithms. This is because the FFD method places the VMs only based on the remaining capacity of physical servers, and no other parameter is involved in this decision. However, the CRA-DP-MA algorithm has the lowest brown energy consumption because it involves significant parameters such as PUE and data center availability. One of the most critical parameters that affect the energy consumption of fossil fuels is the PUE. This parameter affects the VMP by considering both the outside temperature and the workload. The availability of renewable energy sources and prioritizing data center power supply also significantly impact metaheuristic VMP algorithms. As shown in Fig. 16, there is no significant difference between metaheuristic algorithms. In other words, the brown energy consumption of the proposed meta-heuristic algorithms is not much higher   Table 11, also confirms this claim. Since the value of the Sig = 0.586 is more significant than 0.05, the hypothesis H 0 is accepted. The brown energy consumption in the above algorithms is not significantly different. Also, the figure shows that the energy consumption of methods that use joint cost and scheduling optimization is slightly higher than that of single-objective methods. The general reason behind this behavior has already been stated in Fig. 15. The brown energy consumption in multi-objective scenarios never increases by more than 3% compared to single-objective cases. As shown in Fig. 17, the FFD algorithm has the lowest energy consumption regarding renewable sources. Given that the order and availability of physical resources are the only parameters for VMP in this algorithm, such a result is reasonably expected. The interesting point in Fig. 17 is that there is very little difference between solar energy consumption in the CRA-DP-GA and the CRA-DP-MA algorithms. This behavior can be found in the neighborhood selection method in these algorithms. The proposed neighborhood algorithm has a severe limitation on  choosing a neighbor for a solution. We consider this limitation to generate neighboring solutions with a higher chance. The fitness function in the CRA-DP-MA assigns the highest importance for solar energy. So, if we do not apply the data center constraints, we will obtain solutions with entirely different fitness. Figure 17 shows that the green energy consumption in multi-objective methods never increases by more than 4% compared to single-objective cases. Also, the ANOVA test result is shown in

Carbon footprint emission
As shown in Fig. 18, the FFD algorithm has the highest carbon dioxide production. According to the explanations provided in Sect. 5.3 and considering that fossil fuel consumption is directly related to carbon dioxide emissions, such good performance is entirely predictable for the FFD algorithm. Similar to Figs. 16 and 18 does not show a significant difference among metaheuristic methods. They   Figure 18 shows that the carbon footprint in multi-objective methods never increases by more than 3% compared to single-objective cases. Also, the ANOVA test is shown in Table 13. The value of Sig. = 0.543 indicates the acceptance of hypothesis H 0 . This indicates that the amount of carbon dioxide produced by metaheuristic algorithms is not significantly different.  Energy-aware and carbon-efficient VM placement optimization in cloud... 9317

Cost
According to Eqs.
(3) to (4), the energy cost is the sum of the green energy (renewable) cost and the brown energy (fossil fuels) cost. As mentioned earlier, since the cost of renewable energy is low in the long-term run, we set it at zero. As a result, the energy cost is only a function of consumed fossil fuel. In Fig. 19, the CRA-DP-MA, with a negligible difference, consumes less energy than the CRA-DP-GA, CRA-DP-DDMPEA, and the CRA-DP algorithms. The reason for this behavior was previously described in Fig. 16. Figures 20 and 21 show the carbon cost and the total cost, respectively. According to Eq. (18), the carbon cost is a function of the rate of produced carbon dioxide and the associated tax. As shown in the figure, the amount of carbon produced by metaheuristic methods is never significantly worse than the baseline method, CRA-DP. Figure 20 shows that the cost of carbon in multi-objective plans never increases by more than 4% compared to single-   Fig. 21 indicates that the total cost in multi-objective methods never increases by more than 5% compared to single-objective cases.

SLA violations
By the SLA violation, we mean the number of VM requests rejected due to the lack of physical resources in the data center. We conducted a stress test on the system and changed the number of VM requests from 1000 to 1700. Figure 22 shows the percentage of SLA violations for different algorithms. As seen in the figure, the more traffic intensity the system exerts, the lower the SLA violations. This, in turn, degrades the QoE of users. However, the amount of degradation in methods that use joint cost and scheduling optimization is far less than single-objective ones. The results show that the proposed method can reduce SLA violations by almost twice in heavy traffic. This confirms our primary claim in this study about the need to pay attention to the scheduling of tasks.

Conclusion and future trends
In this paper, we targeted one of the most practical issues of cloud computing, namely carbon-efficient VM placement. To meet real-world requirements, we incorporated the delay-sensitive nature of IoT requests into modeling.
After formulating each sub-problems, we provided a joint energy cost and scheduling optimization for the problem.
Since the problem is NP-hard, we solved it on a large scale using the modified Memetic Algorithm (MA). Then, we evaluated the performance of the proposed method against the baseline method, such as the Genetic Algorithm (GA) and state-of-the-art techniques. Solar energy was also considered one of the energy supply sources to reduce energy and carbon consumption costs. Our solution minimizes the energy consumption of cloud service providers and considers the users' QoE and SLA requirements. The simulation results revealed the superiority of the proposed approaches in terms of significant criteria such as VM allocation time, energy cost, energy consumption, Power Usage Effectiveness (PUE), and SLA violation. The results showed that proposed metaheuristic algorithms could significantly decrease the convergence time compared to GA-based and the heuristic method and, at the same time, do not cause a significant increase in energy consumption and cost.
For the sake of fairness, a comparison was made between multi-objective and single-objective optimization modes. As is common in optimization theory, we expect the value of the functions obtained in multi-objective optimization to be less than those of a single-objective case. However, the amount of degradation in various criteria was never more than 5%, which is quite promising. Also, the results showed that in heavy traffic, the proposed method could reduce SLA violations by almost twice. When most tasks do not have a firm deadline, the competitive advantage of the proposed method over the base method may not be as significant. However, it will be as efficient as the base method in the worst case.
There are several possible lines of research for the future. To find a neighborhood in the MA, both the outside temperature and renewable energy significantly impact data center utilization. This study assumed that hosts in the same geographical locations are more likely to have similar temperatures and solar energy. As one of the future works, the effect of these two parameters can be examined regardless of the geographical location. We also ignored the impact of communication network components. One future research could consider the impact of these components on energy consumption and the VMP cost. Machine learning methods, especially reinforcement learning, can predict future carbon-efficient placements.
Funding The authors did not receive support from any organization for the submitted work.
Data availability The datasets generated during and analyzed during the current study are available in the [MENDELEY] repository, [http://data.mendeley.com/datasets/2g7dy8bnfj/1].

Declarations
Conflict of interest On behalf of all authors, the corresponding author states no conflict of interest.