A novel multi-level hybrid load balancing and tasks scheduling algorithm for cloud computing environment

Today cloud computing is at the heart of all information technologies. This prodigious technological paradigm relies on a very simple concept deﬁned as the ability to deliver hardware and software resources as service directly over internet. A set of mechanisms cooperate to maintain the cloud reliability and to allow continuous delivery of these services while guaranteeing the same quality of service (QoS) and respecting the service-level agreement (SLA) for each client. Load balancing is one of those mechanisms and it ensures a crucial service, it can be deﬁned as the ability of the system to ensure fairness in the distribution of workload over all servers. The most recent load balancing techniques are hybrid methods involving in major cases the combination between static and dynamic approaches, in other cases it can go further by integrating other mechanisms in order to improve the overall eﬃciency of the systems. The performance can be evaluated by parameters which generally refers to the degree of compliance with SLA and QoS. In order to enhance load balancing and tasks scheduling in cloud environment we propose in this paper a diﬀerent hybrid approach which allows the decomposition of the problem and to operate on two levels by going through two stages : (i) ﬁrst clusters are built for each datacenter grouping together sub-sets of servers that have close utilization rates, (ii) then tasks scheduling and load balancing operate at the datacenter level to deal with distribution over clusters and at the cluster level to ensure fairness between servers of the same cluster. Our method allows hot-deployment in already operating cloud environments and an excellent scalability. It also oﬀers decoupling of missions and strong interoperability between the diﬀerent mechanisms. To prove its validity, we implemented it on the standard cloudSim plus simulator, before carrying out a comparative study which shows better results than existing approaches in terms of makespan, reduces reaction time, number of migrations required and SLA violations.


Introduction
Cloud computing was a game-changer technology since its emergence nearly a decade ago, the key feature behind this revolutionary paradigm is its ability to provide resources like hardware or software as services over internet to individuals and companies.It offers a certain number of advantages like elasticity, pay as you go, multi-tenancy and so on [1].It is worth pointing that it must guarantee sustainably users trust by maintaining availability, reliability and scalability.If we take a closer look at how this technology works, we realize that behind the simplicity of use and service delivery models there is a set of extremely complex mechanisms inter-operating to ensure an optimal functioning [2].
Our work focus on load balancing (LB) module which is one among these mechanisms and is in charge of maintaining a fair workload distribution between servers and virtual machines (VM).The role of this component is crucial to the operation of cloud services and to meet the service level agreement (SLA).Guaranteeing an optimal use of hardware resource and a fair distribution of the workload helps to increase the global performance of the system by reducing the makespan of useful jobs [3].A lot of propositions have been made to face expectations of cloud service providers in term of load balancing.When studying the literature we understand that these solutions can be categorized according to two criteria impacting the manner in which workload distribution is made: first (i) information on environment, tasks and resources.Then (ii) step during which the balancing occurs.Indeed, while a first category to which we refer as being static acts only at reception of new tasks and decides on how assigning them according to a set of non-evolving information like tasks length and servers physical capabilities, a second category regroups dynamic approaches which are continuously operational over time incorporating information on current workload on each server, individual makespan of virtual machines and primitives like tasks or VMs migration to maintain an optimal utilization rate of all servers.A last category stands for regrouping all approaches that combine static and/ or dynamic approaches together or with other complementary mechanisms to enhance performance of load balancing and to mitigate common shortcomings [4] [5].This is the way we build in our proposal, by integrating complementary algorithms for hybridizing load balancing and tasks scheduling.
There are several ways to approach load balancing problem and tasks scheduling.We commonly find solutions formulating it as a bin packaging problem, clustering problem or even more like a path finding problem.Independently of the formal modeling we can describe the elements that constitute the problem we are addressing as follows: given a set of tasks and a set of resources grouped into virtual machines and physical hosts, how to decide for each task to which virtual machine should it be assigned and which physical server should host which virtual machine.It is important to keep in mind that each server is resource limited in term of CP U , RAM , Storage and Bandwidth [6].Once this has been achieved, we need to monitor the evolution of the use of these same resources and the makespan of servers and VMs to ensure fairness in the distribution of workloads and enable an increase in datacenter performance by reducing metrics such as waiting, execution, control and migration times.
The aim of this paper is to present a novel hybrid algorithm to ensure tasks scheduling and load balancing in cloud environments.First of all, the choice of this combination comes from the fact that a good workload balance starts with the optimization of the tasks assignment procedure.Our algorithm goes through three main stages: first (i) servers clustering: a k-means based procedure is first triggered to partition servers with similar occupation characteristics in a set of clusters with bounded size.Then (ii) tasks assignment: is realized in two phases, first using a round-robin algorithm to choose the cluster to which a set of tasks will be assigned then using a genetic based algorithm to select inside this cluster the servers on which they will be scheduled.And finally (iii) load balancing: also requiring two steps, the algorithm decides which cluster to unload and which servers exactly.Once done the cloudlets are retrieved and sent (migration) to the global tasks scheduler module to plan them again.
To realize this our architectural model embed following components, first we introduce a new module called cluster manager which will ensure primitives relative to creating and updating clusters.Then we use a classical tasks scheduler at datacenter level qualified as global to which we add a local monitor in each cluster.And finally we propose the same organization for load balancer with a central module and local probe in each cluster.
The method we propose is based on realistic assumptions, using particular configurations of already existing architectural components.The main contributions of our work are: • A hot-deployment enabled algorithm which can be deployed in already operational cloud environments.
• A highly scalable algorithm, thanks to the strategy grouping the servers in clusters and operating at two levels, it keeps the same performance even if the number of servers and cloudlets in the datacenter is considerably increased.
• A very strong interoperability and complementary that avoids interference in mission of each and redundancy of actions between clusters management, load balancing and tasks scheduling mechanisms.This decreases the non-useful delays related to cloud management and allows to considerably reduce the number of SLA violations.
To prove the validity of our approach, we implemented it using the standard cloud simulator CloudSim plus.The results are very promising.Indeed, our approach outperforms most recent and relevant works.For example, we achieved an SLA violation rate close to 8%, compared to an average of 18% for [7].We also achieved a reduction in the makespan per server, and reduced the proportion of migrations required to 12%, compared with a ratio varying from 19% to 35% according to the proposal.
The rest of this paper is organized as follows: core concepts on cloud computing and literature review of existing solutions to perform load balancing are given in section 2. The section 3 is dedicated to our proposition, we start it by formulating the problem statement, then introducing our architectural model with related assumptions, we finally depict our method and give corresponding algorithms.Results of realized simulation are given and discussed in section 4. The section 5 concludes our work by giving a last overview and by opening perspectives for future works.

Related works
In this section we will first review core concepts and preliminary elements on cloud computing, its core concepts and service providing models.Then we will summarize following respective categories that we consider to be the most pertinent and recent works on load balancing.
Cloud computing was a game-changer technology since its emergence nearly a decade ago, the key feature behind this revolutionary paradigm is its ability to provide resources as services directly over internet to individuals and companies.The resource can be hardware or software or even take another form, it offers a certain number of advantages like elasticity, pay as you go, multi-tenancy and so on [1] [8].
The cloud providers deliver services in several standardized manners among which we retain [9]: • IaaS (Infrastructure-as-a-Service): hardware is delivered to client which is in charge of installing all the stack components over material: operating systems, middlewares, runtime environments and applications.The cloud provided is only responsible on management of hardware part.
• PaaS (Platform-as-a-Service): the responsibility of the supplier is shifted a little higher in the stack, he will be in charge of the installation and the management of operating systems, middlewares and all required execution environments.
• SaaS (Software-as-a-Service): in this pattern client interacts with cloud services via a GUI (graphical user interface), indeed all required services are delivered as a ready to use application and provider is responsible on the entire building stack.
After reviewing the different possible architectures and organizations for the operation of the cloud, we came to the conclusion that, regardless of the model, the architectural elements can be placed in one of the three levels shown in Figure 1: • Requests handler: covering a set of components in charge of collecting requests from client and retrieving related information such as tasks length, priority, deadline, required data and so on.
• Data-center controller: plays an orchestration role, on the one hand it receives information from the requests handler on tasks to schedule, on the other hand the resource manager provisions it with available resources.The controller then applies a scheduling strategy to decide which tasks to assign to which VM, and which host will receive which VM.
• Resources manager: in charge of monitoring states and utilization rates of hosts and virtual machines, it provides crucial information on which the controller relies to schedule tasks and to perform load balancing.Load balancing (LB) is a critical module to ensure good functioning in cloud environment, it aims to describe techniques in charge of distributing workload over datacenter servers.In other words, it is the method used to maintain resource utilization of servers in equilibrium and avoid to over-load or under-load some of them.The load balancing can be achieved at one of the two levels: (i) virtual machines (VM) level or (ii) hosts level.In first case the load balancing algorithm deal with workload on virtual machines, so it manages tasks distribution and migration over virtual machines to maintain good workload partition conditions.In the second case it manages virtual machines distribution over physical servers.
A lot of approaches were proposed in the literature to perform load balancing in cloud environment, we will give insights on the most important ones in subsections 2.1 to 2.3.Whatever it is critical to first understand the used categorization.Indeed a classification of algorithms that is often encountered involves three main classes which are depicted as follows [4]: • Static approaches: this category of approaches relies on prior information on jobs and servers/ virtual machines capabilities to decide on tasks assigning policy.
• Dynamic approaches: in contrast to static methods, dynamic algorithms integrate realtime information such as workload on resources and utilization rate to decide on tasks affectation strategy.
• Hybrid approaches: hybrid approaches are obtained by mixing static and dynamic approaches in order to overcome lacks of each one.Even more lot of researchers go further and make hybridisation by combining load balancing techniques with fault tolerance or tasks scheduling mechanisms.
There are other modules involved in the smooth running of jobs on the cloud that are intrinsically linked to load balancing and cannot operate without close cooperation between them.
The first mechanism is tasks scheduling which, considering a set of constraints, decides for a set of tasks on which resource they should ideally be executed .This can be achieved in a static or dynamic way regarding if the scheduler relies on prior information on tasks and resources for taking decisions or if it continuously monitor nodes behavior to decide which one is more suitable for a specific kind of job.It can be preemptive so that tasks can be interrupted during running time or not, it can operate in online or offline manner according to if tasks are directly planned on resources or grouped in batches before [10].
The second mechanism is fault tolerance which measures the capacity of a system to recover after a failure.A failure can be explained as a succession of undesirable actions which lead the system to an unsuitable or a not specifications-conform state.Two main families of fault tolerance approaches can be introduced: (i) proactive: in this case the technical effort is focused on ways to anticipate the failure and reduce its impact on the overall system.(ii) Reactive: In this case means are concentrated on the methods that make it possible to recover quickly after the fault occurrence and try to bring back the system to the last known coherent state.A lot of known approaches like hardware-redundancy, jobs replication, checkpoint and restart and so on are utilized to ensure reliability, availability and integrity in cloud environment [11].

Static load balancing methods
Static load balancing is a class of algorithms that allocate tasks to different resources without considering their current states.Indeed, depending on the applied policy the algorithm will distribute new jobs equally or randomly over all treatment units regardless to their actual workload [12].
Among the static load balancing methods we can cite min-min algorithms whose principle is to evaluate beforehand execution time of each task and to find the one with the shortest duration.Once done the algorithm locates the resource with minimum completion time to perform this task and assigns it.Min-min approaches repeat these steps until all jobs are done.The major shortcoming these methods are suffering from is when the number of short tasks exceed the number of long ones the allocation of resources is not optimal.Another category is the max-min which overcomes this shortage by scheduling first larger tasks but it penalizes short ones and increases their waiting time [4].
Lot of improvements were proposed to make these approaches perform better load balancing and reduce makespan.For example in [13] Kokilavani et al. proposed to start their algorithm with a min-min phase, which provides a fast start by sending the shortest tasks to be executed by the most efficient resources.In a second step, the algorithm checks each resource makespan and retrieve tasks from ones with heavy load and reassigns them to resources with short makespan.The principle of the LBMM approach is simple but they showed that it allows a reduction of the global execution time and a better jobs distribution.
Another group of researchers goes further in improving min-min approach by proposing a new algorithm considering three key constraints in cloud environment which are quality of service, tasks priority and cost of service.Their solution also passes by a first min-min step where short tasks have initial higher priority, then they rearrange load balance by weighting these priorities by the three constraints (expressed as numerical values) to produce dynamic priorities allowing to order all the jobs over the resources [14].
We can find in the literature approaches oriented towards particular use cases, or ones based on meta-heuristics inspired from nature.An example that embodies both is the proposition made by Zhan et al. [15] where they use a discrete particle swarm optimisation (PSO) for building a static load balancing algorithm which ensures tasks distribution in cloud environment.They proposed an adaptation of functions to update personal and global bests on one hand and to update velocity on the other hand, this renders PSO performing better for this particular discrete problem avoiding to be trapped in local optimums.
As powerful as they may be these techniques have the disadvantage of not being able to adapt to increasingly dynamic cloud environments.Static approaches schedule tasks at the reception and rely on a logic independent of real-time workload distribution, this does not allow them to review the load distribution in a fluid way when the exploited resources utilization evolves.In other words static load balancing performs well in cloud environments with reduced workload variability which is not always adequate given that there are peak periods.

Dynamic load balancing approaches
Dynamic load balancing techniques is the set of algorithms that consider real-time information about the utilization rate and remaining makespan on each server before deciding how to assign new jobs and in which manner to migrate already scheduled ones.These approaches can be separated into two main categories according to their calculation mode which are on-line or off-line mode.In the first family of algorithms tasks are assigned as they arrive in the system unlike off-line mode which works by batch since tasks are grouped then taken at predefined times [16].
A category of methods stands out as an ideal candidate to face the dynamic load balancing challenge: nature-inspired meta-heuristics.By going through several stages of adaptation many researchers have succeeded in making it a dominant class of approaches for dynamic LB.It is worth pointing that before proceeding to any improvement of this kind of algorithm it is necessary to define an adequate mapping between the parameters of the algorithm and the cloud environment on the one hand and on the other hand to find a way to define novel search functions.[17] For example the authors of [18] proposed an enhancement of bee colony optimization algorithm to realize a dynamic load balancing.This approach incorporates as constraints avoiding at once virtual machines overloading and under-loading, reducing makespan and the number of migration operations.The key idea behind the proposed improvement is to use standard deviation of processing time on each VM as input to the model of load balancing where a threshold is given separating VMs into two groups: overloaded VMs modeled as honeybees and under-loaded VMs modeled as food.Considering that the approach is dynamic the deviation values are updated each time a new task is received.
On their side Seyedeh et al. [7] have combined two meta-heuristic approaches to better fit cloud SLA requirements.They first use a firefly based algorithm to generate an initial population of possible task/ resource assignment then optimize it using an imperialist competitive algorithm (ICA).In a first time two instances of firefly algorithm are executed separately to find two assignments: one achieving best makespan and the other for best load balancing.The outputs of these heuristics are aggregated as multi objective function for the ICA algorithm which will incorporate the two constraints and attempt to produce a workload balance ensuring a makespan as small as possible.
Another approach using a meta-heuristic bio-inspired is proposed in [19] where authors approach the load balancing as a clustering problem: regrouping sets of virtual machines on physical servers with specified CP U and memory capacities.Clusters are first build by randomly placing VMs according to a feasible distribution then they are updated based on their respective workload to maximize remaining resources on each server and eliminating workload on weakly utilized ones.Authors applied bat algorithm to optimize global and local search for finding new cluster centers and speed up convergence.This means reviewing the set of VMs and physical servers that make up each of the clusters in order to dynamically maintain equity.
Many other works have been proposed and tried to create variants that improve a subset of the performance criteria.This objective can be achieved by eliminating or adding constraints depending on the service-level agreement and the type of service provided, by modifying hyper-parameters of known models or by proposing new fitness functions for meta-heuristics.Thus Dalia et al. [20] add constraints not commonly considered by other researchers to their algorithm system.They add complexity by dealing with simultaneous arrival of requests, the tasks they consider have priorities and a deadline is assumed for each job according to the relative service-level agreement.The distinction point of this approach is that if the workload on a server does not meet requirement of correct execution for a given task then it is migrated to another one.Therefore in order to realize an efficient workload distribution the authors integrated load balancing and task scheduling within the same algorithm.Another example is given in [21] where the authors provide a variant of genetic algorithm which deals with performance degradation of VMs during migration time and impact on tasks execution.
Authors of [22] dived deeper in key constraints integration by adding elasticity to their model.Their architecture supports hardware proactive horizontal scale-up.Indeed after assigning tasks a component of the resource broker monitors activity on servers and estimate if there are scheduled tasks that will overhead their deadline and decide to create new virtual machines to balance the workload.
It is worth pointing even if these meta-heuristics build the backbone around which most of the load balancing algorithm in the cloud are built, other methods exist and approach this problem by a different modeling or statement and take advantage of the power of several different mathematical techniques.
We often meet in the literature fuzzy-based approaches to achieve tasks scheduling and load balancing.We can observe that authors in [23] proposed a fuzzy based algorithm for multidimensional resource planing focusing on file sharing service in cloud environment.The approach operates in three stages: first, (i) collecting requests from users.Then (ii) a trapezoidal fuzzification and fuzzy square inference are utilized to achieve multidimensional resource scheduling.Finally (iii) queuing network is designed for the assigned tasks and resources.
We can also find methods relying on machine and deep learning such as the one proposed by Zhao et al. [24] in which they combine a Q-learning approach with a neural network.They first represent the scheduling plan by a directed acyclic graph in which nodes are described by a quaternion composed of a specific task, the cost of execution, cost of communication and edges representing relation to successor tasks.The dynamic scheduler before planning a workflow by distributing its jobs on virtual machines calls the algorithm to evaluate the execution scenario which is modelled in form of a graph and applies a reward function which helps the decision making by emitting an action to be applied by the scheduler.Another hybridization with a meta-heuristic method is proposed by Jena et al. [25] where a particle swarm algorithm is combined with a Q-learning approach used to adjust velocity of particle and global bests to achieve quicker convergence toward optimal load balancing solution.
Dynamic approaches are responsive to constraints evolution in real-time, they keep maintaining an equitable distribution of workload over time and balancing the load flow as it occurs by taking into consideration the resource usage on each host.The only disadvantage it suffers from compared to static methods is that it generates some latency at the start of the LB and is bounded on reaction times even with light and few tasks.

Hybrid load balancing algorithms
Hybrid load balancing is a combination of static and dynamic approaches.It leverages the strengths of each category to cover the gaps of the other, static methods allow an initial quick distribution of tasks while dynamic algorithms maintain optimal workload balancing over time.Hybridization is not limited to this combination and can be achieved by integration with other mechanisms such as fault tolerance or dynamic task scheduling.
Bio-inspired meta-heuristics also form a cornerstone in the edifice of hybrid approaches.Indeed the literature of hybrid LB is rich in examples such as the proposition made by Marwa et al. [26] where the authors combined swarm intelligence of bee and ant colonies to build an osmotic hybrid optimization load balancing algorithm.After an initial random distribution of jobs, an artificial bee colony is used for quickly find over-loaded and under-loaded servers, then an ant colony is used to find optimal migration scheme for VM among osmotic servers.
In another proposal [27] a group of researchers have opted for the combination of ant colonies with fuzzy models.While they also use ACO to find the suitable migration pattern they integrate a fuzzy module to evaluate the quality of the returned solution.To be more concise the fuzzy part is used to update the pheromone traces and thus accelerate the convergence process towards an optimal solution.
In [28] authors proposed to combine genetic algorithm and gravitational search method to enhance searching procedure and reduce computing cost.The improvement comes from hybrid method to calculate particles position at each step which is made by using crossover technique combined with gravitational constant function.In [29] authors combined a queuing model for managing virtual machines and a crow search based approach to improve tasks placement and reduce at once time wastage and energy consumption.
Other researchers go further by hybridizing load balancing with proactive fault tolerance mechanism.In [30] authors proposed to combine several reactive and proactive fault tolerance techniques with an accelerated decision making procedure to ensure quick recovery.The heart of the proposed method is a dynamic scheduling approach which inserts replication as fundamental constraint.Haoran et al. [31] proposed a similar combined approach by embedding fault tolerance and task scheduling into their model which improves load balancing.The authors focus on hybrid real-time tasks which they divided to data intensive, process intensive tasks or balanced tasks, by doing the same with virtual machines it increases the chances of improving system performance and facilitates the work of the scheduler.They did not stop here, they went further by mixing checkpoint and primary backup techniques to generate the recovery policy and the corresponding task description.Finally they added resulting tasks list (including redundancy) as constraints to the scheduler.
Huaiying et al. [32] have also proposed an approach to ensure quality of service in edge-cloud by combining fault tolerance with task planning in order to maintain equitable load balancing.The authors enhance classical primary/ backup fault tolerance approach by incorporating QoS constraints such as time based ones, the primary and copy tasks are then scheduled with a dedicated method integrating an adjustment procedure that guarantees the placement of copies in a such manner to reduce both recovery time in case of failure and overlapping in case of well functioning.A last example of this kind of hybridization is given in [33], apart the planning aspect, the authors proposed a solution for monitoring activities on the virtual machines which form logical clusters over physical hosts.They used metrics built based on previous performance of each server that allow the system to early anticipate deviations and behaviors that do not conform to specifications which allows it to quickly resume from the last consistent checkpoint.
We have not provided an exhaustive list of the hybrid proposals that exist and it must be taken into account that many other works have been done.A last example is given in [34] where authors rely on machine learning technique to enhance resource utilization, they deal with horizontal and vertical load balancing.An agent is trained using a custom reinforcement learning approach and is rewarding according to the desirability of selected action which can be task assigning on a specific virtual machine, or migration on another host and so on.Other works deviate from the mainframe application and focus on specific context like [35] which ensures a dynamic resource provisioning for a specific usage in meteorological intensive data-flows treatment.
As we have shown, hybrid approaches cover the shortcomings of static or dynamic isolated techniques, they have the advantage of being more reactive, managing more constraints and being faster in the distribution of tasks, they nevertheless suffer from the drawback of being complex to implement.

Synthesis
Load balancing plays a crucial role in keeping cloud environments running smoothly.Its performance has a direct impact on quality of service (QoS) and the degree of compliance with service-level agreements (SLA) with customers.Numerous approaches have been proposed since the advent of the cloud to meet this need.Firstly, static approaches were proposed, offering a rapid search for a balancing solution based on invariable information.Unfortunately, as they were unable to adapt to changes in the environment and load variability, they were gradually replaced by dynamic approaches.The latter rely on heuristics and attempt to find optimal solutions by integrating variable constraints that were previously ignored.Although dynamic approaches are effective, they are slow to execute.This has led to the emergence of hybrid approaches, combining static methods for speed in finding a feasible solution and dynamic methods for optimizing these solutions.Hybrid approaches are widely used today, but suffer from the complexity in implementation phase.

Our proposal
We discussed in the previous section the existing solutions to achieve load balancing in cloud environments.We will now in this section introduce our novel method, we first formalize the problem statement in 3.1, then expose the architectural model on which our proposal relies and its related assumptions respectively in 3.2 and 3.3.Finally we give an overview on the solution in 3.4 before we dive deeper and depict it in algorithms within 3.5.

Problem statement
Keeping in mind that the main purpose of tasks scheduling and load balancing hybrid algorithm is first to decide how to assign a task to a specific virtual machine which is running on a particular server, then to focus on the manner that allows to maintain a workload distribution such that it eliminates overloaded and under-loaded servers and try to bring them all within a moderate and fair usage level.In this section we will formalize the problem we face and introduce used formulas.
Let assume that each datacenter is composed of a set denoted S of N serversS = {S 1 , S 2 , ..., S n } to which corresponds a set of Resources such that: Where R resource i (t) gives the remaining level of resource of server i at instant t.We assume also that on each server evolve a set V M of M virtual machines such that: V M = {vm 1 , vm 2 , ..., vm n }.Each virtual machine lives on a physical server and has dedicated resources, for example the first vm assigned to the first host noted vm 11 has as remaining resources at instant t: In order to formally express the problem we will now introduce some of the most important parameters used in our method and related equations, they can be regrouped within two main categories: (i) temporal based parameters shown in equations 2 to 8 and (ii) load based parameters shown in equations 9 to 14. Equation 2 allows the calculation of computation power of a host or a virtual machine: where P.U is the set of allocated processing units, |P.U | is its cardinality and sizeof (p.u) is the individual capacity of a processing unit in millions of instructions per second (MIPS).
The execution time of a task on a specific virtual machine is given by equation 3: Equation 4 gives the full completion time of a task on a virtual machine: where W T ji is the waiting time of the task j on the virtual machine i and is given by equation 5: i.e: waiting time of a task on a specific virtual machine is the sum of completion times of all preceding tasks.We can now define the makespan as the main parameter which measures the entire completion time of all tasks.It is respectively given on virtual machine, hosts and clusters levels by 6, 7 and 8: makespan(cluster i ) = max{makespan(server ji )} Now we are done with temporal or time based parameters, we will move to define the most important used load related parameters.The first of all estimates the load on processing unit and is given by Equation 9: In the same manner we can obtain the load on the RAM and storage at a particular timestamp by the equations Equation 10and Equation 11 respectively: This estimation is obvious, considering that on a particular a virtual machine, at a specific timestamp, the load corresponds to the already present workload to which we add the load induced by new assigned tasks and deduce load of finished ones.The utilization of bandwidth on a virtual machine is obtained by summing the amounts of data streams generated by active tasks: The global load score on a virtual machine is then given by Equation 13: where α, β, γandσ are pondering and normalizing factors, Then we can calculate the load of a particular server by Equation 14: These formulas are common ones and we will make use of them in our method in order to build clusters, define a task scheduling strategy based on genetic algorithm and to improve load balancing within cloud environment.

Architectural model
Our method is built upon the architectural organization shown in Figure 2 which is nearly similar to standard ones (see Figure 1).Our method relies on hybridizing between tasks scheduling and load balancing, it incorporates a new module called clusters manager which is new and crucial to the purpose of this paper.We deliberately omit other components such as energy efficiency and so on.
We will refer to mechanisms operating at datacenter level as level-2 and ones operating at clusters level as level-1.The major components/ modules involved in our solution can be described as follows: • Requests handler (RH): Builds the interaction interface with end users.It retrieves important information relative to each request like length and deadline.Each request is analyzed, modeled and transmitted to datacenter broker in form of a set of tasks.
• Datacenter broker (Broker): is the central module responsible of coordination of all other functional components, it receives information, controls consistency, synchronises and transmits commands to other modules.
• Clusters manager (CM): is in charge of organizing clusters.In other words it builds and maintains clusters of servers inside a datacenter while relying on following criteria and primitives: -Cluster size: which is a dynamic parameter but for the first iteration we randomly fix it to the ratio defined as: N/50 where N is the total number of servers in the datacenter.it can take any integer value and experiments are performed to determine its optimal value.
-Server utilization information: to be able to setup clusters the module is fed by information on utilization rates of the resources of each server, mainly: CP U , RAM , Bandwidth, storage.
-Fusion and fission primitives: dynamic thresholds are chosen to determine when a subset of a cluster should leave it (fission) and create another autonomous cluster.Same mechanism is implemented and determine when and which clusters should fusion to build a larger cluster.
• Cluster monitor: in the assigned cluster, it is responsible of monitoring the evolution of resource utilization of servers.It is recommended to deploy it as a virtual machine in each cluster (same recommendation for other local components) since clusters are dynamic and change over time.It continuously collects data from servers, aggregates them to statistical metrics and transmits deciding information on which cluster manager relies to trigger fusion and fission primitives.
• Tasks scheduler: corresponds to the classic module in charge of assigning tasks to the VMs hosted by the servers, however in our model it acts on two levels.
-Level-2 global tasks scheduler (GTS): decides which of the least loaded clusters will receive the incoming tasks.
-Level-1 local tasks scheduler (LTS): is responsible within a cluster for determining which server and virtual machines will run the tasks assigned by the GTS.
• Load balancer: is responsible of keeping a fair workload distribution among servers in the cloud environment, it is also updated such that it operates within two levels for our purpose: -Level-2 global load balancer (GLB): decides at datacenter level which cluster (among overloaded ones) should be relieved of workload and to which cluster (among under-loaded ones) the tasks should be migrated.
-Level-1 local load balancer (LLB): selects source servers from the origin cluster for the cloudlets migration process.Sink ones within the destination cluster are selected by Local scheduler.

Assumptions
In order to avoid confusion it is important to make some assumptions upon which our approach is built: • The load balancing is realized at cloudlets level, therefore, once a cloudlet is selected for migration it is forwarded to the global tasks scheduler to be planned again.
• A workload is already present in the datacenter before the deployment of our algorithm and is randomly distributed on the servers.A major advantage of our approach is that it can ensure hot-deployment in already operational datacenters.However, our framework can also be deployed in new datacenters without workload.• A task is executed to completion without interruption and randomly utilizes virtual machine resources according to a particular model (selected for simulation scenario).
• A virtual machine is destroyed if it finishes executing tasks in its queue before the scheduler has assigned it other tasks (there is no fully unoccupied virtual machines).

Overview
Our approach is designed in a such manner to reduce complexity and delays in operations of tasks scheduling and load balancing.To meet this objective we propose to divide the datacenter into a set of clusters partitioned on four categories according to the makespan and utilization rate of the servers composing them.
There are two ways to give an overall view of our solution.The first focuses on the functional model and can be broken down as follows: 1. Clustering: in a first phase, we use a k-means-based clustering procedure to divide the servers into four major categories according to their respective utilization rates and makespans.Once this is done, we divide the categories into clusters of bounded sizes.Primitives we've called fission and fusion allow clusters to evolve in a quasi-cellular fashion, so that certain sub-groups of servers can leave a cluster if they approach the centroid of another category.Two clusters can also be merged if they are in the same category and meet particular size constraints.

Tasks scheduling:
The second stage is dedicated to job scheduling, with the module acting on two levels: (i) at datacenter level, a round-robin procedure is used to decide which cluster a group of jobs will be assigned to.Then (ii) at cluster level, a genetic algorithm is used to assign tasks to the servers.
3. Load balancing: load balancing stage begins as soon as the tasks are scheduled.We've designed our solution in such a way that this mechanism focus on two tasks: (i) identifying the clusters to be lightened, (ii) locating the servers to be freed.Once done, the reallocation of released tasks is left to the scheduling module.
The second perspective focus on architectural levels on which the mechanisms act and can be depicted as follows: 1.At datacenter level: the mechanisms at this level deal with clusters and are responsible of realizing their respective missions in an independent way while relying on their local modules.
(a) Cluster manager: is responsible for cluster creation and development.During hot deployment, it initiates the k-means-based clustering procedure.It then supervises cluster evolution by gathering information from local monitors, and decides on fission and fusion operations.(b) Global tasks scheduler: is in charge of executing a round-robin procedure between clusters to decide which will receive the next jobs.It groups tasks into groups of size equal to the standard number of servers in a cluster, then decides to which cluster to send them to.(c) Global load balancer: it determines the overloaded clusters to be released by applying a round-robin algorithm between clusters in the fourth category, if any, and those in third category, if none exist.
2. At cluster level: the modules here act inside clusters and are in charge of realizing their missions on servers while following instructions from global managers and feeding them with local information.
(a) Cluster monitor: is on the lookout for evolving information on its own cluster like servers load and cluster size.It is continuously observing the movement of the cluster center and its proximity to main categories centroids.It is responsible for sending alerts regarding the Eulerian distance to the cluster manager when a fission or fusion procedure needs to be triggered.(b) Local tasks scheduler: receives groups of jobs to schedule from the global module and applies a genetic algorithm to decide of tasks assignment over the servers within the same cluster.(c) Local load balancer: when called upon from the global load balancer it runs a particular function to calculate a score based on makespan and utilization rate per server.It then rely on these scores to decide which cloudlets should be retrieved from servers and migrated to another cluster.This is just an overview of how the load balancing and tasks scheduling mechanisms work, depending on the architectural organization of the environment.The following sub-section is dedicated to the details of each step.

Our method
Our method goes through succession of steps and involves multiple algorithms, we will detail each of them in this section.

Servers clustering
The power of our method lies in the fact that it reduces the amount of information and constraints the task scheduling and load balancing modules should deal with.To do this it starts by decomposing the entire datacenter into clusters making the above mechanisms act on size-reduced sets of hosts.We will first focus on how clusters are built in the datacenter level then zoom to clusters level and explain fusion and fission primitives.
Intuitively, we can conceive a classification of servers according to the criteria of makespan length and scores relative to resources utilization rate, those criteria were previously calculated by the formulas 7 and 14 and allow to get the following categories as shown in Figure 3: • Category 1: grouping servers with short makespan and low resource utilization rate.
• Category 2: including servers with short makespan and high resource utilization rate.
• Category 3: including servers with long makespan and low resource utilization rate.
• Category 4: grouping servers with long makespan and high resource utilization rate.
Category 1 represents under-loaded servers while category 3 represents the worst category since it regroups badly exploited servers and category 4 contains overloaded ones.The category considered to be ideal is category 2 which is made up of servers that fairly use their resources and whose makespan is reduced.In the load balancing algorithm proposed in 3.5.3 the migration operations will be made in such manner to bring maximum number of servers within the category 2.
1. Cluster manager: in order to perform this clustering we propose in Algorithm 1 to use k-means method: • First, Algorithm 1 takes as input the list of servers and is expected to return a set of clusters with corresponding servers.
• In order to realize k-means clustering informative features on servers must be modeled.We propose the feature representation given in lines 2 and 3 respectively using equation 7 for the first feature which is the makespan of the server.Then combining equations 13 with 14 for calculating what we call resource utilization rate score to build the second feature.• Line 11 calls a standard k-means clustering procedure.It takes as argument a list of four main centroids generated randomly which should be updated through several iterations by using euclidean distance and the list of concerned servers.It returns four clusters grouping servers with similar characteristics.This classification around these four main centroids will serve as the basis for our next primitives and algorithms.
• Finally line 12 cuts clusters into smaller server pools to render them easier to manage.It takes as input the list of clusters and the desired size which is here equal to 50 and returns a list of clusters of limited size.Local centroids must be recalculated for each cluster and the CM (cluster manager) keeps track of the initial four main centroids.
2. Local cluster monitor: once created the CM (cluster manager) setups as a virtual machine on each cluster a cluster monitor which is responsible of gathering information on servers and transmitting them to the CM.The coordinates of the four main centroids obtained from the k-means algorithm are transmitted to each cluster monitor such that it can perform fusion and fission primitives.
• Fission: it is a function that allows to update clusters so that they stay consistent.
datacenter into a set of clusters in such a way that we will now be able to perform the desired operations at two levels: at the datacenter and clusters one.
1.At datacenter level: at datacenter level we focus on the method by which tasks will be assigned to clusters.The global scheduler decides on which cluster to plan some arriving tasks regardless to how it will be managed locally by the local tasks scheduler module.To realize this the global scheduler acts as following: • First it decides of the targeted category of clusters.It will obviously favor clusters of category 1 since they contain under-loaded servers, if there is no cluster in this category GTS (global tasks scheduler) will explore the possibilities in class 3. The idea here being to be optimistic, as this category underutilizes its resources it could be possible to increase this ratio without significantly impacting the makespan.
As a last resort it will choose class 2 which represents the perfect exploitation model and should not be disturbed.
• Then it decides of the targeted cluster by using a round-robin algorithm.Once the category is chosen, the global scheduler lists all corresponding clusters first, then it redirects the tasks it receives on these clusters in turn.Depending on the parameter given to it which is equal to a certain number of tasks, at the end of the scheduling of this number of tasks on the clusters it will repeat the verification of the first step.
We propose to start by defining an algorithm for finding clusters corresponding to a specific category relative to the four main centroids.Algorithm 2 explains how we do it.Algorithm 3 relies on Algorithm 2 and allows to select the clusters that will be concerned by round-robin tasks assignment procedure.
A last important procedure must be defined for the round robin algorithm.Algorithm 4 takes a list of selected clusters and only one task, it assigns the task to one among these clusters and outputs the list of remaining others.
In order to carry tasks scheduling at cluster level we opted for the use of a variant of genetic algorithms given their adequacy with the type of problem and the conclusive results obtained by the works based on them.In order to design a robust and efficient genetic algorithm we need to look closely at three elements: (i) the generation of the individuals and the population, (ii) the operators for the evolution of the populations, (iii) the procedure for performing these operations on the individuals based on the fitness function.
For the first generation we have to create a random population of feasible solutions.We first create one feasible solution at once by generating an individual with Algorithm 6 which incorporates as constraints the number of available servers and arriving tasks and generates a random realizable solution.Then we repeat the procedure for individual creation a certain desired number of times to create the first generation population, this is explained in Algorithm 7.
whether clusters at datacenter level or servers at cluster level.The second element is the high decoupling in the missions of the latter modules.As we previously said, in load balancing step we're only interested in one question on two levels: which clusters to free up and which servers in particular to lighten.This is because the load balancer is no longer in charge of migrating cloudlets, as the scheduler will reassign them to other servers on category one clusters.
1.At datacenter level: the first step in load balancing is to locate clusters within the fourth category, those with high resource utilization rate and long makespan.If the category four is free then we try to find clusters of category three that have a bad utilization rate but still have a long makespan.This can be achieved by using Algorithm 11.
The last step of all in our method is realized thanks to Algorithm 13 which is used by local load balancer to decide which cloudlets must be relieved from servers and migrated to another cluster, the latter job is made by the global tasks scheduler and is out of the scope of this algorithm.Now we have explained our method in details by depicting each step, the role of each architectural component and the strategies used to trigger primitives, we will move in next section to explain our validation method, show details of implementation, discussing obtained results and comparing them to the best ones found in the field literature.

Experiments
In order to validate our method we implemented it with the standard cloud simulator called CloudSim plus .Table 3 shows what is needed to be implemented in term of objects and corresponding parameters.
CloudSim is an open-source simulation library widely used for modeling and evaluating cloud computing systems.It enables researchers and developers to simulate cloud datacenters and applications to assess their performance such as energy consumption, makespan and so on.CloudSim offers a simple, flexible interface for creating customized simulation scenarios.It allows researcher to focus on implementation of their own algorithm like virtual machine placement ones, tasks scheduling, load balancing and to assess them according to a certain set of criteria like number of cloudlets migration [36].CloudSim Plus comes into the game as an extension to the CloudSim simulation library, offering advanced features and performance enhancements to speed up modeling and simulating cloud computing environments [37].The simulation was conducted on a laptop with following characteristics: • Processor: Intel ®Core ™i7-10510U CPU @ 1.80GHz.
In order to verify the veracity of the results obtained, we ran our algorithms in several scenarios, varying key parameters such as the number of servers, cloudlets and cluster sizes.We collected key indicators such as duration over the various stages and number of operations, before carrying out a comparative and analytical study against the methods we considered to be the most relevant in the literature.
The collected metrics are obtained by calculating the mean values among a hundred of repetitions for each scenario.Cloudlets and servers are generated with random parameters as previously shown in table 1 to make the scenarios more realistic.

Results and discussion
Table 4 introduces the main performance metrics we used to evaluate the efficiency of our method and to compare it with other approaches.Since the time involved in load balancing processes is very negligible, we are only interested in three metrics which are response time, number of migrations and of SLA violations.We have estimated clustering time according to the number of servers within a datacenter and for a variable cluster size, the obtained results are shown in Table 5.In comparison, the [38] approach requires 4seconds for clustering one thousand servers.In regard of the parameters used with k-means, our approach enables realizing clustering operation within a very negligible time.Evaluating task scheduling means taking two durations into consideration: (i) the durations required for the global scheduler to perform group tasks in batches and round-robin execution to designate the target cluster, then (ii) the duration required for the local scheduler to run the genetic algorithm and assign tasks to servers.The results obtained are detailed in Table 6.When we focus on overall times, we see that smaller clusters deliver better performance, due to the fact that round-robin time is negligible even when the number of clusters is large, and that the genetic algorithm increases considerably in runtime as the number of servers increases.The results of Table 6 are represented in the graphs of Figure 4.The graphs show a certain irregularity: the regression is not totally linear between cluster sizes and planning times.This easily observable phenomenon is due to the fact that task characteristics and server capacities are generated randomly and results are obtained by aggregating the outputs of the various scenarios repetitions in average values.

Number
Table 7 summarizes the most relevant performance criteria of our approach.We have measured those metrics for several possible cluster sizes.The major observations can be summarized as follows • The number of cloudlets to migrate increases logically while increasing the cluster size, this is due to the size of the tasks batches transferred by the global scheduler to the cluster which is equal to the number of servers present in the latter.The more the cluster receives tasks that are of variable sizes, the more the load balancer decides to migrate a greater number of them which risk to fall in SLA violations otherwise.It is important to note that the proportion of cloudlets to migrate remains near the ratio of 12% compared to the size of the cluster (batch of tasks).• The given makespan concerns one batch of tasks of random sizes.We can see that it remains within a very interesting interval of values.Larger sized clusters may have a longer makespan due to the greater variety in the characteristics of the cloudlets they receive.
• The number of SLA violations increases if the cluster is bigger.The greater the number of tasks received, the greater the probability that some will violate the SLA.It is also important to note that these violations remain below an acceptable threshold of around 8%.
• The last row of the table shows the results of the load balancing module response time evaluation.This time includes the designation of the cluster to be lightened and the precision of the servers to be released.In other words, it encompasses the delays from detecting an imbalance to determining the list of cloudlets to migrate.It is easy to notice that the delays are negligible, in comparison the solution [23]  We will now move to the comparative study.We start by comparing the main metrics obtained for our method with some of the most relevant ones in the literature of hybrid approaches.Table 8 gives a comparison between our method and selected other ones according the the standard parameters.The results were estimated for a datacenter of 2000 servers and with 2000 cloudlets.Our method presents a best average makespan, a lower migration ratio and a considerably reduced amount of SLA violations.
Makespan migrations SLA violations Our method 1.18 12% 8% [25] 3.5 19% unknown [18] 8 35% unknown [22] 3.8 unknown 18%  5 represents the results given in table 8.The left plot allows a visual comparison of our method and ones given in [25] and [18] according to the number of performed cloudlets migrations.While the right graph compares our method with [22] according to SLA violations.The red graphs are plotted to serve as marks.The performance evaluation of our methods has produced very promising results, the approach allows an excellent scalability and a reduction in delays, migrations and SLA violations.

Conclusion and perspectives
In this paper we proposed a new hybrid approach to job scheduling and load balancing in cloud environments.This approach offers several advantages such as hot-deployability, high scalability, decoupling and strong interoperability between the mechanisms that manage the cloud ecosystem.The power of this approach lies in its modus operandi: in the first stage, servers are clustered using an algorithm based on k-means and criteria such as utilization rate and makespan, enabling us to approach the problems we face on two scales: at the datacenter level and inside a cluster.The tasks scheduler integrates two modules, one is global and relies on round-robin to transfer task batches to a particular cluster, the other which is local calls a genetic algorithm to assign tasks to servers.The load balancer also operates on these two levels: a global module uses round-robin to designate the cluster to be released, then a particular algorithm which uses an individual score to designate the servers to be unloaded.The clustering mechanism also incorporates probes within each cluster, and forecasts their evolution to perform fission and fusion actions designed to maintain cluster coherence.
We were able to validate the performance of our approach by implementing it with CloudSim plus, which produced very conclusive results in terms of makespan, response times, SLA (service-level agreement) violations and cloudlet migrations.The comparative study showed a clear improvement over the most recent and relevant works in the literature.We'll now be looking at future improvements, taking advantage of the power of machine and deep learning to optimize cloudlet migration processes and define more optimally the different threshold used to control actions of our algorithms.

Declarations
Ethical Approval This declaration is not applicable for the purpose of our work.

Competing interests
The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Figure 3 :
Figure 3: Servers categories according to makespan and resource utilization score

Algorithm 2 : 6 i
Find clusters Data: List of clusters, category Result: All clusters of specific category 1 f ound = N ull; 2 i = 0; 3 foreach cluster ∈ clusters do 4 if cluster.category= category then 5 f ound[i] = cluster.id;

Algorithm 13 : 2 P
Determine cloudlets to migrate Data: List of cloudlets within the cluster Result: List of cloudlets to migrate 1 foreach cloudlet ∈ cloudlets do lanned time = cloudlet.getPlannedOnT ime(); 3 if P lanned time > M M S then 4 migration list = migration list + cloudlet; 5 end 6 end 7 return migration list; time required by the load balancer to detect an unbalanced situation and to determine cloudlets to migrate

Figure 4 :
Figure 4: Tasks scheduling duration according to cluster size

Table 1 :
11(t) = {R cpu 11 (t), R ram 11 (t), R bw 11 (t), R str 11 (t)} We assume that all virtual machines are initially of a same fixed amount of resources: 2 CP U cores with each 5000 M IP S and 4 Go of RAM .Each task has two main parameters which are length and resources utilization model, we commonly also call this the cloudlet model.According to its type a cloudlet has a determined size (quantity in MIPS) and a variable percentage (from 0.2 to 0.8) of available resources utilization such as RAM.Table1gives an overview on available server types with corresponding resources (number of processing cores, RAM and each core calculation capacity) and possible cloudlet sizes.Hosts and cloudlets configuration

Table 2 :
Configuration of the genetic algorithm for local tasks assignment

Table 3 :
Cloudsim simulation model and elements

Table 7 :
requires 0.008seconds to determine the tasks to migrate among a total of just 30.Global performance evaluation