Dynamic Migration of Microservices for End-to-End Latency Control in 5G/6G Networks

With the adoption of the microservice paradigm by the telecom industry in the design of 5G/6G networks, complex network functions are decomposed into sets of chained sub-functions, which are further deployed using containerized technologies over geographically distributed cloud clusters. Latency-sensitive applications require to carefully orchestrate the allocation and re-arrangement of (micro)services to prevent from a largely segmented placement of microservices. To address this issue in the context of network functions, we introduce a novel placement and migration strategy that chooses those specific microservice(s) to migrate and the optimal destination (data center) while considering the impact of the migration on other microservices. We devise fast and effective heuristics that are extensively studied via simulation experiments, that show that our proposed approach significantly reduces the service latency.


Introduction
The decomposition of complex services into microservices, which can easily be instantiated, modified and deleted, is instrumental in the design of 5G/6G networks and widely adopted by the telecom industry to implement Network Function Virtualization (NFV).Network Functions (NFs) decomposed into microservices are hosted in containers, giving rise to Cloud-Native Network Functions (CNFs).Thanks to the flexibility of the cloud native approach, plethora of cloud providers, content providers and telecom network operators are today adopting the container-based microservices paradigm.
Numerous research works (refer to, e.g., [9] for a survey) have addressed the problem of placing Virtualized Network Functions (VNFs) represented as Service Function Chains (SFCs) on data centers geographically distant.Several optimization criteria can be envisaged (e.g., load balancing, latency control).A big chunk of the studies however consider static situations, where a set of SFCs has to be placed on a distributed and virtualized architecture composed of data centers interconnected by transmission links.Optimization problems considering cloud and transport resources are then formulated based on objective criteria and solved via heuristics or Machine Learning techniques.Only a few works (see [4]) consider dynamic situations where SFCs join and leave the network.For instance in [4], the problem of placing SFCs (namely, network slices) has been addressed in a dynamic context, wherein SFC requests arrive following a non stationary Poisson process; Deep Reinforcement Learning techniques then prove very effective to cope with this kind of placement problem.
Beyond placement of SFCs, the decomposition of NFs into microservices, which can easily be migrated [9], introduces an additional degree of freedom in the placement of SFCs.While classical placement algorithms place the different components of a SFC on data centers for the whole lifetime of the SFC, migration makes it possible to continually modify the placement of microservices in order to improve some performance criteria or remedy an impairment.
Container migration is mostly addressed in the technical literature within the framework of service migration in connection with Mobile Edge Computing (MEC), see notably [2].In that context, the migration of containers is triggered by the move of users and is intended to guarantee some Service Level Agreement (SLA) that are expressed in terms of latency, bit rate, etc.Our motivation in the present paper is different as we consider the placement of Service Level Agreement (SLA) (rather than user applications) decomposed into microservices embedded in containers.The corresponding SFC are placed on a hierarchy of clouds (edge, fog, and central clouds) and are then migrated in order to control the latency of all the placed VNFs and not only the latency of the individual service.Contrary to [2], which is relevant to Edge Multi Cloud Orchestrator (EMCO), the framework considered in this article addresses the network orchestration problem, in which the orchestration platform optimizes the placement of SFCs; container migration is an additional feature, which so far only places SFCs (see for instance Open Network Automation Platform (ONAP)).
Compared to many works on SFC placement [9], we consider in this paper VNFs, which comprise virtual Radio Access Network (RAN) functions, which shall be placed near a predefined geographical area.SFCs are hence rooted in the sense that microservices are placed near to the virtual end user, which is static, contrary to [2] that allows end user to move.Finally, SFCs are different from virtual network embedding as one objective of the placement and then of the migration is to collocate microservices in order to contain latency.The frameworks of virtual network reconfiguration or even circuit repacking in circuit switched networks present some similarities with the problem addressed in this paper.However, the concept of latency, which is central in our analysis, cannot be easily handled as this metric depends on the number of messages exchanged between the microservices.
Through this work, we identify the factors that significantly influence the placement of SFCs composed of chained microservices.First, the proposed strategy is dynamic in nature, which means that after proposing the initial placement, the system continues to improve the placement by migrating or re-allocating the microservices.Second, our solution is user-centric as it aims at reducing the end-to-end latency while considering resource load balancing.Finally, the design rationale supports migration of microservices across geo-distributed cloud nodes by performing both vertical (moving microservices from the bottom edge to the top layer of the cloud and vice versa) and horizontal (moving microservices from one node to another in the same layer) migration.
We specifically answer to the following questions: (1) Which microservice requires to migrate and when?(2) Which factors have to be considered while choosing an optimal data center to place the migrated microservice?(3) In case of no available resources on the selected optimal data center, which microservices can be selected from the list of already placed ones to migrate on another node by avoiding the impact on its current communication delay?While taking in account all the above mentioned design criteria, the contribution of the present paper can be summarized as follows: • First, we formalize the model for the migration of microservices distributed across several data centers, considering a heterogeneous cloud architecture.In particular, the goal is to solve this ever-demanding migration problem by ensuring the lesser number of microservices are moved while keeping the placement optimal.• Second, we introduce an approximate problem-solving solution with three heuristics that considerably reduce run time of the migration algorithm.The two heuristics emphasize on the placement of newly arrived services while considering the current system's state and the third heuristic aims at enhancing the placement optimality in terms of end-to-end latency by performing the run-time migration upon service departures.
This paper is organized as follows: in Sect.2, we review existing work on microservice migration.The model considered in this paper (in particular the underlying cloud infrastructure) is presented in Sect.3. The dynamic system as well as the metrics considered as quality indicators are presented in Sect. 4. The placement and 84 Page 4 of 25 migration algorithms are described in Sect. 5. Simulation results are reported in Sect.6. Concluding remarks are presented in Sect.7.

Related Work
The difference between the framework considered in this paper and other frameworks relative to virtual network embedding and reconfiguration as well as circuit repacking has been noticed in the Introduction; we review in this section the existing literature on microservice migration, which is instrumental to solve the problem of placing VNFs or CNFs.We further summarise in Table 1 the main characteristics of research works.In particular, Table 1 compares the various approaches in terms of their ability of handling dynamicity, network distribution, NF chaining, as well as accounting of user location, resource and latency constraints, and application type (stateless or stateful).The management of "stateless" or "stateful" applications has a significant impact on the way migration is carried out.With stateless application, the stateless container is typically migrated (i.e.re-allocated and restarted from scratch) without conserving the application state.On the other hand, stateful container migration involves transferring the application state.In particular, the migration of an inactive application (cold migration) is straightforward as it involves shutting down the running container before initiating the migration, which eliminates the need to handle the memory state.Alternatively, migrating a container from one node to another while it is running (live migration) and without service interruption, necessitates maintaining state consistency during the whole migration.

Optimization Based Approaches
The authors of [7] propose an Mixed Integer Linear Programming (MILP) model for VNF migration problem to reduce the SFC delays while considering resource constraints (such as CPU and memory), network delay, affinity & anti-affinity factors and migration delay (time required to discover service and to propose new placement).They use greedy algorithm to place VNF and analyse the impact of VNF migration or re-instantiation using their proposed model.
In [1], an edge based migration strategy is proposed for containerized applications.An Integer Linear Programming (ILP) model aims at minimizing the service downtime and latency occurring while performing the migration across edge nodes.Further, authors implement a heuristic approach to overcome the limitation of mathematical models such as lack of scalability and time consumption, and compare it with greedy approaches.
In [19], a multiple dimensional Markov Decision Process (MDP) migration strategy is used for fog networks and is further solved by using two combined algorithms, namely Deep Q-learning and Deep Neural Networks (DNNs).The considered states of the system are delay, power consumption and migration cost.The action contains the selection policy based on greedy methods that choose the containers to migrate.In particular, containers hosted at under-utilized node, are migrated to other nodes to minimize the power consumption.Whereas, at the over-utilized node, the containers involving the least migration costs are migrated.The allocation policy selects the target node for each migrated container.The empirical evaluation shows that the proposed solution performs better comparing to existing baseline strategies.
The approach in [5] formulates two different optimization problems.The first one aims at mitigating the QoE degradation during user handover.The second one is intended to control the cost of service replicas.To solve the migration problem the authors exploit the replication mechanism while respecting the creation of replicas for each user.

Algorithmic Approaches
In [12], a re-assignment strategy is introduced.Containers belonging to the same type of services are required to be placed closer to each other.The placement of new containers is intended to reduce the load and communication cost and is performed by a customized version of Worst Fit Decreasing (WFD) algorithm.Then, the re-assignment of initially placed containers is performed by the Sweep & Search algorithm to minimize the total cost.An online container based placement strategy for managing the inter-container traffic is presented in [21].An offline ILP model is formulated to fetch the traffic flow along with quadratic constraints.The online scheme follows the primal-dual method and proposes the placement at the arrival of each new container request.
The scheduling mechanism proposed in [16] migrates only long-lived containers as they occupy the resources for a long time.First, long-lived containers are arranged with respect to the CPU resources they consume.Then, the containers on highly occupied hosts are swapped with those hosted by less occupied hosts.The proposed algorithm is based on a random-first-fit algorithm which is continuously executed until the load is uniform.
In [3], the proposed migration algorithm handles the migration of shared VNFs that are deployed on a multi-domain federated network.The proposed algorithm coordinates with each domain orchestrator and migrates the shared and chained VNFs using the information provided by each orchestrator in case of failure.

Migration in the Context of MEC
The authors of [15,20] reviewed various container-based placement and migration strategies.They investigated a set of previously proposed frameworks and algorithms used to build the scheduling models for edge computing, notably MEC.
The dynamic container migration strategy introduced in [14] for MEC focuses on minimizing the workload and migration time, and handles the user-mobility using a heuristic method.The proposed method first shortlists the containers at source nodes based on their total latency.Then, for each container selected for migration, the node that is (i) geographically closer to the end-user and (ii) less utilized is selected as a destination node.Likewise, the MEC-enabled approach in [18] aims at providing flexible placement and migration of VNFs.The orchestrator dynamically manages the resources on the fly in order to handle the requirements of an application across a heterogeneous network that spans the core and edge networks.
In [8], a Kubernetes-based container migration approach is presented to migrate stateful services over the fog network and minimize the downtime.For stateful services, transmission of disk states is very time consuming.Hence, the authors put forth on managing the transmission process of container layers from source to destination node.
The work in [2] addresses the migration problem for distributed data centers to manage latency-sensitive applications.Taking into account various parameters (e.g.resource allocation and load) that are related to container usage, they propose three algorithms.First, containers characterised by a high total latency are short listed.Further, suitable neighbours of each container are selected based on utilization level of the region and its location with respect to the user.At the final stage, containers are placed near to their neighbors, which is the final destination that has been chosen based on the number of migrations from source to destination location taking the inter-edge bandwidth and memory load into account.The authors of [11] implemented a testbed based on Kubernetes that redeploys pods near end-user.The design rationale is to relocate services while performing adaptive handoff in MEC architecture.Another container-oriented approach [13] considers the layer structure of the Docker storage system to reduce the migration time.The top storage layers may be modified anytime while underneath layers remain unchanged.Thus, underneath layers may be transmitted in advance before commencing the migration process; latter the top layers are transmited.End-user's location is also taken into account so as to place the service near the user.
In [17], live migration mechanism for mobile networks overcomes the limitations of current CRIU tool (Checkpoint/Restore In Userspace), 1 which is frequently used for process checkpointing during cold and live linux container migration.Experimental plateform includes the SCTP protocol to ensure message delivery between MME and CU in LTE networks and a tool to manage the GTP (GPRS Tunnelling Protocol) device-specific information.A detailed evaluation of the migration of core network functions such as Home Subscriber Server (HSS), Mobility Management Entity (MME), and Serving and Packet Gateway (SPGW) for VM-enabled and container-enabled live migration is also provided.
The above mentioned research works solve the migration problem by taking into account various critical factors such latency or computing resources and merely focus on network edge with MEC and the migration of user application.In this paper, we rather consider network operator applications (mainly CNFs) which can spread over the complete network infrastructure.Contrary to previous works, we pay attention to the communication between the chained NFs/microservices and the end-user while optimizing the end-to-end latency.We tackle the placement and migration problem in such a way that chained microservices of the same service co-join on an optimal target while satisfying the resource load, latency and end-user location.

Cloud Infrastructure
The model considered for simulation experiments and described in Fig. 1 reasonably represents a national telecommunications cloud infrastructure with several interconnected data centers organized in a three-layer tree structure.The lowest layer consists of edge nodes corresponding to the MEC level that have limited resource capacity and are geographically close to end-users, thereby ensuring low communication latency.The next layer is composed to regional nodes having intermediate capabilities in terms of resources.The top layer refers to a centralized cloud that acts as a national cloud with enormous capacity compared to the others, but operates at the expense of high latency.
Communications between the user and the microservices of an application (or NF in the context of this paper) hosted on different the data centers induce latency.Obviously, the user experiences lowest latency if all the components of an application (e.g., cloud gaming, AR/VR) is hosted in the edge node.In turn, latency increases if the application is hosted in a regional or the centralized cloud.Data centers are geographically distributed.In most European countries, edge clouds are located within a distance of 50-100 km.The distance between edge clouds and core clouds is between 100 and 200 km and between 300 and 1000 km between core and centralized clouds.In the following, we shall take as time unit the propagation time 84 Page 8 of 25 between an edge and core clouds distant of 100 km.This time may slightly vary in practice, depending on the number of routers, switches and link capacities between the two clouds.But for thought experiments we assume that the transmission time between an edge and a regional data center is set to 1; transmission delay between fog nodes is equal to 2; the ones between fog nodes and the central cloud equal to 3. Within a node, bandwidth is assumed to be infinite because network operators typically over-dimension their transmission links to avoid bottlenecks.
The set of data centers

Placement of Services
In the following, we use the notation summarized in Table 2 for describing placement of services and the related metrics.We consider the problem of placing a set of services on a cloud infrastructure composed of the set of data centers D = {D 1 , … , D N } , where N is the total number of data centers.Each service S is composed of J S microservices denoted by 1 , ⋯ J S ; each microservice j (with j = 1, … , J s ) requires a certain amount of CPU, disk and RAM.In practice, RAM and CPU are both the most scarce resources of cloud infrastructures.We denote c( ) The placement problem consists of finding a mapping function h from the set S of services to the set D of data centers.More precisely, we consider the mapping where ({0} ∪ D) is the multiset with elements in {0} ∪ D and h( j ) = D n if microservice j is placed on data center D n .If microservice j cannot be placed because of resource exhaustion, then we set h( j ) = 0 .In that case, no microservices of S are placed and h(S) = 0 def = (0, … , 0).Let M (h)  n denote the set of microservices placed on the data center D n under place- ment policy h.Let C n and R n denote the CPU and RAM capacities of data center D n , respectively.The following constraints shall apply: (1) Table 2 Notation for the cloud infrastructure, the placement of services, and related metrics The set of services having a microservice hosted by data center D n is denoted by S (h) n and is defined by The set of services (resp.microservices) that can be placed is The mapping h has to satisfy constraints (2) while additional criteria can be considered, e.g., load balancing between data centers, maximization of the number of placed services, etc.For instance, the maximization of the utilization of the CPU of the cloud infrastructure reads while the maximization of the fraction of services which can accepted in the system reads Finally, anti-affinity rules can be introduced to prevent two microservices from being placed on the same data center (for instance for security or resilience reasons).

Latency of Services
In the following, we are interested in the latency experienced by a service S composed of microservices 1 ⋯ J S .A dummy microservice 0 with no resource requirements is added to represent the location of the user of the service, which is attached to an edge node of the cloud infrastructure (see Fig. 1).Microservices j , j = 0, … , J S , exchange messages to execute the application they support.In the following, we define the message exchange matrix S = ( S ( i , j )) for service S, where S ( i , j ) for i, j = 0, … , J S , is the number of messages exchanged between microservices i and j of service S.Even if the exchange of messages between two microservices is asymmetric, the latency only depends on the number of messages exchanged regardless of their direction.Hence, we can make the assumption that S ( i , j ) = S ( j , i ) and in addition S ( i , i ) = 0 .The (j S + 1) × (j S + 1) matrix S is then symmetric with zeros on the diagonal.
If microservices i and j are not placed on the same data center, then the transmission across the links connecting the two data centers introduce latency in the execution of the service.Let d n,m denote the delay between data centers n and m.In the following, we neglect the delay inside a data center (i.e., d n,n = 0 ) as this delay is low compared to transmission delays between remote data centers.
For a given placement h, let us define the (j S + 1) × (j S + 1) delay matrix Δ (h) S = (d h( i ),h( j ) ) for service S under placement h.Then, the latency affecting ser- vice S is Owing to the symmetry of matrices, where Tr is the trace operator.
The global latency of the system under placement h is defined as and the average latency as With regard to placement, we can introduce the following optimization problems: minimizing global (resp.average) latency min h L (h) (resp., min h L (h) ) or minimizing the maximum latency of services min h max S∈S (h) (h)  S with h achieving the maximum cloud occupancy (criterion (3)) or acceptance rate (criterion (4)).

Dynamical Setting
While many studies in service placement (VNFs or network slices) assume a static setting, where the global set of services to be placed is known in advance and fixed, we consider a dynamic system where services join and leave the system.In that case, the placement strategy should take account of the service dynamic in the sense that: • Each arriving service has to be placed by taking into account the current state of the system, possibly by migrating some microservices in order to control the latency of the new service while also controlling that of services with migrated microservices; • At each departure of a service, resources are released and can be used for microservices migration so as to reduce latency of services in the system, e.g., according to the optimization problems (see Sect. 3.3).
In addition, we assume that services are anchored in the sense that the service user is attached to an edge data center.In contrary to studies on MEC in which users are moving, we focus on network functions instantiated for fixed groups of users (e.g., a RAN area, a company, ephemeral groups of users willing to have connectivity to the network, etc.).For this purpose, a dummy microservice with zero capacity (5) 84 Page 12 of 25 requirements is located at an edge data center.If services are accepted on a capacity basis only, then we have a blocking system.As long as the service can be placed, the service is accepted regardless of incurred latency.
In the following, we assume that there are K classes of services.Those services of class k ( k = 1, … , K ) arrive according to Poisson processes with rate k .A service of class k, if accepted, stays for a random amount of time with mean 1∕ k .A service S k of class k has a global resource requirement A k = ∑ ∈S k c( ) .The global capacity of the system is C = ∑ N n=1 C(D n ).If the global capacity C is finite then we have a multirate loss network (see the seminal paper [10]).This kind of model has been used to dimension multiservice circuit switched networks and the blocking probability can be derived in various load regimes (see for instance [6]).

Metrics
When dealing with a QoS requirement like latency, we could impose that when a service joins the system and the QoS objectives for this service cannot be met, then the service is rejected.This may however lead to under-utilization of the system.Instead, we propose to accept all services and we use the capability of migrating microservices to keep the latency under control.An issue is to determine control metrics.
So far, we have defined in Sect.3.3 latency of services in a static situation.We can nevertheless define a random variable (h) taking values in the set { (h) S , S ∈ S (h) } .When dealing with a dynamic system, we compute the latency of those services that are in the system.Contrary to the static case, the service latency can vary in time due to migration.If a service S has a holding time S , then we define the mean latency under a migration strategy m as where t S is the arrival date of service S and (m) S (t) is the latency experienced by ser- vice S at time t.
When considering a population of services under service migration policy m and placement h, we define the mean latency as This is a global metric reflecting the efficiency of a migration policy m in terms of latency.
Latency is due to the placement of the microservices (including the dummy microservice) on distant data centers, which reflects the fragmentation of service.More precisely, for a given placement h, the fragmentation index of a service S is set equal to (h) S = |h(S)| , thereby representing the number of data centers hosting ser- vice S. The set of fragmented services is denoted by S (h)  f under placement h.
With migration, the placement of a service may vary and impact its fragmentation.The fragmentation index of a placed service is denoted (m)  S when the migration strategy m is applied.The objective of a migration strategy m is to decrease the initial fragmentation indices (h) S of services S for a placement h.The set of fragmented services after applying migration strategy m is denoted by S (m) f .

Algorithms for Placement and Migration of Services
The proposed heuristic algorithms place the newly arrived services (Sect.5.1) and further reassign highly fragmented services (Sect.5.2).

Placement of New services
For the placement of arriving services, we consider two greedy algorithms: The Greedy First Fit algorithm and the Greedy Best Fit algorithm.

Greedy First Fit Algorithm (GFF)
This algorithm places the microservice chain on the first available data centre and is commonly used for bin-packing problems as it is very fast in searching for the first available block.In this way, the nodes closest to the end user are acquired first over the nodes located at a distance (the edge node, followed by the fog nodes, and then the cloud node in the final stage. Our approach (Algorithm 1) involves the following steps: 1. Initialize by allocating the user of the service.For this purpose, a random location is selected at an edge node n (line 2).Note that the end user does not consume/ occupy resource; this user is introduced for latency computation.2. Further, place the chained microservices of the service on the selected edge node (closer to end-user location) until the resource capacity is met (lines 11-14).3.Then, move to the nearest regional node (i.e., attached parent node -lines [16][17] to place the remaining microservices in case all the microservices are not placed.4. Proceed to the cloud node until all the microservices are placed.

Greedy Best Fit Algorithm (GBF)
This algorithm corresponds to a greedy method that aims at reducing the fragmentation of microservices that compose a given service by (1) keeping to a minimum the number of data centers occupied by microservices of a given service and (2) allocating all the co-joined microservices as much as possible on the same data center that has been selected in a greedy manner.In order to place the microservices in a best fit manner, the whole service must be placed on a single data center otherwise the whole service moves to the next available data center in a greedy manner.This strategy tends to reduce the latency caused by communications between microservices (except the end user).
The two algorithms presented have small complexity.They can be adopted to place the microservices initially and on each service arrival while migration of microservices is executed only when the services leaves the system in order to fully optimize the released resources.The computational complexity for GFF (in algorithm 1) for a total of S services to be placed on D data centers is O(S).Likewise, the time complexity for GBF approach to map the whole service is O(S).On the other hand, the migration algorithm 2 includes the sorting of fragmented services and then placing the highly fragmented service.Our implementation used the inbuilt sort() function of Matlab which is based on Quick Sort (popularly known as fastest algorithms for sorting).This tends to provide the time complexity of O(n * logn) for the set of fragmented services S * f = n.

Migration Strategy
Once placed, microservices could be migrated in order to improve the latency of services.The step by step executions (see Fig. 2) proceeds as follows: i.The migration of microservices starts with the departure of service(s).Given that departed service(s) release(s) resources from their respective data center(s), it is pivotal to make use of these available resources to improve the latency of other services.ii.The migration is triggered if the number of departures is higher than a given threshold value to avoid the triggering of migration at each service departure.iii.Based on the ordered list of fragmented services (set S (h) f ), the most fragmented service (composed of chained microservices) is selected to proceed with the process.iv.The microservices which (i) experience high latency because they are located on distant data centers, and/or (ii) exchange many messages with end-users are chosen from the selected fragmented service list.As a consequence, only highly communicating and paired microservices are privileged for migration rather than all the microservices (even-though there is enough resources available) to minimize the global latency.v.As an optimal target data center to migrate the microservice, it is suitable to find the data center, where the end user is located that led to minimize the latency.vi.Note that the migration takes place in the case there are sufficient available resources to host the microservices that need to be migrated and if the migration results in a latency gain.Otherwise, it is necessary to free some space by re-allocating some microservices composing service(s) experiencing no fragmentation and hence small latency: such microservices are typically placed on the same data center (or on a nearby data center) and exchange the least number of messages with the other microservices.vii.The candidate microservice is moved to the nearest data center to host as per greedy approach.viii.At the final stage, the highly active microservices are migrated after verifying that the latency gain corresponding to the difference between the latency reduction achieved by migrating the microservice of S and the latency increase due to the migration of the candidate microservices is positive.
Precisely, the migration strategy detailed Algorithm 2, involves the following steps: 1. Sort the set of placed services in decreasing order of fragmentation and create the ordered set S * f of fragmented services (line 3). 2. If S * f is not empty, select the service S i in S * f with the greatest fragmentation index (line 5). 3. Identify two microservices i and j (with i ≥ 0 , j > 0 , i ≠ j ) among the services that are the most fragmented and induce the highest latency (line 6). 4. The migration takes place if (i) the required capacity c( j ) is available on data center D( j ) and the migration causes a latency gain (lines [13][14][15][16].If the neces- sary capacity is not available, the service S j with least fragmentation index among the services hosted on data center D( i ) is considered to free a capacity larger than or equal to c( j ) .If this is not possible, then migration cannot take place.Otherwise, the selected microservices are placed on other data centers by using the greedy algorithm and the migration of j takes place.

Simulation Setting
For the implementation of our proposed algorithms, we consider the cloud infrastructure depicted in Fig. 1.For our experiments, we assume that centralized cloud has infinite capacity, fog nodes (from 10 to 12) have capacity 100, edge nodes (labeled from 1 to 9 in Fig. 1) have capacity 20 (see Table 3).
Concerning latency between the nodes, the assumed latency between the edge and fog nodes is 1 unit, between the fog nodes is 2 units and for the centralized cloud is 3 units.Further, we have two types of services differing in the number of microservices: • Small service corresponds to lightweight applications (e.g., a firewall) and consists of a small number of microservices, namely 3 microservices exchanging messages with ( 1 1 , 1 2 ) = 2 and ( 1 2 , 1 3 ) = 4 ; the number ( 1 0 , 1 1 ) of messages between the end user and the first microservice may change.
The required capacity of each microservice is set equal to 1.We assume that the two types of services (i.e., K = 2 ) arrive according to a Pois- son process with rate k ; a service of type k (with k ≤ K ) stays in the system for an exponentially distributed period of time with mean 1∕ k equals to 1.Under the assumption that the resource requirement of microservices is equal to 1, the resource requirement of a service of type k is equal to A k , where A k is the number of micros- ervices composing the service.
Since we assume that the capacity of the centralized cloud is infinite, we define the load of a system by considering edge and fog data centers only.The load offered by services of type k is where k = k k and the quantity C 0 = ∑ D∈D e ∪D f C(D) is the capacity of edge and fog data centers.The total load is equal to 1 + 2 .
Since we deal with a system with no blocking, the number of services in the system has a Poisson probability mass function as stated in [10].The mean and the variance of the number of microservices of type k in the system is then respectively.While the total number of services of types 1 and 2 in the system have Poisson distributions, it is difficult to compute the number of microservices hosted by a data center for a given placement strategy (GFF, GBF, or migration).

Numerical Results
For the simulation experiments, we have taken in a first step 1 = 2 = 1 and the load for type 1 (resp.2) services 1 = 1.5 (resp. 2 = 2.5 ) with A 1 = 3 and A 2 = 10 as stated in the previous section.The fact that 1 = 2 entails that large and small services stay in distribution for the same duration of time in the system.Since C 0 = 480 (see Table 3), we can fix the arrival rate 1 and 2 of the Poisson processes describing the arrivals of services at the system if we assume that 2 = 1 ∕2 , indi- cating that there are less large services but they offer a larger load.
To compute the probability mass functions of the quantities of interest, we use the "Poisson Arrival See Time Averages" (PASTA) property: we record at each service arrival the values of a variable that we want to observe.Then, we compute the normalized histogram of the successive observations.Thanks to PASTA, this yields the stationary distribution of the random variable under consideration.
Majority of studied research works followed the greedy or first-fit algorithms while implementing or comparing their solution such as [1,7,16,19].Therefore, in order to demonstrate the potential of our proposed migration strategy, we compare it with GFF and GBF.We consider the migration strategy triggered by service departures and relying on GBF for placement.We analyze latency and fragmentation experienced by small and large services.Using the Eq. 5) for latency that accounts: (i) the number of exchange between the microservices itself and those with respective end-user of a service and (ii) the distance between the data centers for the microservices placed on distinct node or layer.
In a first step, to analyse the impact of placement and decision of moving the microservice near to the end-user through the number of messages, we let vary the number of the messages exchanged between the end user and the first microservice ( 0 , 1 ) .In Table 4, our migration algorithm resulted in better performance by minimizing the average global latency for small and large services; the improvement is more significant for small services.As expected, the global latency increases with the number of messages exchanged between the microservices and the end-user since the first microservices may be placed in fog and cloud data centers.When ( 0 , 1 ) is equal to 2 or 5, we cannot observe much decrease in average latency after migration.If the number ( 0 , 1 ) increases, the global latency also improves after migration.Therefore, migration is relevant when there is a high active exchange between microservices, notably between the user and the first microservice.
Figure 3 compares the strategies in terms of the probability mass function (pmf) of latency (h)  S for small and large services considering messages exchange between the user and the first microservice as ( 1 0 , 1 1 ) = 50 and ( 2 0 , 2 1 ) = 2 respectively.We observe in Fig. 3c that the migration strategy globally minimizes the service latency in comparison to the GFF and GBF strategies (Fig. 3a and b) even if the latency of large services is slightly increased compared to GBF.
Likewise, we compare the strategies from the perspective of probability mass function (pmf) of fragmentation for small and large services in Fig. 4. As expected, Fig. 4b shows that for the GBF strategy, at most two data centers are used to place the service instead of the whole service on the same data center.This is due to the fact that (large-sized) services segregate or are allocated away from user's nodes when the resources at edge are full.The fragmentation index is larger than 1 for the large-sized services that are far from the end-user and in search of a (fog/cloud) data-center with enough resource availability.
The migration strategy in Fig. 4c still shows a far better outcome than GFF but remains little competitive against the GBF algorithm.
We finally study the resulting placement of the services at the different network layers (edge, fog and cloud) considering the GFF, GBF and "GBF along with migration" approaches, regarding small services (Fig. 5) and large services (Fig. 6).In particular, we consider the number of microservices placed on each layer which reflects the CPU resources that are consumed by the services.As expected, GFF (Fig. 5a) first consumes the edge resources and then moves to the cloud layer.GBF consumes more of the edge and fog for small services (Fig. 5b) compared to large services (Fig. 6b) that are mostly placed on cloud data centers.Likewise, migration strategy (Figs.5c and 6c) shows a similar trend but performs better than the GBF strategy: more microservices migrate near the end user while less-active microservices moves on upper layer to make the space for actively communicating microservices.
Table 5 reports the mean values and the variance of the occupancy at each layer for the various placement strategies.Note that the sum of the mean values on each line is roughly equal to A k k as stated in Eq. ( 9).The small difference is due to the limited number of simulated events (service arrivals of 1 million).For the variance, the sum of each line is significantly different from the value in Eq. ( 9).This is due to the fact that the number of microservices hosted by the various data centers are highly correlated.The correlation does not impact the mean but greatly affects the higher moments.This correlation seems to be impossible to model.Results (Table 6) are roughly the same with large services lasting much longer than small services (with 2 = 1 ∕10 ), considering unchanged loads.
In order to study the impact of link transmission capacity, we have doubled the transmission delay between one fog node (node 11 in Fig. 1) and the three attached edges nodes (nodes 4, 5, 6 in Fig. 1).We compare in Table 7 the values of the mean global latency for the former and modified cloud topology.For both topologies,  migration is efficient for small services (latency gain about 20%) compared to large services when compared with GBF.Due to the lack of space, we do not report the results for the latency, the fragmentation and the occupancy of nodes; they are roughly the same as in Figs. 3, 4, 5 and 6.

Conclusion
We have introduced a migration approach that improves the placement of chains of the microservices in terms of latency.The proposed heuristic considers some datacenters distributed over a three-tier architecture along with the ephemeral nature of containerized services.The heuristic first chooses the highly-active microservices that are fragmented to dynamically place these latter near the end-user.At the same, the heuristic analyses the possible replacement of microservice that is already placed microservice in case of lower available resource occupancy at the desired data center.The simulation-based evaluation shows that migration performs better than static placement (e.g., GBF and GFF strategies considered in this paper) and reduces the latency.This work may be extended by using a deep-learning approach through our future work.
where D e is the set of data centers at the edge, D f is that of fog data centers and D c is the cloud data center.Given the hierarchical structure of the network topology considered, we associate with D ∈ D e the data center parent(D) = D f , which is the fog data centers that D is connected to, and for D ∈ D f , we set parent(D) = D c .

Fig. 1
Fig. 1 Cloud infrastructure of a network data centers in the systemC(D n )CPU capacity of data center D n D e (resp1., D f , resp2., D c ) Set of edge (resp.1,fog, resp.2, centralized) data centers (D = D e ∪ D f ∪ D c ) R(D n ) RAM capacity of data center D n S = { 1 , … , j S } Service S composed of microservices j for j = 1, … , j S C = (C(D n ), n = 1, … , N)Capacity vector of the system c( ) CPU requirement by microservice r( ) RAM requirements by microservice S Set of services to be placed h ∶ S → (h( 1 ), … , h( j S )) Placement of service S on the set of data centers ( h( j ) ∈ D) S (h) Set of services placed under placement h S (h) f Set of fragmented placed services under placement h M (h) n Set of microservices placed under placement h on data center D n M (h) Set of microservices placed under placement h in the system S ( , � ) Number of messages exchanged between microservices and ′ of service S d n,m Delay between data centers D n and D m Δ h S The (1 + j S ) × (1 + j S ) delay matrix of service S, whose (i, j) entry is equal to d h( i ),h( j ) (h) S Latency experience by service S under placement h 84 Page 10 of 25 Page 20 of 25

Fig. 5
Fig. 5 Placement of small microservices on different layers

Fig. 6
Fig. 6 Placement of large microservices on different layers

Table 1
Characteristics of various migration strategies

Table 4
Mean of global latency

Table 5
Occupancy mean and variance of microservices at different layer (for 1 = 2 )