On Demand Deployment of Edge Cloud Infrastructures for Federated Learning

doi:10.21203/rs.3.rs-2354570/v1

Download PDF

Research Article

On Demand Deployment of Edge Cloud Infrastructures for Federated Learning

https://doi.org/10.21203/rs.3.rs-2354570/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Federated learning on the edge allows the use of more powerful servers and more complex training models. This paper presents the deployment of a real federated learning framework on top of a real geo-distributed edge computing infrastructure, based on a commercial edge provider, using the OpenNebula cloud platform. Results show the feasibility, performance and cost efficiency of the solution.

Edge Computing

Cloud Computing

Federated Learning

Benchmarking

Edge Computing transfers computational resources and data closer to consumers, whether they are users or devices, mainly providing latency reduction, which enables new responsive services; and traffic savings, which reduce congestion in the core network. With the current growth, it is expected that most of the generated data will be processed on the edge by 2025.

Federated learning (FL) is a method for cooperatively training neural networks between multiple nodes. A server distributes an initial model to clients, who independently update the model using local data and send the model back to the server to update the global model. The main advantage of FL is that privacy is preserved, as data is kept local to each node involved in the training.

FL is well suited for edge computing, since it can leverage the computation power of edge servers to process the data collected on widely dispersed devices. Performing FL on the edge provides benefits like more performant servers, battery saving in the devices or access to aggregated data. Nevertheless, most research works that propose edge-based FL solutions are based on modeling or simulation, and few of them build small local infrastructures to emulate edge ones in a controlled environment, but they are not tested nor validated on a real edge computing infrastructure.

Therefore, the main goal of this work is twofold: first, to demonstrate that it is possible to provision and deploy an edge-based cross-silo FL infrastructure easily and within a limited time; and second, to prove that the performance of this implementation is better than a cross-device FL solution, from the point of view of training time, accuracy of the resulting ML model, communication latencies and bandwidth consumption.

The main contribution of this work is to demonstrate the viability of deploying a real FL framework on top of a real geo-distributed edge computing infrastructure, based on a commercial edge provider. Furthermore, we propose the use of the OpenNebula cloud platform to manage and orchestrate the deployment of the edge infrastructure.

The structure of this paper is as follows. Section 2 describes the related work, both in Edge Computing and FL. Section 3 introduces the proposed platform to deploy edge clusters on demand. Section 4 describes our approach to perform FL on the Edge. Section 5 presents the experiments and provides results. Finally, Section 6 summarizes the main conclusions and depicts the future work.

Edge Computing [Shi16, Sat17] is a many-sided concept that started being applied in different fields, from telecommunications, to Cloud Computing or Internet of Things (IoT). Until Edge Computing was widely adopted, it received different names, like Micro Data Centers (MDCs) [Aaz15], Cloudlets [Sat09], Multi-access Edge Computing [MEC] or Fog Computing [Bon12]. Today, the consensus is that there are actually two types of edge: the far edge and the near edge. The far edge is computing infrastructure deployed in a location farthest from the cloud data centers and closest to the users, for example, 5G cell towers or IoT appliances; while the near edge is computing infrastructure deployed in a location between the far edge and the cloud data centers, for example, in telecom central offices or in ISP’s (Internet Service Provider) PoPs (Point of Presence).

Most Edge Computing platforms are based on a distributed architecture, where each edge node is independently managed and an upper-level edge gateway is responsible for interacting with clients and routing client requests to the appropriate bottom edge nodes. This approach is highly scalable; however, it delegates most complex management tasks to the edge nodes, which can consume a lot of their resources. Furthermore, installing, configuring, and maintaining the edge nodes can be an extremely complex and time-consuming task.

In contrast, we proposed a disaggregated architecture based on a centralized management, which releases edge nodes from complex management tasks [Mor19]. This approach shows a good tradeoff of performance and design complexity, and provides a uniform view of all the distributed edge infrastructure. The main disadvantage of this architecture is its scalability, which is limited with regard to the number of edge nodes that can be monitored and managed. In a previous work, we presented a disaggregated edge cloud platform based on OpenNebula to create a cloud infrastructure using resources from cloud and edge providers [Hue21].

There is interesting research about the management of distributed edge computing infrastructures. However, most works present simulated results and few of them present real platforms tested with real applications. For instance, ENORM is an edge resource management framework that provides provisioning and auto-scaling of edge node resources tested using AWS and a gaming application [Wan20]. There are also MEC platforms to virtualize the access network and provide cloud-computing capabilities at the network edge [Jar16, Mor17].

Regarding technologies, OpenStack StarlingX provides a container-based platform to build mission critical edge clouds, and the Distributed Cloud subproject supports an edge computing solution by providing central management and orchestration for a geographically distributed network of StarlingX Kubernetes edge clusters, which are centrally managed and synchronized over L3 networks from a central cloud. Also, KubeEdge enables the orchestration and management of edge clusters similar to how Kubernetes manages in the cloud.

Finally, cloud providers are providing solutions to extend their services to the client premises. For example, AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to customer premises, using the same programming interfaces as in AWS Regions, while using local compute and storage resources for lower latency and local data processing needs. Similarly, Google Anthos is an application management platform based on Kubernetes that provides a consistent development and operations experience for cloud (Google, AWS and Azure) and on-premises environments.

From the point of view of FL, most initial works were based on a cloud model, with a single central server that collects and aggregates the trained ML models of the different clients. However, more recently, edge computing has emerged as a naturally suitable environment to deploy and implement FL solutions [Lim20] [Xia21] [Abr22].

For example, authors in [Wng21] propose a cluster-based FL mechanism for Mobile Edge Computing (MEC) environments, where edge nodes, which act as FL clients, are divided into several clusters by balanced clustering. On each cluster, a leader node is chosen, which is responsible for aggregating all local models in this cluster. All the edge nodes in the same cluster implement a synchronous aggregation mechanism, while all leader nodes communicate with the central server for global aggregation following an asynchronous mechanism.

Another interesting proposal is EdgeFed [Ye20], which introduces an additional layer in the FL architecture called ‘client-edge-cloud’. This three-layer model divides the process of updating model parameters into two parts: the local updates of model parameters are performed on ‘client-edge’, and the global aggregation is between ‘edge-cloud’. This model allows to reduce both the computational cost of the mobile devices and the global communication expense.

This three-layer architecture is also analyzed in [Liu20], which compares three different FL models: cloud-based, edge-based and client-edge-cloud hierarchical FL. Cloud-based FL can involve several millions of clients, providing massive datasets, but communications with the cloud server can be slow and unpredictable. In edge-based FL, the server is placed in an edge infrastructure, closer to the clients, thus reducing communication latency. Nevertheless, the main disadvantage is the limited number of clients each server can access, which can result in a training performance loss. Finally, client-edge-cloud hierarchical FL systems get the best of the previous two models: it reduces the costly communication with the cloud, complemented by efficient client-edge updates, and allows managing a large number of clients, so more data can be accessed by the cloud server, which improves the model training performance.

The main limitation of all the above-mentioned works is that they are mostly theoretical, and their results are based on mathematical models or simulations, but the proposed FL solutions are not tested nor validated on a real edge computing infrastructure. In this work, we propose and analyze a three-layer edge-based cross-silo FL architecture, and we achieve a real deployment of a FL use case on top of a real edge infrastructure made of several geographically distributed edge nodes from a commercial edge provider (Equinix Metal), and using the OpenNebula platform for the provisioning and deployment of the edge infrastructure.

In this section, we propose an Edge Cloud platform (shown in Fig. 1) based on OpenNebula that provides the ability to dynamically grow a cloud infrastructure by instantiating Edge Clusters, built using virtual or physical resources from cloud and edge providers like Amazon Web Services, Google Cloud or Equinix Metal. Moreover, resources (e.g., hosts or IP addresses) can be easily added or removed to/from an existing cluster, providing full elasticity. This enables true edge, hybrid and multi-cloud environments to meet latency, bandwidth, privacy or data regulation needs of the workload.

3.1. Anatomy of an Edge Cluster

EdgeClusters can be created from any of the available providers included in the Provider Catalog. The Provider Catalog includes bare-metal providers, like Equinix Metal, Amazon Web Services or Vultr; as well as virtual machine (VM) providers, like Amazon Web Services, Google Cloud, Digital Ocean or Vultr. On premises physical resources can also be used to create an Edge Cluster. Edge Clusters support different workload types, namely: application containers, VMs or Kubernetes clusters. As virtualization technology, KVM, Firecracker and LXC can be used with physical resources, while LXC is used with VM providers.

A cluster consists of the logical resources in OpenNebula and the corresponding resources in the provider. OpenNebula provides Edge Cluster templates with common configurations that includes the following resources:

Cluster: Each provision creates one OpenNebula cluster containing all the resources. There is a one-to-one relationship between the provision and the cluster.
Datastores: Each provision deploys two datastores, the Image Datastore, to store a repository of disk images; and the System Datastore, to hold disks for running VMs, usually cloned from the Image Datastore. The contents of the Image Datastore are replicated on demand and cached from the central OpenNebula Image Datastore, and a specialized distribution system makes the images available to all cluster hosts.
Virtual Networks: A private and public network are created for the cluster. The private network by default is based on the VXLAN overlay to not interfere with the provider network management. The public network uses the elastic IP service of the provider to pre-allocate IPs, so VMs have public connectivity.
Hosts: The servers to run the VMs or containers. The servers can be bare-metal instances or VMs depending on the provider.

During the provision of the cluster, all these resources and their corresponding objects in the provider are created using Terraform. Then, the resources are configured with Ansible in order to install and configure the OpenNebula software. Once the resources are ready, they are enabled in OpenNebula as hosts. The oneprovision command-line tool and the FireEdge web GUI (shown in Fig. 2) provide access to this functionality.

3.2. Architecture of Amazon AWS and Equinix Edge Clusters

As shown in Fig. 3, the Amazon AWS Edge Cluster consists of the following resources:

Instance: Virtual or metal AWS Instances to be used as OpenNebula Hosts.
Virtual Private Cloud (VPC): Isolated virtual network for all the deployed resources.
Subnet: To allow communication between VMs that are running in the provisioned Hosts.
Internet Gateway: To allow VMs to have public connectivity over the Internet.
Security Group: All the traffic is allowed by default, but custom security rules can be defined by the user to allow only specific traffic to the VMs.

The network model is implemented in the following way:

Public Networking: Elastic IPs are requested from AWS using a specific OpenNebula IPAM (IP Address Management) driver. Then, the IP is assigned to the Host where the VM or container is running. Finally, inside the Host, IP forwarding rules are applied so the VMs or containers can communicate over the public IP assigned by AWS.
Private Networking: VXLAN (Virtual eXtensible Local Area Network) is used as the overlay protocol and EVPN (Ethernet Virtual Private Network) is used to communicate MAC and IP addresses through BGP (Border Gateway Protocol). This is called VXLAN-EVPN or VXLAN BGP-EVPN.

For Equinix Metal, a Device is created for the Host, and the network model is implemented in a similar way as in AWS, as shown in Fig. 4.

Federated Learning (FL) is a machine learning (ML) technique which enables multiple parties (clients) to collaborate in training a ML model, without exchanging their local raw data, and thus preserving the privacy of this data. FL systems usually work under the coordination of a central server that collects and aggregates the trained models of the different clients into a single model, which is forwarded to the clients. This shared model can be evaluated and refined again by the clients in multiple rounds.

As shown in Fig. 5, FL systems can be classified into two major categories according to the scale of federation [Kai19]:

Cross-device FL systems, consisting of a large number of thin clients, such as mobile or IoT devices, typically with limited computational capabilities and a limited training dataset per client. Also, the number of clients is variable, as they can be dynamically added or disabled.
Cross-silo FL systems, where clients are organizations, data centers, or edge nodes, which can collect training data from multiple local users or devices. Typically, the number of clients is relatively small and almost fixed, with higher computational power and larger training datasets per client.

In this paper we propose an edge-based cross-silo FL implementation, where edge nodes acting as FL clients can be dynamically deployed on an on-demand basis. These edge nodes can collect and aggregate training data from multiple nearby clients or devices. For example, an organization with multiple sites (e.g., hospitals, banks, etc.) may deploy a per-site edge infrastructure to collect training data from local site users; or a service provider may deploy multiple geographically distributed edge nodes, close to end users or devices, to collect training data from them.

Edge-based cross-silo FL has several advantages compared to cross-device FL:

Edge nodes typically have higher computational power than end devices, thus the execution time of the ML model training can be reduced.
Edge nodes collect and aggregate training data from multiple users or devices, so they work with larger datasets than individual devices, resulting in a higher accuracy of the models trained in the edge clients.
Regarding data communications, we have to distinguish between the transmission of training data and the transmission of trained models. In the edge-based cross-silo FL model, raw training data must be transmitted from end devices to edge nodes before starting the training process, while this is not required in the cross-device FL model. However, because edge nodes are typically deployed in close proximity to end devices, this transmission is usually done over high-speed, low-latency edge networks, and does not consume Internet bandwidth, so that the impact on overall training time is not significant. On the other hand, considering the transmission of ML models, on each training round, clients transmit their trained models to the server, and the server aggregates these models in a single model which is sent back to the clients. These trained models usually have a fixed size, therefore, in an edge-based cross-silo FL, where the number of clients is much smaller than in cross-device FL, the bandwidth consumed in transmitting trained models is also much lower.

The main disadvantage of edge-based cross-silo FL is data privacy, since training data needs to be transmitted from devices to edge nodes. In this case, we assume, first, that the edge provider is authorized to store this data, and second, that data transmission is secured by using some privacy preserving scheme [Zha20, Wan21].

5.1. FL Framework and Model

Flower [Beu20] is a novel framework that supports experimentation with both algorithmic and systems-related challenges in FL. Major design goals for Flower are being ML framework-agnostic, client-agnostic, expandable, accessible and scalable. Flower has several features that are very interesting for our experimental environment: first, Flower can manage a large number of clients training concurrently, allowing researchers to deploy large-scale FL problems and, using reasonable levels of compute, can obtain results at acceptable speed; second, Flower can run on heterogeneous client environments, allowing both to both simulate heterogeneity and to execute FL on real edge devices, and even mixing simulated and real environments; finally, Flower can deal with different ML frameworks, allowing clients to use varying ML frameworks in the same workload. For all these reasons we chose Flower as the FL framework for our experimental environment.

The main Flower architecture is made of a central server and a variable number of clients. On the server side, there are three major components involved: the FL loop, which orchestrates and controls the progress of the learning process; the RPC server, which is responsible for interfacing with clients, and sending and receiving Flower Protocol messages (e.g., fit, evaluate, etc.); and the Strategy, which defines the federated averaging algorithms used for aggregating the model parameters across clients. Different aggregation strategies can be used in Flower. The basic one is FedAvg (Vanilla Federated Averaging) that consists of an equal distribution of model parameters for every local model. Other strategies are Fault Tolerant FedAvg, which can deal with faulty client conditions such as client disconnections or laggards; FedProx, which is able to extend FL to heterogeneous network conditions; QFedAvg, which encourage fairness in FL; and FedOptim, a family of server-side optimizations that include several advanced algorithms.

Regarding the client side, an outstanding characteristic of Flower is that the server is unaware of the nature of connected clients, enabling model training across heterogeneous client platforms and implementations. For example, clients can be based on Virtual Client Engine (VCE), which enables the virtualization of Flower clients to maximize utilization of the available hardware, or can run on real devices (e.g., Android phones, Raspberry Pis, ARM microcontrollers, Nvidia Jetson devices, etc.), general purpose computers, or VMs and virtual devices deployed on a cloud or edge provider.

The CIFAR-10 dataset (162.6 MiB) was chosen, because it is commonly used as a benchmark for evaluating vision models. The dataset consists of 60,000 images (50,000 for training and 10,000 for testing) from 10 different object classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck). The images are in RGB format and are 32 x 32 pixels in size. Partitioning functions implemented in Flower are used to split a dataset across the different clients in a user-defined way.

A simple 2D Convolutional Neural Network (CNN) ML model was created in Keras on top of TensorFlow. As shown in Fig. 6, the model performs two spatial convolution steps, each followed by maximum pooling for down sampling and dropout to prevent overfitting, and a final step flattening out the network to densely connected layers to make predictions. FedAvg is used as the FL strategy. The global model (5 MiB) is sent after every round of FL and 12 rounds are performed in total. For the first round, each client starts with a random model.

5.2. Testbed

The experimental testbed used in this work is based on a cloud/edge environment, so that both the Flower server and clients run on VMs deployed on different cloud or edge resources. To emulate cross-device and cross-silo FL platforms, we use VMs of different capacity. For example, in a cross-device FL platform, where training models run on end-devices, clients are implemented as VMs with limited capacity (e.g., 2 vCPUs and 4 MB RAM). On the other hand, in a cross-silo edge-based FL platform, where training models run on edge nodes, clients are implemented as VMs with higher capacity (e.g., 16 vCPUs and 30 MB RAM).

Experiments were performed with the provision of clusters on 3 different locations of Equinix Metal using KVM as hypervisor. Physical servers of c3.small machine type were used, with an Intel Xeon E-2278G CPU with 8 cores (16 threads) at 3.40 GHz and 32 GB of RAM. Then, to validate the approach at a larger scale, a final experiment was performed with 8 different locations.

The OpenNebula frontend was deployed on AWS in Paris using a t2.large instance with 8 GB of RAM and an Intel Xeon Scalable CPU with 2 virtual CPUs at up to 3.0 GHz. All provisions and management operations are done from this server, which also runs the FL server. Table 1 lists the Equinix Metal locations used in the experiments along with the average Round-Trip Time (RTT) measured from the frontend.

We consider two scenarios, training on the device (cross-device FL) and training on the edge (cross-silo FL), with their parameters shown in Table 2.

Table 1

Locations used in the experiments.
Location	RTT (ms)
Frankfurt (Germany)	11
Amsterdam (Netherlands)	14
Madrid (Spain)	20
New York (US)	76
Washington DC (US)	83
Dallas (US)	108
Singapore (Singapore)	173
Sidney (Australia)	282

Table 2

Configuration for the experiments.
Parameter	Device	Edge
Clients per location	6	1
vCPUs per client	2	16
RAM per client (GB)	4	30

5.3. Experiments and Results

Figures 7 and 8 show the deployment and execution times for training on the device and on the edge with 3 locations. The times for the following stages are shown:

Provision: The physical resources are deployed on the provider using Terraform (this usually takes less than a minute in Equinix Metal, unless resources have to be deprovisioned) and configured using Ansible in order to install all the OpenNebula software and distribute the required SSH keys.
Host enablement: The host is enabled in OpenNebula, which starts monitoring the host to check that it is ready.
VMs instantiation: The VM is instantiated in OpenNebula, so the VM image is transferred and the VM is booted.
VM preparation: Using the OpenNebula contextualization mechanism, the VM is prepared by installing Flower and transferring the needed input data.
FL client synchronization: The FL client starts, but waits for the rest of clients to start the execution of FL.
FL execution: When all FL clients are ready, FL starts.

We can see that provision times are quite consistent in both deployments, with higher provisioning times in Madrid, because resources had to be deprovisioned first. The main difference is in the VM instantiation time, since in the first case 6 VMs were instantiated while only one was in the second case. Regarding the execution time, it is lower when learning in the device, because of the smaller input data and higher data parallelism. However, in terms of quality, learning on the edge provides higher accuracy and lower loss, as shown in Table 3. This is mainly because more data is processed on each client, increasing the chances to detect patterns.

Table 3

Execution time and quality of the model with 3 locations
Metric	Device	Edge
Time	5 m 14 s	5 m 56 s
Loss	1.47	0.86
Accuracy	0.46	0.70

To validate the approach at a larger scale, another experiment was performed for training on the edge with 8 locations. The deployment and execution times for this experiment are shown in Fig. 9.

In this case, we can see that provision and VM instantiation times increase with the latency. On the other hand, FL execution time decreases to 3 m 17 s, since the training data is distributed to more clients. As discussed before, the number of clients also determines the quality of the resulting model. In this case, with 1.19 loss and 0.58 accuracy, the quality is worse than with 3 locations on the edge, but still better than with 3 locations on the device.

Finally, Table 4 summarizes the cost of each element of the infrastructure (not including the frontend) and calculates the cost per hour of each experiment. It can be seen that the cost of the experiments is affordable, even more if we take into account that each experiment takes about 15 minutes. Therefore, we demonstrate that the presented approach is feasible for testing, validation and benchmarking experiments on highly distributed edge computing infrastructures, both in terms of performance and cost.

Table 4

Cost of the infrastructure.
Element	Unit Cost	Device with 3 locations	Edge with 3 locations	Edge with 8 locations
c3.small in Europe	$0.83/h	3	3	3
c3.small in US	$0.75/h	0	0	3
c3.small in Asia	$0.5/h	0	0	1
c3.small in Australia	$0.9/h	0	0	1
Public IPv4	$0.005/h	18	3	8
Outbound bandwidth	$0.05/GB	0	0	0
Total cost per hour		$2.58	$2.51	$6.18

This article proposes an Edge Cloud platform based on OpenNebula that provides the ability to dynamically grow a cloud infrastructure by instantiating Edge Clusters, built using virtual or physical resources from cloud and edge providers.

Results show that, with the proposed platform, deploying an edge infrastructure for testing, validation and benchmarking is easy, fast and cost efficient. In particular, an edge-based cross-silo FL infrastructure can be provisioned and deployed in as little as 16 minutes for just over $6.

Future work includes the design of location-specific elasticity rules for both physical and virtual resources and global load-balancing techniques. Regarding FL, we plan to test client-edge-cloud hierarchical FL as it provides the benefits of cloud-based and edge-based FL approaches without the disadvantages.

Ethical Approval

Not applicable.

Competing interests

Ignacio M. Llorente is the Chief Executive Officer of OpenNebula Systems.

Authors’ contributions

Eduardo Huedo: conceptualization, validation, writing – original draft. Rafael Moreno-Vozmediano: conceptualization, methodology, writing – original draft. Rubén S. Montero: conceptualization, software, writing – original draft. Ignacio M. Llorente: conceptualization, writing – review and editing. All authors read and approved the final manuscript.

Funding

This work was supported by Ministerio de Ciencia, Innovación y Universidades through the EdgeCloud research project (RTI2018-096465-B-I00) and by Comunidad de Madrid through the EdgeData research program (P2018/TCS4499).

Availability of data and materials

OpenNebula and Flower are open source. Any other data or code is available on request.

[Aaz15] M (2015) “FogComputingMicroDatacenterBasedDynamicResourceEstimationandPricingModelforIoT”.Proc.29thIntl.Conf.AdvancedInformationNetworkingandApplications,pp.687–694,doi:10.1109/AINA.2015.254
[Abr22] HG (2022) Federated Learning in Edge Computing: A Systematic Survey. Sensors 22(2):450. doi:10.3390/s22020450
de [Beu20] DJ “Flower:Afriendlyfederatedlearningresearchframework”.arXivpreprintarXiv:2007.14390,2020
[Bon12] F (2012) “Fogcomputinganditsroleintheinternetofthings”.Proc.1stWorkshoponMobileCloudComputing,pp.13–16,doi:10.1145/2342509.2342513
[Fen21] C (2021) On the Design of Federated Learning in the Mobile Edge Computing Systems. IEEE Trans Commun 69(9):5902–5916. doi:10.1109/TCOMM.2021.3087125
[Hue21] E “OpportunisticDeploymentofDistributedEdgeCloudsforLatency-CriticalApplications”.J.GridComputing19(1):2,2021.doi:10.1007/s10723-021-09545-3
[Jar16] Y (2016) "SDMEC:SoftwareDefinedSystemforMobileEdgeComputing".2016IEEEIntl.Conf.CloudEngineeringWorkshop(IC2EW),pp.88–93,doi:10.1109/IC2EW.2016.45
[Kai19] P et al (2021) "Advances and Open Problems in Federated Learning" Foundations and Trends® in Machine Learning 14(1–2):1–210. doi:10.1561/2200000083
[Lim20] WYB et al (2020) "Federated Learning in Mobile Edge Networks: A Comprehensive Survey". IEEE Commun Surv Tutorials 22(3):2031–2063. doi:10.1109/COMST.2020.2986024
[Liu20] L (2020) Letaief."Client-Edge-CloudHierarchicalFederatedLearning".Proc.IEEEIntl.Conf.Communications(ICC),pp.1–6,doi:10.1109/ICC40277.2020.9148862
[MEC] ETSI (2022) “Multi-accessEdgeComputing(MEC)”.Availableonline:https://www.etsi.org/technologies/multi-access-edge-computing
[Mor17] RS (2017) Extending the Cloud to the Network Edge. IEEE Comput 50(4):91–95. doi:10.1109/MC.2017.118
[Mor19] R (2019) A Disaggregated Cloud Architecture for Edge Computing. IEEE Internet Comput 23(3):31–36. doi:10.1109/MIC.2019.2918079
[Sat09] M (2009) The case for VM-based Cloudlets in mobile computing. IEEE Pervasive Comput 8:14–23. doi:10.1109/MPRV.2009.82
[Sat17] M (2017) The emergence of edge computing. Computer 50(1):30–39. doi:10.1109/MC.2017.9
[Shi16] W (2016) The Promise of Edge Computing. Computer 49(5):78–81. doi:10.1109/MC.2016.145
[Wan20] N (2020) ENORM: a framework for edge NOde resource management. IEEE Trans Services Computing 13(6):1086–1099. doi:10.1109/TSC.2017.2753775
[Wan21] C (2022) Safeguarding cross-silo federated learning with local differential privacy. Digit Commun Networks 8(4):446–454. doi:10.1016/j.dcan.2021.11.006
[Wng21] Z (2021) "Resource-Efficient Federated Learning with Hierarchical Aggregation in Edge Computing". IEEE Conf Computer Communications 1–10. doi:10.1109/INFOCOM42981.2021.9488756
[Xia21] Q (2021) A survey of federated learning for edge computing: Research problems and solutions. High-Confidence Comput 1(1):100008. doi:10.1016/j.hcc.2021.100008
[Ye20] Y (2020) EdgeFed: Optimized Federated Learning Based on Edge Computing. IEEE Access 8:209191–209198. doi:10.1109/ACCESS.2020.3038287
[Zha20] C (2020) “BatchCrypt:efficienthomomorphicencryptionforcross-silofederatedlearning”.Proc.2020USENIXConference,Article33,pp.493–506

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

On Demand Deployment of Edge Cloud Infrastructures for Federated Learning

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Deployment Of Edge Cloud Infrastructures

3.1. Anatomy of an Edge Cluster

3.2. Architecture of Amazon AWS and Equinix Edge Clusters

4. Federated Learning On The Edge

5. Experiments

5.1. FL Framework and Model

5.2. Testbed

5.3. Experiments and Results

6. Conclusions And Future Work

Declarations

Ethical Approval

Competing interests

Authors’ contributions

Funding

Availability of data and materials

References

Additional Declarations

Status:

Version 1