Edge Computing [Shi16, Sat17] is a many-sided concept that started being applied in different fields, from telecommunications, to Cloud Computing or Internet of Things (IoT). Until Edge Computing was widely adopted, it received different names, like Micro Data Centers (MDCs) [Aaz15], Cloudlets [Sat09], Multi-access Edge Computing [MEC] or Fog Computing [Bon12]. Today, the consensus is that there are actually two types of edge: the far edge and the near edge. The far edge is computing infrastructure deployed in a location farthest from the cloud data centers and closest to the users, for example, 5G cell towers or IoT appliances; while the near edge is computing infrastructure deployed in a location between the far edge and the cloud data centers, for example, in telecom central offices or in ISP’s (Internet Service Provider) PoPs (Point of Presence).
Most Edge Computing platforms are based on a distributed architecture, where each edge node is independently managed and an upper-level edge gateway is responsible for interacting with clients and routing client requests to the appropriate bottom edge nodes. This approach is highly scalable; however, it delegates most complex management tasks to the edge nodes, which can consume a lot of their resources. Furthermore, installing, configuring, and maintaining the edge nodes can be an extremely complex and time-consuming task.
In contrast, we proposed a disaggregated architecture based on a centralized management, which releases edge nodes from complex management tasks [Mor19]. This approach shows a good tradeoff of performance and design complexity, and provides a uniform view of all the distributed edge infrastructure. The main disadvantage of this architecture is its scalability, which is limited with regard to the number of edge nodes that can be monitored and managed. In a previous work, we presented a disaggregated edge cloud platform based on OpenNebula to create a cloud infrastructure using resources from cloud and edge providers [Hue21].
There is interesting research about the management of distributed edge computing infrastructures. However, most works present simulated results and few of them present real platforms tested with real applications. For instance, ENORM is an edge resource management framework that provides provisioning and auto-scaling of edge node resources tested using AWS and a gaming application [Wan20]. There are also MEC platforms to virtualize the access network and provide cloud-computing capabilities at the network edge [Jar16, Mor17].
Regarding technologies, OpenStack StarlingX provides a container-based platform to build mission critical edge clouds, and the Distributed Cloud subproject supports an edge computing solution by providing central management and orchestration for a geographically distributed network of StarlingX Kubernetes edge clusters, which are centrally managed and synchronized over L3 networks from a central cloud. Also, KubeEdge enables the orchestration and management of edge clusters similar to how Kubernetes manages in the cloud.
Finally, cloud providers are providing solutions to extend their services to the client premises. For example, AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to customer premises, using the same programming interfaces as in AWS Regions, while using local compute and storage resources for lower latency and local data processing needs. Similarly, Google Anthos is an application management platform based on Kubernetes that provides a consistent development and operations experience for cloud (Google, AWS and Azure) and on-premises environments.
From the point of view of FL, most initial works were based on a cloud model, with a single central server that collects and aggregates the trained ML models of the different clients. However, more recently, edge computing has emerged as a naturally suitable environment to deploy and implement FL solutions [Lim20] [Xia21] [Abr22].
For example, authors in [Wng21] propose a cluster-based FL mechanism for Mobile Edge Computing (MEC) environments, where edge nodes, which act as FL clients, are divided into several clusters by balanced clustering. On each cluster, a leader node is chosen, which is responsible for aggregating all local models in this cluster. All the edge nodes in the same cluster implement a synchronous aggregation mechanism, while all leader nodes communicate with the central server for global aggregation following an asynchronous mechanism.
Another interesting proposal is EdgeFed [Ye20], which introduces an additional layer in the FL architecture called ‘client-edge-cloud’. This three-layer model divides the process of updating model parameters into two parts: the local updates of model parameters are performed on ‘client-edge’, and the global aggregation is between ‘edge-cloud’. This model allows to reduce both the computational cost of the mobile devices and the global communication expense.
This three-layer architecture is also analyzed in [Liu20], which compares three different FL models: cloud-based, edge-based and client-edge-cloud hierarchical FL. Cloud-based FL can involve several millions of clients, providing massive datasets, but communications with the cloud server can be slow and unpredictable. In edge-based FL, the server is placed in an edge infrastructure, closer to the clients, thus reducing communication latency. Nevertheless, the main disadvantage is the limited number of clients each server can access, which can result in a training performance loss. Finally, client-edge-cloud hierarchical FL systems get the best of the previous two models: it reduces the costly communication with the cloud, complemented by efficient client-edge updates, and allows managing a large number of clients, so more data can be accessed by the cloud server, which improves the model training performance.
The main limitation of all the above-mentioned works is that they are mostly theoretical, and their results are based on mathematical models or simulations, but the proposed FL solutions are not tested nor validated on a real edge computing infrastructure. In this work, we propose and analyze a three-layer edge-based cross-silo FL architecture, and we achieve a real deployment of a FL use case on top of a real edge infrastructure made of several geographically distributed edge nodes from a commercial edge provider (Equinix Metal), and using the OpenNebula platform for the provisioning and deployment of the edge infrastructure.