3.1 Overview
This paper presents an innovative Intelligent Mobile Data Collection (MDC) framework tailored specifically for Internet of Things (IoT) based sensor networks. Central to this framework is the utilization of Frequency-Based Reinforcement Learning (FRL) to discern and adapt to data generation patterns, including time intervals between transmissions, packet quantities, and packet types. Within the FRL paradigm, each IoT sensor or device autonomously trains its local model using Reinforcement Learning (RL) techniques, encompassing states, actions, and rewards. This localized learning process enables sensors to efficiently capture and adapt to the unique characteristics of their operational environment. Following the training phase, IoT sensors transmit their locally trained model parameters to a central gateway for aggregation. At the gateway, these individual parameters are amalgamated into a comprehensive global model, leveraging collective insights from the entire sensor network. This aggregated global model is then disseminated back to the IoT sensors, empowering them with refined decision-making capabilities informed by network-wide data.
Building upon the insights gleaned from the global model, the framework dynamically adjusts key parameters such as Time Division Multiple Access (TDMA) slots, sleep durations for sensors, and the visiting schedule of the MDC. These adjustments are contingent upon the categorization of sensor clusters, ensuring optimized resource allocation and data collection efficiency tailored to the specific needs and dynamics of each cluster. By integrating FRL-based learning, distributed RL techniques, and adaptive scheduling mechanisms, the proposed framework offers a comprehensive solution to the challenges inherent in IoT-based sensor networks. This approach not only enhances data collection efficiency but also fosters adaptability and resilience in the face of evolving network conditions and requirements. Ultimately, the framework holds promise for facilitating more intelligent and responsive IoT deployments, with implications spanning various domains, from smart cities to industrial automation.
3.2 System Model
Figure 1 illustrates the system model of the proposed framework, delineating the intricate interplay between IoT sensors, gateways, and the Mobile Data Collector (MDC). Central to this model is the concept of clustering, wherein IoT devices and sensors are organized into cohesive groups based on their geographical proximity or regional affiliation. Each cluster is spearheaded by a designated gateway node tasked with the responsibility of collecting, aggregating, and transmitting data from its constituent members to the central MDC. The process of cluster formation hinges on meticulous consideration of geographical locations, ensuring that clusters encapsulate sensors within close physical proximity to optimize data collection efficiency. To facilitate this, each cluster is assigned a unique identifier (ID), enabling seamless communication and management within the network architecture. Crucially, the selection of gateway nodes within each cluster is guided by a judicious blend of factors, primarily residual energy and node degree. Residual energy levels serve as a vital metric for gauging the operational capacity and longevity of potential gateway candidates, ensuring sustainable data transmission capabilities. Concurrently, the node degree, reflecting the connectivity and centrality of each sensor within the cluster, informs the selection process, ensuring robust network coverage and resilience. By integrating geographical clustering, cluster identification, and gateway selection based on energy and connectivity metrics, the framework lays the groundwork for a resilient and efficient data collection ecosystem. This holistic approach not only optimizes resource utilization but also enhances the scalability and adaptability of the system to dynamic environmental conditions and network dynamics. Through Fig. 1, the intricate orchestration of components within the framework comes to life, offering a visual representation of the cohesive symbiosis driving the efficacy and functionality of the proposed architecture.
3.3 Basics of FL
A consortium of users leveraging IoT devices such as smartphones, laptops, or tablets collaboratively implements Federated Learning (FL) algorithms to execute IoT tasks. FL stands as a cornerstone in the evolution of next-generation IoT networks, where its significance is paramount for unlocking the full potential of intelligence at the network edge. This is especially crucial as a centralized Base Station (BS) often lacks the capability to gather all data generated by distributed IoT devices for training Artificial Intelligence/Machine Learning (AI/ML) models. FL revolutionizes the conventional approach by enabling IoT users and the BS to jointly train a global model while preserving raw data privacy at the users' devices. Through FL, each IoT user actively contributes to the training process by leveraging their individual datasets to train a localized ML model. Subsequently, these locally trained models are uploaded to the BS, which orchestrates the aggregation process to construct a comprehensive global model. This collaborative FL process ensures that the intelligence gleaned from IoT data remains decentralized and distributed, reflecting the diverse contexts and environments in which IoT devices operate. By allowing users to retain control over their data while still contributing to the collective intelligence, FL strikes a delicate balance between privacy preservation and model accuracy. Furthermore, FL facilitates continual model refinement and adaptation to evolving data distributions without necessitating data centralization. This distributed approach not only mitigates privacy concerns but also enhances scalability and robustness, as the global model reflects the collective insights from a diverse array of IoT devices. In essence, FL empowers IoT ecosystems to harness the collective intelligence of distributed devices while respecting data privacy and security. As IoT applications continue to proliferate across various domains, FL emerges as a pivotal enabler for unlocking the full potential of intelligence at the network edge, fostering innovation and efficiency in IoT-driven endeavors.
A standard Federated Learning (FL) system comprises a FL server (S) and a cohort of participating clients, each possessing a private dataset (dc ∈ C). Each client leverages its local dataset to train a specific local model (mc) and subsequently transmits the local model parameters as an update to the FL server (S). The FL server (S) then aggregates all received local models to derive the global model (MG) following a specified aggregation protocol. It's important to note that this approach differs from conventional cloud-centric training methods, where the model is trained by aggregating and processing data centrally from all clients.
The training process of FL, as illustrated in Fig. 1, entails the following three steps:
Step 1 (Initialization and Model Distribution): During the initial round (Round 0), the FL server (S) defines the training task, specifying the target model, data requirements, and hyperparameters (e.g., batch size). Subsequently, it broadcasts the initial global model and task settings to all participating clients.
Step 2 (Local Model Training and Update): In subsequent rounds (Round t), each client (c) updates its local model parameters based on the global model received from the FL server (S). The objective is to optimize local parameters to minimize the loss function associated with the training data. Upon completion, the updated local parameters are uploaded to the FL server (S).
Step 3 (Global Model Aggregation and Update): In the same round (Round t), the FL server (S) aggregates all received local models with the aim of minimizing the global loss function. The aggregation process combines insights from all participating clients to refine the global model (MG).
The FL server (S) then broadcasts the updated global model to all clients for training in the subsequent round (t + 1). This iterative process continues until convergence of the global model (MG) or until a desired level of accuracy is achieved.
In summary, the FL framework facilitates collaborative model training across distributed clients while preserving data privacy. By leveraging local datasets and iterative model updates, FL enables the development of robust and accurate global models without centralized data aggregation.
3.4 Deep Reinforcement Learning (DRL)
Reinforcement Learning (RL) serves as a powerful mathematical framework that empowers computing devices to autonomously learn and make decisions based on experiences garnered from interacting with their environment. At the heart of RL lies the concept of learning through interactions, where an agent navigates through a dynamic environment by selecting actions, observing outcomes, and receiving rewards in return. In RL, the agent's decision-making process revolves around selecting actions according to a predefined policy and executing them within the environment. Subsequently, the agent receives feedback in the form of rewards, which reflect the outcomes of its actions within the evolving environment. Through iterative cycles of action-selection, observation, and reward-feedback, the agent continually refines its policy to optimize its decision-making strategy and maximize cumulative rewards. The ultimate goal of the agent is to learn an optimal policy that guides it towards actions yielding the highest expected rewards. This entails discerning the most favorable sequence of actions based on the rewards provided by the environment. The methodology or algorithm employed by the agent to learn and update its policy varies depending on the specific RL method utilized. Deep Reinforcement Learning (DRL) represents a significant advancement in RL by integrating deep neural networks into the learning process. By leveraging the expressive power of deep learning architectures, DRL enables agents to learn complex decision-making strategies directly from raw sensory inputs. The DRL framework trains neural networks to map environmental states to optimal actions, leveraging rich representations of state-action spaces learned through layers of abstraction. In recent years, the field of DRL has witnessed an explosion of research activity, resulting in the development of a diverse array of algorithms and techniques. These advancements have led to significant breakthroughs in a wide range of domains, including robotics, gaming, finance, and healthcare, among others. In summary, RL and its variant, DRL, represent cutting-edge approaches to autonomous learning and decision-making in dynamic environments. By enabling agents to learn optimal strategies through interaction and experience, these frameworks hold immense potential for advancing the capabilities of intelligent systems across various domains.
3.5 Federated Reinforcement Learning (FRL) Process
Frequency-Based Reinforcement Learning (FRL) represents a fusion of Federated Learning (FL) and Reinforcement Learning (RL) techniques, leveraging the strengths of both approaches [13]. FRL offers a unique advantage by enabling the aggregation of observations from diverse environments, thus enhancing learning capabilities compared to traditional Deep Reinforcement Learning (DRL) methods that rely solely on partial observations from a single environment. The integration of FL principles into RL frameworks empowers FRL to harness collective intelligence from distributed sources, facilitating more robust and comprehensive learning. By pooling observations from various environments, FRL can overcome the limitations of individual datasets and extract valuable insights from a broader spectrum of experiences. One notable advantage of FRL lies in its ability to outperform standard DRL approaches when confronted with scenarios characterized by partial observations. FRL's capacity to integrate observations from multiple environments enables it to derive more accurate and generalized models, thereby enhancing performance across a range of tasks and environments. In the context of IoT data collection, FRL emerges as a potent tool for training and classifying data generation patterns among IoT devices. By leveraging insights from diverse clusters, FRL enables the identification and classification of data generation patterns into categories such as Frequent, Less Frequent, Rare, and Very Rare. This categorization facilitates more nuanced and effective resource allocation strategies, tailored to the specific characteristics and requirements of each cluster. In summary, FRL represents a novel approach that bridges the gap between FL and RL techniques, offering enhanced learning capabilities by leveraging observations from multiple environments. In the realm of IoT data collection, FRL holds promise for optimizing resource allocation, improving classification accuracy, and ultimately advancing the efficiency and effectiveness of data-driven decision-making processes.
Let K = {1, 2, ..., K} denote the set of participants who use IoT devices to collaboratively implement an FRL algorithm for performing an IoT task.
The key process involved are as follows:
-
In this framework, the data generation patterns such as time interval, number of packets generated and type of packets are learned using FRL.
-
This learning has two clients
-
Data Clients – this refers IoT Devices
-
Aggregation Server – This is located at the base station or access point.
-
FRL allows IoT users and the BS to train a shared global model while the raw data are remained at users’ devices.
-
Each IoT user k participates in training a shared model by using their own dataset Dk∈K. Hereinafter, the FL model trained at the IoT device is called the local model wk.
-
After local training, IoT users upload their local model updates to the BS that then aggregates to build a shared model, called the global model wG.
-
By relying on the distributed data training at the IoT devices, the aggregation server at the BS can enrich the training performance without significantly compromising user data privacy.
-
After learning, the clusters gets classified into 4 categories as
-
Frequent
-
Less Frequent
-
Rare and
-
Vary Rare.
3.5.1 FRL Algorithm
The Frequency-Based Reinforcement Learning (FRL) system entails several pivotal steps to effectively train models and optimize performance. These steps are as follows:
1. **Initial Global Model Distribution**:
The gateway initiates the training process by disseminating the initial global model to all devices within the network. This serves as the foundation upon which local models will be built and refined.
2. **Local Model Training**:
Each device engages in training its own local model using locally available information, encompassing states, actions, and rewards.
- **States**:
To enable optimal decision-making, the state representation includes pertinent information tailored to the context. The initial value set incorporates cluster information, providing crucial insights into the network topology.
- **Actions**:
Devices are endowed with the ability to execute movements in four cardinal directions on the traffic map, namely left, right, bottom, and up. These actions facilitate dynamic navigation and adaptation to changing traffic conditions.
- **Rewards**:
Proper incentivization is essential for effective learning. Rewards are bestowed based on fluctuations in traffic volume: a positive reward is conferred upon traffic reduction, while an increase in traffic warrants a negative reward. Additionally, rewards are allocated for the efficient utilization of network service capability, promoting optimal resource management.
3. **Transmission of Local Model Parameters**:
Following local model training, devices transmit their respective local model parameters (W1,. . ., Wn) back to the gateway. This exchange facilitates the aggregation of individual insights into a unified global model.
4. **Aggregation of Model Parameters**:
The gateway aggregates received local model parameters into the global model, consolidating diverse insights and refinements contributed by individual devices.
5. **Global Model Distribution**:
Subsequently, the parameters of the aggregated global model (WG) are disseminated to all devices once again. This iterative process continues until the global model reaches a satisfactory level of training, ensuring continual refinement and optimization of performance.
Through these coordinated steps, the FRL system harnesses the collective intelligence of distributed devices, enabling adaptive decision-making and optimization in dynamic environments. By iteratively refining the global model based on local insights, the system achieves enhanced performance and adaptability, ultimately driving efficiency and effectiveness in managing traffic congestion.