Optimizing Task Offloading and Resource Allocation in Edge-Cloud Networks: A DRL Approach

doi:10.21203/rs.3.rs-2522525/v1

Download PDF

Research Article

Optimizing Task Offloading and Resource Allocation in Edge-Cloud Networks: A DRL Approach

https://doi.org/10.21203/rs.3.rs-2522525/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 26 Jul, 2023

Read the published version in Journal of Cloud Computing →

You are reading this latest preprint version

Edge-cloud computing is an emerging approach in which tasks are offloaded from mobile devices to edge or cloud servers. However, Task offloading may result in increased energy consumption and delays, and the decision to offload the task is dependent on various factors such as time-varying radio channels, available computation resources, and the location of devices. As edge-cloud computing is a dynamic and resource-constrained environment, making optimal offloading decisions is a challenging task. This paper aims to optimize offloading and resource allocation to minimize delay and meet computation and communication needs in edge-cloud computing. Theproblem of optimizing task offloading in the edge-cloud computing environment is a multi-objective problem, for which we employ deep reinforcement learning to find the optimal solution. To accomplish this, we formulate the problem as a Markov decision process and use a Double Deep Q-Network (DDQN) algorithm. Our DDQN-edge-cloud (DDQNEC) scheme dynamically makes offloading decisions by analyzing resource utilization, task constraints, and curret status of the edge-cloud network.. Simulation results demonstrate that DDQNEC outperforms heuristic approaches in terms of resource utilization, task offloading, and task rejection.

Edge-cloud computing

Task offloading

Resource allocation

Deep Reinforcement learning

Markov decision process (MDP)

The advent of IoT and 5G has enabled the development of new applications in areas such as surveillance, augmented/virtual reality, and facial recognition, which heavily depend on both computational resources and data storage for optimal performance. IoT and mobile devices have limited resources so it is difficult for them to support such intelligent, delay-sensitive applications [1]. Hence, edge computing can help these devices by offloading computation-intensive tasks to the cloud. The cloud can cause delays in communication and data transfer as a result of network congestion and high usage, this can limit their use in real-time applications such as autonomous driving, advanced navigation, augmented reality (AR) and virtual reality (VR). Overall, edge-cloud computing and 5G can work together to enable new types of applications and services that require low-latency, real-time processing. The use of edge computing can reduce latency, enhance performance, and reduce costs associated with cloud computing. The edge servers can process computation-intensive tasks instead of sending them to the cloud [2].

Edge-cloud computing is a distributed computing model that utilizes both cloud computing resources and edge devices, such as servers and IoT devices, to process and transmit data. This model aims to improve processing speed, decision-making and reduce latency and bandwidth usage by bringing the cloud's processing power closer to the edge where data is generated and collected. Edge and cloud computing can coexist and work together to provide the necessary resources for task execution. The integration of AI with edge-cloud computing allows for the deployment of machine learning algorithms on edge devices, providing a high level of intelligence to the edge-cloud network and enabling it to understand its environment [3]–[5]. Before offloading tasks to the cloud or edge, it is essential to carefully consider the requirements and available resources. However, edge-cloud computing is a dynamic and resource-constrained environment. Hence making an optimal decision for task offloading based on the available resource is a critical issue.

Task offloading is the process of transferring a task or workload from a local device to a remote device, such as a server or cloud resource, in order to improve the local device's performance and efficiency. Offloading tasks can result in increased delay and energy consumption while edge servers may have limited computational capacity, which can lead to increased computing latency. It is important to consider these trade-offs before making a decision. 5G networks with high densities may also experience higher transmission delays. Cooperative task offloading is a technique used in edge-cloud networks to improve the performance of distributed systems. In distributed approach, tasks are splited among devices in the network, such as edge devices and cloud servers, to optimize resource usage and reduce workload on individual devices. In an edge-cloud computing environment, it can be challenging to determine the optimal location for task offloading, as there are many factors to consider, including the computational capacity of edge servers, the transmission delays of networks, and the diverse requirements of end devices. Numerous research has been conducted on the topic of computation offloading in edge-cloud networks [6]–[10]. However, due to the diverse requirements of end devices and the limited information available about wireless channels, bandwidth, and computing resources in edge-cloud networks, it is challenging to design an optimal offloading strategy.

Deep Reinfocement Learning is a subset of machine learning that combines reinforcement learning and deep learning techniques to handle high-dimensional state-action spaces and accelerate training for complex decision-making tasks. Many successful applications of DRL have been demonstrated in a wide range of fields, including gaming [7], robotics [8], and networking. The application of DRL to edge-cloud-assisted networks can optimize system performance by perceiving users' mobility [15], [16]. Task offloading problems have been addressed with DRL in [9], [10], [11], and [12]. It can be able to find the best solutions to the optimization problems of time-varying and dynamic network environments. In the context of task offloading at the edge, Dueling-DQN could potentially be used to learn the optimal decision-making policy for offloading tasks to different locations in a dynamic and resource-constrained edge-cloud environment. The DDQN architecture separates the estimator into two streams, value and advantage, and then aggregates them to make the final estimation of the Q-value. This allows for better generalization of the Q-value estimation by decoupling the estimation of the value of a state from the estimation of the advantage of taking a specific action in that state. This could involve training the DDQN to learn the optimal trade-offs between different factors, such as the task requirements, available resources, latencies, and costs, to make informed decisions about the best location for task offloading.

In this research, we propose an advanced edge-cloud computing scheme that leverages the power of Double Deep Q-Network (DDQN) to optimize the offloading of computations and the allocation of resources for the offloaded tasks. Our proposed scheme is an extension of our previous work that employed DQN [10], but it goes one step further by utilizing the DDQN algorithm to improve the decision-making process and achieve a more efficient and optimized solution for the edge-cloud computing task offloading and resource allocation problem. The goal of the proposed scheme is to minimize energy consumption while satisfying the computation and communication requirements of the offloading tasks at edge-cloud computing. DDQN enables efficient and effective offloading decisions in dynamic and resource-constrained environments, such as edge-cloud, by adapting to changes and learning from past experiences. In terms of training, DDQN separates the Q-value estimator into two streams, value and advantage, and then aggregates which improve the generalization. We present the task offloading and resource allocation problem as an MDP and use the DDQN algorithm to identify the optimal policy. Our simulation results show that the proposed method outperforms a heuristic schemes and DQNEC in terms of resource utilization, maximum task offloading, and task rejection. The main contributions of our work are:

To optimize resource allocation for compute-intensive and delay-sensitive tasks in the edge cloud computing environment, we developed a scheme based on the double DQN. Our scheme determine the best actions to take in the current system state, ultimately improving the overall performance of the edge cloud
Considering the dynamics and unpredictability of edge device environments, we modeled the task offloading problem as a Markov Decision Process (MDP) and applied Double-DQN to maximize the long-term cumulative discounted reward. Our optimization objective was to maximize resource utilization, minimize task rejection, and minimize the round-trip time to complete the task within the deadline
By using the proposed Double-DQNEC scheme, we can reduce idle time and balance the workload of the edge-cloud computing system. The simulation results show that the DDQN-based algorithm significantly outperforms the heuristic algorithm.

The rest of the paper is organized as follows: In Section 2 the work-related are reviewed, especially relative to task offloading and resources allocation in the edge-cloud computing environment; Section 3 provides the system model and formulation of the proposed DDQNEC scheme in detail; in Section 4 the simulation result and comparison are discussed that validate our approach; finally, the conclusion is made in Section 5.

DRL has been widely studied for its use in resource allocation and solving complex optimization problems in edge-cloud computing environments [11]–[13].

Liu et al [14] proposed a DRL algorithm for allocating resources using vehicles as edge devices, where the vehicles act as mobile edge servers and offer computational services to nearby users. The authors in [15] proposed an intelligent DRL-based resource allocation scheme for wireless networks to minimize service time and balance resources. Dinh et al. [16] proposed a computational offloading framework to minimize the joint cost of energy and delay. Yan et al. [17] investigated the optimization of task offloading and resource allocation to minimize the joint cost of energy consumption and execution time.

In [18], the authors presented a DQN-based algorithm to address the complex problem of jointly optimizing task offloading and bandwidth allocation in mobile edge computing (MEC) networks. They proposed a solution that effectively balances the trade-offs between the quality of service, energy efficiency, and network congestion. [19] provided a comprehensive review of the state-of-the-art techniques and computational resources used for partitioning and offloading in MEC networks. They discussed the challenges and opportunities of various approaches and highlighted their pros and cons. Huang et al. [20] introduced a novel deep learning-based online offloading algorithm named DROO for wireless-powered MEC networks. The algorithm separates the optimization problem into two sub-problems: allocating resources and making offloading decisions.. The algorithm splits the optimization problem into two sub-problems: resource allocation and offloading decision. The authors conducted extensive evaluations and simulations to demonstrate the performance of DROO, which showed that it outperforms other existing methods in terms of energy efficiency, task completion time, and network congestion.In [21], a DQN-based task offloading scheme was proposed to select the optimal edge server and the optimal transmission mode for maximizing task offloading utility. The authors showed that the proposed scheme can achieve a high level of performance in terms of task completion time, energy consumption, and network congestion.

In [22], an intelligent task scheduling framework was proposed which utilizes a reinforcement algorithm to address the scheduling problem in allocating heterogeneous VM resources. In [23], the authers presented a migration algorithm based on multi-agent reinforcement learning to optimize task migration. In [24], a DQN-based computation offloading scheme was proposed for minimizing the long-term cost of MEC networks. The authors showed that the proposed scheme can effectively reduce the cost of computation and communication while maintaining the quality of service. [25]proposed a Double Deep Q-Network (DDQN) based backscatter-aided hybrid data offloading scheme to reduce power consumption in data transmission. The authors showed that the proposed scheme can significantly improve the energy efficiency of MEC networks while maintaining the transmission rate and reliability. In [26], an approach was proposesd to make offloading decisions by addressing the optimization problem of both CPU frequencies and transmit powers in a MEC environment. A RL-SARSA algorithm is proposed in [27] to resolve resource management issues and make optimal offloading decisions in Mobile Edge Computing Network (MECN) environments to minimize costs such as energy consumption and computation delay.

In [28], a DRL-based is algorithm is proposed for collaborative computation offloading in heterogeneous edge computing. In [29], an improved DQN-based resource allocation policy is proposed for IoT edge computing to improve resource utilization and minimize task completion delay. [30] proposed a distributed data offloading and sharing scheme using federated learning for secure networks. Chen et al. [31] study the problem of minimizing latency and energy consumption in a multi-eNB environment through task offloading. Li et al. investigate the issue of multi-user offloading and present a Lyapunov-based optimization algorithm as a solution for attaining optimal bandwidth allocation.

Lu et al. [32] proposed a DRL-based task offloading strategy for large-scale heterogeneous MEC that uses an LSTM network in conjunction with a deep Q-network to improve learning performance, while also being lightweight. In [33], Li et al. investigated the task offloading service provision problem in a UAV-based MEC environment and suggest a DRL solution to enhance system throughput. Li et al. [34], presented the OPPOCO method to optimize the balance between using a cloudlet-assisted computation service (CCS) and a remote cloud (RC) model for computation offloading. Although these studies aim to minimize energy consumption and delay, they do not consider data security as a crucial aspect.

In this section, we present the system model and the problem formulation for task offloading and resource allocation. The network model with the edge-cloud system of the DDQNEC scheme is shown in Fig. 1. Our scheme involves connecting end devices such as sensors, mobile devices, and IoT devices to base stations through wireless links. The edge computing system is connected to the core cloud via the backbone network, allowing for the offloading of tasks and the utilization of available resources in the public cloud. This batch processing approach waits for a predefined number $\left(N\right)$ of task requests before determining the optimal location for each task, whether it be the edge or the cloud, taking into consideration the availability of resources and the deadline. By evaluating a batch of task requests at a time, this approach allows for better resource utilization and decision-making. Both bandwidth and computing resources are considered when making offloading decisions, with the goal of optimizing resource usage, minimizing delay, and reducing energy consumption. In the following section, we provide a detailed description of the system model, including the task, communication, and computation offloading models. Table 1 provides a list of the notations used in our models.

A. Task Model

A task ${t}_{n}$ is represented as a tuple of four variables, $\left({\mathcal{z}}_{n},{\mathcal{y}}_{n}, {c}_{n}, {\tau }_{n}\right), (1\le n\le N)$ where ${\mathcal{z}}_{n}$ is the input data size in bytes, ${\mathcal{y}}_{n}$ is the resultant data size, ${c}_{n}$ is the required computational resource in CPU units, and ${\tau }_{n}$is the task latency requirement. The variable ${x}_{n}$ is a binary value (0 or 1) indicating the decision of whether a task should be assigned to the edge or the cloud.

$${x}_{n}=\left\{\begin{array}{c}0 task {t}_{n} is executed at edge,\\ 1 task {t}_{n} is executed at cloud,\end{array}\right.$$

Typically, multiple resources are required for offloading tasks; however, our scheme considers only CPU resources required for the task [35]–[37].

$${c}_{n}={\mathcal{z}}_{n}\times \varsigma$$

where ${c}_{n}$ represent the total CPU units required to process the task ${t}_{n}$, ${\mathcal{z}}_{n}$ indicate the total size of input data, and $\varsigma$ represents the amount of computational resources required to rocess a single unit of data, typically represented in bytes.

Table 1

Notation and description
Symbol	Definitions
$\mathcal{T}$	A set of tasks generated by edge devices
${t}_{n}$	Each task generated by the end device$n$
${\mathcal{z}}_{n}$	The input data size of the task${t}_{n}$
${c}_{n}$	Computation resource size required for the task${t}_{n}$
${\tau }_{n}$	Maximum tolerable latency of the task${t}_{n}$
${\mathcal{y}}_{n}$	The resultant data of the task${t}_{n}$
${x}_{n}$	A binary variable to indicate whether the task ${t}_{n}$ is assigned to edge or cloud, (0 indicates edge and 1 indicates cloud).
$\varsigma$	CPU unit to process the one byte of data
$\mathcal{B}$	Set of base stations (BS),$\mathcal{B}=\{{b}_{1}, {b}_{2},\dots ,{b}_{W}\}$
${\mathcal{U}}_{B}\left(t\right)$	The bandwidth utilization of all BSs at time step$t$
${H}_{w}$	Set of wireless channels related to the BS${b}_{w}$
${\beta }_{h}^{w}$	The bandwidth of each channel $h$ ${b}_{w}$
${\sigma }_{h}^{w}$	Remaining bandwidth of the BS ${b}_{w}$
$\mathcal{P}$	The set of computing servers at the edge
$p$	Computing server at the edge$(p\in \mathcal{P})$
${c}_{p}$	The available computing capacity of server$p$
${T}_{n}^{{proc}_{e}}$	The processing time for the task ${t}_{n}$ at the edge
$\mathcal{M}$	The set of computing servers in the cloud
$m$	Computing server at the edge$(m\in \mathcal{M})$
${T}_{n}^{{proc}_{c}}$	The processing time for the task ${t}_{n}$ at the cloud server
${c}_{m}$	$\text{T}$he computing capacity of a cloud server $m$
${C}_{c}$	$\text{T}$he total computing capacity of cloud servers
${T}_{n}^{{proc}_{c}}$	The total computation time for the task ${t}_{n}$ at cloud
${\mathcal{U}}_{M}\left(t\right)$	The computational resources utilization of cloud servers at time step$t$
${T}_{n}^{{trans}_{e}}$	Transmission time for the task ${t}_{n}$ data sent to the edge server
${T}_{n}^{{trans}_{c}}$	Transmission time for the task ${t}_{n}$ data to the cloud server
${T}_{n}^{{prop}_{e}}$	The propagation time of the link between the nodes and edge servers
${T}_{n}^{{prop}_{c}}$	The propagation delay of the link between the edge and cloud
${rtt}_{n}^{e}$	The total round-trip time to the edge for a task ${t}_{n}$
${rtt}_{n}^{c}$	The total round-trip time to the cloud for a task ${t}_{n}$

B. Wireless Bandwidth Model

To offload the task from the end device to the edge or cloud, the device must be connected to the nearest base station by a wireless channel. Let's $\mathcal{B}$ the set of all base stations $\mathcal{B}=\{{b}_{1}, {b}_{2},\dots ,{b}_{W}\}$, and each base station ${b}_{w}$ has a set of wireless channels that provides different data rates as ${\beta }_{h}^{w}\in \left\{{\beta }_{1}^{w},{\beta }_{2}^{w},{\beta }_{3}^{w},\dots ,{\beta }_{{H}_{w}}^{w}\right\}.$ Each channel serves different tasks and ${{\sigma }}_{h}^{\text{w}}$ represents the remaining bandwidth of each channel as ${\{\sigma }_{1}^{w},{\sigma }_{2}^{w},{\sigma }_{3}^{w},\dots ,{\sigma }_{{H}_{w}}^{w}\}$. Then at time step $t$, the bandwidth utilization ${\mathcal{U}}_{W}\left(t\right)$ of all the base stations can be formulated as

$${\mathcal{U}}_{W}\left(t\right)=\frac{\sum _{w=1}^{W}\left(\sum _{h=1}^{H}{\beta }_{h}^{w}\right)}{B}$$

where $B$ represents the bandwidth of all base stations

C. Computational Model

i. Edge computing:

In our scheme, the set of edge servers is denoted as $\mathcal{P}=\{\text{1,2},3\dots P\}$, and ${c}_{p}$ denote the available computational capacity of edge server $p, (p\in \mathcal{P})$. The computation time ${T}_{n}^{{Proc}_{e}}$ for task ${t}_{n}$ to compute at edge server $p$ is given by

$${T}_{n}^{{proc}_{e}}= \frac{{c}_{n}}{{c}_{p}}$$

The utilization of the computational resources of the edge server at time $t$ is represented as

$${\mathcal{U}}_{P}\left(t\right)=\frac{\sum _{p=1}^{P}\left({c}_{p}\right(t\left)\right)}{{C}_{e}}$$

where ${C}_{e}$ denotes the total available computing capacity of all servers at the edge.

ii. Cloud computing:

The set of cloud servers is denoted as $\mathcal{M}=\{\text{1,2},3\dots M\}$, and ${c}_{m}$ denote the available computational capacity of edge server $m,(m\in \mathcal{M})$. The processing time ${T}_{n}^{{proc}_{c}}$ for task ${t}_{n}$ to compute it at the cloud server $m$ is given by

$${T}_{n}^{{proc}_{c}}= \frac{{c}_{n}}{{c}_{m}}$$

The utilization of the computational resources of the cloud server at time $t$ is represented as

$${\mathcal{U}}_{M}\left(t\right)=\frac{\sum _{m=1}^{M}\left({c}_{m}\right(t\left)\right)}{{C}_{c}}$$

where ${C}_{c}$ denotes the total available computing capacity of all servers at the cloud.

D. Delay model

In computation offloading, tasks are sent to an edge or cloud server for processing. The process involves three types of delays: transmission delay, propagation delay, and processing delay.

i. Transmission Time

For task ${t}_{n}$, data transmission is required in both directions: from the end device to the edge/cloud server with a data size of ${\mathcal{z}}_{n}$, and from the edge/cloud server back to the end device with a resultant data size of ${\mathcal{y}}_{n}$.

Hence, a specific amount of bandwidth ${\beta }_{h}^{w}\left(edge\right)$ or $\beta \left(cloud\right)$ is required to fulfill the minimum latency ${\tau }_{n}$ of task ${t}_{n}$. Transmission time which needs to send data of task ${t}_{n}$ to the edge ${T}_{n}^{{trans}_{e}}$ and cloud ${T}_{n}^{{trans}_{c}}$ can be formulated as

$${T}_{n}^{{trans}_{e}}= \frac{{\mathcal{z}}_{n}}{{\beta }_{h}^{w}}+\frac{{\mathcal{y}}_{n}}{{\beta }_{h}^{w}}$$

$${T}_{n}^{{trans}_{c}}={T}_{n}^{{trans}_{e}}+ \frac{{\mathcal{z}}_{n}}{\beta }+\frac{{\mathcal{y}}_{n}}{\beta }$$

ii. Propagation Time

In the given model, the propagation delay is assumed to be constant, with a value of ${T}_{n}^{{prop}_{e}}= 5ms$ for edge server and ${T}_{n}^{{prop}_{c}}= 50ms$ for cloud server. This simplifying assumption is made for the ease of calculation and analysis. The actual propagation delay may vary depending on the location of the resource.

iii. Processing delay:

Processing delay for the task ${t}_{n}$ edge server ${T}_{n}^{{proc}_{c}}$ and cloud server ${T}_{n}^{{proc}_{c}}$ can be obtained from Eqs. (4) and (6).

Therefore, the overall time for a task to be completed by edge ${rtt}_{n}^{e}$ or cloud ${rtt}_{n}^{c}$ is the sum of the delay caused by data transmission, propagation and processing which is represented as

$${rtt}_{n}^{e}={T}_{n}^{{trans}_{e}}+{T}_{n}^{{prop}_{e}}+{T}_{n}^{{proc}_{e}}$$

$${rtt}_{n}^{c}={T}_{n}^{{trans}_{c}}+{T}_{n}^{{prop}_{c}}+{T}_{n}^{{proc}_{c}}$$

The total resources cost ${CO}_{total}$ can be obtained by adding the total utilization of bandwidth ${CO}_{W}$, edge server CPU ${CO}_{P}$, and cloud server CPU ${CO}_{M}$ for total task offloading as follows:

$${CO}_{W}= {W}_{W}\bullet \mathcal{U}{}_{W}$$

$${CO}_{P}= {W}_{P}\bullet \mathcal{U}{}_{P}$$

$${CO}_{M}= {W}_{M}\bullet \mathcal{U}{}_{M}$$

$${CO}_{total}={CO}_{W}+ {CO}_{P}+ {CO}_{M}$$

Where each resource (bandwidth, edge and cloud computational resources) has been assigned a cost weight, with ${W}_{w}=1$ being assigned to bandwidth, ${W}_{p}=5$ for edge resources and ${W}_{M}=10$ for cloud computational resources. These weight values are used in determining the cost of utilizing each resource.

E. Formal Problem Formulation

The multi-objective problem solved in this paper is described formally as follows:

Optimization:

$$maximize \left({\mathcal{U}}_{W}+{\mathcal{U}}_{P}+{\mathcal{U}}_{M}\right)$$

$$minimize \left({CO}_{W}+ {CO}_{P}+ {CO}_{M}\right)$$

Subject to the constraints:

$\sum _{w=1}^{W}\sum _{h=1}^{H}{ch}_{h}^{w}\bullet {\mu }_{h}^{w}\le B$	(15)
$\sum _{n=1}^{N}{c}_{n}\bullet \left(1-{x}_{n}\right)\le {C}_{e}$	(16)
$\sum _{n=1}^{N}{c}_{n}{\bullet x}_{n}\le {C}_{c}$	(17)
${rtt}_{n}^{e}\bullet \left(1-{x}_{n}\right)+{rtt}_{n}^{c}{\bullet x}_{n}\le {\tau }_{n}$	(18)

In this section, we introduce DQNEC, a proposed scheme that utilizes the DDQN algorithm to make optimal decisions and select the best location for task execution by analyzing the current state of the edge-cloud environment. It aims improve resource utilization and balance the trade-off between delay and resources cost, in order to maximize the performance of edge-cloud computing systems. This is achieved by maximizing task offloading while minimizing delay and cost as defined in Eq. (12), (13), and (14). We formulate this multi-objective problem using a Markov Decision Process (MDP).

A. Markov Decision Process

A Markov decision process (MDP) models sequential decision-making problems where an agent makes decisions to maximize reward. It includes elements such as agent, state, action, policy, and reward. We formulate task offloading and resource optimization problems as a MDP to find the optimal policy ${\pi }^{\text{*}}$. Policy is a mapping of states to action probabilities, represented by $\pi \left(a\right|s)$ for all possible actions $a$ for each state $s$. RL algorithms are often used to solve MDPs, as they allow an agent to learn the optimal behavior for a given MDP through trial and error. In the DDQN-based framework, the agent observes the state ${s}_{t}$ by attracting to the edge-cloud environment and taking an action ${a}_{t}$ as computing server selection via a deterministic policy and receives an immediate reward ${r}_{t}$. The agent uses the action-value function $Q({s}_{t},{a}_{t})$ to update the agent policy. The goal of the agent is to maximize the long-term reward by finding an optimal resource allocation policy. In the following section, the state, action, and reward of the proposed scheme are explained in more detail.

State: The state includes full information on the edge-cloud network. It includes the number of remaining tasks (), is from to (that is ), which specifies the task which should be currently determined by the agent, the total remaining computational capacity of edge servers and cloud servers (), total remaining bandwidth at edge and cloud (), the number of cloud-server (), the remaining CPU of each server. In addition, information on edge such as the number of edge servers (), and the remaining total CPU of the edge server exists. CPU (, ) and bandwidth information allocated to each cloud and edge is added. Finally, each task's information is included. State can be defined as

$${s}_{t}=\{{N}^{t},{I}^{t}, {C}_{c}+{C}_{e}, {B}_{c}+{B}_{e},$$

$${N}^{c},{C}_{c}, {B}_{c}, {\alpha }_{1},{\alpha }_{2},\dots ,{\alpha }_{m},\dots , {\alpha }_{M}, {\mathcal{U}}_{1},{\mathcal{U}}_{2},\dots ,{\mathcal{U}}_{m},\dots {\mathcal{U}}_{M},$$

$${N}^{e},{C}_{e}, {B}_{e}, {\alpha }_{1},{\alpha }_{2},\dots , {\alpha }_{p},\dots ,{\alpha }_{P}, {\mathcal{U}}_{1},{\mathcal{U}}_{2},\dots ,{\mathcal{U}}_{p},\dots {\mathcal{U}}_{P},$$

$${t}_{1}, {t}_{2},\dots ,{t}_{N}\}$$

Action : In our model, the agent takes action by observing the current state of the environment. The goal of the agent is to make the optimal decision to maximize resource utilization and minimize the overall average service delay with the minimum rejection of tasks. Action at each time step can be defined as the action to offload the -th task () and allocate the resources (Bandwidth and CPU) to the task for execution within the task deadline. Action can be defined as

$${a}_{t}=\{\eta , {x}_{n}\}$$

where $\eta$ represents the computation server, and ${x}_{n}$ selects the edge or cloud location for task ${s}_{n}$, with $\eta$ belonging to {1,2,…,P} (edge server) when ${x}_{n}=0,$ and $\eta \in \left\{\text{1,2},\dots ,M\right\}$ (cloud server) ${x}_{n}=1$. The agent will take actions based on the task offloading strategy in each time step and receive rewards from the environment in the following time step.

Reward : In RL, the agent's objective is to maximize the sum of rewards from good actions. Our reward function is designed to optimize resource utilization, minimize cost, and satisfy delay constraints. The reward can be calculated by the total resource utilization at time step in Eq. (20), the total cost at time step in Eq. (21) and delay constraint satisfaction for the task at time step in Eq. (22).

$\rho \left(t\right)={\mathcal{U}}_{W}\left(t\right)+{\mathcal{U}}_{P}\left(t\right)+{\mathcal{U}}_{M}\left(t\right)$	(20)
$\sigma \left(t\right)={CO}_{W}\left(t\right)+ {CO}_{P}\left(t\right)+ {CO}_{M}\left(t\right)$	(21)
${r}_{t}\left({s}_{t},{a}_{t}\right)=\frac{\rho \left(t\right)}{\sigma \left(t\right)}\left[{\tau }_{t}-(rt{t}_{t}^{e}\bullet \left(1-{x}_{t}\right)+rt{t}_{t}^{c}\bullet {x}_{t})\right]$	(22)

In our model, we used the DDQN algorithm for the learning process. The DDQN algorithm is an off-policy algorithm and is applied to environments with discrete action spaces. The learning process for DDQN is described in Algorithm 1 and also depicted in Fig. 2.

As shown in Fig. 2, the proposed learning process based on DDQN applies replay memory $R$, which can store a set of recent experience $\left({s}_{i},{a}_{i}, {r}_{i}, {s}_{i+1}\right)$ which an agent gathers by interacting with the environment, and then uses for DDQN learning. In particular, the system records the experience for every step. During the network training, a mini-batch (size: $b$) is extracted from the replay memory $R$, and the Q network can learn from the previous experience. DDQN uses two neural networks, i) the prediction network ${Q}_{\pi }(s,a|\theta )$ as a function approximator to estimate the action-value function Eq. (15), where $\theta$ is the weight of the neural network, ii) the target network ${\stackrel{-}{Q}}_{\pi }(s,a;\stackrel{-}{\theta })$ to estimate the target value ${y}_{i}$. The target network has the same structure as the prediction network. However, its weights $\stackrel{-}{\theta }$ are copied from $\theta$ every fixed number of iterations (K) instead of every training epoch. The following Eq. (23), (24), and (25) are the main equations for calculating the loss value.

${q}_{i}\approx {Q}_{\pi }({s}_{i},{a}_{i}\|\theta )$	(23)
${y}_{t}=r+\gamma \times \underset{\text{a}}{\text{max}}{\stackrel{-}{Q}}_{\pi }({s}_{t+1},a\|\stackrel{-}{\theta })$	(24)
$loss={\left({y}_{t}-{Q}_{\pi }(s,a)\right)}^{2}$	(25)

DDQN updates the Q-function network's parameters, $\theta$, using the loss value and stochastic gradient descent (SGD) with each mini-batch.

$$\theta = \theta -{\alpha }{\varDelta }_{\theta }\sum _{i=1}^{b}\frac{{\left({q}_{i}-{y}_{i} \right)}^{2}}{b}$$

where ${\alpha }$ is the learning rate.

Algorithm 1 Training Stage of the DDQNEC algorithm
1:	Input: the full information of the edge-cloud network the tasks to be offloaded.
2:	Output: Selecting a server at the edge or cloud for offloading tasks, taking into consideration the constraints on resources and the requirements of the tasks.
3:	Initialize two neural networks as Q-networks ${Q}_{\pi }$ and ${\stackrel{-}{Q}}_{\pi }$, with random weight (and bias) or parameters $\theta$ and $\stackrel{-}{\theta }$
4:	Initialize replay memory$R$
5:
6:	for episode = 1 to $e$ do
7:	Initialize the first state${s}_{0}$
	for $t=1 to T$ do
8:	Gather the state ${s}_{t}$ from the edge-cloud environment
9:	if random value $<ϵ$ than
10:	select a random action${a}_{t}$
11:	else
12:	select${a}_{t}=\underset{a}{\text{argmax}}Q({s}_{t},a\|\theta )$
13:	Execute action ${a}_{t}$, received reward ${r}_{t}$ and next state ${s}_{t+1}$
14:	Store $\left({s}_{t},{a}_{t}, {r}_{t}, {s}_{t+1}\right)$ into$M$
15:	Get the mini-batch of (size: $b)$ $\left({s}_{i},{a}_{i}, {r}_{i}, {s}_{\text{i}+1}\right)$ from R
16:	for $i=1$to $b$do
17:	if${s}_{i+1}=ternimal$
18:	${y}_{i}= {r}_{i}$
19:	else
20:	${y}_{i}={r}_{i}+\gamma \times \underset{a}{\text{max}}{\stackrel{-}{Q}}_{\pi }({s}_{i+1},a\|\theta )$
21:	${q}_{i}{=Q}_{\pi }({s}_{i},{a}_{i}\|\theta )$
22:	end for
23:	$\theta = \theta -{\alpha }{\varDelta }_{\theta }\sum _{i=1}^{b}\frac{{\left({q}_{i}-{y}_{i} \right)}^{2}}{b}$ //Gradient descent
24:	Every $K$episode, copy $\theta$ to$\stackrel{-}{\theta }$
25:	end for
26:	end for

This section evaluates the DDQNEC scheme's performance through computer simulation. The focus is on resource utilization, task acceptance ratio, task rejection ratio, and cost ratio using a simulation environment based on i9-10900K CPU, 64GB RAM, RTX 3090 GPU, Linux Ubuntu 20.04.02 LTS, Python 3.8, and PyTorch 1.9.0 to reflect real-world edge-cloud computing environments and analyze and compare the results to existing methods. The DDQNEC is evaluated and compared to three heuristic methods (heuristic1, heuristic2, heuristic3) and DQNEC [10] using a simulated edge-cloud environment (as shown in Fig. 2) to measure its efficiency. Heuristic1 uses FIFO for tasks and prefers edge resources if available, Heuristic2 prioritizes tasks with high resource demands, Heuristic3 uses the 0/1 knapsack algorithm to maximize utilization as profit. We conducted tests in two different types of environments, (small and large). In both the small and large environments, the number of tasks are distributed as: 50, 100, 150, 200, 250, and 300. However, the available resources at the edge and cloud, and task requirements are different in both environments. The small environment has fewer resources, whereas the large environment has more resources.

In the small environments, the task parameters are as CPU requirement:10 ~ 20, data size:10 ~ 20, bandwidth:100*15, deadline:5 ~ 10 ms, and the available resources in the small environments are as the number of edge servers:30 with CPU capacity of 40 ~ 60, cloud servers:20 with CPU capacity of 60 ~ 80, and bandwidth: 100 Mbps.
In large environments, the task requirements are higher than the small environments as CPU:20 ~ 30, data size:20 ~ 30, bandwidth:100*30, and deadline:10 ~ 15 ms, and the edge-cloud resources are, edge-servers:50 with CPU capacity of 50 ~ 80, cloud servers:30 with CPU capacity 60 ~ 100 and available bandwidth: 100 Mbps.

Tables II and III show the simulation parameters and environment configuration, respectively. DDQN is used to determine whether to offload or reject the current task based on the edge-cloud available resource and all waiting tasks for resources. To maximize current rewards and expected rewards, the proposed scheme considers the current overall network information when selecting the computing server for task offloading. We provide detailed performance comparisons of the proposed scheme with heuristic algorithms for both environments in the following section.

Table II. Simulation Enviroenemt Configuration

Small environments
Number of cloud server	20
Cloud server CPU capacity	60 ~ 80 (uniform distribution)
Edge bandwidth	100 (Mbps)
Cloud bandwidth	200 (Mbps)
Number of the edge server	30
Edge server CPU capacity	40 ~ 60 (uniform distribution)
Task CPU request	10 ~ 20
Task data size	10 ~ 20 MB
Task tolerable delay	10 ~ 15 ms
Number of offloading tasks	50, 100, 150, 200, 250, 300
Large environments
Number of cloud server	30
Cloud server CPU capacity	60 ~ 100 (uniform distribution)
Edge bandwidth	200 (Mbps)
Edge bandwidth	300 (Mbps)
Number of the edge server	50
Edge server CPU capacity	50 ~ 80 (uniform distribution)
Task CPU request	20 ~ 30
Task data size	20 ~ 30 MB
Task tolerable delay	5 ~ 10 ms
Number of offloading tasks	50, 100, 150, 200, 250, 300

Table III. Learning parameters of the DDQNEC algorithm

Definitions and description	Values
Dense-layer setup (Hidden)	256
N-step for Q-learning	1
Replay Buffer Capacity ( Size of the replay buffer)	10000
The target network smoothly copy parameter	0.005
Initial epsilon (Exploration)	1.0
Final epsilon (Exploration)	0.1
Target synchronization interval training steps	1000

Figure 3 presents a comprehensive comparison of the task rejection ratios of five different schemes. As the number of tasks increases, it can be observed that the rejection ratio for all five schemes also increases. However, the DDQNEC scheme exhibits a significantly lower increase when compared to the other four schemes, thereby indicating a superior performance in terms of task acceptance ratio and a lower rejection rate. The ability of DDQNEC to accept more tasks is a result of its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability. This not only saves and preserves resources for future utilization but also allows for more tasks to be accepted. A high acceptance rate is beneficial as it leads to higher resource utilization and reduces idle time in the system. As a result, DDQNEC outperforms other methods in terms of resource utilization, indicating its effectiveness in improving the proposed edge-cloud system.

Figure 4 shows the comparison of the average utilization of the proposed scheme DDQNEC with four other heuristic schemes: heuristic1, heuristic2, heuristic3, and DQNEC. As the number of tasks increases, the average resource utilization for all five schemes also increases. However, it is observed that DDQNEC consistently demonstrates a higher utilization rate when compared to the other four algorithms in both environments, as depicted in Figures 5(a) and (b). The task rejection ratio is a crucial metric that has a direct impact on resource utilization. A low task rejection ratio implies high resource utilization. DDQNEC employs a robust mechanism for selecting the best servers based on task requirements, thereby improving the efficiency of the edge-cloud system. Additionally, DDQNEC makes use of intelligent resource allocation strategies, resulting in an increased acceptance rate of tasks while maintaining resource utilization. A high acceptance rate generally leads to a higher average utilization compared to cost. The results demonstrate that DDQNEC achieves a higher utilization rate than the other algorithms, thus highlighting the effectiveness of the DDQN approach in enhancing the performance of the edge-cloud system.

Figure 5 presents a comprehensive comparison of the cost ratios of five different schemes as the number of tasks increases. As the number of tasks increases, it can be observed that the cost ratio for all five schemes also increases. However, the proposed scheme DDQNEC exhibits a significantly lower increase in comparison to the other four schemes, both in small and large environments, thereby indicating a superior performance in terms of cost ratio. Additionally, DDQNEC has a significantly lower task rejection rate when compared to the three heuristics and DQNEC, which implies that it accepts more tasks for offloading and increases the utilization of the edge-cloud system. The key factor that enables DDQNEC to achieve this is its ability to intelligently assign tasks to servers that are optimally matched in terms of resource requirements and availability, thus minimizing the overall cost and maximizing the utilization of the edge-cloud system.

The task offloading and resource allocation in edge-cloud dynamic environments is a difficult problem. A solution is proposed by formulating it as an MDP optimization problem and using the DDQN algorithm to find an optimal solution for task offloading. The DDQNEC model uses an agent to make better decisions for end devices and offload their computation-intensive and low-latency task to the edge or cloud server. This improves the performance in terms of average cost, average utilization and task rejection rate, and also improves resource utilization compared to other algorithms.

In the future, We aim to improve the DDQNEC scheme for task offloading and resource allocation by using advanced machine learning and AI algorithms. We will analyze the edge-cloud network considering factors such as the characteristics and capabilities of end devices to optimize task offloading. Also, we will research on using reinforcement learning techniques for managing a large number of IoT devices with diverse task requirements, focusing on techniques suitable for continuous action spaces.

Acknowledgment

This research was supported by Basic Science Research Programs through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-
2020R1I1A3065610 and NRF-2018R1A6A1A03025526).

Authors’ contributions

Methodology: Ihsan Ullah; Resources: Ihsan Ullah and Hyun-Kyo Lim; Software: Hyun-Kyo Lim, Yeong-Jun Seok; Supervision: Youn-Hee Han; Writing original draft: Ihsan Ullah; Writing review editing: Ihsan Ullah, Hyun-Kyo Lim; All authors read and approved the final manuscript.

Funding

This study was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant No. NRF-2020R1I1A3065610 and NRF-2018R1A6A1A03025526.

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

Competing interests

The authors declare that they have no competing interests.

A. Singh, S. C. Satapathy, A. Roy, and A. Gutub, “AI-Based Mobile Edge Computing for IoT: Applications, Challenges, and Future Scope,” Arab. J. Sci. Eng., pp. 1–31, 2022.
B. Dai, J. Niu, T. Ren, and M. Atiquzzaman, “Towards Mobility-Aware Computation Offloading and Resource Allocation in End-Edge-Cloud Orchestrated Computing,” IEEE Internet Things J., 2022.
H. Lu, X. He, M. Du, X. Ruan, Y. Sun, and K. Wang, “Edge QoE: Computation offloading with deep reinforcement learning for Internet of Things,” IEEE Internet Things J., vol. 7, no. 10, pp. 9255–9265, 2020.
Y. Dai, D. Xu, S. Maharjan, G. Qiao, and Y. Zhang, “Artificial intelligence empowered edge computing and caching for internet of vehicles,” IEEE Wirel. Commun., vol. 26, no. 3, pp. 12–18, 2019.
T. K. Rodrigues, K. Suto, H. Nishiyama, J. Liu, and N. Kato, “Machine learning meets computation and communication control in evolving edge and cloud: Challenges and future perspective,” IEEE Commun. Surv. Tutor., vol. 22, no. 1, pp. 38–67, 2019.
T. G. Rodrigues, K. Suto, H. Nishiyama, N. Kato, and K. Temma, “Cloudlets activation scheme for scalable mobile edge computing with transmission power control and virtual machine migration,” IEEE Trans. Comput., vol. 67, no. 9, pp. 1287–1300, 2018.
J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944–7956, 2019.
T. T. Nguyen, L. B. Le, and Q. Le-Trung, “Computation offloading in MIMO based mobile edge computing systems under perfect and imperfect CSI estimation,” IEEE Trans. Serv. Comput., vol. 14, no. 6, pp. 2011–2025, 2019.
Y. Dai, D. Xu, S. Maharjan, and Y. Zhang, “Joint computation offloading and user association in multi-task mobile edge computing,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12313–12325, 2018.
I. Ullah, H.-K. Lim, Y.-J. Seok, and Y.-H. Han, “Optimal Task Offloading with Deep Q-Network for Edge-Cloud Computing Environment,” presented at the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), 2022, pp. 406–411.
K. I. Ahmed, H. Tabassum, and E. Hossain, “Deep learning for radio resource allocation in multi-cell networks,” IEEE Netw., vol. 33, no. 6, pp. 188–195, 2019.
Y. Dai, D. Xu, K. Zhang, S. Maharjan, and Y. Zhang, “Deep reinforcement learning and permissioned blockchain for content caching in vehicular edge computing and networks,” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4312–4324, 2020.
Y. Lu, X. Huang, K. Zhang, S. Maharjan, and Y. Zhang, “Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles,” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4298–4311, 2020.
Y. Liu, H. Yu, S. Xie, and Y. Zhang, “Deep reinforcement learning for offloading and resource allocation in vehicle edge computing and networks,” IEEE Trans. Veh. Technol., vol. 68, no. 11, pp. 11158–11168, 2019.
J. Wang, L. Zhao, J. Liu, and N. Kato, “Smart resource allocation for mobile edge computing: A deep reinforcement learning approach,” IEEE Trans. Emerg. Top. Comput., vol. 9, no. 3, pp. 1529–1541, 2019.
T. Q. Dinh, J. Tang, Q. D. La, and T. Q. Quek, “Offloading in mobile edge computing: Task allocation and computational frequency scaling,” IEEE Trans. Commun., vol. 65, no. 8, pp. 3571–3584, 2017.
J. Yan, S. Bi, Y. J. Zhang, and M. Tao, “Optimal task offloading and resource allocation in mobile-edge computing with inter-user task dependency,” IEEE Trans. Wirel. Commun., vol. 19, no. 1, pp. 235–250, 2019.
L. Huang, X. Feng, C. Zhang, L. Qian, and Y. Wu, “Deep reinforcement learning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing,” Digit. Commun. Netw., vol. 5, no. 1, pp. 10–17, 2019.
F. Gu, J. Niu, Z. Qi, and M. Atiquzzaman, “Partitioning and offloading in smart mobile devices for mobile cloud computing: State of the art and future directions,” J. Netw. Comput. Appl., vol. 119, pp. 83–96, 2018.
L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,” IEEE Trans. Mob. Comput., vol. 19, no. 11, pp. 2581–2593, 2019.
K. Zhang, Y. Zhu, S. Leng, Y. He, S. Maharjan, and Y. Zhang, “Deep learning empowered task offloading for mobile edge computing in urban informatics,” IEEE Internet Things J., vol. 6, no. 5, pp. 7635–7647, 2019.
S. Sheng, P. Chen, Z. Chen, L. Wu, and Y. Yao, “Deep reinforcement learning-based task scheduling in iot edge computing,” Sensors, vol. 21, no. 5, p. 1666, 2021.
C. Liu, F. Tang, Y. Hu, K. Li, Z. Tang, and K. Li, “Distributed task migration optimization in MEC by extending multi-agent deep reinforcement learning approach,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 7, pp. 1603–1614, 2020.
X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Bennis, “Performance optimization in mobile-edge computing via deep reinforcement learning,” presented at the 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018, pp. 1–6.
Y. Xie, Z. Xu, J. Xu, S. Gong, and Y. Wang, “Backscatter-aided hybrid data offloading for mobile edge computing via deep reinforcement learning,” presented at the International Conference on Machine Learning and Intelligent Communications, 2019, pp. 525–537.
K. Tian, H. Chai, Y. Liu, and B. Liu, “Edge Intelligence Empowered Dynamic Offloading and Resource Management of MEC for Smart City Internet of Things,” Electronics, vol. 11, no. 6, p. 879, 2022.
T. Alfakih, M. M. Hassan, A. Gumaei, C. Savaglio, and G. Fortino, “Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA,” IEEE Access, vol. 8, pp. 54074–54084, 2020.
Y. Li, F. Qi, Z. Wang, X. Yu, and S. Shao, “Distributed edge computing offloading algorithm based on deep reinforcement learning,” IEEE Access, vol. 8, pp. 85204–85215, 2020.
X. Xiong, K. Zheng, L. Lei, and L. Hou, “Resource allocation based on deep reinforcement learning in IoT edge computing,” IEEE J. Sel. Areas Commun., vol. 38, no. 6, pp. 1133–1146, 2020.
Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain and federated learning for privacy-preserved data sharing in industrial IoT,” IEEE Trans. Ind. Inform., vol. 16, no. 6, pp. 4177–4186, 2019.
M. Chen and Y. Hao, “Task offloading for mobile edge computing in software defined ultra-dense network,” IEEE J. Sel. Areas Commun., vol. 36, no. 3, pp. 587–597, 2018.
H. Lu, C. Gu, F. Luo, W. Ding, and X. Liu, “Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning,” Future Gener. Comput. Syst., vol. 102, pp. 847–861, 2020.
J. Li, Q. Liu, P. Wu, F. Shu, and S. Jin, “Task offloading for UAV-based mobile edge computing via deep reinforcement learning,” presented at the 2018 IEEE/CIC International Conference on Communications in China (ICCC), 2018, pp. 798–802.
W. Li, X. You, Y. Jiang, J. Yang, and L. Hu, “Opportunistic computing offloading in edge clouds,” J. Parallel Distrib. Comput., vol. 123, pp. 69–76, 2019.
M. Cheng, J. Li, and S. Nazarian, “DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers,” presented at the 2018 23rd Asia and South pacific design automation conference (ASP-DAC), 2018, pp. 129–134.
S. Nath and J. Wu, “Dynamic Computation Offloading and Resource Allocation for Multi-user Mobile Edge Computing,” presented at the GLOBECOM 2020-2020 IEEE Global Communications Conference, 2020, pp. 1–6.
J. Chen, H. Xing, Z. Xiao, L. Xu, and T. Tao, “A DRL agent for jointly optimizing computation offloading and resource allocation in MEC,” IEEE Internet Things J., vol. 8, no. 24, pp. 17508–17524, 2021.

No competing interests reported.

Download PDF

Journal Publication

published 26 Jul, 2023

Read the published version in Journal of Cloud Computing →

Editorial decision: Major revision
07 Mar, 2023
Reviews received at journal
07 Feb, 2023
Reviewers agreed at journal
05 Feb, 2023
Reviewers invited by journal
05 Feb, 2023
Editor assigned by journal
31 Jan, 2023
Submission checks completed at journal
29 Jan, 2023
First submitted to journal
27 Jan, 2023

You are reading this latest preprint version

Symbol	Definitions
\(\mathcal{T}\)	A set of tasks generated by edge devices
\({t}_{n}\)	Each task generated by the end device\(n\)
\({\mathcal{z}}_{n}\)	The input data size of the task\({t}_{n}\)
\({c}_{n}\)	Computation resource size required for the task\({t}_{n}\)
\({\tau }_{n}\)	Maximum tolerable latency of the task\({t}_{n}\)
\({\mathcal{y}}_{n}\)	The resultant data of the task\({t}_{n}\)
\({x}_{n}\)	A binary variable to indicate whether the task \({t}_{n}\) is assigned to edge or cloud, (0 indicates edge and 1 indicates cloud).
\(\varsigma\)	CPU unit to process the one byte of data
\(\mathcal{B}\)	Set of base stations (BS),\(\mathcal{B}=\{{b}_{1}, {b}_{2},\dots ,{b}_{W}\}\)
\({\mathcal{U}}_{B}\left(t\right)\)	The bandwidth utilization of all BSs at time step\(t\)
\({H}_{w}\)	Set of wireless channels related to the BS\({b}_{w}\)
\({\beta }_{h}^{w}\)	The bandwidth of each channel \(h\) \({b}_{w}\)
\({\sigma }_{h}^{w}\)	Remaining bandwidth of the BS \({b}_{w}\)
\(\mathcal{P}\)	The set of computing servers at the edge
\(p\)	Computing server at the edge\((p\in \mathcal{P})\)
\({c}_{p}\)	The available computing capacity of server\(p\)
\({T}_{n}^{{proc}_{e}}\)	The processing time for the task \({t}_{n}\) at the edge
\(\mathcal{M}\)	The set of computing servers in the cloud
\(m\)	Computing server at the edge\((m\in \mathcal{M})\)
\({T}_{n}^{{proc}_{c}}\)	The processing time for the task \({t}_{n}\) at the cloud server
\({c}_{m}\)	\(\text{T}\)he computing capacity of a cloud server \(m\)
\({C}_{c}\)	\(\text{T}\)he total computing capacity of cloud servers
\({T}_{n}^{{proc}_{c}}\)	The total computation time for the task \({t}_{n}\) at cloud
\({\mathcal{U}}_{M}\left(t\right)\)	The computational resources utilization of cloud servers at time step\(t\)
\({T}_{n}^{{trans}_{e}}\)	Transmission time for the task \({t}_{n}\) data sent to the edge server
\({T}_{n}^{{trans}_{c}}\)	Transmission time for the task \({t}_{n}\) data to the cloud server
\({T}_{n}^{{prop}_{e}}\)	The propagation time of the link between the nodes and edge servers
\({T}_{n}^{{prop}_{c}}\)	The propagation delay of the link between the edge and cloud
\({rtt}_{n}^{e}\)	The total round-trip time to the edge for a task \({t}_{n}\)
\({rtt}_{n}^{c}\)	The total round-trip time to the cloud for a task \({t}_{n}\)

Optimizing Task Offloading and Resource Allocation in Edge-Cloud Networks: A DRL Approach

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Related Work

System Model

A. Task Model

B. Wireless Bandwidth Model

C. Computational Model

i. Edge computing:

ii. Cloud computing:

D. Delay model

i. Transmission Time

ii. Propagation Time

iii. Processing delay:

E. Formal Problem Formulation

Ddqn-based Task Offloading And Resource Allocation

A. Markov Decision Process

Ddqn Framework For Task Offloading

Performance Evaluation

Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1

\(\sum _{w=1}^{W}\sum _{h=1}^{H}{ch}_{h}^{w}\bullet {\mu }_{h}^{w}\le B\)	(15)
\(\sum _{n=1}^{N}{c}_{n}\bullet \left(1-{x}_{n}\right)\le {C}_{e}\)	(16)
\(\sum _{n=1}^{N}{c}_{n}{\bullet x}_{n}\le {C}_{c}\)	(17)
\({rtt}_{n}^{e}\bullet \left(1-{x}_{n}\right)+{rtt}_{n}^{c}{\bullet x}_{n}\le {\tau }_{n}\)	(18)

\(\rho \left(t\right)={\mathcal{U}}_{W}\left(t\right)+{\mathcal{U}}_{P}\left(t\right)+{\mathcal{U}}_{M}\left(t\right)\)	(20)
\(\sigma \left(t\right)={CO}_{W}\left(t\right)+ {CO}_{P}\left(t\right)+ {CO}_{M}\left(t\right)\)	(21)
\({r}_{t}\left({s}_{t},{a}_{t}\right)=\frac{\rho \left(t\right)}{\sigma \left(t\right)}\left[{\tau }_{t}-(rt{t}_{t}^{e}\bullet \left(1-{x}_{t}\right)+rt{t}_{t}^{c}\bullet {x}_{t})\right]\)	(22)

\({q}_{i}\approx {Q}_{\pi }({s}_{i},{a}_{i}\|\theta )\)	(23)
\({y}_{t}=r+\gamma \times \underset{\text{a}}{\text{max}}{\stackrel{-}{Q}}_{\pi }({s}_{t+1},a\|\stackrel{-}{\theta })\)	(24)
\(loss={\left({y}_{t}-{Q}_{\pi }(s,a)\right)}^{2}\)	(25)