Unsupervised Deep Learning for Binary Offloading in Mobile Edge Computation Network

Mobile edge computation (MEC) is a potential technology to reduce the energy consumption and task execution delay for tackling computation-intensive tasks on mobile device (MD). The resource allocation of MEC is an optimization problem, however, the existing large amount of computation may hinder its practical application. In this work, we propose a multiuser MEC framework based on unsupervised deep learning to reduce energy consumption and computation by offloading tasks to edge servers. The binary offloading decision and resource allocation are jointly optimized to minimize energy consumption of MDs under latency constraint and transmit power constraint. This joint optimization problem is a mixed integer nonconvex problem which result in the gradient vanishing problem in backpropagation. To address this, we propose a novel binary computation offloading scheme (BCOS), in which a deep neural network (DNN) with an auxiliary network is designed. By using the auxiliary network as a teacher network, the student network can obtain the lossless gradient information in joint training phase. As a result, the sub-optimal solution of the optimization problem can be acquired by the learning-based BCOS. Simulation results demonstrate that the BCOS is effective to solve the binary offloading problem by the trained network with low complexity.


Introduction
With the dramatic growth of Internet of Things (IoT) devices, various computation-intensive mobile applications, such as speech recognition, language processing, online game and reality augmentation, are emerging. Executing computation-intensive applications poses great challenges on the MDs due to the limited battery energy and low computing power. A potential solution to address the challenges is mobile cloud computation (MCC) [1]. In MCC, the devices offload the computation tasks to remote cloud servers for execution over wireless link. Nevertheless, the execution latencies may be very large due to the long distance and the huge additional transmission load between MDs and cloud servers [2]. To handle this problem, mobile edge computing (MEC) is proposed, which reduces network latency by placing small edge servers near end users [3]. Hence MEC servers can execute and deliver the computation services rapidly to reduce the delay and save the energy consumption, which are the pivotal challenges for future radio network.
It is the common cognition in academia and industry that the efficiency of MEC is largely determined by the offloading decision [4]. Moreover, reasonable resource allocation is also important for the improvement of the performance of MEC [5,6]. Therefore, it is necessary to jointly optimize the policies, which comprise the offloading decision and resource allocation, to acquire the optimal solution for the delay sensitive tasks. Most of the joint optimization problems are typically NP-hard problems [7]. Owing to its non-convex property, the conventional methods used in the literatures are either exhaustive search or iterative optimization of approximation problem. Thus, the existing complexity and convergence issues of these methods may hinder their practice application. What is worse, the computation complexity of these problems would exponentially grow against the number of MDs, and the growing complexity even result in the infeasibility. In recent years, DL has achieved great success in nature language processing, speech recognition and some other fields. Some researches show that it can also be used to process hard communication issues, for instance channel precoding [8][9][10], power control [11] and channel estimation [12]. Thus, we try to tackle the MEC optimization problem by a deep-learning-based method with lower complexity. Meanwhile, computational offloading problems are generally divided into two categories: partial offloading which only offloads a subset of the task components to edge servers, and binary offloading which hands over all the task to edge servers. However, partial offloading requires to calculate the computational cost for each task component, thus puts additional work load on computation resources and energy reserves [13]. Compared with partial offloading, binary offloading is more suitable to tackle atomic tasks that are not partitionable and easier to implement in practice. Hence, it makes sense to tackle the binary offloading and resource allocation issue in MEC network with low complexity by deep-learning-based approach.
In this work, we consider a multiuser mobile-edge computation offloading (MECO) network based on Time Division Multiple Access (TDMA). To minimize the energy comsumption of MDs, we model the MECO issue as a joint optimization problem which jointly optimizes offloading decision and resource allocation. Then, a novel unsupervised deep-learning-based BCOS is proposed to find the sub-optimal solution. Compared with the supervised deep-learning method, the unsupervised deep-learning method don't need the training dataset with lables. Particularly, it is difficult to obtain the training dataset in this joint optimization problem due to its nonconvex nature. Moreover, this optimization problem is a mixed integer programming problem, and it causes gradient vanishing issue in DL network. To tackle this issue, we design a DNN with an auxiliary teacher network to acquire the lossless gradient information by using the auxiliary network as the teacher network in joint training phase. Then the main contributions are summarized as follows: 1. By taking latency constraint, transmit power constraint and energy consumption into account, we model the multiuser binary MECO process under TDMA system as a mixed integer programming (MIP). The optimization policies including offloading decision, transmit time slot and transmit power are jointly optimized to minimize the sum energy consumption of MDs.
2. To tackle the binary offloading issue, we propose the BCOS in which the original optimization problem is transformed as an unsupervised deep-learning problem to reduce the computation complexity. Comparing with the supervised DL, the unsupervised DL does not require the training dataset with labels, which the is generally difficult to be obtained by solving the optimization problem with conventional mathematical method. 3. To address the gradient vanishing problem caused by binary offloading decision, we design a DNN with an auxiliary network. With the aid of the auxiliary teacher network, the student network can acquire the lossless gradient information directly. Therefore, the binary offloading problem can be solved effectively by the designed DNN. Moreover, we can obtain the sub-optimal solution by the trained DNN with low complexity.
The remainder of this paper is organized as follows. In Section 2, the related work of deep learning and MECO are introduced. The system model and problem formulation are proposed in Sect. 3. In Section 4, binary computation offloading algorithm is presented, which includes deep learning framework, proposed deep-learning-based problem formulation and the joint training mechanism with auxiliary network. The fifth section gives the simulation results and discussions. Finally, we conclude the research work in the last section.

Related Works
A large amount of study works are done to improve the performance of MDs computation offloading by utilizing the advantages of edge servers [14][15][16]. To adapt the real-time computation offloading, several low-complexity algorithms have been proposed to deal with the binary computation offloading problem. However, most of these works tackle the computation offloading problems with mathematical analysis fashions [2,[17][18][19]. Zhao et al have proposed a minimum incremental task allocation algorithm to minimize the industrial vehicles system cost including energy consumption, execution time by optimizing the partial offloading decision [6]. Liu et al. have presented an novel one-dimensional search algorithm to process the power-constrained delay minimization problem, which obtained the optimal offloading decision according to the queuing state of the application buffer, the available power at local processing unit and remoting transmission unit, as well as the CSI between MDs and the MEC servers [20]. However, to make offloading decision, the MDs require feedback from MEC servers in this algorithm, which increases signaling overhead additionally. In [2], a distributed computation offloading algorithm based on game theoretic method is designed to obtain efficient computation offloading decision, which requires multiple communication iterations between MDs and MEC servers. Similarly, references [21] and [22] update the binary offloading decision iteratively to solve the joint task offloading and resource allocation issue. Reference [23] propose a constrained stochastic succession convex approximation (C-SSCA) algorithm for minimizing the sum energy consumption of MDs by jointly optimizing the transmit power, offloading decision and the assignment of computation resource with low-complexity. However, all those algorithms are not applicable for the real-time computing offloading of the MEC network due to the limitation of the trade-off between computational complexity and optimality. Machine learning has replaced traditional method in many fields, such as computer vision, natural language processing, and face recognition. The performance of DL has transcended that of the traditional machine learning methods, and DL has been widely exploited in communication fields and achieved excellent results [24][25][26][27]. The optimization framework proposed in [28] exploits deep reinforcement learning method to tackle the resource allocation in wireless MEC. In [29], a smart energy-efficient partial computation offloading scheme based on DL is proposed to minimize the cost function by selecting an offloading set which depends on the mobiles' remaining energy, energy consumption of application components, channel conditions, data size for transmission, computational load, and latency in communication. Furthermore, Gong Y et al. exploit DL method to solve the MEC offloading problem, which minimize the cost function by optimize the state of mobile environment [30]. Nevertheless, most of the aforementioned deep-learningbased methods are supervised DL and is difficult to acquire the appropriate training dataset with labels. Moreover, researchers barely adopt unsupervised deep-learning-based method which need the training dataset without labels to solve the binary computation offloading problem. Meanwhile, there is little literature on tackling the MIP problem with deep-learning-based method. Hence, this motivates us to design an unsupervised deep-learning-based method for intelligent offloading decision-making processes in multiuser MEC network. In addition, the comparison of conventional mathematical method, supervised DL and unsupervised DL is show in Table 1, which indicates that the unsupervised DL model can be trained using dataset without labels and the trained model can tackle the offloading problem with extreme high calculation speed.

Network Model
We consider a multiuser MECO system that contain K single-antenna MDs with computation-intensive tasks and a single-antenna base station (BS) equipped with edge cloud server as shown in Fig. 1. Time is separated into slot of T seconds for K mobiles and each slot contains two phases:1) local computing or offloading and 2) cloud computing and fetching the computational results from the edge cloud server to MDs. During the time slot, the computation-intensive tasks of K MDs can be locally executed by the CPU of the terminal MDs or remotely executed via offloading to the edge servers based on TDMA. Meanwhile, we assume that the tasks are atomic and can't be split further as the strong dependence on each other. In other words, the tasks are either executed locally or completely offloaded to the edge servers. So as to enable the BS to select the offloading MDs and allocate time slots and transmit power to the offloading MDs, we also suppose that the BS masters the channel state information (CSI), the energy consumption of computing per bit and the size of task data for all users. Moreover, channels are supposed to keep unchanged within a slot.

Local Execution Model
The model of local execution is depicted as follows. We first model the computation power consumption by p = f 3 ,where is determined by the structure of chip, and f denotes the computational speed of the CPU measured by the cycle numbers per second [31]. B k and C k denote the size of task data and the number of cycles CPU to process 1-bit data for MD k respectively. And f k denotes computational speed of MD k. Particularly, we assume that C k and f k of each mobile are fixed, which may vary over different MDs. Due to the assumption that all the tasks are atomic, we denote a k ∈ {0, 1} as the offloading strategy, a k = 1 if MD k offloads its total task to edge server to compute, otherwise a k = 0 . Then the required number of CPU cycles for MD k to process a B k Kilobyte task is 1 − a k C k B k and the time consumption of local computation for MD k can be given as follow: Then the energy consumption of local computation for MD k denoted as E loc,k , is given by where p k,l denotes the local computation power consumption for MD k.

Offloading Model
The energy consumption of computation offloading is modeled in this subsection. Computation offloading comprises three phases for 1) uplink task transmission (i.e., wireless access to edge servers via TDMA), 2) edge execution and 3) result fetching. Assume the cloud server has infinite capacity and the tasks can run in parallel. Consequently, the latency of cloud execution is very small, and result fetching is much faster than uplink task transmission due to the relatively smaller size of the computation result. For this reason, the latency and energy consumption of edge execution and result fetching are assumed to be negligible compared with the uplink task transmission. Thus, the resource allocation of the above two phases are not considered. Then the uplink transmission rate r k of MD k can be expressed as where p k and h k refer to the transmission power and channel gain for MD k, N 0 is the variance of complex white Gaussian channel noise, W is the bandwidth. The time required for offloading task on MD k can be denoted as The fraction of time slot allocated to mobile k denotes as t k , thus t ′ k should not be larger than t k to ensure the transmission of the completed task data. Based on the depiction above, the energy consumption of offloading E off ,k for MD k can be expressed as

Problem Formulation
Our objective is to minimize the weighted sum mobile energy consumption for K MDs by adjusting the binary offloading decision a , transmit time slot t and transmit power p . According to the local execution model and the offloading model, the corresponding optimization problem can be formulated as follows: where k denotes the positive weight factors accounting for the fairness of MD k and p max denotes the maximum transmission power of MDs. Here, C1 is the time allocation constraint for K MDs. C2 denotes the latency constraint of local computation. C3 specifies the maximum transmit power constraints for per MD. C4 indicates that a task can be executed locally or offloaded to edge server for remote processing. Last, C5 ensures that a task can be offloaded completely for MD k within specified time. It can be seen that problem P1 is a mixed integer nonconvex problem which is NP-hard. It is challenging to solve the NP-hard problem directly by conventional mathematical analysis method [32].

Proposed Binary Compution Offloading Algorithm
In order to solve the NP-hard problem P1, the unsupervised deep-learning-based DNN model is used to implement the mapping from channel gain to offloading decision and resource allocation. In this section, we describe the binary computation offloading scheme containing details of the basic operation of the fully connected DNN, the proposed deeplearning-based problem formulation and the joint training mechanism with auxiliary network.

Deep Learning Framework
First, let us briefly introduce the operation of the fully connected network (FCN) in Fig. 2. The FCN used in this paper is composed of one input layer, two hidden layers and one output layer. The channel gain vector h of K MDs and optimization policies a, p, t are as the input layer and output layer respectively. The number of nodes of i − th layer is denoted as l i , i = 1, 2 . The output of the i − th hidden layer is computed as follows: where x i and x i−1 are the output vectors of current and previous layers, and their dimensions are l i × 1 and l i−1 × 1 respectively; W i is the weight matrix with dimensions of l i × l i−1 ; b i is the bias vector with dimensions of l i × 1 ; ReLU is the Rectified Linear Unit function (max (x, 0)) ; and BN denotes the batch normalization (BN). Then, sigmoid function is chosen as the last activation function and the computation is given as follows: where the sigmoid function is given as: .

Proposed Deep-Learning-Based Problem Formulation
Due to the standard DL issues are unconstrained issues, the approaches used to address these problems can not be adopted directly to the complex MECO problem with constraints [34]. The common approaches used to eliminate the constraints are to concatenate selfdefined activation layer as output layer, or to add additional terms to the objective function as the DNN loss function for punishing the constraint violations. In this subsection, the tricks mentioned above are used to transform the origin optimization problem P1 into an unconstrained DL problem. First, the DNN used to address the optimization problem P1 is comprised of three parallel FCNs. For this DNN, the input is the channel gain h and the output is the solution of problem P1, namely, the offloading decision a , the time allocation t and the transmit power allocation p . According the Eq. (10), we employ the near universal parametrization method to parameterize the offloading decision and the resource allocation function: where h is the input channel gain, and is the network parameters [33]. Then the problem P1 can be modified as: where a k = k a (h, ) , p k = k p (h, ) and t k = k t (h, ) denotes the sub-optimal policies for MD k. Although the problem P2 is non-convex, the loss of the optimality is small for the near-universal parameterizations according to the Theorem1 in [33].
To meet the constraints of the offloaded decision a and resources allocations p and t , we concatenate different self-defined activation layers at the end of the FCN, and x o in Eq. 10 is the input of self-defined layers. Then, the different self-defined layers are given respectively as follows: 1. To satisfy the transmit power constraint C3 in problem P2, the activation function of the last layer for the FCN is expressed as: 2. To satisfy the time allocation constraint C1 in problem P2, the normalization activation function of the last layer for the FCN is given as: 3. The binary offloading decision need to satisfy the constraints C2 and C4. Namely, if the whole task can not be completed in the specified time locally, it must be offloaded to edge servers for execution. In light of constraint C4, we can obtain a ratio a k ≥ m + k with m k = 1 − f k T C k B k and x + = max {x, 0} for MD k, which denotes the ratio of task beyond local computing power. Once the ratio is greater than 0, then a k = 1 . Accordingly, the activation function of the last layer for FCN can be given as: where the vector m + k denotes the task ratio vector, and the sign function is given as follows: As a result, the constraints C1, C2, C3 and C4 are addressed by the activation layers at the end of the FCN. Further, the optimization problem P2 can be transformed as follows: However, the optimization problem P3 is still constrained. To remove the constraint C5, the penalty term is introduced in the loss function of DNN. Since we are merely focus on eliminating the states that dissatisfy the constraints, we can neglect the value of function t � k − k t (h, ) when constraint C5 is satisfied. Consequently, the Hinge function is defined as follows: The constraint C5 can be replaced equally by H(t � k − k t (h, )) = 0, ∀k without changing the origin formulation. Specifically, H(t � k − k t (h, )) can be thought as the loss when constraint C5 is dissatisfied. If the constraint is satisfied, the loss is zero. Hence, the optimization problem P3 can be transformed to minimize a new loss function of DNN which penalizes the constraint violation. That is, the overall learning problem can be expressed as: where B denotes the scaling factor to balance the loss function and is the network parameters set. Then the loss function can be expressed as: In the loss function (20), penalty term is introduced to make the output satisfy the offloading time constraints. If t namely, the offloading time constraints are not satisfied. To minimize the loss function, the penalty term compels the DNN updating along the direction of constraint satisfaction. In the opposite, if t = 0 and the penalty term has no effect on the loss function. In this instance, the training process concentrates on the satisfaction of other MDs and the minimization of the energy consumption of K MDs. B is the scaling factor, which is used to balance the gap between different terms of the loss function. Therefore, B needs to be adjusted carefully as a hyperparameters: if too small, the DNN may minimize the energy consumption mainly and output an infeasible solution violated the constraint; if too large, the DNN may concentrates on the satisfaction of the constraints while neglecting the minimization of the sum energy consumption. It is generally difficult to select a suitable hyperparameter in DL and many optimization problems. In this work, we use sub-gradients to handle the non-differentiability caused by the Hinge function, and the parameter update equation of B can be written as: where is the step size and

Joint Training Mechanism With Auxiliary Network
Although the problem P4 is an unconstrained optimization problem, the binary offloading decision considered in the MECO network will result in gradient vanishing issue in the training process. In other words, the back-propagation method can not effectively update the gradients of the neural layers before the binary layer when the non-differentiable operators are adopted. To address this issue, a DNN with an auxiliary network is designed in Fig. 3. In this subsection, we introduce the training process of the proposed DNN in detail.
To map the channel gain to a binary offloading decision of 0 or 1, the common binarization operators (e.g., sign()) may be applied in the activation function of DNN. While the fact that the derivative of the output of the binarization operator neuron is zero everywhere except the origin where the function is non-differentiable, may result in the vanishing gradient problem during backpropagation. The most common practice to overcome this problem is to approximate the activation function of binarization layer with a smooth differentiable function during backpropagation. Such an approximation of a binarization layer during backpropagation is taken as straight-though estimation (STE), which is first proposed by G. E. Hinton [35,36]. However, the approximation may cause noisy signal when updating the parameters of DNN due to the incorrect updating direction [37,38]. (20) To mitigate this, we design the DNN containing an auxiliary teacher network and a student network in Fig. 3. Specially, the teacher network (red dotted line in Fig. 3) is an auxiliary network which is used to guide the student network to update the gradient information, and in which the constraint of offloading decision a auxi is relaxed as a successive constraint, so the gradient information can be updated in backpropagation and transmitted to the student network effectively; the student network (green dotted line in Fig. 3) is the binary offloading network which shares the FCN before last activation function layer for offloading decision with the teacher network ,so the student network can get the lossless gradient information from the teacher network in alternate updating process. Moreover, we suppose that the tasks can be tailored and the offloading decision constraint C4 are relaxed as 0 ≤ a k auxi ≤ 1, ∀k for the auxiliary network. Thus, a k auxi denotes the offload ratio of task on mobile device k. To satisfy this constraint, the last layer activation function of auxiliary network of offloading decision is defined as: Therefore, the student network and the auxiliary teacher network have the same structure except the different activation function layer for the offloading decision. The identical structure is effective to transmit information from the auxiliary network to the student network and reduce the information loss caused by the structure mismatch between the auxiliary network and the student network [9]. Additionally, the network before the last activation layer for the offloading decision is shared by the auxiliary network (in the red line box) and the student network (in the green line box) as seen in Fig. 3. Thus, the student network can directly acquire the lossless gradient from the auxiliary network.
Similar to the learning problem P4, the loss function of the auxiliary network can be given as: where A is the scaling factor and auxi is the network parameters set. Then the loss function can be expressed as:  Fig. 3 Schematic of the proposed joint training with auxiliary DNN. The FCN of offloading decision is shared by the auxiliary teacher network and the student network. In training phase, the parameters of the auxiliary teacher network (red dotted line) and the student network (green dotted line) are updated alternatively. Then, the shared FCN can obtain the lossless gradient information during backpropagation 1 3 To utilize the auxiliary network in training phase, we exploit a joint training approach to alternatively train the auxiliary network and the student network [33,34]. The detailed procedure is presented in Algorithm 1. The auxiliary teacher network and the student network are sequentially updated in each iteration process as shown in Fig. 4 which is the flowchart of the proposed joint training process. Therefore, the teacher network can guide the student network effectively to obtain lossless information for generalization, and the corresponding joint training prohibits the student network from being trapped at a poor local minimum [33][34][35]. (25) � . Fig. 4 Flowchart of the proposed joint training process for teacher network and student network 1 3

Numerical Simulation
In this section, numerical simulations are carried out to evaluate the efficiency of BCOS. We consider a multiuser MEC system comprising of a single-antenna BS equipped with edge servers and K MDs. We model the channel as independent Rayleigh fading and set the large-scale fading average power loss as 10 −6 . Other simulation parameters are shown in Table 2 unless specified otherwise.
To evaluate the performance of our proposed BCOS, we take three offloading schemes into account and explain them particularly as follows: 1. Minimum computation offloading scheme (MCOS): In this approach, we give priority to local execution. If the local computation capacity is insufficient, all of the remainder will be offloaded with maximum transmitting power to edge servers and the weighted sum energy consumption of K MDs is computed. 2. Partial computation offloading scheme (PCOS): Assume that the tasks can be partitioned arbitrarily, we use the single teacher network to optimize the offloading ratio, transmit- The weighted sum energy consumption of the three different schemes for K MDs are compared in Fig. 5 with the increase of MD number from 4 to 20, where T = 0.5s . We can find that for all three schemes, the weighted sum energy consumption increases with the MD number. The energy consumption of MCOS is always more than the other two schemes, indicating that the computation offloading is effective to save the energy consumption of MDs.
In addition, the energy consumption of PCOS and BCOS are relatively close. When the number of MDs is less than 8, the energy consumption of PCOS and BCOS are approximately the same. While MD number is more than 8, the gap of energy consumption between PCOS and BCOS begin to increase. This is because that when the number of MD is small, the time deadline is relatively loose, and the devices can be allocated enough time to offload all the tasks. With the increase of mobile device number, the time deadline is tightened and the tasks may need to be tailored. In view of the task integrity, BCOS can't offload as many tasks as PCOS does.
We further investigate the effect of communication resource on the weighted sum energy consumption. The comparison of weighted sum mobile energy consumption between MCOS, PCOS and BCOS with the increasement of time slot T from 0.3s to 1.6s is shown in Fig. 6, where K = 14 . Compared with another two schemes, the energy savings of the proposed BCOS are evaluated. With the growth of time slot T, the energy consumption of PCOS and BCOS first decrease and then stabilize when T ≥ 1.5s , while the energy consumption of MCOS remains almost unchanged. This is because that the increased time slots allow more MDs to be selected to offload their tasks. when time slot T is greater than 1.5s, the energy consumption of PCOS and BCOS are not abating. It indicates that the allocated time has reached saturation and meets the requirement of offloading all the tasks. Therefore, the further increase of time slot T has no effect on energy consumption. For MCOS, it gives priority to local computation and the assigned time always meets the requirement of offloading the remainder task.
Furthermore, we can see that the energy consumption of PCOS is slightly lower than that of BCOS at the beginning and then the gap become smaller with the increasement of T. Due to that the partial offloading decision is exploited by PCOS, the tasks on MD can be tailored flexibly and offloaded as much as possible. However, PCOS violates the integrity of the task. Hence, the relaxed latency constraint can help to reduce the difference between binary offloading and partial offloading, since the tasks can be offloaded as a whole without compromising the integrity.
Finally, we depict the cumulative distribution function (CDF) of the sum of the realistic task offloading time for K MDs in Fig. 7, where T = 1s and K = 14 . It is seen that for MCOS, all MD have a high probability of the realistic task execution delay at 0.5s, which is much lower than the prescribed delay constraint 1s. For the reason that MCOS gives priority to local execution, only a small number of tasks are offloaded to edge servers, and the required time is relatively less. However, both PCOS and BCOS have a few samples whose realistic offloading time is beyond the prescribed time constraint. It is because that the number of the samples violated the delay constraint is affected by hyperparameters B and A of penalty terms in loss function (20) and (25). The hyperparameters are used to balance the objective function and realistic offloading time constraint in loss function. If too large, the DNN would concentrate on meeting the realistic offloading time constraint and the number of samples violated the delay constraint would be further reduced or even become 0 while sacrificing the sum energy consumption; If too small, the DNN would mainly minimize the weighted sum energy consumption and output an infeasible solution with the increased number of samples violated the delay constraint. Hence, there are a few samples violated the delay constraint obtain the sub-optimal offloading decisions and the lowest energy consumption. On the whole, both PCOS and BCOS complete the offloading tasks with a very high probability within the prescribed delay.

Conclusion
In this study, we have proposed an unsupervised DL binary offloading problem for computation intensive tasks on MD in multiuser MEC network to save the MD energy consumption. The problem is formulated as an optimization issue, which jointly optimize the binary offloading decision, transmit time and transmit power, to minimize the energy consumption of K MDs under the constraints of latency and transmit power. Due to the binary offloading decision, the optimization issue is a MIP problem, and the gradient vanishing problem occurs in backpropagation. To tackle this, we have proposed the BCOS, in which we design a DNN with a auxiliary network. With the assistance of the auxiliary network, the student network acquire the gradient information without loss. Finally, the sub-optimal solution of the optimization issue is obtained by the unsupervised DL based BCOS. The simulation results indicate that the mobile edge computation offloading approach is effective to reduce the energy consumption of MDs. Moreover, both BCOS and PCOS can solve the optimization problem effectively with binary offloading decision and partial offloading decision respectively by trained DNN with low complexity. Especially, the binary offloading decision is suitable for the sample tasks which can't be tailored.
In the future, we will expand the proposed MEC network model to consider the computation resource allocation of edge servers and the non-orthogonal multiple access (NOMA) technology for improving the performance further. Meanwhile, considering privacy and limited communication resources for MDs, we will attempt to tackle the MEC problem with federated learning method.