Deep Cooperative Spectrum Sensing Utilizing Recurrent Convolutional Neural Networks

Cognitive Radio (CR) network was introduced as a promising approach in utilizing spectrum holes. Spectrum sensing is the ﬁrst stage of this utilization which could be improved using cooperation, namely Cooperative Spectrum Sensing (CSS), where some Secondary Users (SUs) collaborate to detect the existence of the Primary User (PU). In this paper, to improve the accuracy of detection Deep Learning (DL) is used. In order to make it more practical, Recurrent Neural Network (RNN) is used since there are some memory in the channel and the state of the PUs in the network. Hence, the proposed RNN is compared with the Convolutional Neural Network (CNN), and it represents useful advantages to the contrast one, which is demonstrated by simulation.


Introduction
Cognitive radio networks (CRN) are exceptional cases of networks which have been designed to utilize the frequency bands dynamically and efficiently. Two leading players of these networks are primary users (PU) who pay for the frequency bands and secondary users (SU) that utilize the vacant bands dynamically. Hence, determining the unoccupied bands is crucial for the SUs due to not interrupt the PUs. In order to detect the empty bands, SUs need to recognize the activity of the PUs, which is called spectrum sensing in the context [1][2][3][4][5].
Although the spectrum sensing is essential in each SU, the individual sensing results are highly disposed to the fading channel conditions and other destructive effects. Consequently, to improve the sensing quality, Cooperative Spectrum Sensing (CSS) is introduced in the literature. CSS utilizes the results of some of cooperating SUs to enhance the quality of sensing and develop the accuracy. However, the channel condition affects the optimal strategy of cooperating in CSS [1]. As an example, the SU that is close to the PU will detect the presence of PU which is very important to share this information with remote SUs to help them in detecting the presence of PU [3,4]. Therefore, having a unique approach to combine the individual sensing results is very crucial in the accuracy of the CSS.
Accordingly, Combining the individual results become very popular among the researchers to have an optimal approach. Firstly, some simple procedures were introduced based on the hard decisions such as AND-rule, OR-rule, counting rule, and linear quadratic combining rule [6][7][8]. These are the most common cooperation approaches. Besides, there are some soft decision techniques which utilizes the detected energy levels in the SUs to develop the accuracy [9]. Further, the relay-based cooperation is studied in [10,11], too. Moreover, some efficient approaches based on the K-out-of-N scheme has been introduced to determine the presence of the PU by the positive response of K SUs among N SUs [1,2] in which the independence of the individual sensing is assumed. Moreover, multi-dimensional correlation in individual sensing was considered for the CSS in [3,4]. Besides, machine learning-based approaches were introduced in [5] which improve the quality of the well-known procedures in the fusion center (FC). These techniques are categorized in unsupervised and supervised categories based on the training method. The authors of [5] studied K-means clustering and Gaussian Mixture Models (GMM) as the unsupervised approaches and Support Vector Machine (SVM) and K-nearest Neighbor (KNN) under the supervised approach. Further, Deep Learning (DL) was utilized in [12] to design a CSS algorithm based on DL, which needs the update of all the measurements in all SUs in each time slot. The main drawback of these approaches is the lack of historical data in the classification process. It has been expressed in [13,14] that exploiting temporal data of PU states can improve the spectrum sensing relations. Hidden Markov Model (HMM) has been applied as the first employer of the temporal relation in [15], which is hugely dependent on the accuracy of statistical models.
In this paper, we have utilized DL [16] to improve the classification performance of CSS. In order to utilize the previous knowledge of the PU, we have proposed to employ Long-Short-Term-Memory (LSTM) layer. Utilizing temporal relations could improve the classification accuracy. The proposed neural network is composed of three consecutive convolutional blocks to employ the spatial features of different sensing nodes. Besides, each of the convolutional blocks consists of a convolutional layer, batch normalization layer, and rectified linear unit (RELU), which is succeeded by a max-pooling layer to decrease the size of sensing samples. Henceforth, the LSTM layer employs temporal features. Ultimately, the fully connected layer translate the resultant samples into two categories to be alleviated by softmax and classification layers. The most significant novelty of the paper is employing CNN and LSTM together to exploit the spatio-temporal information to improve the classification accuracy. Moreover, the proposed NN can learn the PU activity pattern without any prior information about the PU's activity distribution, such as dwell time and the probability of dwelling. Additionally, the proposed approach is totally model-free, and its accuracy is not in danger by the inaccurate assumptions. Finally, in low Signal-to-Noise Ratios (SNRs), it is more robust than other state-of-the-art approaches, and its performance is less affected than others.
The remainder of the paper is as follows: After system model in Section II, the DL based CSS algorithm is considered in Section III. Subsequently, the simulation results and concluding remarks are expressed to declare the applicability and performance merit of the proposed approach in Section IV and V.
The PUs can transmit in N P bands arbitrarily, and none of the N SU are aware of the utilized bands. The power of transmitting is fixed at P . Additionally, the Additive White Gaussian Noise (AWGN) is assumed where the power spectral density is N 0 and z j i (n) is the noise of the i-th SU in the band j at the time of n. Hence, the SU broadcasts the request of cooperative sensing, and all the one-hop neighbors will respond to its appeal by their local sensing results, intermittently. The initiating SU, which requests for cooperative sensing is called Fusion Center (FC) in this phase. The FC will combine the local sensing results in the network.
As shown in Fig. 1, the FC begins the DL-based cooperative spectrum sensing and detect the presence or absence of the PU based on the surrounding one-hop neighbors. The number of one-hop neighbor surrounding the agent is N . In each stage of cooperative sensing, only C < N nodes are selected based on the reliance of the cooperation, and it would change based on the conditions of the channel and hidden PUs. To broadcast the fusion decision, the FC utilizes a reliable channel to transmit its cooperation consequence and inform other one-hop nodes about the last result. In addition, the channel is assumed to be unavailable if all of the available bands is detected to be occupied by the PU. The same as other references such as [5], the birth-death process is assumed for the simulation of the PU activity characterized by the birth rate r b and death rate r d . Hence, average probability of being active for the PU is assumed to be r b /(r b + r d ).
Moreover, the fading and shadowing effect together with the hidden PU makes some SUs unable to detect the presence of the PUs by local sensing. The received signal model in each of the SU could be determined by where g j i (n) is the fading factor of the channel between the PU and i-th SU in frequency band j in time instant n, and s j (n) is the transmitted sample in the j-th band. Furthermore, let y i,j be the received energy samples from the SU i in the j-th band as follows where N ED is the number of samples exploited to calculate the received energy level. The SUs transmit their calculated energy to the FC where the received samples by the FC could be formulated bŷ where h j i (n) is the channel gain between the i-th SU and the FC in the j-th frequency band in time instant n.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 The FC collects all the measurements in the matrix Y(n) = {ŷ i,j (n)} ∈ R N SU ×N B . After collecting all the measurements, it will determine the value function using DL. Finally, the result of the cooperation will be broadcasted to all the neighbors. The proposed RCNN-based DL network for detecting the availability of the channel is expressed.

Proposed Recurrent Convolutional Neural Network
The proposed Recurrent CNN is depicted in Fig. 2. The proposed network is consist of three convolutional blocks which are followed by a max-pooling layer to reduce the amount of the data entering the LSTM layer. Then, a fully connected layer generates the concise samples for softmax and classification. Besides, each of the convolutional blocks is containing three layers called convolutional layer, batch normalization layer, rectifier linear unit (RELU). In the convolutional layer spatial convolution is performed in 2 dimensions to extract the spatial features of data. According to [17], the size of the spatial filter is set to 3 × 3, which is adequate for spatial feature extraction. Hence, the output of the convolution layer can be denoted by Y conv (i, j) = 2 p=0 2 q=0 X conv (i + p − 1, j + q − 1)W conv (i, j) where W conv represents the weights and X conv denotes the input of the convolutional layer. Besides, the input to the first convolutional block is the matrix of detected energies Y(n). Furthermore, padding is used to maintain the size of the input in the output of this layer and stride is set to 1. Then, each batch is normalized utilizing batch normalization layer. At the end of the convolutional block, the non-linearity is introduced by activation function RELU. This layer is added to handle non-linearity in the input data and enabling the proposed RCNN to classify the non-linear behavior [17]. This non-linear layer rejects the negative data and ferries the positive data without any manipulation.
At the end of each convolutional block, there is a max-pooling layer with a corresponding pool size of 2 × 2 where the maximum is extracted. This layer decreases the computation complexity without remarkable performance degradation.
Then, the decreased samples are inserted in LSTM layer with 200 neurons. The number of neurons in LSTM layer is selected experimentally. At first, the number of neurons was 50, then increase it with steps of 20. Dramatic improvements were observed until 200 neurons while increasing neurons more than 200 was not much useful. In the LSTM layer, the output sequence and hidden sequence are calculated by where x = (x 1 , x 2 , . . . , x t ) is the input sequence, and h = (h 1 , h 2 , . . . , h t ) and y = (y 1 , y 2 , . . . , y t ) are the hidden sequence and output sequence, respectively. Additionally, W is the weight matrix and b is the bias where the subscript determines the position of the weight and bias. Moreover, H is the function of the hidden layer which is responsible for the temporal feature extraction. The hidden function H is the well-known LSTM layer function which exploits the memory cells to remember   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 the temporal information. The implementation of H is organized as follows: where i, f, o, and c are input, forget, output, and cell gates activation vectors, and σ is the logistic sigmoid function and all of them are the same size of the hidden vector h. Moreover, the matrices W ji for j = h, s, c (the matrices from cell to gate) are diagonal matrices. Henceforth, fully connected layer is used to introduce the abstract features to softmax unit and classification. Additionally, in this layer, the weights are multiplied by the input data, and biases are added to the resultant data. Specifically, the output is obtained via W F C X F C + b F C , where W F C and b F C are the weights and biases, respectively and X F C is the input to the fully connected layer. Soft-max layer generate the classification input by utilizing exponential function as exp(xr) i exp(xi) . Thus, the output is lower than 1 and more than 0. The classification layer determines the existence or absence of the PU in the network according to the abstract spatiotemporal features.

Training of the Proposed Network
To train the proposed network, a set of labeled data is generated. In each of the data ensemble, the channel state is changed until the averaging in different channel models is ascertained. Using these labeled data, the weights and biases of the proposed network are estimated to minimize the cross-entropy loss function. The cross-entropy is minimized utilizing stochastic gradient descent with Momentum (SGDM). In order to generate the labeled data, we have changed three main parameters, stochastically. These three parameters are channel conditions, positions of cooperating nodes, the node-wise permutation of gathered data in the input of the proposed NN. Each of these parameters has a significant effect on performance. Theoretically, position permutation will affect the fading coefficients which are encountered by SUs and the fusion center, which will affect the energy measurements report. Moreover, channel conditions will affect the fading factors, too. Additionally, the node-wise permutation of the gathered data will determine the order of the impact of the near or far node on the proposed network. Consequently, we consider all the sources of the stochastic behavior of the problem in the training phase to decrease the error of the classification network.
To evaluate the performance of the proposed algorithm, we have compared the proposed approach with other machine learning based cooperative spectrum sensing including SVM, and DL-based CSS in [12] approaches and HMM based CSS approach as the state-of-the-art schemes. Furthermore, the comparison metrics are the probability of miss detection (P m ) to the probability of false alarm (P f ) and probability of detection in different SNRs and the impact of the number of SUs in sensing error. Sensing error which is considered as the sum of probability of false alarm (P f ) and the probability of the detection (P d ). Additionally, the three probability metrics P m , P f , P d are denoted by the following: In Fig. 3, the comparison of P m to the corresponding P f of the four mentioned schemes is provided. In this simulation number of SUs are 30, and the number of samples in each SU is N ED = 128. Apparently, the proposed approach outperforms others in a higher probability of false alarms, while in a lower probability of false alarm, the result of the DL-based approach is comparable to the proposed approach. Furthermore, the HMM-based approach is utilizing the history of the PU activity while the decisions in two others are independent of history.
The impact of different SNRs on the probability of detection is depicted in Fig.  4. Apparently, the proposed approach is superior to the others in lower measured SNRs. While increasing the SNR will increase the P d , and different sensing approaches getting closer for higher SNRs. The results of the simulation are averaged over 1000 independent runs.
Eventually, the sensing error is represented in Fig. 5, where several cooperating users and number of samples are changed. As depicted, the sensing error in both cases for the proposed approach is more reliable than others. Specifically, two initial results are the lowest sensing error and lowest fluctuations in the sensing error by changing the number of samples and users.