Deep Learning-Based Beamforming for Millimeter-Wave Systems Using Parametric ReLU Activation Function

Beamforming design is a crucial stage in millimeter-wave systems with massive antenna arrays. We propose a deep learning network for the design of the precoder and combiner in hybrid architectures. The proposed network employs a parametric rectified linear unit (PReLU) activation function which improves model accuracy with almost no complexity cost compared to other functions. The proposed network accepts practical channel estimation input and can be trained to enhance spectral efficiency considering the hardware limitation of the hybrid design. Simulation shows that the proposed network achieves small performance improvement when compared to the same network with the ReLU activation function.


Introduction
5G wireless systems operate in mm-Wave bands to utilize their wider bandwidth. Largescale antenna arrays are used to counterfeit propagation loss associated with mm-Wave bands. This configuration enables higher data rates and system capacity which both have growing demands in every aspect of our life. Mm-Waves have a short wavelength, which enables implementing more antennas into an array without increasing its dimension (enable 1 3 massive MIMO) which increases the number of data streams in a cell. However, it is also expensive and power-consuming to implement a dedicated RF chain for each antenna. Recently, hybrid architecture has been receiving growing interest and consideration as a practical cost-effective solution to this problem. By splitting the beamforming process into analog and digital, hybrid beamforming enables the RF chain number to be less than the antenna number and solves the power and cost problem [1].
However, hybrid beamforming is also a complex design problem that has prominent hardware limitations and challenges. The analog beamformer architecture using phase shifters is limited by the constant modulus constraint [2][3][4]. The imperfect CSI requires a channel estimation technique and joint optimization of multiple variables that yield a non-convex problem. In this paper, we propose a using PReLU (Parametric Rectified Linear Unit) activation function in a deep learning technique to produce the optimal analog beamforming parameters while taking SNR and channel estimations as input. Our results show some enhancement over the same technique when non or other activation functions are used. In Sect. 2, we will discuss existing approaches and their limitation in the proposed scenario followed by the problem statement and system model in Sect. 3. Section 4 describes the proposed technique and algorithm and its simulation results are discussed in Sect. 5. Section 6 concludes the paper.

Related Studies
Several studies attempted to overcome hybrid beamforming design challenges [2][3][4][5][6]. An orthogonal matching pursuit (OMP) and greedy-based method were proposed in [5], where a dictionary containing the array responses is constructed and used to select the precoders and combiners. However, to construct the dictionary, departure and arrival angles must be identified which limits the analog beamformer to a limited pre-defined codebook. The manifold optimization technique was proposed for the analog beamformer optimization in [2,3]. The alternating minimization and phase extraction methods are applied to determine the analog beamformers. In [4], the authors considered a hybrid beamforming architecture in which digital (baseband) beamformer is concatenated to an RF (analog) beamformer that is implemented using phase shifters. The preceding studies implemented greedy or optimization-based techniques which raises the computation complexity and time. Besides, perfect CSI is considered in all of these algorithms.
Deep learning (DL) is extensively studied in massive MIMO and mm-Wave systems due to its ability to learn different models and solve optimization problems with less complexity and time. Deep learning is an effective approach to solve intractable problems [7]. Recently, deep-learning proved its capability to learn wireless channel's complex features [8]. Deep-learning is more robust to the practical imperfect CSI when compared to the methods assuming perfect CSI as in [2][3][4][5]. Most of the conventional beamforming algorithms require time-consuming, iterative, and high-complexity computations as in [3,4]. On the other hand, deep neural networks (DNN) offer less complexity when solving the same optimization problems, and low computation time when deployed online after offline training [9].
DL-based techniques received a great deal of attention in communications society as a solution to several popular challenges such as coordinated beamforming for highly mobile mm-wave systems, estimating the channel parameters and direction of arrival (DOA), antenna selection, analog beam selection [10][11][12][13][14]. DL-based techniques have also been considered for the hybrid beamforming design problem in [15][16][17][18].
The work in [15], introduced a convolutional neural network (CNN) for the hybrid joint design problem in mm-Wave and massive MIMO systems considering precoding and combining stages. The channel parameters are the input to the network and the optimal analog and digital beamformers are the output. The network was developed only for the single-user scenario. In [16], the authors proposed a DL-based beamforming technique, they introduced a beamforming neural network (BFNN) which after training, learns how to maximize the spectral efficiency considering practical CSI, the proposed network is deployed for a single-user setting and considered the precoder design only and strongly relies on the channel matrix perfectness. In [17], the authors also attempted a DL-enabled technique for the hybrid beamforming design problem, in which the hybrid precoder is selected through training based on the neural network for optimizing the precoding process of the mm-Wave massive MIMO system, such dense multiple fully connected layers can increase the complexity and time, also the study only optimizes the precoder with fixed combiners. In [18], a deep learning framework was provided for hybrid precoding showing enhancement in spectrum efficiency and bit error rates, but the design of the combiner was not considered and perfect CSI was assumed and the constant modulus constraint was not met which show in Fig. 1. Moreover, some of these works also adopted the separation of channel estimation and precoding which doesn't suit the practical settings.

Problem Definition and System Model
In this paper, we examine a single user downlink in a mm-Wave communication system with multiple antennas. We investigate the analog beamformer design with one RF chain. The optimal baseband beamformer has a closed-form solution and can easily be obtained [2,4]. The output dimension can be increased from N T to N RF N T for the N T × N RF analog beamformer matrix to investigate the hybrid beamforming problem with multiple RF chains.
Let s be the transmitted symbol vector from the base station (BS) equipped with N T antennas to a mobile station (MS) equipped with one receive antenna, and let {|s| 2 } = 1. Because only one RF chain is considered, the baseband precoder f BB is a scalar. At the transmitter, the symbol vector is multiplied by baseband precoder f BB and then by an Nt × 1 analog precoder F RF . The final transmitted signal can be written as Eq. 1. At the receiver, the received signal is given by Eq. 2.
where n denotes the additive white Gaussian noise (AWGS) and h H ∈ ℂ 1×N T is the channel vector between the base station and the user with The widely used Saleh-Valenzuela mm-Wave model is considered in this architecture where N c clusters of N ray paths are assumed in the channel model [19][20][21]. For h H with one LoS path and (L − 1) NLoS paths, which is given by Eq. 3.
where l denotes the complex channel gain of the lth path, and the N T × 1 response vector of the antenna array at the BS is denoted by a t l t . The departure angle of the lth path is l t ., The optimization objective is the spectral efficiency, for the proposed system, spectral efficiency is calculated by Eq. 4 When the constant modulus constraint, Then, the beamforming optimization problem for F RF is given by Eq. 5.
where = P 2 expresses the signal-to-noise ratio (SNR). It is assumed that est = , where est expresses the estimated SNR since the SNR can be accurately estimated compared to the CSI.

Deep Learning-Based Beamforming with Parametric ReLU
The conventional method of replacing the analog beamformer with a multi-layer neural network is not suitable when considering the analog phase shifters architecture [11,22,23]. As specified in Eq. 2, the acquired signal is correlated to the analog precoder, thus it can't be used as the input as some existing studies did [10,11,22]. Moreover, since most of the deep learning frameworks don't support complex output, it is difficult to guarantee that the output (F RF ) meets the constant modulus constraint. To this end, we introduce a deep learning-based network that takes the above-mentioned issues into consideration. The proposed network takes the SNR estimation est and the channel estimation h est as inputs and outputs the optimum analog beamforming F RF . A Lambda layer is implemented to guarantee that the output (F RF ) is satisfying the constant modulus constraint. A loss function that is correlated to the objective in Eq. 5 is used to train the proposed network. The loss functions can be given by Eq. 6.
where N expresses the training samples, and h n , n , and F RF , n expresses the CSI, SNR, and analog beamformer output correlated to the nth sample. Minimizing the loss resembles an increase in spectral efficiency.
In the training stage and based on the system model, channel samples and noise in addition to the transmit pilot symbols are simulated. The channel estimation method in [6] is then applied to estimate CSI. Ultimately, the channel estimation h est and the SNR estimation est are fed to the network as the input.
It is assumed est = as mentioned in Sect. 2. Then, the proposed network attempts to minimize the loss function defined in Eq. 6 to generate the analog beamforming F RF,n . The channel parameters and SNR values (perfect CSI) are simulation-generated, they are used in the computation of the loss function, as shown in Fig. 2. Using the perfect CSI in the computation of the loss function and using the channel estimation as the input of the proposed network enforces it to learn how to achieve the optimum spectral efficiency with perfect CSI and stability against the estimation errors.
The channel estimation is applied again in the deployment stage then fed to the proposed network to generate the optimized beamformer. The perfect CSI is not required in the deployment stage and is only used during the offline stage to calculate the loss function. In the deployment stage, the proposed network parameters are updated and fixed after training.
Since the channel estimation (the input) is complex-valued while the proposed network is a real-value, the imaginary and real domains of the channel estimation are concatenated together and further with est to generate a (2N T + 1) × 1 real-value input vector followed by three hidden layers with 256, 128, 64 neurons in each. Algorithm 1 explains the proposed network workflow.
Activation functions have a significant role in deep learning networks, it affects its ability to converge and the convergence speed. The neural network's computational efficiency, Fig. 2 The proposed network stages accuracy, and final output are also determined by the activation function. The activation function is assigned to every neuron and the function decides whether the neuron is activated or not. It is crucial to have computationally efficient activation functions since they are processed for each neuron in the network for every data sample. Furthermore, backpropagation is the most used method for model training which puts more computational effort on the activation function. The linear activation function derivative is constant and irrelevant to the input which prevents the use of backpropagation to train the model. Furthermore, if linear activation functions are used, the last layer will always be a linear function of the first layer regardless of how many layers are implemented, therefore, when linear activation functions are used, the neural network operates as one layer.
Recent deep learning networks employ non-linear activation functions, which enable the model to create the desired mapping between the inputs and outputs of the network, which are essential for the model to learn complex data. Non-linear functions have a derivative that is relevant to the inputs and therefore allows backpropagation. They allow multiple layers to be stacked to create a deep neural network.
In this paper we used the parametric rectified linear unit (PReLU) function [24], this function overcomes several disadvantages in previous functions. Sigmoid and TanH functions suffered from the vanishing gradient problem where input's very high and low values yield no change to the prediction, outputs not zero centered, and computationally expensive. Rectified linear unit (ReLU) function suffered from the dying ReLU problem, the backpropagation cannot be performed and the network stops learning when its inputs are negative or reach zero. The dying ReLU problem is tackled by the Leaky ReLU function but, still, cannot give consistent predictions for negative inputs.
PReLU doesn't suffer from the dying ReLU problem and can perform well with negative inputs and allow backpropagation, it also presents the negative part as an argument to provide consistent predictions for negative inputs as in Eq. 7.
Notice that x i is only one entry from the vector and the subscript i in α i allow the nonlinear activation to change on different channels which is known as channel-wise PReLU, if we decide to learn the same α for all features, channel-shared PReLU is used. Accordingly, α defines how the function will perform. We tested different activation functions and found that PReLU achieves the best results. PReLU improves model fitting with no extra complexity as well. The results from [24] show that PReLU enhances the performance of small and large models without adding computational cost.
Considering the online stage only, the proposed network complexity can be calculated as the number of floating-point operations (FLOPs) in all layers given by (2N I − 1) N O [25], where N I and N O refers to the input and output dimensions respectively. The FLOPs number of the proposed network is about 0.14 million when N T = 64. For comparison, the complexity of conventional HBF algorithms in [3][4][5] is about 0.26 million considering the same number of N T and the complexity of the same network with traditional ReLU is also 0.14 million.
The PReLU activation function is associated with an initializer, regularizer, a constraint, and shared axes option. The initializer allows us to determine how the α weights are initialized, we tested with the default α = 0 and α = 0.25, the value that authors in [24] recommended. We found that output is better with the default value and considered this value in this study. the regularizer controls weight fluctuations by applying penalties to outliers, authors in [24] suggested that some regulators may force α to certain values and sways PReLU towards ReLU or Leaky ReLU, we didn't use a regularizer. Also, network parameters can be constrained to a fixed limit during training, we didn't use any constraints so the activation function doesn't be monotonic. Shared axes enable sharing axes over space and set to none in the proposed network.

Simulation and Results
In our simulation, a uniform array of 64 transmit antennas N T are deployed at the BS spaced with half-wave. The channel model in [5] is used and the same parameters as those in [18] are applied. The channel estimation algorithm in [6] is employed for obtaining h est . The Adam optimizer and PReLU activation function are used. The network layers, the output shape (dimension), the number of trainable parameters in every layer, and the activation function are listed in Table 1. A batch normalization layer is implemented before every layer to enhance the convergence. Eventually, the Lambda layer ensures that the constant modulus constraint is met at the output. The number of layers and neurons in each layer and the samples used in the offline training stage are set as in [16]. Assuming the number of estimated paths is correct (L est = 3), Figs. 3, 4, and 5 show the spectral efficiency (SE) versus signal-to-noise ratio (SNR) with three different estimation levels, defined by three pilot-to-noise power ratios (PNRs), i.e., − 20 dB, 0 dB,

Conclusion
In this paper, it has been shown that the PReLU activation function can enhance spectral efficiency with small margins over the traditional ReLU function. The same hyperparameters, datasets, and architecture were used in both scenarios and only the activation function  changed with α = 0 and 0.25 and opted for zero value as it showed better results. PReLU seems to be more significant for large datasets than small datasets where the effects of the vanishing gradient and dying ReLU problems can be significant. Therefore, PReLU can be considered for large and deep networks to gain marginal performance improvements over traditional ReLU at no additional complexity.
Funding There is no funding.

Availability of Data
The data will be available under your request.

Code Availability
The code will be available under your request.

Declarations
Conflict of interest There are the authors of manuscript titled as "Deep Learning-Based Beamforming for Millimeter-Wave Systems Using Parametric ReLU Activation Function" and we confirmed that there are not any conflict of interest between authors.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Tarek Abed Soliman received the B.S. degree in electronic and communication engineering from the Delta Higher Institute of Engineering and Technology, Egypt, in 2012. He is currently pursuing the M.S. degree in electrical engineering with Menoufia University, Egypt. He is also working with Vodafone Egypt, for more than five years, as a Technical Project Coordinator. His research interests include massive MIMO, mmWave communications, hybrid beamforming, and network resource allocation for 5G networks.