An Accurate, Robust and Low Dimensionality Deep Learning Localization Approach in DM-MIMO Systems Based on RSS

Currently, localization in distributed massive MIMO (DM-MIMO) systems based on the fingerprinting (FP) approach has attracted great inter-est. However, this method suffers from severe multipath and signal degradation such that its accuracy is deteriorated in complex propagation environments, which results in variable received signal strength (RSS). Therefore, provid-ing robust and accurate localization is the goal of this work. In this paper, we propose an FP-based approach to improve the accuracy of localization by reducing the noise and the dimensions of the RSS data. In the proposed approach, the fingerprints rely solely on the RSS from the single-antenna MT collected at each of the receive antenna elements of the massive MIMO base station. After creating a radio map, principal component analysis (PCA) is performed to reduce the noise and redundancy. PCA reduces the data dimension which leads to the selection of the appropriate antennas and reduces complexity. A clustering algorithm based on K-means and aﬀinity propagation clustering (APC) is employed to divide the whole area into several regions which improves positioning precision and reduces complexity and latency. Fi-nally, in order to have high precise localization estimation, all similar data in each cluster are modeled using a well-designed deep neural network (DNN) regression. Simulation results show that the proposed scheme improves positioning accuracy significantly. This approach has high coverage and improves average root-mean-squared error (RMSE) performance to a few meters, which is expected in 5G and beyond networks. Consequently, it also proves the supe-riority of the proposed method over the previous location estimation schemes.

is expected in 5G and beyond networks. Consequently, it also proves the supe-as to be a unique fingerprint of a specific location. Therefore, the localization problem can be solved using pattern recognition, which consists of fingerprint extraction, fingerprint matching, and ultimately location estimation [9].
In this study, we propose a robust and precise localization method using a dimension reduction technique, clustering, and regression, which are accomplished via two modes: the offline mode and the online mode. The extracted RSS samples from the whole area are analyzed during the offline mode. First, the dimensions of RSS samples are reduced using principal component analysis (PCA). The whole area is then split into several sub-areas using a combination of clustering algorithms based on the K-means and affinity propagation clustering (APC) algorithms. Ultimately, a deep neural network (DNN) regression is applied to the RSS samples of each cluster. The accuracy of the model is estimated using a validation dataset. When a new fingerprint is given in the online mode, it is first preprocessed, then its cluster is specified, and finally, its location is estimated. We now summarize our major contributions in four aspects.
1. In the preprocessing step, we apply the PCA technique on all data samples to denoise the RSS sample and extract effective features and reduced unimportant features from RSS vectors. Preprocessing leads to speed up and improve the accuracy of our proposed machine learning-based method since the training time and complexity are reduced significantly with fewer dimensions (features). Also, it helps us to select a proper set of RRHs. 2. In the clustering step, a fast convergence, and initial value independent clustering method relying on a combination of K-means and AP clustering algorithms is proposed. This method reduces latency and computational complexity and helps to improve localization accuracy. 3. We propose a DNN regression for each cluster using all the data of the corresponding cluster to estimate the location more precisely. 4. The performance of the proposed localization method is evaluated in terms of root-mean-squared-error (RMSE) via simulations and compared to the works in [9].
The rest of the paper is organized as follows. Section 4.2 overviews existing techniques and related works for localization. In Section 4.3, we present the system model. The positioning method is proposed in Section 4.4. Simulations results are presented and discussed in Section 4.5. A conclusion is presented in Section 4.6.

User Positioning in Massive MIMO System
In recent years, user positioning in M-MIMO has attracted much attention and there are several research works in this area. The authors in [10], [11], and [12] use angle-of-arrival (AoA) information to estimate UE position in M-MIMO systems. In [13] and [14], the combined information of AoA, angleof-delay (AoD), and time delay is used for user positioning in M-MIMO, where in [13] a mm-Wave M-MIMO system including LOS scenarios is considered. In [15], a compressed sensing approach is proposed to estimate the location of a MT from time-of-arrival (ToA) data recorded at multiple M-MIMO BSs. In [16], an environment sensing method is employed in a highly directional 60 GHz mm-Wave network to estimate MT's positions. However, the localization in all of the above techniques is completed with based on the information obtained from a CM-MIMO system configuration, where the BS hosts an array of antennas. But, these methods are not applicable in DM-MIMO systems, where single-antenna remote radio heads are considered. In [17] and [18], the Gaussian process regression (GPR) ML algorithm is employed based on RSS measurements in DM-MIMO systems. In [3], the performance of several ML algorithms, which are used in conjunction with fingerprint-based MT localization for DM-MIMO wireless systems configurations, is investigated and evaluated. In [9], RSS-based positioning using a machine learning method relies on the affinity propagation clustering algorithm and the GPR algorithm. Among the relevant works, the study of [9] is the most pertinent for our investigation, wherein the focus of the analysis is based on GPR. We expand on the work presented in [9], using data compression and deep learning algorithms to provide higher localization accuracy and less computational complexity.

Machine Learning and Deep Learning for User Positioning
Localization techniques are classified into four main categories: proximitybased, angle-based, range-based, and fingerprinting-based. The proximity-based is the most straightforward technique where the location is provided approximately in a particular radio coverage area based on the locations of the BSs. Therefore, BSs are required, which is not suitable for large areas [19]. The angle-based technique, which is based on the AoA of the received signal, is not efficient in non-line-of-sight (NLoS) situations because it produces a coarse error for positioning [20], [21]. In range-based techniques, one must compute the distance between the MT and at least three BSs. Then the MT location is estimated using trilateration. This can be accomplished through radio signal information received from MTs such as ToA and received signal strength (RSS). The ToA method is known for its complexity because it requires very expensive hardware at the BS, such as high accuracy clocks for time synchronization [22]. In addition, it has low performance in NLoS environments. It has been demonstrated that the RSS method is appropriate in non-urban environments because by increasing the distances the path loss is expected to decrease steadily [3]. This issue can be mitigated when the RSS method is employed in conjunction with a fingerprinting (FP) based method [17].
In a FP-based method, the location of MTs is estimated based on a prerecorded data, called fingerprint, using ML and deep learning (DL) algorithms [17]. Since FP-based positioning methods have a good performance in highly-cluttered multipath environments [17], [23], they can be used in many systems such as WiFi networks [24,25,26]. In addition to received signal information, channel state information (CSI) [27], [28] is used as the position fingerprint. In recent years, the FP-based localization method has attracted significant interest by combining mobile positioning requirements into 5G wireless communication systems due to its broad applicability and high cost-efficiency without any hardware requirement on the MTs [29].
Several machine learning methods including GP methods [17], and more recently, deep learning methods [30], [27] have been applied and investigated for wireless user positioning. However, the proposed methods for WiFi systems [27,31,32] are not applicable for M-MIMO systems because they do not consider the associated inter-user interference. In addition, they concentrate on the downlink, where the MTs estimate their positions by managing the computational cost while in M-MIMO systems, positioning is performed on the uplink, where the BS estimates the MTs' position.

System Description
In the considered single-cell DM-MIMO system ( Fig. 1), there are K singleantenna users that transmit signals to M single-antenna RRHs on the same time-frequency resource. The high-speed front-haul links connect RRHs to a central processor unit (CU). When the RRHs receive signals transmitted by the users on the uplink, individually record their own multi-user RSS values and send them to the CU. The CU handles the multi-user interference and extracts the per-user RSS values from the multi-user RSS values. Then the CU from each user forms an M × 1 RSS vector to perform localization [33], [34]. Details are as follows.

Propagation Model
To explain the uplink of a multi-user DM-MIMO system in more detail, let w k be the symbol vector transmitted by user k with transmission power ρ. If g mk is the flat-fading channel gain between user k and RRH m, the sum symbol vector y m received at RRH m is given by In (1), g mk = q mk √ h mk is a flat-fading channel where q m,k denotes smallscale fading represented by an independent and identically distributed (i.i.d.) zero mean complex Gaussian random variable with unit variance, i.e., q mk ∼ CN (0, 1), h mk is the large-scale fading coefficient, and n m ∼ N (0, σ 2 n I) is the additive white Gaussian noise. Note that the large-scale fading coefficient h mk can be modeled [35] as User Equipment (UE)

Remote Radio Head (RRH)
Computing Unit (CU) where b 0 is the path loss at reference distance d 0 , d mk is the distance between user k and RRH m, α is the path-loss exponent (typically dependent on the environment and the range), and z mk is the log-normal shadowing noise coefficient with 10 log 10 z mk ∼ N (0, σ 2 z ).

Mitigating Multi-user Interferences
For measuring the RSS, we consider the power of the received signal at RRH m which is given by ||y m || 2 according to (1). But we should note that ||y m || 2 at RRH m is in fact the multiuser RSS because the symbol vectors which are transmitted by all K users are combined at RRH m. Consequently, ||y m || 2 cannot be directly used to estimate the position of user k. So the RSS of each user is not separately distinguishable. To overcome this, the symbol vectors w k in (1) should be mutually orthogonal and should be already known at the RRH [17].
So, we need users to transmit an orthogonal set of pilot signals during channel estimation [36]. The RSS p mk of user k can then be obtained from (1) [34] as From (3), we can see that the RSS varies due to small-scale fading and shadowing of the wireless channel. The variation of small-scale fading can be decreased by averaging it over multiple time-slots according to the channel hardening effect [37]. But the shadowing effect, which is position-dependent and therefore depends on the user location, cannot be averaged out [38]. Therefore, the RSS between user k and RHH m, which is obtained from (2) and (3), when converted to dB scale, is given [34] by where p dB 0 = 10 log 10 (ρb 0 ) is the uplink RSS at reference distance d 0 . Once the per-user RSS values p mk , m = 1,…, M and k = 1,…, K, are extracted as above, the CU uplink RSS vector p k is given by which is considered as the fingerprint.

A Clustering and Deep Learning Approach-Based Fingerprinting
An overview of the structure of the proposed localization method is shown in Fig. 2, which consists of two distinct modes: the offline mode and the online mode.
During the offline mode, the system captures the RSS fingerprints from a grid of known location reference points (RPs). Then, each fingerprint is labeled with corresponding location coordinates. The labeled data is divided into training, validation, and testing datasets. In the proposed method, learning is done with the training dataset and the performance is checked with the validation dataset. The accuracy of the localization system is then presented based on the testing dataset. Then PCA is applied for dimension reduction. After that, only a subset of dimensions (features) that have the maximum variance is selected. Therefore, an efficient feature set is produced. In the clustering step, the reduced-dimension training data is divided into several clusters using an efficient clustering method, which is based on K-means and AP. Later a cluster identification is employed for cluster matching and coarse localization. Ultimately, a DNN regression is trained for each cluster based on their similar data distributions. The accuracy of the model is evaluated using the validation dataset. If the accuracy of the proposed model is not sufficient, the clustering and regression parameters are modified in each iteration until convergence is achieved.
During the online mode, we provide the learned model to estimate the unknown locations of the test data. We first input the test data to the preprocessing level to transform the test data and reduce its dimensions. Then, its cluster or regions is identified using the cluster identification algorithm. Finally, the DNN regression of the related cluster is used to estimate the location.

Offline Mode
Let's assume there are L training data points which correspond to different RPs. Therefore, the radio map is formed as where each row p l of radio map P is an M -dimensional fingerprint that corresponds to the training x-coordinates x l and the training y-coordinates y l , l = 1, . . . , L.
Offline mode

Pre-processing
The core step in data mining and machine learning is data pre-processing, which consists of data transformation and noise reduction, and dimensionality reduction with PCA in our method [39]. Each step is explained in detail as follows.

-Data Transformation
In data transformation, each feature value p l,m , m = 1, . . . , M of each training fingerprint p l is standardized [39] using where µ m and σ m are the mean and the standard deviation of the mth feature. With standardization, the training vector p l converts to s l where l = 1, . . . , L, and the radio map P is changed into S ∈ R L×M .
-Dimensionality reduction PCA is employed for denoising and dimension reduction in order to map the standardized radio map S, which is in an M -dimensional space, to an D-dimensional space such that D < M , while the most relevant information is maintained [40]. In PCA, in order to find the principal components where E ∈ R M ×D . Each column of E outlines the PCs, which are orthogonal to each other and in decreasing order. More precisely, the first eigenvector e 1 is the direction that captures the maximum variance of data. The second eigenvector e 2 is the direction that has the greatest variance among those that are orthogonal to the first eigenvector, and so on. Therefore, the low dimensionality radio map is U = (S)E, where U ∈ R L×D and is defined as where l = 1, 2, . . . , L.

Clustering
A clustering algorithm is required to split the whole area into several regions based on the collected RSS data. K-means is a very popular clustering algorithm that is extensively used due to its fast convergence. However, it is sensitive to the initial condition wherein the number of clusters is predefined, and a random set of initial exemplars is selected in advance. Therefore, in K-means, many runs are needed to get a good clustering result. However, it does not guarantee i) that an appropriate initialization will occur during the repetitious running, and ii) unique clustering because we get different results with randomly chosen initial clusters.
In contrast, the affinity propagation (AP) clustering algorithm [41], has the initialization-independent property wherein all RSS samples have an equal chance to be a cluster head (CH). The optimal number of clusters is then obtained by passing iteratively two kinds of messages, named validity and responsibility, to maximize a fitness function until a good set of CHs emerges [41]. Therefore, APC can provide a good set of CHs with high speed. However, it can sometimes fail to converge, particularly for large similarity matrices. Considering the convergence property of K-means and the good performance of affinity propagation, a new clustering method is employed. The APC algorithm is first used to determine the optimal number of clusters and the initial CHs. Then, K-means is employed to create the final clustering results by iteration based on the initial CHs.
As mentioned in [9], the AP clustering algorithm requires two inputs to divide U into clusters: the similarities matrix S Sim and the preference pref , which are defined as follows [41]: where 1 ≤ l, l ′ ≤ L and l ̸ = l ′ . The cluster algorithm used in our research is summarized in Algorithm 1. The validity of clustering is measured by the silhouette which is a well-known measure of how similar a training RSS vector u l , l = 1, . . . , L is to its own cluster (cohesion) compared to other clusters (separation). It is averaged over all training RSS vectors and is defined [42] as where d(u l ) is the average distance between u l and all training RSS samples in other clusters and f (u l ) is the average distance between u l and all training RSS samples in the same cluster. A clustering which has sufficient SI value is regarded as a valid clustering.

Cluster selection
To determine the cluster corresponding to a new data point, a cluster identification based on KD-tree is used, and its accuracy is evaluated based on the valid clustering. For this purpose, a subset of clustered training RSS samples is selected as a validation dataset, and it is supposed that their cluster ID are unknown. This subset is used to obtain the accuracy of the cluster identification algorithm. First, the KD-tree algorithm, which finds similar data quickly, is employed for each cluster. Then the KD-tree uses each RSS sample of the validation dataset to find their K nn nearest neighbors from each cluster; among them, the one that has the minimum distance is selected. Therefore, the predicted cluster ID of the validation RSS sample will be the cluster ID of its nearest neighbor.
To estimate the accuracy of clustering, we need to determine the error of cluster membership by comparing the predicted and the real cluster ID of the validation RSS samples. A threshold is considered for the accuracy of the cluster identification algorithm. If the accuracy of the valid clustering is less than the threshold, that clustering is ignored. Otherwise, its validity check and the number of clusters of the clustering which has the highest validity and sufficient accuracy requirements for cluster identification is selected as the best number of clusters.

Require:
Preference value pref The training matrix U = [u 1 , u 2 , . . . , u L ] T 1: Compute similarity matrix S Sim 2: Run APC algorithm to get CHs C = {C 1 , C 2 , ..., C T } 3: Calculate the number of clusters T and the initial centers for the K-means clustering 4: Run K-means clustering

Regression
In order to have very precise localization, we use a DNN to solve the regression problem in each cluster. Fig. 3 shows the full connected multi-layer neural network structure in our method, which consists of an input layer, hidden layers, and an output layer. The input layer consists of artificial input nodes and receives the initial data for further processing. After dimension reduction, each RSS vector is composed of D RSS values. Therefore, the number of input nodes is equal to the number of dimensions of each RSS vector. The output layer produces the required output. The position information of the MT is set as output. Therefore, the number of output nodes is two. The hidden layers are between the input and output layers, where the transitional computations are performed. Each hidden layer uses the output of the previous layer to perform a non-linear operation, which is defined as: where W k is a fully connected weight matrix that represents all the connections between each node of the (k − 1)th layer and each node of the kth layer. b k is the bias vector of the kth layer, h k−1 represents the output from the previous layer. The weights and biases in a neural network are initially set to random values but the model is trained using the back-propagation (BP) method and the Adam optimizer [43] to minimize the loss function and the network parameters (i.e., weights and biases) are updated iteratively until convergence is achieved. ϕ(.) is the activation function and in our case, we use a Rectified Linear Unit (ReLU) [44] (i.e., ϕ(x) = max(x; 0)) in the hidden layers and a linear function (i.e., ϕ(x) = x) in the output layer, since the localization is a regression problem.

Online Mode
In this phase, the position of a test user whose location is unknown is estimated. Let us suppose there areL test users. After using dimensionality reduction, theL × D testing matrixÛ is used to estimate theL × 2 location coordinates (x,ŷ). For each test user dataû, the process of location estimation is described below.

-Step 1: Cluster selection
The cluster ID t of testing data pointû is determined using the cluster selection algorithm.

-Step 2: Location estimation
The DNN regression model of cluster t is used to estimate the location (x, y) of fingerprintû.

Performance Evaluation
In this section, we compare using simulations, the location estimation using GPR [17], APC-GPR [9], and the proposed method in a DM-MIMO system with M = 36 single antenna RRHs, L = 400 training locations, andL = 16 test users. Training users are distributed every 10 m in a grid configuration over the whole area of 200 m × 200 m. Test users are distributed in a random configuration, as shown in Fig. 4. For training, the RSS matrix P is generated using (4) with user transmit power ρ = 21 dBm, reference path loss b 0 = −47.5 dB and different shadowing noise variance σ 2 z = 1, 3, 5 dB. Also, we set the path loss exponent to α = 0 for 0 ≤ d mk < 10 m, α = 2 for 10 m ≤ d mk < 50 m, and α = 6.7 for 50 m ≤ d mk , according to the 3GPP urban micro propagation model [45]. The PCA technique is applied on P to reduce the dimensions of RSS vectors and to generate the transformed RSS radio map U. Then the clustering algorithm is applied to cluster all the training data using a similarity matrix and preference values which are generated by (10). When the optimal number of clusters is obtained, K-means algorithm is run for 100 times where the number of clusters is equal to 6. Also, the KD-tree is evaluated with different K nn . Finally, the proposed DNN model is trained with different hidden layers and activation functions. The root-mean-squared error (RMSE) between the real coordinates (x l , y l ) of the test users and their estimates (x l ,ŷ l ) is considered as a performance metric, which is defined as The RMSE, is averaged over the Monte-Carlo realizations. Lower RMSE values indicate better location estimation performance.

Preprocessing
For an efficient and accurate clustering algorithm, we need to extract the essential RSS values received by RRHs by reducing the noise and the high dimensionality of the data. As mentioned, the PCA is employed in the offline mode to extract the more important feature set from the original RSS data set, while assuring the same level of positioning accuracy. Also, each RSS sample vector in the online mode is transformed into its low-dimension representation and is then compared with the corresponding low-dimension radio map. Using PCA, the optimal number of components that capture the greatest variance in the data are found. In this work, a 98% variance criterion is considered. Note that different variance thresholds may be chosen depending on the applications specific requirements. Fig. 5 shows how the variance is captured by principal components. We see that the first three components explain the majority of the variance in our data. From Fig. 6, we can see that with the first 27 components, 98% of the variance is contained. Therefore, the number of principal components is set at 27, and the dimension of the original data is reduced. The number of components needed to explain variance Fig. 6: The cumulative sum of PCA components' variance. The first component already contains more than 20% of the total variance, 27 components take into account 98% of the RSS.

Clustering
By running AP, the optimal number of clusters is equal to 6, which is considered in the k-means clustering as the input. In case there are 6 clusters, we have a maximum of silhouette value. Also, the KD-tree algorithm with K nn = 3 is used for cluster identification as considered in [9]. Fig. 7 shows the average RMSE of the test user's location estimation as a function of different shadowing noise variance ranging from 1 dB to 5 dB for different methods. We can see that the average RMSE is increased by increasing the shadowing noise variance in all methods. When we apply PCA in GPR and AP-GPR methods in [17] and [9] respectively, although a similar increase in average RMSE is observed by increasing the shadowing noise variance, the methods where PCA is used have lower average RMSE than those with no PCA and also the proposed method in the current study has a significantly lower average RMSE compared to the others. Also, we can see that the proposed method has a superior performance compared to the previous methods. Using PCA reduces noise and the number of dimensions of the data, that leading to increase stability, and reduces the computational complexity. 6KDGRZLQJQRLVHG% $YHUDJH506(P *35>@ 3&$*35 $3&*35>@ 3&$$3&*35>@ 3URSRVHGPHWKRG Fig. 7: Average RMSE of using GPR [17], AP-GPR [9] and the proposed methods with M = 36, when the shadowing noise variance is 1, 3, and 5 dB and L = 400.

Conclusion
We proposed an efficient and low dimension FP-based method using PCA, APC and k-means, and DNN to estimate the user's location based on RSS values in a DM-MIMO system. In the proposed method, after preprocessing the data such as denoising and dimension reduction, the whole testbed was first divided into clusters using the AP and k-means algorithms, which reduces the computational cost of online positioning. AP was chosen for clustering due to its initialization-independent property and a better selection of CHs and kmeans was combined with AP due to its great convergence. Then, KD-tree was used for cluster identification to allow for a quick finding of the related cluster. Also, DNN was applied for further location estimation within each cluster. The proposed method was compared to the previous works in terms of localization accuracy. Numerical results have justified our proposed localization system over previous schemes. Also, through simulations, we showed that increasing the shadowing noise variance decreases localization performance.

Declarations -Funding: It is provided by Natural Sciences and Engineering Research
Council of Canada (NSERC). -Conflicts of interest: There is no Conflicts of interest.
-Availability of data and material: Not applicable -Code availability: Not applicable