An Improved and Low-dimensional Fingerprint-based Localization Method in Collocated Massive MIMO-OFDM Systems

Localization has drawn significant attention in 5G due to the fast-growing demand for location-based service (LBS). Massive multiple-input multiple-output (M-MIMO) has been introduced in 5G as a powerful technology due to its evident potentials for communication performance enhancement and localization in complicated environments. Fingerprint-based (FP) localization are promising methods for rich scattering environments thanks to their high reliability and accuracy. The Gaussian process regression (GPR) method could be used as an FP-based localization method to facilitate localization and provide high accuracy. However, this method has high computational complexity, especially in large-scale environments. In this study, we propose an improved and low-dimensional FP-based localization method in collocated massive MIMO orthogonal frequency division multiplexing (OFDM) systems using principal component analysis (PCA), the aﬀinity propagation clustering (APC) algo-rithm, and Gaussian process regression (GPR) to estimate the user’s location. Fingerprints are first extracted based on instantaneous channel state information (CSI) by taking full advantage of the high-resolution angle and delay domains. First, PCA is used to pre-process data and reduce the feature dimen-sion. Then, the training fingerprints are clustered using the APC algorithm to increase prediction accuracy and reduce computation complexity. Finally, each cluster’s data distribution is accurately modelled using GPR to provide support for further localization. Simulation results reveal that the proposed


Introduction
For supporting users with high quality of service, recognizing their location is essential.Therefore, location information has recently become an important characteristic to drive location and context-aware services in wireless communications [1].Nevertheless, it is a notoriously challenging problem to provide precise and reliable location information of user equipments (UEs) by using multipath propagation in wireless communications [2].
Currently, with the advent of the fifth-generation (5G) of wireless communications, multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems have received great attention from the localization community due to their potential for improving user-localization accuracy [3].Indeed, by employing a very large number of antennas at the base station, massive MIMO-OFDM, besides improving spectral and energy efficiency [4], can obtain a higher multipath resolution in the angle and delay domains to provide high-accuracy localization for location-based services [3].Furthermore, the resilience against small scale fading is provided through processing measurements across the massive array [5].Therefore, the potential of massive MIMO-OFDM to support location-based services is one of the main economic drivers of 5G wireless communications [6].
Global positioning system (GPS) is the most well-known localization technique.Although it provides a precise estimation of mobile terminal's (MT's) position, it suffers a loss of accuracy in areas where there is no direct line of sight (LOS) between the transmitter and receiver [7].Consequently, there are range-based localization methods [8,9,10,11], which are based on radio signal information received from MTs such as angle-of-arrival (AoA), time-of-arrival (ToA), and received signal strength (RSS).However, the AoA-based methods deal with the non-line-of-sight (NLOS) error [8]; in ToA-based methods, the base stations (BSs) need to be synchronized [9] and using RSS has coarse range estimation error in complex environments [12].
Another approach is fingerprinting (FP), which has achieved extensive attention for localization in recent years due to its promising performance in complex multipath environments.It is flexible and can be used in many systems such as WiFi networks [13] and in systems where channel state information (CSI) [14] is used as the location fingerprint.Unlike traditional methods, it uses massive data to train a model and then it is employed for localization [3].For this purpose, many machine learning (ML) and deep learning (DL) algorithms are used.However, the ML and DL algorithms often have to deal with data complexity due to the high dimensionality of the data.Some solutions such as using dimensionality algorithms have been proposed to reduce complexity [15].By considering all benefits of the massive MIMO-OFDM system and the FP methods, integrating 5G and fingerprints can be a good solution for localization in rich scattering environments.
Localization with massive MIMO is still in its nascent stage.In [16], AoA is estimated precisely in massive MIMO systems employing very large uniform rectangular arrays (URAs).The authors in [17] consider the combined information of time delay, angle of delay (AoD), and AoA for localization of MTs in a massive MIMO system.In [18], a compressed sensing approach is proposed to determine MT's location directly from data acquired, such as ToA information at multiple massive MIMO BSs.The Gaussian process regression (GPR) method is employed in [19] to estimate location of users using a vector of RSS, which is considered as a fingerprint in a DM-MIMO system.A fingerprinting method is presented in [20], wherein an angle-delay channel power matrix (ADCPM) is first extracted as a fingerprint.Then a system is employed for clustering the fingerprints with a mathematical model, such as the joint angle-delay similarity coefficient (JADSC).However, efficiency is lost because the effective range of the JADSC is dependent on the scatterers' density.
In the present study, we consider a collocated massive MIMO-OFDM system where the BS is equipped with a large array of antennas to serve single antenna UEs over their coverage area.We propose a machine learning localization method using the CSI of collocated massive MIMO-OFDM system to achieve accurate localization resolution.For this purpose, the fingerprints are extracted from the known channel estimations.Then a combination of principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) is used to pre-process data, reduce the dimensions of the data features and visualize the data.Later, an affinity propagation clustering (APC) and Gaussian process regression (GPR)-based fingerprinting positioning system is presented, which consists of two phases: the training phase and the online positioning phase.In the first phase, a large area is divided into small clusters using an optimal clustering method based on the APC algorithm.The data distribution within each cluster is precisely modeled using the GPR method.Then, in the positioning phase, the user's cluster is first specified by a cluster identification algorithm based on a multi-layer perceptron (MLP) neural network.Its location is estimated using the GPR model of that cluster.
The remainder of this paper is structured as follows.In Section II, we present the system model of collocated massive MIMO-OFDM.In Section III, we describe the proposed machine learning localization method in detail.The results obtained through simulations are shown in Section IV.Finally, conclusions are provided in Section V.

Massive MIMO-OFDM Channel Model
In this section, we consider the uplink of a multi-user collocated massive MIMO-OFDM system.In this system, we aim to localize the K single-antenna UEs, which are randomly distributed in the coverage area.The UEs transmit signals to the BS, which is equipped with M antennas in the form of a uniform linear array (ULA), through F ≫ 1 different scattering paths, as shown in Fig.According to Fig. 1, the wireless signals are propagated through multiple scattering paths with AoA θ f,k ∈ (0, π), and the distance d f,k between the UE's antenna and the first receive antenna of the f th path.Assume that at this multipath wireless channel, the CSI is known at the BS through the uplink channel.The channel impulse response (CIR) vector associated with the f th path of the kth user is given by [20] where ω f,k ∼ CN (0, σ f,k ) is the complex attenuation of the f th path and e(θ) ∈ C M ×1 is the array response vector related to the AoA, θ, and is given [20] by where λ and d represent the carrier wavelength and the space between two antennas, respectively.For OFDM systems, the channel frequency response (CFR) of the nth subcarrier can be defined as the summation of the time domain CIRs with different delays [20] as where Ts ⌉ and Nsc indicates the temporal propagation delay corresponding to the f th path.
v is the ToA of each path, v is the speed of light, T s and N sc are the sample interval and the number of subcarriers in the OFDM system, respectively.Then the overall CFR matrix is defined as

Fingerprint Extraction
For creating a fingerprint, it is required to extract several characteristics that are constant in the FP method.Therefore, wide-sense stationary features from the instantaneous CSI are considered as a fingerprint.Discrete Fourier transform (DFT) operations are employed to establish a simple mapping from the CFR matrix to a sparse structure [20], and the angle-delay channel power matrix (ADCPM) is given as where F ∈ C M ×M indicates a phase-shifted discrete Fourier transform (DFT) matrix represented by Nsc and N g is the number of guard subcarriers.The left multiplication operator, F H , and the right multiplication operator, G * , cause the frequency domain CFR map to the angle and the delay domain, respectively.The complex gain related to the ith AoA and the jth ToA is represented by the (i, j)th element of the ADCPM fingerprint which is define as where ⊙ denotes the Hadamard product so that As shown in Fig. 2, the ADCPM represents the AoA, the ToA, and the channel power of each path related to the scattering environment of user k.

Fingerprinting Localization Method Based on Clustering and Regression
Fig. 3 shows an overview of the architecture of the proposed localization method.The ADCPM is extracted as a fingerprint from the channel estimation result known to the BS.The remaining structure of the proposed location estimation method consists of two phases: the training phase and the estimation phase.
In the training phase, the training data are collected from a grid of known location reference points (RPs).Then, the fingerprints are labeled with their corresponding location coordinates, and the original dataset is created.The proposed method is based on four principal component blocks: i) pre-processing, which is used to standardize, reduce dimensions of the features, and visualize data, ii) clustering, where similar data are grouped and stored in the database, iii) cluster identification, which is employed for coarse localization, and iv) regression, where an accurate model is created for each cluster based on their similar data distribution.We will describe each block in detail in the following subsections.
During the estimation phase, we randomly select some unknown locations to collect the testing data.Similar to the training phase, the collected data is processed using the same method to generate fingerprints.Then, its corresponding cluster (cluster ID) is first determined using a cluster identification algorithm.Accordingly, the small region in which the new user is most probably located is identified.Lastly, accurate position estimation is applied using the GPR model of the corresponding cluster to improve the accuracy of the location estimation.

Offline Phase
Let's assume there are L training RPs.The ADCPM fingerprint of each RP is first converted to a one-dimensional array such that we have Therefore, each sample r l has B = M × N g attributes, i.e., A b , b = 1, . . ., B, and the radio map is defined as where each row of R indicates a B-dimensional training vector r l related to the training x-coordinates x l and the the training y-coordinates y l , l = 1, . . ., L.
Therefore, the corresponding L × 1 training x-coordinates x and L × 1 training y-coordinates y are defined as follows

Pre-processing
Data pre-processing is the core stage in data mining and machine learning.Data transformation, dimensionality reduction, and data visualization are well-known techniques in this step [21].

-Data transformation
Standardization is the main pre-processing step in data mining, where attributes values are standardized from different dynamic ranges to a particular range [21].Standardized datasets learn the training model faster and have better quality, efficiency, and accurate clustering results [22].
Based on the nature of the datasets for the analysis, it is essential to select an appropriate standardization method.In our case, for each training vector r z , standardization of each value, r l,b , b = 1, . . ., B, of attribute where µ b and σ b represent the mean and the standard deviation of attribute A b .After standardization, the training vectors r l converts to s l where l = 1, . . ., L, and the radio map R is changed into S ∈ R l×B .

-Dimensionality reduction
The number of attributes that describe the dataset is defined as its dimensionality.These dimensions are represented as columns.Dimension reduction involves reducing the number of these columns to obtain a reduced or "compressed" representation of the original data, especially with the presence of curse dimensionality.If the columns are correlated, there is some redundant information that affects the training model results [23].Therefore, it is crucial to use dimensionality reduction techniques to avoid overfitting and reduce the model's complexity.In our case study, we applied principal components analysis (PCA) for dimensionality reduction.
PCA is a well-established method for dimensionality reduction.PCA is an orthogonal linear transformation that maps the given dataset R present in a B-dimensional space to a D-dimensional space such that D < B, while retaining the most relevant information [24].Before clustering, using PCA is a powerful method for clustering high dimensional datasets.The procedure of PCA is explained in the following.After data standardilation, which was explained in the previous section, i.e., the data transformation section, the covariance matrix of radio map S is computed by where Σ ∈ R B×B .In order to find the principal components in the new feature space, we need to compute the eigenvalues λ i and eigenvectors e i of the covariance matrix Σ, satisfying Σe i = λ i e i .For computing eigenvalues Since e i is a non-zero vector, ( 12) can be equal to zero if det(Σ − λ i I) = 0. Let I eigenvalues in descending order form the diagonal matrix Λ: To discover the eigenvectors of Σ using eigenvalue decomposition, the matrix of eigenvectors is defined as where E ∈ R B×I .With the D largest eigenvalues of Σ, the eigenvector matrix is defined as where E ∈ R B×D and its columns represent the principal components (the new dimensions) which are orthogonal to each other and arranged in decreasing order of variance.More precisely, the first eigenvector e 1 is the direction in which the data varies the most, the second eigenvector e 2 is the direction of greatest variance among those that are orthogonal (perpendicular) to the first eigenvector, and so on.The last step is transforming the standardized radio map to new dimensions radio map which is computed by where N ∈ R L×D .

-Data visualization
The t-distributed stochastic neighbor embedding (t-SNE) algorithm is an innovative probabilistic method for data visualization.It is well known in machine learning due to its remarkable ability to transform high dimensional data to lower dimensions by preserving the neighborhood structure of the dataset [25,26].If there are L standardized training data points, in the t-SNE algorithm the similarity of data point s l to s l ′ , where 1 ≤ l, l ′ ≤ L and l ̸ = l ′ , is a conditional probability given [25] by p l ′ |l is high for nearby data points, whereas it will be relatively small for widely separated data points.σ l is associated with a predefined input parameter P erp known as "perplexity" and can be loosely interpreted as the number of effective neighbors that each data point has and is defined [27] as where H(p l ) is the Shannon entropy, which is given by Then the L × L similarity matrix P High in the original high dimensional space is formed and its entries are defined [25] as The t-SNE algorithm tends to learn a D-dimensional map {n 1 , . . ., n L }, n l ∈ R 1×d of the original data that reflects the similarities p l,l ′ as well as possible.For this purpose, a Student t-distribution with one degree of freedom is used.Using this distribution, a L × L similarity matrix Q Low in low dimensional space is defined whose entries are given by The t-SNE for finding the projections of the input data s l in lower dimension as n l , employs a gradient-based technique to minimizes the Kullback-Leibler distance, which is defined as the cost function between P High and Q Low [25] and is given by The gradient of the Kullback-Leibler distance between P High and the Student-t based joint probability distribution Q Low is given [25] by

Clustering and Clustering Validation
The training data clustering and cluster validation procedure are presented in Fig. 4.After preprocessing, let us define N as the compressed radio map whose D-dimensions rows are n l , l = 1, . . ., L. First a clustering algorithm is employed.Then, the quality of clustering is evaluated by a cluster validity index.In our method, we consider a different clustering algorithm which is explained in the following.

-Affinity propagation clustering
The affinity propagation clustering (APC) algorithm [28] divides the training fingerprints into clusters by allocating each fingerprint sample an equal chance to become a cluster-head (CH) [29].Let us assume that the radio map N obtained after preprocessing on the training phase is represented as follows where, each row of N is the fingerprint of the lth RP.
To divide the training fingerprints into several clusters, the affinity propagation clustering (APC) algorithm [28] is employed by allocating each fingerprint sample an equal chance to become an exemplar.In K-means clustering, the number of output clusters and the corresponding random Data for process

Algorithm results
Fig. 4: The pre-processing scenarios.
set of initial exemplars must be identified in advance [29].Therefore, APC outperforms K-means clustering because it benefits from the initialization independent feature and better selection of the cluster exemplar.
The APC algorithm requires two types of real-valued input to divide N into clusters: the similarities matrix S Sim and the preference pref .A pairwise similarity s(n l , n l ′ ) (for l ̸ = l ′ ) is used to indicate the appropriateness of n l to be selected as the exemplar with respect to n l ′ .Since we aim to minimize the squared error, the similarity calculation in the APC is based on the negative squared error (Euclidean distance) as follows: where 1 ≤ l, l ′ ≤ L and l ̸ = l ′ .Also pref is defined as follows To evaluate the quality of clustering results, a clustering index validity such as the silhouette (SI) is considered [30].The SI is a well-known measure of how similar a training vector n l , l = 1, . . ., L is to its own cluster (cohesion) compared to other clusters (separation).It is then averaged over all training vectors and is defined [31] as where d(n l ) is the average distance between n l and all training vectors in other clusters and f (n l ) is the average distance between n l and all training vectors in the same cluster.

Cluster Identification
In this section, to identify the corresponding cluster of each fingerprint, we use an artificial neural network (ANN) classifier.The ANN that we use is the multi-layer perceptron (MLP).which is employed and evaluated for coarse localization.For this purpose, a subset of the clustered training data points is selected as a validation dataset, and it is supposed that their cluster IDs are unknown.This subset is used to obtain the accuracy of the cluster identification model.Let us suppose there are D attributes after using dimensionality reduction and 20% of the training data are selected as validation dataset.The details of this cluster identification algorithm are described in the following.

-Multi-layer Perceptron (MLP) Neural Networks (NN)
MLP is a supervised learning algorithm that learns a function h(.) : R D ∼ R T by training on a dataset, where D is the number of dimensions of attributes for the input and T is the number of cluster IDs for the output.Given a set of attributes and a target, MLP can learn a non-linear function for classification.MLP consists of three layers: input, hidden, and output.As shown in Fig. 5, we associate the input nodes with reference node A 1 ∼ A D and the output nodes with predefined cluster-ID.
The leftmost layer, known as the input layer, consists of a set of neurons {A 1 , A 2 , . . ., A D } representing the input attributes.The hidden layers are between the input and output layers, where the transitional computations are performed.Each hidden layer uses the output of the previous layer to perform a non-linear operation, which is defined as: where W k is a fully connected weight matrix that represents all the connections between each node of the (k − 1)th layer and each node of the kth layer.b k is the bias vector of kth layer, h k−1 represents the output from the previous layer.The weights and biases in a neural network are initially set to random values but the model is trained using the back-propagation (BP) method and the Adam optimizer [32] to minimize the loss function and updates the network parameters (i.e., weights and biases) iteratively until a convergence is achieved.ϕ(.) is the activation function that in our case, we use Rectified Linear Unit (ReLU) [33] (i.e., ϕ(n) = max(n; 0)) as an activation function in hidden layers and the softmax function is used as the activation function of the output layer so that the sum of the output values of all output neurons is equal to 1 and is defined as The output layer receives the values from the last hidden layer and transforms them into output values.Then the model is evaluated by comparing the real cluster-ID and the estimated cluster-ID.

Making a Regression Model
Let us suppose matrix N is divided into T clusters N ′ t , t = 1, . . ., T with known locations [x t , y t ].Therefore, we have Consequently, the training data of each cluster are modeled using a GPR model, which takes the fingerprint as an input and provides the UE's location as an output.For simplicity, the data of a cluster is only considered in this part, but a similar procedure is applied for other clusters as well.Therefore, we consider the t th cluster which has L ′ training samples.
Let us define f x (.) and f y (.) as the functions which map the fingerprint vector n k of any user k into cluster N ′ t to provide the 2-dimensional user's location coordinates (x k , y k ), such that where v x and v y are error terms modeled as i.i.d.Gaussian random variables with zero mean and variance σ 2 vx and σ 2 vy , respectively.From (30), we can see that, estimating the x-coordinate x k (and y-coordinate y k ) from n k is a nonlinear regression problem in machine learning.Among non-linear regression methods, we choose GPR because it is a powerful, Bayesian non-parametric approach that provides a probability distribution over all possible values [34].Also, GPR methods have a good performance in terms of multiple metrics, including the squared prediction error [35].For simplicity, details of the GPR model are presented for the x-coordinates of the users and the same procedure can be applied for the y-coordinates.
In (30), function f x (.) is supposed to be random and follows a Gaussian process with zero-mean and covariance matrix C t (also known as a kernel matrix), such that The covariance function between user k and user k ′ in cluster N ′ t , is the weighted sum of the squared-exponential, the linear and the delta function [36] and is defined by where c t (n k , n k′ ) is an element of C t and the delta function and 0 otherwise.According to (32), we need to estimate and optimize an unknown GPR hyperparameter vector φ t = [γ, ϑ, ϖ] from the training data.
Based on the GP assumption, i.e. data can be represented as a sample from a multivariate Gaussian distribution, we know that x t is Gaussian distributed, . The log-likelihood function is used to derive the maximum likelihood estimator of parameter ϕ ϕ ϕ t .The estimator φ ϕ ϕ t is obtained by solving The optimization problem in (33) can be solved using a limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm which is an optimization algorithm in the family of quasi-Newton methods that approximates the BFGS algorithm using a limited amount of computer memory.It is a wellknown algorithm for parameter estimation in machine learning that is explained in detail in [38].

Online Positioning Phase
In this phase, the position of a test user whose location is unknown is estimated.Let us suppose there are L test users.For this purpose, the L × D testing matrix N is used to estimate the L × 1 x-coordinate vector x.For each test user data n, the process of location estimation is described below.

-Step 1: Cluster identification
Based on the cluster identification algorithm, the cluster ID t of testing data point n is estimated.

-Step 2: Location estimation
By using the GPR model of related cluster t, the x-coordinate x of test user fingerprint n is estimated.Now we would like to determine the posterior density of the location, i.e.
x|x t , N ′ t , n.According to [39], this distribution is Gaussian and the best estimate for x is the mean of this distribution, which is defined as where c t = (c t (n, n 1 )), . . ., c t (n, n L ′ )) and L ′ is the number of training data in the cluster t.Also, the uncertainty in our estimate is given by its variance which is given as

Results and Discussion
In this section, we compare the position estimation performance of the twostage fingerprint clustering method in [20], which is considered as a benchmark, and the proposed method.To simulate a common urban wireless propagation scenario, a 120 • sector with radius R = 500 m is considered.At the center of the sector, we place a BS which is equipped with a ULA where the number of antennas M is 128.The major wireless parameters which are employed in the simulation are set to those typical in LTE [40], [41], as listed in Table 1.Also, to have a simple analysis, it is assumed that the location of a UE is estimated based on the 20 nearest scatterers.The RPs are uniformly distributed every 10 m in a grid configuration over the whole target area.However, the test users are randomly distributed in the sector.In our case, the number of training RPs L is 893 and the number of testing data points L is 157.For each path, the location point of scattering is used to compute the AoA, the CFR of each UE is calculated according to (1) and (3).

Preprocessing Results
As mentioned in the offline phase, the fingerprints of the RPs are first extracted and stored into the dataset with their corresponding coordinates.The training data R are standardized according to (10) and the pre-processing scenarios are employed.For evaluating the performance of the proposed method, first, the new dimensions of the training data should be specified using dimensionality reduction PCA.In PCA, a vital part is to estimate how many principal components which explain the variance of the data are needed.The relation between the number of principal components and the percentage of data variance is shown in Table 2.We can see that with the first 20 components, approximately 80% of the variance is contained, while we need 58 Seyedeh Samira Moosavi, Paul Fortier components to describe close to 95% of the variance.Since we aim to have 90% of the variance, the number of principal components is set to 35 and thus the dimensions of the original data is reduced to 35.In Fig. 6, shows the t-SNE visualization of applying the PCA on our localization dataset with 35 principal components.We can see that the data are very clearly separated into sub-groups.We can clearly see how all the samples are nicely spaced apart and grouped together with their respective locations.If we now use a clustering algorithm to pick out the separate clusters, we could probably automatically assign new points to a label.Fig. 6: Preprocessing data using PCA and t-SNE.

Clustering and Cluster Validation Results
For clustering the compressed training data, k-means, APC and agglomerative algorithms are applied.Before applying APC, according to (26), we select the median value of the similarity matrix in (25) as the initial preference value.Then by increasing this value, we generate a range of preference values because changes in the preference value can result in quite different clustering results.Therefore, APC takes these values as an input to cluster the compressed training matrix N. The maximum number of iterations in APC is num_iterations = 200.For evaluating the performance of the proposed method, first, the valid number of clusters should be specified.It should be noted that the clustering results affect the cluster identification accuracy, the computational complexity, and further affect the position estimation accuracy, as shown later in this section.For this purpose, first, the valid clusterings which has high quality, are identified.Then, the results of validity indexes are averaged over 100 Monte-Carlo runs.Fig. 7 shows the quality of clustering of the training data points using k-means, affinity propagation and agglomerative clustering algorithms based on the silhouette, SI, value.It should be noted that the value of SI is in the range of 0 to 1.A larger value of SI represents higher clustering quality.Each cluster can be separated significantly when SI > 0.5.We have an unsuitable cluster structure when SI < 0.2.According to the clustering results, we can see that when we use affinity propagation to cluster the training data points, we have better average silhouette which means that the data are well separated.Also, we have a good clustering quality when we use agglomerative clustering, especially when we have 5 clusters.

Accuracy of Cluster Identification
As mentioned in the offline phase, the MLP algorithm is applied for cluster identification.To evaluate the accuracy of the MLP model, 20% of all training data are considered as a validation dataset and it is supposed that their cluster-ID is unknown.Then, by comparing the estimated cluster ID and the real one, the accuracy is obtained.In our analysis, the accuracy of the model is 92%.

Positioning Performance
For estimating the position, the GPR model is trained by solving the loglikelihood maximization problem in (33).
During the online positioning phase, the fingerprints of the new UEs are extracted and their position is estimated with our positioning method.Then the estimated positions are compared with the true positions and the performance of the proposed fingerprint wireless positioning system is evaluated.In this part, by considering the two-stage fingerprint clustering method in [20] as a benchmark, we first present the performance of the localization accuracy of the proposed method.Simulation results are obtained to indicate that the proposed method is suitable in massive MIMO-OFDM systems.Also, the effect of the number of BS antennas in location estimation performance is evaluated.For evaluation, a massive MIMO-OFDM system with 128 BS antennas is considered.Fig. 8 shows the cumulative distribution of the estimation errors for different methods.We can see that the accuracy of the proposed method is better than the benchmark with 93% reliability for 10-meter accuracy.For comparision, in [20], the reliability for 10-meter accuracy was 70%.
The impact of the number of BS antennas on the localization accuracy is demonstrated in Fig. 9.When the number of antennas is increased from 64 to 128, the reliability for 10-meter accuracy is increased from 81% to 90%.Also, we can see that increasing the number of BS antennas increases the localization accuracy.

Conclusion
We proposed a low dimensional cluster-based approach to estimate the user's location from the CSI in a collocated massive MIMO-OFDM system.In the proposed method, first, all high dimension training data were map into a lower dimensional space.Then the whole testbed was divided into clusters using different clustering algorithms, which reduces the computational cost of online positioning.APC was chosen for clustering due to its initialization-  independent property and better selection of cluster representative compared with k-means and agglomerative clustering.A MLP was used for cluster identification to allow for a quick finding of the related cluster.Also, GPR was applied for further location estimation within each cluster.The proposed method was compared with a previous work in terms of localization accuracy.Also, The wireless channel from an arbitrary user k to the BS.The pre-processing scenarios.The CDF of the location errors with different number of antennas M.

Fig. 1 :
Fig. 1: The wireless channel from an arbitrary user k to the BS.

Fig. 3 :
Fig. 3: Overview of the proposed position estimation scheme.

Fig. 8 :
Fig. 8: The CDF of the location errors.

Fig. 9 :
Fig. 9: The CDF of the location errors with different number of antennas M .

Figure 2 Example
Figure 2

Figure 3 Overview
Figure 3

Figure 5 Structure
Figure 5

Figure 7 Comparison
Figure 7

Figure 8 The
Figure 8

Table 1 :
Wireless Parameters

Table 2 :
The projection loss of the principal components.