Anomaly Detection in Multi Tiered Cellular Networks using LSTM and 1D-CNN

Self Organizing Networks (SONs) are considered as one of the key features for automation of network management in new generation of mobile communications. The upcoming ﬁfth generation (5G) mobile networks are likely to oﬀer new advancements for SON solutions. In SON concept, self-healing is a prominent task which comes along with cell outage detection and cell outage compensation. 5G networks are supposed to have ultra-dense deployments which makes cell outage detection critical and harder for network maintenance. Therefore, by imitating the ultra-dense multi-tiered scenarios regarding 5G networks, this study investigates femtocell outage detection with the help of Long Short- Term Memory (LSTM) and one-dimensional Convolutional Neural Networks (1D-CNN) by means of time sequences of Key Performance Indicator (KPI) parameters generated in user equipments. In proposed scheme, probable anomalies in femto access points (FAP) are detected and classiﬁed within a predetermined time sequence intervals. On the average, in more than 80% of the cases the outage states of the femtocells are correctly predicted among healthy and anomalous states.


Introduction
and outage detection may become a much harder task due to sparse user statistics and vertical handovers [2,3,4,5].
Many outage detection studies have focused on macro cell anomalies. Within this scope, handover statistics based on KPI measures were employed on COD analysis in [6]. In [7], COD has been handled with the help of neighbor cell list reports by detecting outage cells according to the changes in the topology generated by visibility graphs. In [8], Channel Quality Indicator (CQI) was used within a composite hypothesis for outage detection by means of a discriminant function.
Machine learning methods has also been popular in outage detection of macro cells.
In [9] and [10], clustering algorithms and Bayesian Networks have been conducted for COD, respectively. In [11], alternative to machine learning procedures, an anomaly detection method based on statistical processing of big data emerged from KPI measures has been introduced.
Some researchers proposed approaches on detecting anomalies in cellular networks regardless of the type of the anomalous base stations such as macro, micro, femto or pico base stations. Within this framework, K-Nearest Neighbors method has been conducted in [4] for COD in multi-tier networks . Hidden Markov Models (HMM), another well-known maximum likelihood classifier, was also studied in COD by training the data regarding healthy cells and outage cells for predicting the outage status of the base stations [2]. Recently deep learning approaches started attracting attention among researchers in the area of COD. In [12] Recurrent Neural Networks were comparatively analyzed along with traditional Support Vector Machines in terms of COD performance. Another study was introduced in [13] where anomaly detection in cellular networks was studied by using Convolutional Neural Networks (CNN). On the other hand, some researchers have presented studies about COD in only small cells so as to focus on its difficulties when compared to relatively easier detection of macro cell anomalies. In this context, Wang et al proposed a cooperative COD scheme for femto cells in a conventional heterogeneous network by means of spatial correlations among users [3].
This study aims to develop a foresight for cell outage management in forthcoming 5G networks with multi-tiered ultra-dense deployments. Inspired by the advancements of deep learning methods in time sequence analysis we propose using Long-Short Term Memory and one-dimensional CNN for detection of femtocell outages which poses extra challenges. In the proposed scheme, we suggest not only detecting femtocell outages but also classifying the type of outages into anomalous subclasses according to the severity of the degradation of the femto access point functionality.
As a variant of Recurrent Neural Networks (RNNs), LSTM modules have been used with time sequence labeling tasks on many areas so far . Similarly, 1D-CNN structures have also been used for extracting features from fixed-length data like audio recordings and various other time series of sensor data. Thereby, in this framework, we separately employed LSTM and 1D-CNN for the investigation of outage patterns by using time sequences of metrics measured on UEs placed around the femtocell sites. Signal to Interference Noise Ratio (SINR) and CQI, were utilized as input feature time series data within the aforementioned deep learning structures for training and testing phases.
Outstanding contributions of this study are the following. This study; 1) Employs and compares two deep network approaches, namely LSTM, and 1-D CNN for detection and identification of anomalous states of densely deployed femtocells for inspiring outage management in upcoming 5G networks.
2) For boosting the anomaly detection performance, introduces aggregation decision methods integrated with LSTM and 1-D CNN.
3) Detects degradations by using only CQI and SINR being two fundamental features of the UEs.
The rest of the paper is organized as follows. In Section 2 radio access network (RAN) structure and the deep learning methods used in this study are scrutinized.
In Section 3, the details of the anomaly detection, classification algorithms and training procedures are introduced. Section 4 presents results and includes essential hermeneutic discussion issues. Finally, we conclude the paper in Section 5.

Methods/Experimental
This study investigates the detection of anomalous states of FAPs in ultra-dense cellular networks for providing an insight into management of self-organizing 5G networks. The femtocells in cellular wireless systems in this study are considered to be in four states as healthy, degraded, crippled and catatonic where degraded, crippled and catatonic cells are the types of anomalous cells. In anomalous states the service quality of FAPs is said to be in reduction to some extent. Degraded FAPs have a slightly lower performance than healthy ones and can resume normal operation after the environmental effects causing the anomaly disappear.
Crippled FAPs have serious problems and may carry very little traffic. On the other hand, catatonic FAPs are generally out of service due to catastrophic failures like serious power cuts [2]. Due to some reasons like hardware or software failures such as implementation failures in channel processing, external power supply problems or even misconfiguration, and etc., when there is a reduction in FAP's output power, FAPs go into anomalous states. However, service providers generally may not realize these types of state changes very quickly and efficiently. Making use of the measured UE data related to received signal strength, which are also reported to base stations, the anomalous states might be detected by use of deep network applications like LSTM and 1D-CNN which have been employed in many fields so far.
LSTM is utilized in this study for its ability to learn time interactions on UEs' time series data to detect anomalies regarding outages. On the other hand, another deep learning method, 1-D CNN structure, is also employed for its powerful extraction capability of spatio-temporal anomaly patterns that might be generated in outage events.

Key Performance Indicators
KPIs are generally utilized for monitoring and optimizing cellular network performance. Therefore, KPIs may also be well suited to anomaly detection tasks. KPI

Radio Access Network
We generate a time sequence composed of CQI and SINR values of UEs with the help of a well-known downlink system level simulator for cellular networks [14].
In order to synthesize time sequences of UE data, we employed macro BSs with tri-sectored structure in the hexagonal geometry using 7 macro BSs with each macro BS having 120 UEs in our simulator. Additionally, we generated 2 FAPs in each  Table 1. In this study healthy, degraded, crippled and catatonic FAPs are assumed to radiate with a power of 30 dBm, 20 dBm, 10 dBm and -10 dBm respectively. We ran the simulations for a duration of 60 milliseconds and record UE data in every millisecond. Within this duration, we degrade the transmit power of the femtocell in a random time for all three possible anomaly cases with the condition that the FAPs are initially in healthy state i.e., transmit power is 30 dBm which is healthy full power. For healthy cases we did not reduce the transmit power of the FAP at all. In our work we have employed three different shadowing conditions with standard deviations 2, 5 and 8 dB to reflect the fading effects to our analysis.

LONG SHORT-TERM MEMORY
In deep learning field, LSTM is known to be a special type of artificial RNN ar- Unlike RNNs having one hyperbolic tangent layer, a typical LSTM unit is composed of a cell being the memory part, an input gate, an input modulation gate, an output gate and a forget gate. Input gate decides the entrance of a new data to the cell whereas input modulation gate controls the extent to which the new data enters the cell. Forget gate decides what information will be eliminated from the cell state and the output gate controls the extent to which the value in the cell is used to contribute to the activation of the LSTM unit output.
Recurrent neural networks including LSTM networks consist of repetitive sequences of neural network modules in chain arrangement as shown in Fig. 2. By use of activations from previous cycles as inputs to the current network, a decision for the current input can be made, so that LSTM can better be suited for sequential labeling purposes [16]. The relations concerning the hidden state h t-1 , current state In the compact equations above, f t is the forget gate value, it is the input gate value, č t is the input modulation gate value, o t is the output gate value, c t is the current state (cell memory) value, h t is the value of the output (hidden state) of the LSTM unit at time step t, σ is the activation function (sigmoid, relu etc.), tanh is the hyperbolic tangent function and • is the element-wise Hadamard product. W xf ,

1-DIMENSIONAL CONVOLUTIONAL NEURAL NETWORKS
CNNs are very powerful tools in many modern artificial intelligence applications, particularly in machine learning and computer vision tasks. The architecture of a generic CNN has one or more convolutional layers, followed by a pooling layer, a flattening layer and a fully-connected layer where these layers help in learning the features, then patterns and objects in the data of interest.
In the literature many CNN applications have multidimensional structure especially in tasks related to image and video data [17]. On the other hand, 1-D CNN is also used in data having time sequence character such as text, handwriting, speech signals and natural language processing [18]. 1D-CNN searches for temporal patterns and differences in the direction of elapsing time via a convolution kernel window [19]. For illustrative purposes, convolution process with a kernel of size 2 is shown in Fig. 4 below such that 1D-CNN kernel is slid towards the direction of elapsing time so that it can extract the temporal pattern changes in the time data composed of CQI and SINR values of subject UEs for investigating any probable anomalies.
The first layer, conv1, in our 1D-CNN structure is the first convolutional layer.
The inputs to this layer are the input data volume, I ∈ R (n IH )×(n IW ) and learning filters, also called kernels, F ∈ R (n F H )×(n F W ) where n IH , n F H are height and width of input volume and n IW and n F W are height and width of the filters applied. In this layer parallel convolution processes are employed between each kernel and the input volume in the desired directions followed by a bias and a rectified linear unit function (ReLu) operation to generate the output of , which is also the input of the second convolutional layer (conv2), I conv2 . The output of the second convolutional layer, , is also the input to pooling layer, I maxpool .As illustrated in Fig. 5, two subsequent convolutional layers are conducted in our CNN architecture for the purpose of better extracting features composed of lower level features. At the end of convolutional layers, activation maps, also called as feature maps, which hold the kernel responses for every spatial position are generated. In our work the number of kernels, N kernel , we used is 32 which means that 32 possible patterns are subject to our feature extraction task.
In our study we set filter height, n F H , to 3 because the sign of anomalies in CQI and SINR on UEs generally emerges suddenly within a few milliseconds where we sample CQI and SINR in every millisecond in our RAN simulations. These sudden pattern deviations due to anomalous degradations are also the main reason why we did not apply any strides or zero paddings to our data for not missing any significant patterns. In our study height of our input volume n IH is set to 60 which is the employed UE time sequence duration in milliseconds. Since we used time sequences of two metrics, width of our input volume, n IW equals 2. Our filter width is also equal to that of input volume meaning that n F W is also set to 2.
In the sequential structure of CNNs, pooling layers generally take place right after the convolutional layers to reduce the spatial size and parameters within a down sampling character. We employed a max-pooling filtering operation which computes the largest value in each patch of each activation map. The input to his layer I maxpool is reduced according to the max-pooling filter size, n mp which can be considered as the down sampling rate and is chosen as 2 in our work.
The output of the max-pooling layer, is also the input of the flattening layer, I f lat . Flattening layer reshapes the input by converting it into a 1-dimensional long feature vector output, , which is also the input to final fully con- tioning that in practical application, training step is done for once at the beginning of network monitoring and hence will take no time and will bring no computational burden in classification and detection phases in real time.

Classification
Every test sequence of the UEs residing in ROI is applied to the trained model so as to produce a discrete probability distribution vector, (P D U E ), with four probability values as expressed in Equation (7). For every UE in ROI, P D U E vector holds, p healthy , p degraded , p crippled and p catatonic values which are the probabilities that the UE is exposed to healthy, degraded, crippled and catatonic states respectively.
Thus, the probabilities in this vector are accepted as scores revealing the degree of involvement of each UE with the given anomalous or healthy states. The classifier predicts the anomalous state of the UE time sequences by picking the most probable class which is nothing but choosing the class with the highest probability in P D U E .

Aggregation Decision
However, since we are oriented about coming to a decision about the state of the FAP rather than the UE being exposed to such a probable anomaly, we propose a kind of aggregation decision mechanism, for predicting the state of the FAP by Alternative to ensemble averaging one can also propose another aggregation decision method, which we call majority voting. We start this procedure by examining the P D U E of every UE in ROI, revealed by the classifier, and determine the pre-

Results & Discussion
In this section, we will be giving the results about our proposed anomaly detection procedures in terms of overall accuracies and analyzing them with the related proper discussion. We used overall accuracy as a measure of how well our proposed anomaly detection methods identify and classify outages among four state categories. We   In addition, Fig.8 also tells us that as shadowing is another factor that affects classification and anomaly detection procedure. We notice that as shadow fading becomes harsher, under the same ROI, the classification accuracies all get less.
This remarkable reduction obviously shows that harsher shadowing makes anomaly detection a harder task as the KPI data subject to classifiers start having more noisy nature.
Employed aggregation decision method also affects the overall anomaly detection performance. The anomaly classification accuracies within all ROI values and both aggregation decision methods are given in Fig.9 for LSTM at 2 dB shadowing and in by multipath channel effects, so as to reduce misclassification [22].
In multi-label classification, recall rates for every category also account for the classifier. Recall rate is a label-based evaluation measure which gives the ratio of truly predicted samples of a category to total actual samples of that category as expressed in Equation (8) where TP indicates the True Positives and FN indicates False Negatives.
In Fig.11  A good classifier is expected to have a TPR as high as possible.
In Fig.12 [3]. Likewise, in 4-category FAP anomaly classification we reached accuracies more than 80% on the average which is even better than the average results reached in [2] where more easily detectable macro BSs are also involved in. Moreover, our study does not require any data regarding neighboring cells as in [7], and does not require any KPI data pre-processing as in [15], thus has a potential on being operated in run time applications for relatively reduced complexity.
By considering the results of this study, we can deduce that a smart way of anomaly management in FAPs of 5G networks should be monitoring the signal quality and channel conditions in radio access network and choose the appropriate type of the proposed deep learning method and optimum ROI accordingly. In that regard, in 4-category classification, it is better to use LSTM in harsher channel conditions and use 1D-CNN in good signal conditions. And in two-state scheme, employing larger ROI gives better accuracies in good signal conditions whereas smaller radius is more suitable in harsher conditions. Therefore, OSS should keep track of channel conditions and determine the most appropriate way of detection method and suitable ROI for anomaly detection in cellular networks.

Conclusion
In this study we have worked on detecting the anomalies in femtocells and classify-

Availability of data and materials
The data used and/or analysed during the current study are available from the corresponding author on reasonable request.