On the Classification of Modulation Schemes Using Higher Order Statistics and Support Vector Machines

The recognition of modulation schemes in military and civilian applications is a major task for intelligent receiving systems. Various Automatic Modulation Classification (AMC) algorithms have been developed for this purpose in the literature. However, classification with low computational complexity as well as reasonable processing time is still a challenge. In this paper, a feature-based approach along with various classifiers is employed based on statistical features as well as higher-order moments and cumulants. An over-the-air (OTA) recorded dataset consisting of four analog and ten digital modulation schemes are used for testing the proposed method at 0–20 dB SNR. The overall accuracy for quadratic Support Vector Machine (SVM) is found to be as high as 98% at 10 dB. The comparison of the results with other AMC papers published in the literature indicates that the proposed method present higher accuracy, especially for realistic channel induced OTA dataset.


Introduction
Automatic Modulation Classification (AMC) plays a vital role in a wide range of applications in military and civil industries. The implementation of AMC can be largely categorized into two groups [1]; likelihood-based (LB) [2] and feature-based (FB) [3] approaches. In the LB approach, hypothesis testing is utilized in conjunction with the likelihood function. Even though this method gives an optimal solution, the probabilistic nature of LB is considered to be computationally heavy. Moreover, it is sensitive to model mismatches such as phase and timing errors along with frequency offsets and noise [4,5]. In contrast to the LB approach, computational complexity in the FB approach is much lower, and the implementation is much simpler [6]. The FB method is based on the extraction of features of the received signal, and therefore, this is highly dependent on how much the features are distinctive. Implementation of the FB approach follows feature extraction and classification stages. In the literature, various features have been utilized, for example, statistical features of instantaneous amplitude, phase and frequency characteristics of the received signal including High Order Moments or Cumulants (HOMs/HOCs) also known as Higher-order statistics (HOSs) [7], or spectrum symmetry and [8][9][10] wavelet transform [11]. The set of features could be basically selected based on modulation type, classification methods, channel effect, etc. [12][13][14][15][16][17][18][19][20][21]. Various classifiers including decision tree (DT), k-nearest neighbor (KNN), ensemble classifiers and support vector machines (SVM) have been used in the classification stage of AMC in the literature. Classifiers can have different complexity and accuracy in various applications. In the DT, the data is classified by splitting into different branches. At the end of each branch the decision of the classification is carried out with the class labels called leaves. It has different types of flexibilities called as Fine, Medium and Coarse Trees. DT has a wide range of applications especially in data mining and statistics because of its low memory requirement. It has been reported that the DT has weak performance when the number of features is low or the class diversity is high [22]. KNN is employed for both classification and regression applications. Simply, it is based on the assumption that data belonging to the same class are near to each other. The decision is made according to calculating the distance of data to its neighbouring data labels using generally Euclidean distance. Major drawback of KNN algorithm is that the prediction speed decreases when the number of samples and classes increases. According to the flexibility of algorithm, there are various types such as Fine, Medium, and Coarse KNN. In ensemble classification algorithms, the combination of multiple classifiers is employed to increase the performance of classification when compared single classifier. Bagged and Boosted Trees are popular ensemble classification algorithms [23]. SVM is widely reported in the literature for classification and regression problems as it shows low computational complexity but high performance. In SVM, data points are projected in N-dimensional feature space, and hyperplanes for the distinction of the classes are determined accordingly [24]. On the other hand, deep learning (DL) has received a great attention in AMC implementations [25,26] while this paper focuses only on classical ML based classifications.
When the latest AMC publications in the literature are examined, the followings conclusions can be drawn for understanding what have been done recently. The authors in [27] proposed a recognition algorithm based on differential nonlinear phase Peak Factor (PF) that can successfully classify both MPSK and MQAM signals. For those signals which exceed an SNR of −1 dB, the classification accuracy is reported to be over 97%. In [28], a recognition algorithm using a cyclic spectrum and SVM classification is proposed. Classification of signals of MPSK and MQAM were performed for SNR values between 0 and 7 dB. A modulation recognition framework based on SVM classification is given in [29]. Four modulation schemes of secondary modulated signals were investigated. To classify secondary modulated signals, the features were extracted from the spectrum of the raw signal. Features extraction from the intermediate frequency band of radar signals in the time-frequency domain was used for classification by the authors in [30]. SVM and KNN classifiers were used. To distinguish the radars having the same class, the classification accuracy was reported to be 91% for SNR values between 5 and 15 dB, while in the worst case for SNR between −1 to 10 dB the accuracy is found to be 64%. The authors in [31] proposed a novel and robust DL method that leverage both contextual features and handcrafted features. The practicality of the proposed method was assessed by investigating the classification accuracy of 11 types of modulation schemes.
Most of the studies in the literature have used the simulated channel effects or Additive White Gaussian Noise (AWGN) channels. However, this does not represent realistic channels. Moreover, modulation schemes studied in the literature were very limited [5,8,11,16], and only HOCs or statistical characteristics were used in feature extraction which may not work for diverse modulation schemes [6,9,15,19].
In this study, a wide range of over the air (OTA) recorded signals at different Signal to Noise Ratio (SNR) levels are used for modulation classification. A total of 14 modulation schemes consisting of (i) analog modulation schemes ( [32]. Statistical features including mean, variance, skewness and kurtosis as well as moments and cumulants up to 8th order are employed. 17 Classifiers including the derivatives of DT, KNN, SVM and ensemble classifiers have been employed and their performance has been compared at three different SNR levels (0, 10 and 20 dB). Linear, quadratic and cubic SVMs are used for further processing due to their higher classification accuracy. The effects of the feature sets on the classification of particular modulation types are evaluated as well. Finally, the performance of the classification has been compared with the published works in the literature.
The rest of the paper is organized as follows: In Sect. 2, the proposed method including feature extraction and classification is given. Section 3 present the classification performance and discusses the results in comparison with the literature. Section 4 draws some conclusions along with future work.

Signal Preprocessing
Dataset contains over-the-air (OTA) recorded signals in IQ (In-phase and Quadrature) form [32] . First of all, the time domain complex signal can be formed as [12,13] where I(n) and Q(n) are the in phase (I) and the quadrature (Q) components of the samples of the signal, and n is the sample (time) index. Instantaneous amplitude a(n), phase (n) and frequency f(n) characteristics of the complex signal can be derived as follows In the next step, these must be normalized by subtracting their mean values in order to avoid biases superimposed in the data collection stage. Following the method in [33], the normalized instantaneous signal characteristics can be written as

Feature Extraction and Classification
Derivation of statistical features instead of the complex signal characteristics derived previously provides a reduction in the computational complexity as well as feature space dimension. Therefore, the following features, namely, mean ( x ), variance ( 2 x ), skewness ( x ), and kurtosis ( x ), respectively, are derived for any signal characteristic, x(k).
where x is the mean of x(k) [33].
Additionally, higher-order moments (HOMs) and cumulants (HOCs) are included to achieve better performance in classification, especially, for high-order modulation schemes. HOMs can be written as where E[.] denotes expectation, * denotes complex conjugate, and p, q determine the order of the moment. The HOMs play a role in obtaining HOCs [12] as well. For example, the following defines one of the HOCs ( C 42 ) Based on [34], the amplitudes of the HOMs and HOCs are taken into account. Then, all features are listed in Table 1.
In the next step, SVMs are used for classification. SVM is known as a supervised machine learning algorithm, and it show high performance when classifying the noisy and high dimensional data samples. It uses hyperplanes to separates data among the classes [35]. The SVM logic is that all modulation classes are mapped with kernel function called transformations with which the separation of classes is conducted. If the linear kernel is used, SVM is called linear. If the non-linear kernel is used, SVM is called non-linear (i.e., polynomial and Gaussian type kernels). In linear SVM classifier, the linear kernel can be defined by where a = [a 1 , a 2 , … , a k ] is input feature vector and w is the weight vector. The weight vector is optimized through training by designing the hyperplane to attain maximum separation. This optimization reduces computational complexity of the training. The kernel concept helps reduce computational complexity caused by nonlinear SVM in converting the dataset to a high-dimensional space. Polynomial kernel functions used for nonlinear SVM classifiers can be given in the following form where d is the degree of the polynomial. Then, the degree of the polynomial is 3 and 4 for cubic SVM and quadratic SVM, respectively [43]. Basically, SVM uses a set of binary classification sub-problems. A comparison regarding with multi-class SVM is listed in Table 2.

Results and Discussion
Based on the features and classifiers described in the previous section, OTA modulated signals of 14 modulation types are classified. Each modulated signal record contains 1024 IQ samples, and there are 4096 records for each modulated signal at each SNR value. One  Table 3. It can be seen that the polynomial derivations of SVM is outperform all others. The memory requirement of DT considered to be small while its prediction speed is fast enough. The depth of the tree can be increased to obtain higher performance but this could lead to overfitting. Based on the number of leaves, DTs can be named as coarse, medium and fine. It seems that the performance of the fine tree is better than the coarse and medium tree for relatively large number of classes. KNN classifiers exploit the distance of neighboring classified samples. The performance of these classifiers are dependent on the number of neighbors, the function of distance weighting and distance metric. Generally, lower the dataset dimension higher the performance is. Their memory requirement and complexity is high and prediction speed is slow compared with DT classifiers. On the other hand, ensemble classifiers can be considered as a combination of multiple learning algorithms to achieve better prediction performance. Their interpretability is hard while the prediction speed and memory requirement   Fig. 1 within the SNR range of 0-20 dB. They show similar performance at low SNR values (up to 8 dB), and then the linear SVM shows slightly lower performance. Overall, all classifiers achieve around 95% accuracy above 6 dB SNR. It seems that the lower bound of the classifiers would be somewhere around 2 dB where the accuracy drops to 83%. In this paper, 80% of the records are used to train the network while the remaining (20%) is used for testing.
Next, it could be interesting to see how training data size could affect the performance of the classification. It is known that the size of the training data could improve the classification accuracy to some extent. For this purpose, based on previous analysis of SVM, the quadratic SVM was picked up and was trained with different size of the training dataset. There is no major difference between the classification accuracy for 20% training data (819 out of 4096 records) and 60% training data (2458 out of 4096 records) as shown in Fig. 2. However, the classification accuracy increases slightly (2-3%) when the training data is 80% of the total data (3277 out of 4096 records). Here, 5-fold cross-validation was also applied to avoid overfitting. The processing time for feature extraction and training takes around 3 h.  Tables 4 and 5 are the confusion matrices of the quadratic SVM classifier at 0 dB and 10 dB SNR, respectively. At low SNR (0 dB), some of the modulation schemes including 32PSK, 16APSK, GMSK, OQPSK as well as DSB analog amplitude modulation schemes have very low classification accuracy (between 30 and 60% and as low as 28% for a few other schemes). The classification accuracy increases significantly (87.5% or above) when the SNR increases to 10 dB (Table 5). FM and GMSK have 100% classification accuracy at 10 dB SNR. Furthermore, OOK, BPSK, QPSK, AM-SSB-SC and OQPSK have similar classification accuracy (99%). When both confusion matrices are examined, it can be seen that the classification accuracy of OOK and AM-SSB-SC is almost independent of SNR variations. This could be attributed to the constellation diagrams of the modulation schemes. Overall, average classification accuracy increases to 98% from 61.8% when the SNR increases from to 10 dB.
The classification performance is highly dependent on the features. For this purpose, the effects of various feature sets on the classification performance are examined. Mean, skewness, variance, kurtosis constitute the statistical features set. On the other hand, the effects of higher-order moments and cumulants set need to be studied carefully. At low SNR, the use of the statistical features set without moments and cumulants may not work well for OOK, 4ASK, 8ASK, and QPSK modulations. The use of moments and/ or cumulants greatly increase the classification performance. For BPSK, 16APSK, and OQPSK, it is found that cumulants perform poorly whereas the use of moments in both low and high SNR significantly increases the classification performance. For 32PSK, 32APSK, FM, SSBSC modulations, the use of statistical features along with moments and cumulants gives sufficient classification performance. For GMSK modulation, the use of cumulants alone was insufficient in achieving high accuracy while moments or statistical features alone are sufficient in achieving high classification performance. It should be noted that the classification accuracy at SNR values above 6 dB is almost independent of feature sets for DSBSC, DSBWC, SSBSC, BPSK, 32PSK, 32APSK and OOK. Shown in Fig. 3 presents how different feature sets perform for OOK, 16APSK, FM, and GMSK modulation schemes to set an example.
Finally, performances of the modulation classification methods, including ML and DL, of published works in the literature are summarized in Table 6. In order to demonstrate the contribution of this study, Table 6 needs to be examined carefully. It can be concluded that 3.7%

Table 5
Confusion matrix at 10 dB OOK

Conclusions
This study presents the classification results of 14 modulation schemes based on an over the air dataset (OTA). The assessment of various feature sets including higher-order statistics, moments and cumulants is presented as well. The performances of three SVMs are examined for the classification of analog and digital modulation schemes. The performance of the proposed work is compared with the published works in the literature from different aspects including the number of modulation schemes, diversity of modulation schemes, channel models, SNR levels and classifier types as well as accuracy. The classification performance of the prosed study outperforms all reported works with more realistic channel effects and an extended number of modulation schemes. Overall, the results show that modulation classification based on robust features including statistical features and HOMs/HOCs along with SVM has the potential to play an important role in many AMC   applications in realistic radio channels. Moreover, selective features may reduce computational complexity and processing time for some applications.
Funding The authors have not disclosed any funding.
Data availability Enquiries about data availability should be directed to the authors.