Radio Transmitter Identi�cation Under Small Sample Conditions

. Radio transmitter identification is an emerging technology that uses RF fingerprints to identify different radiation sources, and can find important applications in the field of wireless security. The choice of classifier is one of the key factors affecting recognition performance. Aiming at the shortcomings of a single classifier model, the author proposes a radio transmitter identification method based on improved SVM-KNN. First, the square integral bispectrum (SIB) algorithm is used to extract the signal fingerprint features from the steady-state signal of the mobile phone. Then, instead of the traditional Euclidean distance calculation method, the manifold geodesic distance calculation method is used to calculate the distance between the test sample and each support vector. Experimental studies using the same model of mobile phone show that the improved SVM-KNN algorithm is less affected by the choice of kernel function parameters, and has higher recognition accuracy without increasing the computational complexity. We also proved the robustness of our method in a wide range of signal-to-noise ratio.


Introduction
Radio transmitter identification is defined as the technology of extracting fingerprint features of signals transmitted by radio transmitters and using prior information and classifier to determine which radio transmitter equipment the signals belong to (Zhao Y et al.2018). Radio transmitter identification technique has been intensively studied for military communications (Zhang Q et al.2020), cognitive radio (Flowers B et al.2020), wireless network security (Zhou Y et al.2016). Usually, the fingerprint feature of radio transmitter is non-stationary, non-Gaussian, and non-linear (Liang J H et al.2017). At the same time, they also need to satisfy three characteristics of time-shift invariance, scale variability, and phase retention. The square integral bispectra (SIB) of the signal can better meet the fingerprint feature requirements of the radio transmitter. There is no omission or reuse of the bispectra value of the signal, so that the important features information of the radio transmitter can be obtained. It can overcome the problem of distortion of phase and amplitude information and effectively suppress Gaussian noise (Han J et al.2017), and can solve the problem of fingerprint feature extraction of similar radio transmitter with the same manufacturer, the same batch and the same model. Therefore, it is very appropriate to use SIB to extract the bispectra feature as its fingerprint feature (Cao R et al.2018).
The pattern classification ability of classifier is the main factor that determines the performance of a specific pattern recognition method.The design of the classifier is a key technology in radio transmitter identification and research on various high-performance classification algorithms has always been the main research topic of radio transmitter recognition. At present, the design of classifier is mainly in the following two aspects: First, the widely used classifiers are Decision Tree (Hsu J Y et al.2020), Naive Bayesian Classifier (Yang G et al.2020), Neural Network Classifier (Nandi A K et al.2020), Nearest Neighbor classifier (Kiashari S et al.2020), etc., based on statistical decision theory. The disadvantage of this kind of classifier is that it needs a large number of training samples to get the best classification performance, and it is difficult to achieve the desired classification effect under the condition of small samples. The second is a classifier based on statistics theory. Support Vector Machine (SVM) is the most representative one (Han, J et al.2018). In order to solve the problem of multiclassification, the methods of "one-to-many", "one-toone" and "directed acyclic graph" are used to extend it. However, these methods are time-consuming (Sun, J et al.2018), sensitive to outliers (Kim, L. S et al.2017), and over-fitting on some noisy data (Zhang, X et al.2017). Many later classifiers have been improved based on the above classifiers. The fuzzy support vector machine proposed by Ren et al. can classify five typical multifunctional radar signals with a correct rate of over 82% ( Ren, M et al.2016). Jiang et al. proposed a spatiotemporal information fusion method based on Dempster-Shafer evidence theory, which can effectively deal with uncertain information in the process of specific radiation source identification and obtain reasonable identification results. Zhang et al. constructed a hybrid classification model combining k-nearest neighbors, random forests and neural networks to identify radiation sources with an accuracy rate of over 97% (Zhang, X et al.2017). Zhang proposed a method to classify fault signal based on LSSVM fusion algorithm, and the recognition performance is significantly higher than those of the single LSSVM and multikernel LSSVM classifier (Zhang J.2020). Starting from the correlation between samples, Tang et al. constructed a classifier based on cooperative representation to identify radio transmitters, which showed stronger effectiveness on training sets of different sizes (Tang et al.2017). Yang et al. used a deep belief network to classify communication radiation sources, and the recognition rate could reach 94.6% (Yang et al.2018). Xu et al. proposed a hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy, and applied it to radiation source recognition, which is superior to similar methods in terms of computational efficiency and recognition accuracy (Xu, X et al.2018). Youssef et al. investigated conventional deep neural nets, convolutional neural nets, support vector machines, deep neural nets with multi-stage training, and the latter can achieve 100% classification accuracy of 12 transmitters (Youssef, K et al.2018). Riyaz et al. describes a method for uniquely identifying a specific radio among nominally similar devices using a combination of SDR sensing capability and machine learning (ML) techniques, which demonstrates up to 90-99 percent experimental accuracy at transmitter-receiver distances varying between 2-50 ft over a noisy, multi-path wireless channel (Riyaz, S et al.2018). Gong et al. proposed an unsupervised SEI framework based on information maximized GANs (Info-GANs) and using radio frequency fingerprint embedding(RFFE), which achieved a faster convergence speed and higher evaluation score than state-of-the-art algorithms (Gong J et al.2020).
The essence of radio transmitter identification is a multi-classification problem. Existing classifiers based on statistical decision theory and statistical theory, such as KNN and SVM, can be used to classify nonlinear data. However, there are still some shortcomings in using only a single classifier. For example, KNN is easy to implement and performs better than SVM on multiclassification problems, but its prediction results are easily affected by sample imbalance (Kuang, L et al.2019). Although SVM is a small sample learning algorithm with good robustness, it cannot directly support multiclassification problems, and it is easy to misclassify points near the hyperplane (Zhang, X et al.2017). In order to give full play to the advantages of each model, ( Lin, Y. & Wang, J.2014) proposes a combined model of SVM and KNN. This method can improve the performance of solving multi-class problems, but it is easily affected by parameter selection.
In this paper, we propose a radio transmitter recognition method based on improved SVM-KNN. In Section II, the square integral bispectra (SIB) is used to extract the bispectra features of a radio transmitter signals as their fingerprints. Then, the dimensionality of the fingerprints are reduced by SOPDA (Hu H S et al.2020). Meanwhile, the improved SVM-KNN is adopted to improve the recognition rate. In Section III, the validity and reliability of the algorithm are verified by experiments on mobile phone data sets of the same manufacturer, batch and model.

Radio transmitter Identification Process.
The radio transmitter identification process is shown in fig. 1. Radio transmitter identification is mainly to collect the original signal data from the radio transmitter through the communication receiver. After the signal preprocessing, the fingerprint features of the original signal data are obtained by Square Integral Bispectra(SIB). Then the dimensionality of SIB feature are reduced through SOPDA (Hu H S et al.2020). The Improved SVM-KNN classifier is used to classify the radio transmitter to which the original signal data belongs. Finally, the judgment results. The nature of radio transmitter recognition can be regarded as a pattern recognition problem. That is, the training and learning of the identified object after feature extraction, and finally classifying and discriminating (Dudczyk, J et al.2015). Bispectrum is a two-dimensional Fourier transform of the third-order cumulant. Assuming that the steadystate signal of the communication radiation source is , its corresponding third-order cumulant can be expressed as: ( 1) where represents conjugation and and represent delay. If is the Fourier transform of , bispectrum is: (2) The integral bispectrum method sets different integral paths on the bispectrum plane. The integral path of SIB consists of a group of squares centered at the origin. The steps of the algorithm to extract the bispectra features of radio transmitter with SIB are as follows: 1 Assuming that the radio transmitters to be identified belong to c classes, each type of radio transmitter collects N sample signals, wherein the k-th sample signal of class i radio transmitters can be expressed as . 2 The bispectrum of the sample signal is calculated by (2). ( 3) where L is the total number of integration paths for SIB, and represents the l-th integration path adopted by SIB.

Dimensionality reduction based on SOPDA.
Similarity order preserving discriminant analysis use the similarity between samples to define their stable structure relationship and obtain the projection space based on this relationship. Experimental results in (Hu H S et al.2020) demonstrated the superiority of the method, especially when the number of training samples is small, the method has a great improvement in recognition accuracy.
Assume is a training sample set with C classes, where ∈ represents the sample, m is the dimension of feature and n is the number of samples, and for any sample , i = 1 , 2 , ··· , n , its projection in the low-dimensional space is expressed as: (4) where is the projection vector, d is the selected projection dimension, is the feature after dimensionality reduction. (Hu H S et al.2020)for details.

Improved SVM-KNN classifier.
In fig.2, assuming that the sample to be identified is , and , are two classes of support vectors. Calculate the distance from to and the distance from to make a difference. When the distance difference is greater than the given threshold, which means is far from the hyperplane (region II in the figure), the SVM can classify correctly. When the distance difference is smaller than the given threshold, which means is closer to the hyperplane (falling into the area I), SVM method calculate the distance from to only one representative point , which is easy to misclassify. Because each class of support vectors only chooses one representative point. However, the representative point sometimes cannot well represent its class. In this case, all the support vectors of each class are taken as representative points by introducing KNN to improve the accuracy of the classifier.

Fig. 2. SVM-KNN classification
The steps of the SVM-KNN algorithm are as follows: Step 1: The SVM algorithm is used to find the corresponding support vector and its coefficients and constants .
represents test dataset, represents the support vector dataset, indicates the number of nearest neighbors.
Step 4: If , calculate and output. If , utilizing the KNN algorithm and pass the parameter , , .
Step 4: Let , return step 2. In step 4, when the KNN algorithm is used, the support vector set is used as a representative point set of the classification algorithm.
For the binary classification problem, there is only one hyperplane and the same normal vector is used to calculate the distance, so the distance between the sample point and the hyperplane can be more conveniently calculated by using SVM-KNN algorithm. For the multiclassification problem, due to the existence of multiple hyperplanes and multiple normal vectors, it is necessary to find the normal vector corresponding to each hyperplane, so the computational complexity of the SVM-KNN algorithm is increased. In addition, the original SVM-KNN algorithm usually uses Euclidean distance to calculate the distance between the test sample and each support vector. The Euclidean distance metric spatially distributes clustered samples is valid, but invalid for data containing non-linear manifold structures.
In order to solve the above problems, the Manifold Geodetic Distance (Wang H et al.2021) is used instead of the Euclidean Distance from the test sample to all the support vectors of each class, and then the distance is compared with the given threshold value, the prediction accuracy is higher. The detailed steps for calculating geodesic distance are as follows: Step 1: Construct a neighbor graph. If Sample point belong to the nearest neighbor points of sample point , , ; else , .
Step 2: Matrix initialization. where is Geodesic distance, is Euclidean distance between and .
Step 3: Calculate the shortest path.
Step 4: For each value of , will contain the shortest path distance between all pairs of sample points in the neighborhood graph.

Verification System Construction.
The signal acquisition equipment is an RM200 receiver, and the sampling frequency is set to 60MHz. The communication signals of 4 Honor 20 mobile phones are used. The communication frequency band of mobile phones is 890-915MHz. Receive into the RM200 receiver, operate the host computer software, set the sampling rate and time to capture a certain length of the transmitted signal, the receiver will down-convert the captured signal, and then output two I/Q intermediate frequency signals.

Analysis of Experimental Results.
In this paper, two groups of experiments are carried out to prove the effect of the proposed algorithm, one is to analyze the effect of using bispectra feature to characterize the radio transmitter which it belongs to, and the other is to evaluate the performance of the radio transmitter identification method based on improved SVM-KNN.
The first group of experiments are to analyze the effect of using bispectra feature to characterize the radio transmitter which it belongs to. The bispectra features of the 4 mobile phones were extracted , and compared by the bispectra sectional view and the bispectra stereogram. The bispectra sectional view is obtained by the plane along the bispectra stereogram. The bispectra sectional view and the bispectra stereogram of 4 mobile phones are shown in fig. 3. It can be seen from the bispectra sectional view and the bispectra stereogram of sample signals between different mobile phones in fig. 3 that the bispectra features of different phones are slightly different. At the same time, combined with the characteristics of the previously mentioned bispectra features and the advantages of noise suppression, the bispectra features can be used as fingerprint features of the radio transmitter signal.
The second group of experiments evaluated the performance of the radio transmitter identification method based on improved SVM-KNN. The dimension of signal bispectra eigenvector is the same as the number of integral path bars used by SIB. In order to preserve relatively more bispectra feature of the signal, a relatively large number of integral path bars need to be selected. This leads to a relatively high dimension of the signal bispectra eigenvector. Therefore, if high-dimensional bispectra features are directly used for classification and identification, the problem of "dimensionality disaster" will occur, which leads to the decrease of time-efficiency and recognition rate of some classifiers (such as KNN, SVM). Therefore, in order to improve the recognition rate, dimensionality reduction of high dimensional bispectra eigenvector must be processed.
In this experiment, the dimensions datasets were reduced by SOPDA method. 200 samples were taken from each phone, 100 samples were randomly selected as training samples, and the remaining 100 samples were tested. The improved SVM-KNN algorithm, the SVM algorithm and the original SVM-KNN algorithm are compared on the feature datasets. The number of nearest neighbors is 3. Fig. 4 shows the values of SIB after dimensionality reduction through SOPDA. As can be seen, , the feature sets of the four phones are different, which means that the four phones can be discriminated by Improved SVM-KNN classifier. When the kernel function is linear kernel and the threshold takes different values, the classification results of the three methods are shown in Table1 and  Table 2. From Table1 and Table 2, when the value of is 0.1~0.6, the recognition rate of SVM-KNN is higher than SVM. When the value of is 0.7~1.0, the recognition rate of SVM-KNN decreases gradually, and the recognition performance is slightly lower than that of SVM. The improved SVM-KNN recognition rate varies little with the value of . When , the recognition rate of up to 90.6%can be achieved.
In general, the improved SVM-KNN algorithm has better recognition performance than SVM and SVM-KNN algorithms. The calculation time of the three algorithms is equivalent. ② Three methods use the same threshold and different kernel functions for identification experiments.
From the experiment①, we can see that when the threshold is 0.6, the recognition rate is the highest. In this experiment, the threshold is 0.6. The kernel functions selected by the three algorithms is RBF kernel, and the experimental results are compared when the parameters of the kernels are five different values. It can be seen from Table 3 that regardless of the value of , the improved SVM-KNN algorithm has higher classification accuracy than SVM and SVM-KNN algorithms. When the value of gradually changes from 0.25 to 50, the recognition rate of SVM algorithm increases gradually, and SVM-KNN and improved SVM-KNN algorithm increase first and then decrease. When , the recognition rate of the SVM-KNN algorithm is higher than that of the SVM algorithm, but when , the SVM obtains better recognition results than the SVM-KNN. However, the SVM algorithm is still greatly affected by the kernel function parameters , and the highest and lowest recognition rates differ by 15.6%. The SVM-KNN and improved SVM-KNN algorithm recognition results are not sensitive to the selection of kernel function parameters . Therefore, the experimental results shows that the improved SVM-KNN algorithm has better recognition performance than SVM and SVM-KNN algorithm without increasing computation time, and is less affected by the selection of kernel function parameters. Fig. 5 shows the recognition rate with SOPDA and without SOPDA, which illustrates high-dimensional bispectra features will reduce the robustness of the classifier and affect the recognition performance. The calculation burden with SOPDA and without SOPDA are presented in fig. 6. As can been seen, classifying after dimensionality reduction shows better performance in calculation burden. Fig. 7 reveals the average recognition performance of the five methods for mobile phones under the SNR of 10∼30dB. The recognition rate increases as the SNR increases. The recognition rate of improved SVM-KNN is better than other methods under the same SNR. Even if the SNR reaches 30 dB, the average recognition rates of RFFE-InfoGAN (Gong J et al.2020), SVM, LSSVM fusion and SVM-KNN does not exceed 92%. row of the matrix represents the label distribution of the phone individual feature data set identified by improved SVM-KNN, and the column of the matrix represents the distribution of individual characteristic data sets corresponding to the phone label. As can be seen, the identification performance is biased towards mobile phone 2 when identifying phone 3, and towards phone 3 when identifying phone 4.

Conclusion
The choice of classifier is one of the key factors to realize radio transmitter identification. In this letter, we propose a radio transmitter identification method based on improved SVM-KNN in response to the shortcomings of a single classifier. The manifold geodesic distance is used to replace the Euclidean distance between the test sample and the various support vectors. The effectiveness of our method is verified by the data collected from 4 mobile phones of the same model. The experimental results show that under the condition of wide signal-tonoise ratio, the improved SVM-KNN algorithm has better recognition performance and robustness than existing recognition algorithms. Since the performance of the proposed method is satisfactory, we will study the application of this method in sensor networks in the future.