Automatic modulation recognition of radar signals based on histogram of oriented gradient via improved principal component analysis

Automatic modulation recognition (AMR) of radar signals plays a critical role in electronic reconnaissance. Current AMR algorithms are mainly based on convolutional neural networks (CNN), which can learn the feature hierarchy by establishing high-level features from low-level features. However, for time–frequency analysis-based methods, distinct low-level features in the time–frequency spectrum can already reflect modulation characteristics. Thus, this study develops a novel approach based on low-level shape descriptors via histograms of oriented gradients (HOG) and support vector machine (SVM). Comparison studies with classic CNN-based methods have also been done to reveal the superiority of the designed approach. Experimental results demonstrate that the HOG-SVM approach has a more efficient performance. To further enhance the classification precision under low signal-to-noise ratios, an improved principal component analysis denoising algorithm is developed to improve signal quality under intense noise background. Experiments based on simulated and measured signals demonstrate that the proposed algorithm can accurately distinguish signals under intense noise environments.


Introduction
Automatic modulation recognition (AMR) technique can automatically recognize the waveform of radiation sources without prior knowledge [1]. Such an instrument can assess the threat level of radiation sources, which is indispensable in autonomous driving, electronics support measures (ESM), and electronic countermeasures (ECM). Thus, the development of AMR has attracted considerable attention from an increasing number of researchers. Previous research methods are mainly based on feature extraction and classifiers. The handcrafted features with inter-class correlation are sent to the classifier to recognize various radar signals [2][3][4][5]. However, the selection of handcrafted features dramatically depends on the experience of researchers. Moreover, these features have to be re-selected when the system needs to recognize new waveforms.
Inspired by the structure and functions of the human brain, deep learning applies artificial neural networks for analysis and learning. Owing to its talent in automating the featureextracting process, deep learning has pushed the performance of speech identification [6], language understanding [7], and computer vision [8] to soaring heights. As an essential branch of deep learning, deep convolutional neural networks have evidenced an outstanding capacity to identify large-scale image datasets [9]. Some research has achieved excellent results by combining convolutional neural networks (CNN) to recognize signal waveforms.
Literature [10] designed a multi-level CNN based on different maps of the signal to recognize modulated signals. In literature [11], minimum shift keying (MSK), frequencyshift keying (FSK), 4FSK, frequency modulation (FM) signals, binary phase-shift keying (BPSK), quadrature phaseshift keying (QPSK), and amplitude modulation (AM) were classified with an accuracy of 95% at the SNR of 2 dB. CNN, which adopted denoising cyclic spectrum as the identification basis, contributes to this performance. In literature [12], variant CNN was adopted to extract features from the autocorrelation spectrum. Benefitting from the powerful ability of CNN, this recognition system achieved nearly 100% accuracy for sinusoidal frequency modulation (SFM), linear frequency modulation (LFM), FSK, BPSK, QPSK, and no extra-modulated (NS) signals, when the SNR is over − 2 dB. LeNet-5 [13], one of the classic networks, is utilized to recognize LFM, Costas, BPSK, Frank code, and T1-T4 signals in literature [14]. Combining time-frequency analysis and CNN, this approach reached an overall precision of 93% at the SNR of − 2 dB. Literature [15] improved the LeNet-5 model according to another noted Alexnet [16] model. Richer convolution thickness and the introduction of "relu" activation function contributed to better performance under low SNRs. Besides, literature [17][18][19] has achieved satisfactory results by applying CNN-based methods.
It seems that CNN has dominated the field of signal recognition. CNN can integrate high-level features from lowlevel features, such as shape, texture, and edge. Owing to the richer semantic information of high-level features, CNN achieves outstanding performance on large-scale complex image classification tasks. However, for radar signal waveform recognition, especially the classification methods based on time-frequency analysis, the energy distribution of different modulation signals in the time-frequency spectrum (TFS) is significantly different. Therefore, it is essential to compare CNN-based methods with methods based on low-level features and discuss the pros and cons of each technique.
So far, there is no research on the modulation classification of radar signals based on low-level features. Hence, this research developed a novel recognition method utilizing the shape descriptor of the signal TFS. Firstly, the gradient of the time-frequency image is calculated. Then, a histogram is generated according to the gradient direction. Next, histograms of oriented gradient (HOG) of each area are stitched together to form a descriptor. Finally, a support vector machine (SVM) is trained to realize signal waveform recognition. The proposed HOG-based method can accurately and effectively extract shape features from time-frequency images. Additionally, we designed a comparative study to compare the designed HOG-SVM and classic CNN methods from the accuracy, calculation time, and parameter amount of each model. The results show that, compared with methods utilizing classic CNN, the newly proposed method has the same excellent recognition accu-racy, while it leads CNNs in terms of calculation time and the number of required parameters.
In order to further improve the classification results under low signal-to-noise ratios (SNRs), improved principal component analysis (IPCA) has been designed in this paper. Compared with the traditional PCA method, IPCA can automatically separate the signal and noise components according to the power of the eigenvalue difference spectrum. The quality of original signals is significantly improved via IPCA, enhancing the recognition results in intense noise environments.
The contributions of the paper are: (1) A novel radar signal AMR method is designed to replace the traditional CNN-based approach. (2) The modulation recognition performance of CNN and HOG-SVM is compared. The short time-consuming HOG-SVM method with a small number of parameters facilitates the hardware implementation. (3) We develop an IPCA denoising method for adaptive noise reduction, which improves cognitive performance under low SNRs.

Recognition system overview
The signal intercepted by the receiver is contaminated by noise, whose model can be written as Here, r (t) represents the intercepted signal, o(t) denotes the original modulated signal, and n(t) means channel noise, which is supposed to be additive white Gaussian noise. Eight types of radar signals are NS, LFM, SFM, EQFM, FSK, 4FSK, BPSK, and Frank code signals. Their detailed models can be seen in the related works [12,14,15].
The received radio frequency radar signal needs to be downsampled to obtain the intermediate-frequency (IF) signal. Afterward, modulation recognition can be recognized in five steps. First, signal quality is improved via the IPCA denoising algorithm. Second, the time-frequency image is obtained via smooth pseudo-Wigner-Ville distribution (SPWVD) transformation. Third, binarization, time gating, and frequency filtering are adopted further to enhance the difference between object and background in time-frequency images. Next, HOG descriptors are extracted as the classification basis. Finally, the modulation type of the signal can be correctly identified with high precision after completing the training process of SVM. The system flowchart is given in Fig. 1.

Principal components analysis
PCA, a classic data analysis technique, maps highdimensional data to low-dimensional space via linear projection based on the minimum mean square error principle [20]. PCA has been widely applied in data dimensionality reduction and denoising [21].
Assuming that the discrete signal Here, s N is the useful signal and f N is the noise signal. The Hankel matrix H of the overlapping signal can be obtained by Hankel transformation [22], which can also be written as where S and F are the Hankel matrix of s N and f N , respectively.
Since the variance σ 2 F of AWGN is constant and there is no statistical correlation between noise signal and useful signal, the covariance matrix R of H can be figured out as Here, I is the identity matrix, V S V T is the eigenvalue decomposition of SS T , and are eigenvalues of H H T . can also be written as the sum of S +σ 2 F I as follows where S = diag[λ 2 1 λ 2 2 · · · λ 2 r 0 · · · 0] are nonzero eigenvalues of SS T . These nonzero eigenvalues represent the valuable signal components with primary energy. Therefore, noise suppression can be realized by reconstructing the signal within the subspace composed of the principal functional members and abandoning the worthless noise subspace. The size of the signal subspace is critical to the quality of the reconstructed signal. The too-small size will cause the loss of useful signal information. In reverse, excessive size results in incomplete noise removal. Thus, an appropriate algorithm to precisely measure the signal subspace dimension is vital to final noise reduction.
The traditional PCA algorithm [23] sets a threshold for the ratio of each eigenvalue to the total power to separate the eigenvalues into signal and noise subspaces. However, the pre-defined threshold cannot satisfy different signals in all situations. Especially under low SNRs, this method would be invalid. Akaike information criterion (AIC) and Bayesian information criterion (BIC) have been employed to adaptively search for the dimension of the signal subspace in literature [24,25]. However, it is time-consuming for these parametric approaches owing to solving all the subspace dimension possibilities. Hence, a concise method that can adaptively divide subspace is urgent to be proposed.

Improved principal components analysis
Considering the shortcomings of existing methods, a novel approach based on the power of the eigenvalues difference spectrum is developed in this paper to extract the amounts of primary signal components adaptively. The eigenvalues difference spectrum denotes the sequence consisting of differences between neighboring eigenvalues, which can entirely reveal the changing trend of eigenvalues. Especially in the background of strong noise, much noise makes the overall eigenvalue amplitude spectrum sharply increase, while this increasing trend will be smoothed in the difference spectrum. According to (4), the difference spectrum can be written as Theoretically, the difference spectrum value of the noise subspace is nearly zero. The signal subspace mainly determines the difference spectrum energy. Thus, the separation of signal and noise can be achieved by screening the eigenvalue segment of the primary energy contribution. This paper performs threshold filtering on the difference spectrum based on prior information. Specific steps are as follows: (1) Generate 200 groups of noise-free signals for each modulation type.
(2) Calculate the ideal signal subspace for these noise-free signals. (3) Overlap the noise-free signal with the noise of different powers (SNR ranges from − 4 to 4 dB at the interval of 1 dB). (4) According to the ideal signal subspace, compute the power P s of the signal subspace and the power P N of the noise subspace in the difference spectrum. (5) Calculate the power ratio of each signal as P s /(P N + P s ). (6) The

Histogram of oriented gradient
HOG descriptors, one of the most successful human detection algorithms, have been widely used in image analysis and machine vision [26,27]. HOG has talent in defining texture and shape owing to utilizing the gradient distribution of the local image to characterize edge information [28][29][30]. The cell and the block are two computation units in the HOG feature calculation. First, the image is separated into small units (cells). Second, histograms of each cell in larger regions (blocks) are gathered. Finally, the histograms in all blocks are merged to form the shape descriptor. Figure 2 describes the process of HOG feature extraction. It can be realized in three main steps.
Step 1: Gradient calculation. In order to get the gradient histogram, the horizontal and vertical gradients need to be calculated. Assume that the pixel lies in coordinate (x, y), and the grayscale value of the pixel is represented as g(x, y). The gradients of horizontal and vertical, expressed by Gx(x, y) and Gy(x, y), respectively, can be calculated as follow Gy(x, y) = g(x, y + 1) − g(x, y − 1) Then, the gradient magnitude and the gradient direction can be described as D(x, y) = arctan Gy(x, y) Gx(x, y) Step 2: Direction Vote. The unsigned (or signed) orientations are quantized into K orientation bins. Then, the orientation bins are determined based on the gradient direction of the pixel. Finally, the gradient value corresponding to this pixel is voted proportionally to the corresponding bin.  Fig. 3 The processes of data enhancement (SFM is selected at -4 dB as an example) For example, the pixel located at coordinate (x, y) has the gradient magnitude M(x, y) and the gradient direction D(x, y); direction vote can be denoted as (11) where bin(i) ≤ D(x, y) < bin(i + 1) are two neighboring bins, f bin(i) represents the value weighty voted to orientation bin(i) by the pixel (x, y), L bin is the angle range size of each orientation bin (for unsigned bins L bin = 180 • /K and for signed L bin = 360 • /K ).
The gradient values of all pixels in a cell are voted into bins corresponding to their respective directions. The K -number gradient histogram descriptor is formed in this way.
Step 3: Combination histogram. The histograms of all cells in the block are merged into a vector. The feature descriptors of the entire image are spliced by the vectors corresponding to all blocks. Assume that the image is divided into a × a cells, each block has b × b cells, and there is no overlap between blocks, the number of blocks is ((a − b) . Thus, the length of the final feature descriptor of the entire image is ((a − b)

Data enhancement
To further improve the representation ability of the descriptor, data enhancement processing is performed on the TFS grayscale image. This processing mainly includes four steps. Firstly, the Otsu method [31] is adopted to binarize the grayscale image. Secondly, in order to eliminate the negative impact of processing noise generated in the SPWVD kernel, the small connected regions (less than 10% of the largest connected region) will be cleared (set the pixel value to zero). Thirdly, time gating and frequency filtering are performed to remove the areas without signal energy distribution [14]. Finally, the image is resized to the proper size whose aspect ratio is normalized utilizing bilinear interpolation [15]. Figure 3 shows the processes of data enhancement. Through data enhancement, the noise in the image is further removed, and the shape distribution of the signal components is converted from the original local characteristics to the global characteristics, which will improve the representation ability of the descriptor and promote recognition performance.

Support vector machines
The margin-based classifier SVM is an outstanding machine learning classifier in speed and accuracy [32,33]. SVM maps data to high-dimensional space through a kernel function. Then, the superior sorting hyperplane is created to separate the data with interclass separability.

Comparison with CNN
In order to compare the designed HOG-SVM method, we created several CNN models based on the classic network structure, including LeNet-5 [13,14], AlexNet [15,16], and VggNet [9]. The network structures of these networks are presented in Table 1. The simulation analysis will be further described in the next section.

Results and analysis
In this section, we first introduce evaluation criteria and experiment datasets. SNR is an indicator of signal quality, which can be calculated as SNR = 10 log 10 σ 2 where σ 2 s and σ 2 n are variances of signal and noise, respectively.
The correlation coefficient is an evaluation criterion to assess the similarity between two waveforms (a and b) as below where the closer p is to 1, the more similar the waveforms are.
Precision, applied to describe recognition results, is determined as where TP i are the correctly recognized samples, FP i are the incorrectly recognized samples. N a is the total number of samples. Experiment datasets contain eight kinds of simulation radar signals and three kinds of measured signals. Model is trained by simulation data and verified by both simulation and measured data. The carrier frequency of all signals is distributed in (0.1~0.4)*fs. The bandwidth range of FM signals is in (0.1~0.4)*fs. FSK and 4FSK adopts the random code [5,7,11,13]. Barker code is applied in BPSK signals. The phase number of Frank code signals is from 4 to 6. The code width of FSK and 4FSK signals has a scope of (0.125 0.25)*N. The code width of BPSK signals has a scope of (1/64~1/32)*N. The code width of BPSK signals has a scope of (1/100~1/64)*N. Here, N = 1024 denote the length of the discrete signal, and fs is 400 MHz. 200 samples are simulated per 2 dB from − 6 to 8 dB for each type of signal as training data. Additionally, 100 samples are simulated per 2 dB from − 6 to 8 dB for each type of signal as validation data. Last, 100 samples are simulated per 2 dB from − 8 to 8 dB for each type of signal as testing data. There are 12,800 training samples, 6400 validation samples, and 7200 testing samples in total.

Performance of IPCA
This section reveals the denoising performance of the proposed IPCA. 200 Monte Carlo trials are carried out per 1 dB from − 6 to 0 dB for each type of signal. The correlation coefficient is evaluated as a function of the SNR. Table 2 illustrates the denoising performance of traditional PCA [23] and IPCA, respectively. It can be seen that the proposed IPCA significantly improves the signal quality, which is remarkably better than the traditional PCA algorithm. The traditional PCA can hardly significantly repair the signal time-domain waveform under low SNRs, while the quality of that processed via IPCA makes an outstanding improvement. IPCA

Recognition result analysis
Experiments on the validation set analyze the choice of block size, cell size, overlapping degree, and the number of orientation bins. Based on the best performance on the validation set, 8 × 8 pixel cells, 4 × 4 cell blocks, the 1/2 block overlapping, and 8 bins under signed-gradient are selected as final hyperparameters of the system. Figure 4 reveals the recognition performance of each radar signal. The recognition results of each signal are almost 100% once the SNR is over − 2 dB. As the SNR further decreases, the recognition precision of various signals will vividly decrease. When the SNR is − 8 dB, the classification system can also recognize NS, EQFM, FSK, 4FSK, and BPSK with more than 80% precision. This indicates that the developed approach is valid and robust.
It is demonstrated in Fig. 5 that the existing approaches [12,14,18] are strongly affected by noise and cannot robustly recognize signals in an intense noise environment. In comparison, the recognition accuracy of the designed method surpasses that of these approaches sufficiently under the low SNR. This shows that our approach has a better antinoise performance. Table 3  Overall precision(%) This paper [14] [18] [12] approach. Compared with the TFS-based method [14], the proposed method achieves better recognition performance at additional computational cost on ICPA. [12] and [18] spends less time on signal processing (autocorrelation spectrum, frequency spectrum, and power spectrum). However [18], developed three networks, which increased the calculation cost [12] applied a long short-term memory structure, which requires the output of the previous node to do the computation over the present node. Thus, this method has a low inference efficiency.

Effects of data processing
To evaluate the influences of the data processing, comparison to a system without IPCA and a system without data enhancement are conducted, respectively. It is vividly shown in Fig. 6 that the IPCA algorithm dramatically increases the classification precision. IPCA denoising approach can repair the original signal well and improve the time-frequency spectrum, creating a satisfactory performance. Besides, data Overall precision(%) HOG-SVM VggNet [9] Alexnet [15] LeNet-5 [14] Fig. 7 The overall results of HOG-SVM and CNNs enhancement further heightens recognition precision due to enhancing the contrast between envelope and background and unifying envelope distribution.

Comparison result with CNN
This part compares the proposed HOG-SVM method and the classic CNN-based methods under the same dataset (data via IPCA and data enhancement). For CNN-based methods, the optimizer utilized in experiments is SGD, and the batch size is 64. The learning rate is 0.001. A total of 100 epochs have been run. Network computation is carried out on an Intel i7 10,700 CPU. After each epoch is trained, the network performance is verified on the verification set. Finally, the weight with the best performance on the verification set is selected to identify the test data. As shown in Fig. 7, the proposed method has almost the same recognition accuracy as the VggNet, when the SNR is greater than or equal to − 6 dB. Additionally, the classification precision of the HOG-SVM method is higher than AlexNet, which is far beyond traditional LeNet-5. This demonstrates that the method proposed in this paper has the excellent cognitive ability, which can clearly describe the difference between different modulated signals. Besides, from Table 4, the classic CNN-based methods are more time-consuming than the HOG-based method. The proposed method can not only complete training faster but also recognize the signal faster. Finally, our approach does not need to store a large number of parameters. Due to the linear kernel function application, the system parameters are linear predictor coefficients of each binary classification, which can be calculated as Here, Num P means the parameters amount of the model, M is the number of binary classifications, and s represents the length of linear predictor coefficients. In our eightclassification experiments, M = 28 and s = 1152 which is the same as the length of the HOG feature. There are a total of 32,256 parameters in our model, which is much less than the Alexnet and VGGnet. Hence, the proposed method facilitates hardware implementation and storage on-chip.

Recognition performance based on measured data
To demonstrate the practicality of the developed method, the model trained by simulation data is verified by measured data. Figure 8 indicates the signals collection scenario. Signals are generated by the National Instruments signal source and radiated out through an antenna. Then, signals are received by another antenna and sampled in an Agilent oscilloscope. LFM, SFM, and NS signals are collected in this experiment, whose parameters are within the range in Table 3. 15 samples for each type of signal are collected under each SNR. Via changing the transmit power, signals with different noisy levels are obtained. And the SNR of measured radar signals can be written as SNR MEASURED = 10 log(σ s σ N ) where σ N is the power of environmental noise. σ s = σ r − σ N denotes the power of noise-free radar signal, σ r is the power of the collected signals. Figure 9 reveals the practicality of the developed method. The system can recognize each signal accurately, which is similar to the performance based on simulated data. Excellent practicality property further shows the superiority of the proposed approach (Fig. 8).

Conclusion
Precise AMR of radar signals is an essential ingredient of radar reconnaissance systems. In this paper, the IPCA, with excellent antinoise ability, is designed to restore the signal polluted by noise. Then, gradient descriptors are extracted from the TFS of signals. After finishing the training of the SVM classifier utilizing the training data, the precious classification of various signals is realized. This recognizing system can identify NS, LFM, SFM, EQFM, FSK, 4FSK, BPSK, and Frank code signals in an intense noise environment. Even if the SNR is -6 dB, the developed approach can still reach an overall precision of 97.37%. Compared with the classic CNN-based methods, the designed approach is faster and easy to implement in hardware without sacrificing accuracy. This study will positively affect ECM, ESM, and other aspects of modern electronic warfare.
Authors' contributions Kuiyu Chen contributed to software and writing; others performed reviewing and editing.
Funding This research was financially supported by National Natural Science Foundation of China (61971226, 61801220) and Natural Science Foundation of Jiangsu Province for Excellent Young Scholars (BK20200075).