Quantile-Frequency Analysis and Deep Learning for Signal Classification

This paper proposes a new method for signal classification based on a combination of deep-learning (DL) image classifiers and recently introduced nonlinear spectral analysis technique called quantile-frequency analysis (QFA). The QFA method converts a one-dimensional signal into a two-dimensional representation of quantile periodograms (QPER) which represent the signal’s oscillatory behavior in the frequency domain at different quantiles. With a moving window, the QFA method can also covert a signal into a sequence of such two-dimensional representations, called short-time quantile periodograms, that are localized in time to represent the signal’s time-dependent or nonstationary properties. The DL image classifiers take these representations as input for signal classification. The benefit of this QFA-DL classification method in comparison with the traditional frequency-domain method based on the power spectrum and spectrogram is demonstrated by a numerical experiment using real-world ultrasound signals from a nondestructive evaluation application.


Introduction
Deeply rooted in physics and Fourier analysis, many time series data in engineering applications can be characterized by their spectral properties in the frequency domain [1,2]. Spectral features derived from the power spectrum, also known as the periodogram, have been used for signal classification in these applications. One such example is nondestructive evaluation (NDE) for composite inspection [3][4][5][6]. In this application, the time series data comprise echoes reflected from composites when probed by ultrasound waves. Spectral signatures of these signals are deemed to contain information about structural defects in the material. Training a classifier on spectral features of defective and non-defective samples offers a way to automatic detection of structural defects. This paper introduces a new spectral analysis method called quantile-frequency analysis (QFA) for signal characterization and classification. The QFA method, developed B Ta-Hsin Li thl@us.ibm.com 1 IBM T. J. Watson Research Center, Yorktown Heights, NY 10598-0218, USA recently in the statistical literature [7][8][9], is a nonlinear technique that explores the dynamics of time series data beyond their second-order statistical properties used by traditional spectral analysis methods. Instead of second or higher-order moments [10][11][12][13], the QFA method examines spectral properties of a signal at different quantiles, and represents these properties as a two-dimensional function of frequency and quantile level. At a fixed quantile level, the cross section of this two-dimensional function reduces to a one-dimensional function of frequency called quantile periodogram (QPER). Unlike the traditional periodogram or power spectrum (PER), the quantile periodogram represents the signal's oscillatory behavior around a quantile rather than the mean of this signal. By varying the quantile level up and down to cover a range of quantiles, the QFA method using the resulting quantile periodograms is able to offer a richer view of the signal's dynamics compared to the traditional spectral analysis methods using the power spectrum.
The QFA method can also be applied to smaller segments of a signal extracted by a moving window. This short-time QFA or STQFA method creates a sequence of short-time quantile periodograms (STQPER) suitable for signals that are better characterized by time-dependent or nonstationary properties. It is analogous to the traditional way of ana-lyzing nonstationary signals using short-time power spectra (STPER), also known as spectrogram, where spectral features are localized in time.
To take advantage of the enhanced capabilities of QFA and STQFA for signal classification, we propose to apply deep-learning (DL) image classifiers to the two-dimensional functions produced by QFA or the sequences of such functions by STQFA. This paper presents the results of an experiment with this method on a set of ultrasound NDE data.
Specifically, our experiment considers the multilayer perceptron (MLP) classifier and the convolutional neural network (CNN) classifier [14]. These classifiers are designed to take the two-dimensional functions from QFA as input and classify a set of ultrasonic signals with or without structural defects [3]. The performances of these classifiers are evaluated using a cross-validation scheme and compared with the performances of similar classifiers using power spectra as input instead of quantile periodograms. A recent study in [15] employs a combination of simple linear or quadratic classifiers with functional principal components of the QFA representation. The results in this paper show that a higher classification accuracy can be achieved by the proposed QFA-DL method compared to the contenders in [15]. Applying the CNN classifiers to the sequences of STQPER is shown to be more effective than applying similar classifiers to the sequences of STPER (or spectrogram). Table 1 summarizes the classifiers and the inputs considered in our experiment. Figure 1 shows the data analysis process.
The QFA-DL method was tested on a different data set in [16] with limited experimentation. The present paper provides a more comprehensive study of the method. Instead of just presenting the most favorable results, this paper also explores the sensitivity and computational complexity of the DL classifiers. Our experiment includes a variety of configurations and learning parameters. The resulting classification accuracies are presented together with the computer times entailed in training these classifiers. We believe such information should be disclosed to benefit the practitioner in making trade-offs between accuracy gains and effort costs in practice.
Due to recent successes in computer vision and natural language processing, DL methods, together with other machine-learning (ML) methods, have been proposed for signal classification in NDE applications [17]. These methods include the deployment of DL and ML classifiers in the frequency domain using spectral features as well as in the time domain using time series data [18][19][20][21]. In this paper, we focus on the frequency-domain approach which enables us to compare the QFA representations with the power-spectrum-based alternatives in the same domain.
The remainder of this paper is organized as follows. Section 2 reviews the QFA method and its statistical properties. Section 3 discusses the dataset and the experiment of the QFA-DL classifiers. Section 4 contains the numerical results of the experiment. Concluding remarks are provided in Sect. 5.

Quantile-Frequency Analysis
The traditional power spectrum, or periodogram (PER), of a time series is the Fourier transform of the sample autocovariance function of the time series; it is used in many applications, but can be inadequate for characterizing signals with nonlinear properties because of its exclusive reliance on the second-order statistics [8,10]. Alternative spectral analysis methods have been developed to compensate for this deficiency, including the polyspectra derived from higher-order moments [10][11][12][13]. The recently developed quantile-frequency analysis (QFA) method [7][8][9] represents another effort aimed at enriching the capability of spectral analysis. Inspired by the power of quantile regression [22] and in recognition of the fact that the power spectrum can be reconstructed from least-squares regression with a trigonometric regressor, the basic idea of QFA is to replace the least-squares solution in this reconstruction procedure by a quantile regression solution obtained with a range of quantile levels.
In typical applications, the quantile periodograms are evaluated at the so-called Fourier frequencies where ω takes the form 2π k/n with k being an integer. At these frequencies, the traditional power spectrum can be reconstructed by the same procedure (1)-(3) when ρ α (y) in (1) is replaced by the squared loss 1 2 y 2 to form a least-squares regression. In the special case of α = 0.5, the quantile regression problem (1) becomes a least-absolute-deviations (LAD) problem, and the resulting quantile periodogram is called Laplace periodogram [7]. The LAD method in general is regarded as a robust alternative to the least-squares method [23][24][25]. By varying the parameter α between 0 and 1 in (1), one can steer the quantile regression solution to focus on any quantile in the entire range of the signal and produce the best trigonometric fit that captures its oscillatory property around this quantile. The quantile periodogram in (3) tends to exhibit a smoother sample path with respect to ω, which is desirable for locating spectral peaks [24]; the quantile periodogram in (2) can be easily generalized to construct biquantile cross-periodograms for single time series and quantile cross-periodograms for multiple time series [9,26].
Besides representing the strength of a signal's trigonometric components at different quantiles across frequencies, the quantile periodograms also possess important statistical properties when the time series is viewed as a sample of a generative random process.
More precisely, let {Y t } be a strictly stationary random process with a continuous marginal distribution function F(y) := Pr{Y t ≤ y} and bivariate distribution functions Then, under suitable conditions [7][8][9] and with λ α := F −1 (α) denoting the α-quantile of {Y t }, the quantile periodograms defined by (2) and (3) for fixed ω and α have an asymptotic exponential distribution as n → ∞ and the mean of this exponential distribution takes the form where (3), and where f (ω, α) equals the normalized power spectrum of the binary process {I (Y t ≤ λ α )} regardless how Q n (ω, α) is defined.
The function q(ω, α) in (4), called the quantile spectrum, is the counterpart of the traditional power spectral density function of a second-order stationary random process with variance σ 2 and autocorrelation sequence {r τ }, which takes the form In (4), the quantity η 2 (α) contains information about the marginal distribution F(·) and the quantity f (ω, α) is a spectral representation of the diagonal bivariate distribution functions F τ (λ α , λ α ) satisfying For fixed τ , the function C τ (α, α ) := F τ (λ α , λ α ) is known as a bivariate copula. According to Sklar's theorem [27], this function, and hence f (ω, α), does not depend on the marginal distribution of {Y t }.
In some applications, spectral properties of the signal evolve over time. A traditional tool for analyzing such time-dependent or nonstationary properties is called the spectrogram. It comprises a sequence of short-time power spectra or periodograms (STPER) computed from the Fourier transform of signal segments carved out by a moving window. The same moving-window technique can be used to produce a sequence of short-time quantile periodograms (STQPER).
Specifically, let the s-th segment obtained from a moving window {w 1 , . . . , w u } of length u < n be denoted by is the amount of overlap between two segments. Then, by applying (1) and (2) or (3) to these segments, we produce a sequence of . . . , m). Analyzing the sequence of STQPER in general constitutes what we call the short-time QFA or STQFA method. Signal classification in particular can be accomplished by training a classification algorithm developed for image sequences using the sequences of STQPER as input.

Data and Experiment
The objective of our experiment is two fold. First, we would like to assess the benefit of the quantile periodograms over the traditional power spectra as spectral features when coupled with DL classifiers for a classification task in the NDE application. Secondly, we would like to investigate the sensitivity of the DL classifiers to configurations and learning parameters when leveraged to maximize the benefit of these spectral features for classification.
Our experiment is based on a set of ultrasound signals available at https://www.math.umd.edu/~bnk/DATA/. This data set was produced by the NASA Langley Research Center for an NDE study on the structural integrity of aircraft panels made of adhesively bonded aluminum layers [3]. It comprises 708 pulse-echo ultrasound signals of length n = 258 acquired by a broadband transducer with a center frequency of 3.5 MHz and sampled at 25 MHz.
Of these signals, 324 are labeled "bond" and 384 "disbond". According to the data description 1 , the "bond" signals are from fabricated aluminum lap joints and an actual Boeing 737 aircraft panel. The top or outer layer for all lap joint specimens is 1 mm (or 40 mils); the second aluminum layer bonded to the top layer is different in thickness for different specimens, ranging from 0.5 mm (20 mils) to 2.29 mm (90 mils); the thickness of the epoxy layer bonding the specimens is also different for different specimens, ranging from 0.1 mm to 0.4 mm. The aircraft panel also has an outer aluminum layer thickness of 1 mm, and the "bond" signals are acquired at bonded locations containing two layers of aluminum bonded together with a very thin layer of epoxy, not lap joint locations, but doubler joint locations. The "disbond" signals from fabricated specimens are acquired at locations with a single layer of aluminum 1 mm or 40 mils thick as well as at material thinned locations with thickness between 36 and 40 mils. The "disbond" signals from the aircraft panel are acquired at locations corresponding to a single layer of aluminum with paint on top and a corrosion resistant coating on the inside, ranging from 0.2 mm to 0.3 mm in thickness. These signals have considerable intra-class variabilities due to various factors, including the thicknesses of bond layers, epoxy, paint, and coating, and the amount of material loss.
The superiority of the quantile periodograms over the power spectra for classifying these signals has been demonstrated in a recent publication [15]. It is based on a method which couples QFA with functional principal component analysis (FPCA) to reduce the dimension of features before simple linear and quadratic classifiers are used for classification. The experiment in this paper further expands the previous study by investigating whether or not the superiority remains when more sophisticated and powerful classifiers such as MLP and CNN are applied directly to quantile periodograms and power spectra without a separate preprocessing step for dimension reduction. The MLP classifier, illustrated by Fig. 2, takes a flattened QPER array as input in a straightforward way and produces a binary class label as output. We used a two-layer MLP with the sigmoid function as activation of the output nodes, and treated the numbers of hidden nodes, H 1 and H 2 , and their activation function, σ MLP , as experimental parameters for performance comparison.
The CNN classifier prepends the MLP classifier with a convolution (or filtering-pooling) network, illustrated in Fig. 3, which is optimized jointly for feature learning. This classifier takes the QPER array as an image and renders it into a feature vector to feed the MLP network for classification. We used a two-layer convolution network with 2 × 2 windows for maximum pooling and 5 × 5 filters for feature map extraction, and treated the numbers of filters, F 1 and F 2 , and their activation function, σ FIL , as experimental parameters together with the experimental parameters for the MLP layers.
For each signal in the data set, an 128 × 45 QPER array was created according to (3) with frequencies 2π k/258 (k = 1, . . . , 128) and quantile levels 0.06, 0.08, . . . , 0.94. The entire set of 708 signals was divided randomly into five (approximately) equal-sized subsets to facilitate a five-fold cross-validation (CV) scheme for the assessment of out-ofsample classification accuracy. These QPER arrays, with class label and CV fold index, are available at https://github. For each fold of the five-fold CV scheme, the MLP and CNN classifiers were trained by a Keras-TensorFlow procedure with an 80-20% training-validation split, and the percentage of correct classification was computed on the remaining data not involved in training. The average of these percentages over the folds were used as the final measure of out-of-sample test accuracy for performance comparison. The classifiers were learned by the Adam optimizer in TensorFlow with batch size 50; the Glorot Normal method [28] was used to initialize all weights; an early stop rule was enabled with the patience parameter set to 2, the factor for learning rate reduction 0.5, and the maximum number of epochs 500. A Python code of this procedure in Jupyter Notebook is available at https://github.com/IBM/qfa.
For comparison, we also trained and tested the MLP and CNN classifiers that take the power spectra (PER) of the NDE signals as input using the same experimental settings and learning parameters. While applying MLP to the PER is straightforward, applying CNN to the PER requires the use of one-dimensional instead of two-dimensional filters and pooling windows.
Furthermore, we considered an application of the shorttime QFA or STQFA method to the NDE data in recognition that a better performance may be achieved by accommodating the possibility of time-dependence in the spectral features of these signals. In our experiment, each signal in the dataset Fig. 5 Sequence of short-time quantile perodograms or STQPER for the bond signal (top) and the disbond signal (bottom) shown in Fig. 4 was converted into a sequence of 29 STQPER arrays with a moving Hanning window of length u = 32 and 25% overlap, and each STQPER array was a 15 × 45 matrix produced with frequencies 2π k/32 (k = 1, . . . , 15) and quantile levels 0.06, 0.08, . . . , 0.94. These sequences of STQPER are available at https://github.com/IBM/qfa. As an example, Fig. 5 depicts the sequences of STQPER for the signals shown in Fig. 4.
We trained two types of CNN classifiers using the sequences of STQPER as input. The first classifier, CNN-2D, treats the sequences of STQPER as 15 × 45 images with 29 channels and employs 2D filters for each channel. The second classifier, CNN-3D, treats the sequences of STQPER  For comparison, we also trained and tested the CNN classifier (with 2D filters) using the two-dimensional spectrograms (or sequences of STPER) of the NDE data as images. These 15 × 29 spectrograms were produced with the same moving window as the STQPER. Figure 6 depicts the spectrograms of the signals shown in Fig. 4.

Results
We considered various settings of the experimental parameters for the MLP classifier, including two choices of activation function: σ MLP = relu or tanh, and 36 choices for the numbers of hidden nodes (H 1 , H 2 ) Figure 7 presents the test accuracies of the MLP classifier using quantile periodograms (QPER) and the MLP classifier using ordinary periodograms (PER). For each fixed combination of activation function, learning rate, and input type, the accuracy scores are ranked from the highest (rank 1) to the lowest (rank 36) across the 36 choices of architectural parameters (H 1 , H 2 ). Figure 7 depicts the accuracy scores against their ranks.
These results show that a higher accuracy of classification can be achieved by the MLP classifier when quantile periodograms are used as input instead of power spectra. The results also show that proper tuning of the MLP classifier is essential to realize the benefit of quantile periodograms, because poor choices of architectural parameters or learning rate can lead to inferior outcomes. Moreover, for most of the experimental settings, the MLP classifier with quantile periodograms as input outperforms the QFA-FPCA method in [15], the best accuracy scores of which lie within the range of 95.6%−97.2%, depending on the classifier and the number of principal components. Having confirmed the superiority of quantile periodo grams over power spectra for classifying the NDE signals, let us examine the impact of activation function, architectural parameters, and learning rate on the MLP classifier using quantile periodograms. Toward that end, consider the top five and bottom five MLP classifiers found in our experiment. Table 2 contains the test accuracy scores and architectural parameters of these classifiers together with the average number of epochs (# Epochs) for training them over the CV folds and the total computer time in minutes (Time).
For both activation functions, the higher learning rate produced better top-five accuracy scores but much poorer bottom-five scores compared to the lower learning rate. The higher learning rate tends to work better for smaller networks than larger ones, whereas the lower learning rate shows greater robustness to the choice of architectural parameters. When coupled with the higher learning rate, the tahn activation function produced two highest accuracy scores, 98.73% and 98.45%, among all MLP classifiers in our experiment, but it also yielded lower accuracies compared to relu when the lower learning rate was used instead.
For the CNN classifier, our experimental parameter settings included two choices of activation function in the MLP layers: σ MLP = relu or tanh, 10 choices for the numbers of hidden nodes (H 1 , H 2 ) taking values in {512, 1024, 2048, 4096} with the constraint H 1 ≥ H 2 , two choices of activation function in the convolution layers: σ FIL = relu or linear, and 10 choices for the numbers of filters (F 1 , F 2 ) taking values in {32, 64, 128, 256} with the constraint F 1 ≥ F 2 . Therefore, for each combination of activation functions, there were 100 sets of architectural parameters from different choices of (F 1 , F 2 ) and (H 1 , H 2 ). We also considered two learning rates: 0.0001 or 0.00001. Figure 8 shows the test accuracies of the CNN classifier using the quantile periodograms (QPER) and the CNN classifier using the ordinary periodograms (PER). Each curve depicts the test accuracy scores against their ranks among the 100 choices of (F 1 , F 2 ) and (H 1 , H 2 ) for a fixed combination of activation function σ MLP , learning rate, and input type.
These results show that while the added feature-learning capability of the CNN classifier was able to boost the best accuracy scores for the power spectra, these scores remained Fig. 8 Test accuracy of the CNN classifier using quantile periodograms (QPER) and ordinary periodograms (PER) with activation function σ FIL = relu for each combination of activation function σ MLP (relu or tanh), learning rate (0.0001 or 0.00001), and input type (QPER or PER). Results are sorted and ranked with respect to 100 choices of architectural parameters Fig. 9 Test accuracy of the CNN classifier using quantile periodograms with learning rate 0.0001 and different combinations of activation functions (σ FIL , σ MLP ). Results are sorted and ranked as in Fig. 8 inferior to the best accuracy scores achieved by the quantile periodograms. However, to realize the benefit of quantile periodograms, a proper choice of activation functions and learning rate was essential for the CNN classifier as it was for the MLP classifier. Notably, the CNN classifier exhibited greater sensitivity to the choice of learning rate than the MLP classifier regardless of input type. Figure 9 further compares the impacts of different choices of activation functions σ FIL and σ MLP on the CNN classifier using quantile periodograms. Table 3 contains more details of the top five and bottom five CNN classifiers for each combination of activation functions.
As can be seen, the highest accuracy score of 98.59% was achieved with (σ FIL , σ MLP ) = (relu,relu), but the choice of (σ FIL , σ MLP ) = (linear,relu) showed greater robustness to the architectural parameters and generally required shorter computer times. Both choices yielded much better bottom five scores than the other combinations of activation functions. The best scores of the CNN classifier in Table 3 are comparable to the best scores of the MLP classifier in Table 2 except the case of (σ FIL , σ MLP ) = (linear, tanh). So, using quantile periodograms, the added feature-learning capability of CNN did not offer substantial improvements over MLP, but required much longer computer times.
Furthermore, let us consider the results in Fig. 10 obtained from STQPER using the CNN classifier with different choices of architectural parameters. The corresponding results from spectrograms or STPER are also shown in Fig. 10 for comparison. These results were obtained with (σ FIL , σ MLP ) = (linear,relu), learning rate 0.0001, and the same 100 architectural parameters as used previously. Table 4 contains some details of the top five and bottom five classifiers using STQPER. Clearly, the CNN-3D classifiers outperform the CNN-2D classifiers by offering consistently higher accuracy scores for the same ranks across the choices of architectural parameters. This is achieved, of course, with a higher computational cost. In comparison with the CNN classifiers using spectrograms or STPER, the classifiers using STQPER not only provide consistently higher accuracy for the same ranks but also lower sensitivity to the choices of architectural parameters.
Finally, we remark that choosing the right architectural and learning parameters for DL classifiers such as MLP and CNN is known to be crucial but difficult; it often requires a lot of experience combined with trial and error experimentations. Table 5 illustrates this difficulty in our experiment with the MLP and CNN classifiers using quantile periodograms as input. By treating each choice of architectural parameters as a trial, Table 5 shows the percentage of such trials for the MLP and CNN classifiers to achieve the test accuracy in different brackets above the 97.20 benchmark given by the non-DL classifiers in [15]. The percentages were calculated over 36 choices of architectural parameters for MLP and 100 choices for CNN. All classifiers were trained with learning rate 0.0001. With the exception of CNN using QPER with (σ FIL , σ MLP ) = (linear,tanh), a majority of trials for each classifier in Table 5 was able to produce an accuracy score higher than the benchmark; this majority is as high as 85.0% for the CNN classifier using QPER with (σ FIL , σ MLP ) =  (linear,relu) and 92.0% for the CNN-3D classifier using STQPER with the same activation functions. On the other hand, merely 4.0−5.6% of trials for three out of six classifiers reached the top bracket with accuracy greater than or equal to 98.45. These results suggest that the chance for an MLP or CNN classifier to outperform a non-DL classifier should be fairly high but the chance of finding the best architectural parameters can be very slim. The small chance of finding the best architectural parameters also applies to the classifiers using PER and STPER (see supplementary material). While the focus of this paper is on the frequency-domain approach using spectral features, the interested reader is referred to the supplementary material for comparisons with a time-domain method which applies the MLP and CNN classifiers to the raw time series data instead of their power spectra.

Concluding Remarks
We have introduced the quantile periodogram and QFA as a method of spectral analysis and demonstrated its superiority over the traditional power spectrum and spectrogram as features for signal classification using DL classifiers.Our experiment with the real-world NDE signals provides another piece of evidence in support of the assertion that quantile periodgorams offer a richer view of time series data than power spectra. Our experiment confirms the ability of DL classifiers to take advantage of QFA in delivering more accuracy classification results than some conventional techniques; it also highlights the potential difficulty in choosing the right architectural and learning parameters to fully realize this potential. Overcoming such deficiency to realize the full potential of the method remains an active research area.