Prediction of H-type Hypertension Based on Pulse Wave MFCC Features Using Mixed Attention Mechanism

H-type hypertension increases the risks of stroke and cardiovascular disease, posing a great threat to human health. Pulse diagnosis in traditional Chinese medicine (TCM) combined with deep learning can independently predict suspected H-type hypertension patients by analyzing their pulse physiological activities. However, the traditional time-domain feature extraction has a higher noise and baseline drift, affecting the classification accuracy. In this article, we propose an effective prediction on frequency-domain pulse wave features. First, we filter time-domain pulse waves via removal of high-frequency noises and baseline shift. Second, Hilbert–Huang Transform is explored to transform time-domain pulse wave into frequency-domain waveform characterized by Mel-frequency cepstral coefficients. Finally, an improved BiLSTM model, combined with mixed attention mechanism is built to be applied for prediction of H-type hypertension. With 337 clinical cases from the Longhua Hospital affiliated to Shanghai University of TCM and Hospital of Integrated Traditional Chinese and Western Medicine, the threefold cross-validation results show that sensitivity, specificity, accuracy, F1-score and AUC reaches 93.48%, 95.27%, 97.48%, 90.77% and 0.9676, respectively. In addition, we calculate the feature importance both in time-domain and frequency-domain according to purity of nodes in Random Forest and study the correlation between features and classification. The proposed model achieves better generalization performance than the classical traditional ûmodels, and has a good reference value for TCM clinical auxiliary diagnosis.


Introduction
Hypertension with plasma levels of homocysteine (HCY) greater than 10 μmol/L is defined as "H-type" hypertension [1]. According to a study, H-type hypertension accounts for about 75% of Chinese adult hypertension patients (91% in males and 60% in females) [2]. Chinese guidelines for the prevention and treatment of hypertension indicate that the intensity of stroke caused by elevated blood pressure in Chinese population is 1.5 times that of western population. A large number of studies have shown that elevated plasma HCY level is an independent risk factor for cardiovascular and cerebrovascular diseases, and the plasma HCY level is positively correlated with the risk of cardiovascular and cerebrovascular events. However, like hypertension, it has no clear demarcation value [3]. Up to now, detection of homocysteine content in blood is the only way to clinically diagnosis of H-type hypertension [4]. Therefore, it is of great significance to use a noninvasive and economical method for rapid detection of H-type hypertension. The pulse condition of traditional Chinese medicine (TCM) contains abundant physiological and pathological information of the human body, and is a window for transmitting and observing the changes of internal functions [5,6]. A disease can often be identified by pulse waves, so 1 3 pathological pulse has become an important basis for the diagnosis of diseases. To improve the accuracy of pathological diagnosis, lately multiple artificial intelligence approaches have also been applied in the field of pulse diagnosis of TCM.

Related Work
In recent years, different pulse wave acquisition instruments and varied methods of pulse wave characteristic analysis have been applied to TCM. The earliest pulse wave analysis methods were to extract feature points via signal processing, such as zero-crossing based on wavelet transform [7], amplitude threshold and sliding window positioning main peak [8], and time-domain differential period ratio [9] etc. However, the global information of pulse wave could not be captured due to heavy workload and low recognition rate. Luo et al. [10] applied AdaBoost on hypertension prediction based on time-domain pulse wave, and reached classification accuracy of 86.41%. Feng and Li [11] used the fuzzy C-means clustering to classify the characteristics of frequency-domain pulse wave. Zhang et al. [12] used random forest (RF) to reduce the feature dimension of pulse wave and applied SVM classification to improve the classification accuracy by 10%. However, the foregoing methods did not deal with the sequential characteristic of pulse wave. With the advancement of deep learning, convolutional neural network (CNN) has been widely used in image processing. Zhang [13] proposed a CNN extended dimension preprocessing, which adopted sample statistical features and Hilbert-Huang transform to extend the dimension and improve the speed of training. Liu and Zhou [14] extracted single-period and multiperiod features of pulse wave based on CNN and combined with frequency-domain features for classification with an accuracy of 93%. Yan et al. [15] transformed pulse wave into threshold free recursive graph for classification via VGG-16 network, and the accuracy reached 98.14%, which provided a new perspective for feature extraction of pulse wave. However, the previous stated approaches only classify the time-domain waveform without considering frequency-domain characteristics of pulse wave.
In this article, we propose an effective model of frequency-domain pulse wave classification using mixed attention mechanism on H-type hypertension. The filtered timedomain pulse wave is transformed into frequency-domain Mel-scale Cepstral Coefficients, and the mixed attention mechanism is applied to extract local and global relevant features of pulse wave. Experiments show that the proposed model excels in classification accuracy and generalization performance.

Filtering
In clinical pulse wave sampling, external interference, the collector's breathing, and slight body movements etc. will lead to the difference between the collected instances and the actual instances, which results in high-frequency noise and baseline drift [16][17][18]. Wavelet transform is usually used to reduce high-frequency noise. The orthogonality, direction selectivity and variable resolution in time and frequency domain of wavelet transform can effectively identify the catastrophe point of signal on the time axis and filter the high-frequency noise of pulse wave. Common methods to remove baseline drift of pulse wave include Wavelet transform (WT), Empirical mode decomposition (EMD) and Smooth Prior Approach (SPA) [19]. WT and EMD generally adopt multiple parameter adjustment. When the interference frequency is wide, it is difficult to set the filtering parameters. SPA only adjusts the frequency response by changing the smoothing parameters, which can effectively improve the filtering speed. The formula is as follows.
where p is the effective pulse wave signal, I is the unit matrix of observation matrix, D 2 is the second-order differential matrix, and y is the original pulse wave signal. Different filtering properties depend on different regularization parameters λ. The baseline drift frequency is 0.2-0.3 Hz, and the sampling frequency is 200 Hz. Therefore, under the sampling frequency, the cut-off frequency of λ = 2500 is 200 × 0.0025 Hz = 0.5 Hz, which can effectively remove baseline drift lower than 0.5 Hz in original pulse waveform. The signal to noise ratio (SNR) and root mean square error (RMSE) are considered as evaluation indicators of pulse wave denoising. The larger the SNR value, the smaller the RMSE value, and the better the pulse wave denoising effect where p � (n) represents the original signal, p(n) denotes the signal after removal of baseline drift, and N represents the length of original signal. (1)

MFCC Feature
Mel-scale Frequency Cepstral Coefficients (MFCC) are characterized by cepstral parameters extracted from Mel cepstral frequency domain [20][21][22]. The pulse wave of each frame is composed of a Mel cepstral coefficient vector, and the MFCC features of each frame are continuous in pulse wave timing and correlated. The physical meaning of MFCC features is a set of feature vectors obtained by encoding the physical information (spectrum envelope and details) of the signal, which represents the distribution of signal energy in different frequency ranges. Traditional MFCC feature extraction is to transform time-domain features into frequency-domain by Fourier transform. Fourier transform is a global transform, and the conversion effect excels in frequency-domain of stationary signals. However, the pulse wave is a non-stationary signal, and the local characteristics of the signal in frequency-domain cannot be described by frequency-domain of Fourier transform fully, which affects classification performance of pulse wave in patients with H-type hypertension. Therefore, an improved MFCC feature extraction is proposed in this article. The detailed process is shown in Fig. 1.
In the pre-emphasis process, a Gaussian filter was applied to framing (sampling) operation, and 256 sampling points were taken as an observation unit to ensure that an observation unit contains at least one pulse wave period. In the process of windowing, we selected hamming window to add the continuity of the left and right ends of the "frame" to reduce the reconstruction error. The pulse wave signal shows leaping change on the time scale. The traditional EMD decomposition cannot effectively separate the different Intrinsic Mode Function (IMF) components according to the characteristic scale, nor clearly reflect the intrinsic characteristics of pulse wave. Direct screening of pulse wave will produce the phenomenon of mode mixing. In this article, we added the adaptive white noise to pulse wave decomposition stage to superimpose and eliminate pulse wave components in different time scales, and also eliminated the reconstruction error caused by adding white noise, which ensures the decomposition accuracy of pulse wave and reduces mode mixing effect. The pulse wave signal set with white noise can be expressed as the following formula.
where, g i (t)(i = 1, 2, … , I) represents Gaussian noise, and i is the total number of pulse wave instances.
The pulse waveform can be decomposed into various IMF components and corresponding residual of each order via EMD with adaptive noise and expressed as follows The marginal spectrum, the pulse wave frequency-domain feature, is obtained by Hilbert-Huang transform (HHT) and time integration according to the IMF components. The traditional MFCC extraction by Fourier transform cannot reflect the change of pulse wave in a certain period of time and the information of a certain frequency moment. The Fourier transform is only suitable for stationary signals, rather than non-stationary signals such as pulse waves. In this article, the improved HHT is applied to obtain the nonstationary characteristics of pulse wave. Then, Mel triangle filter banks are applied to smooth frequency-domain features to eliminate harmonic effect and highlight the resonance peak. The frequency response of Mel triangle filter can be expressed as the following formula.
where Then the logarithmic energy is calculated according to the output of each filter set. After discrete cosine transform, the MFCC features of pulse waveform can be obtained.
where H (k) is the marginal spectrum signal, and L is the order of MFCC, set as 12.

Improved BiLSTM Model with Mixed Attention Mechanism
In this article, the time-domain of pulse waveform is transformed into frequency-domain for extraction of MFCC feature. Long short-term memory network (LSTM) [23,24] can learn and remember long-dependent temporal pulse wave MFCC features via its unique gate mechanism. BiLSTM [25] is an improvement on the traditional LSTM, including two layers of LSTM for transmission of the forward and reverse input data respectively. BiLSTM stacks the results of the two layers and extracts the correlation of features from two directions, which can effectively enhance the feature extraction effect of LSTM. When the pulse wave data passes through the BiLSTM layer, the hidden state unit will increase the channel dimension of pulse wave from the initial one dimension to N dimension, and the data correlation generated after the channel expansion is often ignored. Therefore, this paper adds the channel attention mechanism on the basis of BiLSTM model to learn the correlation of pulse wave feature in the channel dimension. However, when the input sequence is overly long and the redundancy of input data from two directions increases, the vector of features cannot accurately express correlation between the data. Meanwhile, due to the length limit, the model cannot retain all the important information. Therefore, in this article we add spatial Attention mechanism on the basis of channel Attention, selectively learn the input sequence, retain the intermediate results of the BiLSTM encoder, correlate the pulse wave sequences in the output, and form a mixed Attention mechanism to learn the correlation of pulse wave feature in the channel and spatial dimensions. The structure diagram is shown in Fig. 2.
To focus on the relevance of channel dimensions, spatial dimension information [26] needs to be compressed, and the expression is as follows where d C is the cth characteristic of the input matrix; h is the input feature height. Channel weights are obtained by hidden layers, including global pooling, ReLU, fully connected layer and Sigmoid. The expression is as follows where W 1 is the weight of the first fully connected layer, activation function is ReLU, W 2 is the weight of the second fully connected layer, and activation function is sigmoid. The original features are re-calibrated by the weight of channels, and finally the weighted channel dimension is obtained.
where F rw represents the context channel product and d C represents the feature graph.
To focus on the correlation of spatial dimensions, multilayer neural network (MLP) is applied for the weight coefficient [27,28] of the corresponding Value of each Key by calculating correlation between Query of certain element and its Keys. The expression is as follows.
The correlation score is converted numerically, and the original calculated score is sorted into a probability distribution with the sum of the weights of all elements equivalent to 1 through normalization. It can also be expressed as the critical weights of important elements through the internal mechanism of SoftMax. The formula is as follows.
Finally, we take the weighted sum of the Values to get the final Attention Value.
Because the global pooling layer will ignore some important characteristics, the global attention mechanism is added as auxiliary information. The overall architecture of the proposed model is shown in Fig. 3. The model training pseudocode is shown in Table 1.

Hyperparameters and Evaluation Indexes
Clinical pulse wave sampling is subject to multiple factors, and sometimes there will be incomplete data. After specific pretreatment, inconsistency of MFCC feature length still occurs. Therefore, the maximum length of MFCC feature is set to 153 as the standard, and 0 is used to be filled if the length is insufficient. The padding MFCC features are used as network input. The BiLSTM layer has 32 units, dropout is 0.1, and the model uses Adam Optimizer. The initial learning rate is α = 0.001, and the exponential decay rate is 1 = 0.9 , 2 = 0.999 . The batch size is set to 32

Filtering Process
The high-frequency noise and baseline drift can decrease effectively by filtering noises from the original pulse wave. In this paper, wavelet functions with different orders and basic functions are applied to perform noise reduction on original pulse signal. Figure 5 shows the curves of SNR and RMSE after denoising when the wavelet order number is N, and the wavelet basis is db, sym and Coif, respectively. Table 2 shows the corresponding SNR and RMSE values of the curve in Fig. 5. The analysis shows that sym7 wavelet has a maximum SNR = 45.5407 and a minimum RMSE = 0.03723. Therefore, sym7 is selected as the wavelet base. This is because Symlets series sym wavelet bases have better symmetry than db and coif , which can effectively reduce phase distortion and noises in reconstruction of pulse signal. When the support range of sym7 wavelet base is 13 and the vanishing moment is 7, the pulse wave has good regularity, which can concentrate the pulse wave energy and effectively reduce boundary effect of wavelet transform (Fig. 6). Figure 7 shows the approximation and detailed components of each order in the single cycle after wavelet decomposition. Figure 7a shows the approximate components of low frequency coefficient of pulse wave, and the different layers of approximation component reflects the variation of low frequency signal. Figure 7b shows the detailed components of high-frequency coefficient. Through the detailed components refactoring we can eliminate the high-frequency signal and noises. As shown in Fig. 7b, high-frequency information exists in the first three layers of detailed components at some sampling points of pulse wave. The detailed components decomposed in layer 1 and layer 3 correspond to the high-frequency interference from 20 to 120 Hz, and the detailed components decomposed in layer 2 correspond to the high-frequency interference of 50 Hz.The fourth and fifth detailed components are the main pulse wave signal. The decomposition coefficients of the first three layers of detailed components are processed by soft threshold and reconstructed to achieve better high effect of high-frequency noise removal. Figure 6 shows the comparative results of various baseline drift removal methods. According to the analysis, for the pulse wave instances of H-type hypertension, SNR of SPA increased by 4.52% and RMSE decreased   Figure 8 shows the filtering process, e.g. highfrequency noise and baseline drift removal. It can be seen that the filtering process reduces the average amplitude of pulse wave, eliminates the extreme points, and maintains stable periodicity of pulse wave. Figure 9 shows the time-domain pulse wave and the IMF components and the residuals of each order in a frame via EMD with adaptive noise. The higher the order, the lower the frequency. Fourier transform can only reflect high-frequency information, but by EMD with adaptive noise, we can obtain different scale frequency information,  Meanwhile, Improved MFCC has achieved higher classification accuracy and generalization performance than MFCC. Table 3 shows the classification results of various models. For the traditional RF model, compared with MFCC method, the recall of Improved MFCC increases by 0.78%, but the precision falls by 2.45%, respectively. This is because RF model has high sensitivity to class imbalanced instances. The prediction is biased to preference of majority non-Htype class, which leads to high Precision. However, compared with MFCC method, F1 score of Improved MFCC improves by 0.45%. This is because we add the adaptive Fig. 9 Comparison of pulse wave decomposition via FFT and EMD with adaptive noise Fig. 10 Comparison of performance for various models white noises via HHT and obtain marginal spectrum, which can decrease mode mixing, and reduce the possibility of high-frequency signals with lower amplitude in short time interval. In addition, it accurately reflects the pulse wave as the actual frequency of non-stationary signal, increases the representative characteristics of the MFCC coefficients, so as to improve classification performance. Figure 11 shows the ROC and PR curves corresponding to the various models and methods. It can be seen that whether for RF and SVM or deep learning model such as BiLSTM, Improved MFCC method has higher evaluation performance than MFCC, Handled Pulse and Original Pulse. Therefore, Improved MFCC has better classification accuracy and generalization performance. It follows that the filtered frequency-domain MFCC features have higher discriminability than time-domain features, which can effectively reduce mode mixing and enhance classification performance.

Ablation Study
To evaluate the performance of attention-based mechanism, we conducted ablation study by adding the spatial and channel attention blocks based on BiLSTM model, respectively. Figure 12 shows the performance of different models, and Table 4 lists the evaluation indexes of various BiLSTM models with different attention mechanisms. Figure 13 shows the ROC and PR curves of various models. Compared with BiLSTM, the Accuracy, Recall, Precision, F1-score, AUC and AP of BA model increase by 4.73%, 13.18%, 0.71%, 8.58%, 4.45%, and 2.73%, respectively. It shows that BA has better classification performance than BiLSTM. This is because the spatial attention mechanism can effectively extract the important location features of pulse wave, provide more effective information for the fully connected layer, and improve generalization performance of model. Compared with BiLSTM model, the Accuracy, Recall, Precision, F1 score, AUC and AP of BSE (BiLSTM with SE block) improve by 4.44%, 11.63%, 1.91%, 8.07%, 4.25% and 3.12% respectively, so BSE has better classification performance on MFCC features of frequency-domain pulse wave. This is because the channel attention mechanism assigns different weights for different channel dimensions of BSE, which enhances feature extraction ability of each channel, and then improves classification performance.
According to Table 4, the BSEA (BiLSTM with SEblock-Attention and spatial Attention) model uses channel Attention first, then adds spatial Attention, and its F1 score reaches 0.9006. BASE (BiLSTM with Attention-SE-block) uses spatial Attention first, then adds channel Attention, and its F1-score is 0.8871. It can be seen from the results that BSEA outperforms BASE, and the sequence of adding Channel or Spatial Attention mechanism based BiLSTM has some effects on classification performance. Data compression via global average pooling may cause loss of some detailed features. Therefore, after adding channel attention mechanism, we also add global spatial attention mechanism to supplement the lost features. The BSEAA model is obtained by adding global spatial attention mechanism on the basis of BSEA model. Experimental results show that compared with BSEA, Accuracy of BSEAA increased by 0.6%, reaching the highest of 0.9348, Recall increased by 0.77%, Precision increased by 0.58%, F1 score increased by 0.71%, AUC by 1.08%, and AP increased by 1.2%. BASEA (BiLSTM with Attention-SE-Block and Additional Attention) is obtained by adding global spatial attention mechanism on the basis of BASE model. Compared with BASE model, Accuracy of BASEA increased by 1.19%, Recall decreased by 0.77%, Precision increased by 2.82%,   . 13 The ROC, PR curve of different models F1 score increased by 1.46%, AUC increased by 0.29%, and AP increased by 0.32%.Therefore, on the basis of BiLSTM, adding global spatial attention mechanism can effectively improve the overall classification performance because the global pooling layer of channel attention mechanism ignores some important features, and the global attention mechanism with parallel structure can supplement the features lost by global pooling to a certain extent, and enhance classification performance effectively. Compared with BASEA, Accuracy of BSEAA model increased by 0.6%, Recall decreased by 0.78%, Precision increased by 2.62%, F1 score increased by 0.6%, AUC increased by 1.04%, and AP increased by 1.1%. Therefore, BSEAA model achieves the best classification accuracy and generalization performance among various improved models based on BiLSTM. Figure 14 shows the training time and number of parameters of threefold cross-validation for various BiLSTM models with different attention mechanism. It can be seen that after the channel attention mechanism is added to BiLSTM model, the number of parameters increases by 66, accounting for only 0.55%, and F1 score increases by 8.07%. From the perspective of economics, it shows that the BiLSTM model obtains higher benefits by adding channel attention mechanism. Compared with channel attention mechanism, spatial attention mechanism has higher temporal and spatial complexity. Increasing spatial attention mechanism can improve F1 score by 8.58%, which is similar to that of channel attention mechanism, but the number of parameters nearly doubles. Therefore, when the model is of high complexity, the channel attention mechanism is preferred to improve the classification performance. In the meantime, it can be seen that with the increase of number of instances, the training time and the number of model parameters also increase. The number of parameters is proportional to the training time, and the training time difference of each model is about 20 s.With the addition of attention mechanism module, the number of parameters increases and the accuracy of classification improves remarkably.

Analysis of Feature Importance
In addition, we added 29 healthy controls (male 10.3%, female 89.7%, age 20.34 ± 0.61), and calculated the P-values and 95% confidence intervals of the characteristics of sex, age and pulse wave in the time-frequency domain, including healthy control group and H-type, H-type and non-H-type. Table 5 shows significance test of top 5 feature importance ranking in time and frequency domain including healthy controls vs hypertensive patients and H-type vs non-H-type Hypertension.
In this article, we ranked 36 pulse wave MFCC features [29] in frequency-domain by calculating Gini impurity of RF algorithm [30,31] as shown in Fig. 15. The top 5 features are the 8th in Second Order Difference (S), namely S_8, the 12th in First Order Difference (F), F_12, and S_10, S_1, F_8. Most of the features that have major influence on classification are the first or second order difference coefficient. The first-order difference accounts for 42.64%, and the second-order difference accounts for 56.9%, totaling more than 99%. Therefore, the dynamic coefficient characteristics of pulse wave describe the correlation between adjacent frames of pulse wave in details, which has crucial influence on classification performance. Presently, many scholars have studied the classification of pulse waves on a single period. The static characteristics of MFCC have a certain impact on classification performance of model. However, the pulse waves between different periods may have more correlation, and more refined classification may be conducted on pulse wave instances of H-type hypertension. In addition, there are many characteristic components in the first-order and second-order difference coefficient, such as F_8 and S_12, which have great influence on classification. The actual physical meaning of these coefficients in relation to pulse waves is yet to be explained so far. It may play a great role in classification of H-type hypertension. Our future directions will include the study of the characteristics of the first-order and second-order differential coefficients, and the correlation In this article, we also calculated the correlation between time-domain pulse graph features and classification of H-type hypertension, and worked out the feature importance ranking. As shown in Fig. 13, the top 4 importance features, i.e. w 1 /T, h 4 /h 1 , h4 and t 1 /t 4 [32,33] are all higher than 5%. w 1 /t refers to the duration of aortic pressure rise, which is related to the appearance time of wave before repeat wave and peripheral resistance. h 4 /h 1 mainly reflects the level of peripheral resistance. When the peripheral blood vessels contract, the resistance increases and h 4 /h 1 increases (> 0.45). On the contrary, when the peripheral resistance decreases, h 4 /h 1 decreases (< 0.30). h 4 is the amplitude of the dicrotic notch, representing the height from the bottom of dicrotic notch to the baseline of pulse wave. The height of dicrotic notch corresponds to the diastolic blood pressure, which is related to the peripheral resistance of arteries and the function of aortic valve closure. The top 3 importance features all reflect the impact of peripheral vascular resistance of H-type hypertension patients on classification. Studies [34] have proved that H-type hypertension is an independent risk factor for atherosclerosis and atherosclerosis, which is also correlated with the results of our study, suggesting that H-type hypertension has a certain correlation with atherosclerosis and atherosclerosis. t 1 /t 4 is related to the function of cardiac ejection. The larger t 1 /t 4 , the slower the acute ejection period of the heart, the weaker systolic function of left ventricular.
Therefore, we can research on the correlation between the pulse wave features both in time-domain and frequencydomain and classification of H-type hypertension, aid clinicians to furtherly study the influence of patient's vascular