Multi-Regression Fusion Based Blood Pressure Estimation

Using the traditional cuff based instruments to estimate the blood pressure (BP) values is inconvenient and difficult to have a continuous measurement. On the other hand, using the intelligent watch based instruments to perform the continuous BP estimation for monitoring the human health conditions is convenient and easy. This paper proposes a method for estimating the BP values continuously only using the photoplethysmogram (PPG). First, the PPGs are denoised by the discrete cosine transform (DCT). It is worth noting that the conventional DCT denoising approach only takes the low frequency DCT coefficients. On the other hand, this paper proposes a training based method for selecting the DCT AC coefficients. Here, the DCT AC coefficients refer to those nonDC DCT coefficients. Then, the features based on some specific points in the PPGs are extracted and the feature vectors are categriozed into two classes of the BP values using the random forest (RF) classifier. Third, for each class and each type of the BP values, three popular regressions including the support vector regression (SVR), the RF regression and the L1 norm criterion based linear regression are used to estimate the BP values. These three estimated BP values are fused together via an L2 norm based regression. It is worth noting that different classes and different types of the BP values are considered separately. Hence, each class and each type of the BP values can be estimated more accurately. Moreover, since the multi-model fusion is employed for combining these three regressions, the overall estimation results are more accurate. The computer numerical simulation results show that the average root mean squares errors (RMSEs) of the SBP and the DBP estimated by our proposed method are 9.12 and 8.19, respectively. In fact, our proposed multi-model fusion based BP estimation approach achieves a higher accuracy compared to the individual regressions.


Introduction
The hypertension prevalence in men is similar to that in women under the same ages [1]. In the United States, the hypertension prevalence in the people aged between 18 and 39 was 22.4%, that in the people aged between 40 and 59 was 54.5%, and that in the people aged 60 or above was 74.5% [1]. Also, the hypertension is a major cause of many serious diseases such as the cardiovascular diseases (CVDs) and the renal diseases. In the China, there are approximately 290 million of patients suffering from the CVDs, in which 245 million of patients have the hypertension [2]. Besides, the hypertension is the most common cause of the death or the disability in the worldwide. In terms of the mortality, 40% of the Chinese are died from the CVDs. This is higher than that of the cancers or other diseases [2]. Since the hypertension is a major cause for the CVDs [3] and contributes a large percentage of the mortality, the hypertension is used as a key criterion for the diagnosis of the CVDs and an index for monitoring the health conditions of the humans. As a result, it is of a high significance for the early detection and the suppression of the hypertension by continuous estimating the BP values accurately in our daily life.
In the clinical practice, there are two approaches for estimating the BP values. They are the invasive approach and the non-invasive approach. For the invasive approach, the BP estimations are performed using the intraarterial cannulation. However, the arterial intubation method involves the tedious steps. For the noninvasive approach, the BP estimations are performed using the cuff via the oscillometry or the auscultation. However, the cuff based method may cause the arterial occlusion. More importantly, these approaches cannot perform the continuous measurements. In order to address the above issues, it is required to develop a device for performing the continuous estimations of the BP values outside the clinic.
In the past few years, various continuous and non-invasive BP estimation methods have been proposed [4]. A common method is to employ the pulse transit time (PTT) and/or the pulse wave velocity (PWV) [5] to perform the BP estimation. Here, the PTT is the time taken by an arterial pulse wave to travel between two arterial measurement sites within the same cardiac cycle of a single systolic diastolic event. In the practical situation, the PTT is usually acquired by estimating the time difference between the QRS complex or the R wave of the recorded electrocardiogram (ECG) and the characteristic point such as the pulse foot or the peak on the peripheral pulse of a photoplethysmogram (PPG) [6]. From here, it can be seen that it is required to have the synchronous measurements between the ECG and the PPG [7]. Here, there are two challenges. The first challenge is the high cost for designing the circuit for performing the synchronous measurements. The second challenge is the acquirement of the ECGs. This is because acquiring the ECGs requires a closed circuit in the body. In order to acquire a good quality of the ECGs, the subject is required to remove the metallic objects and the jewelries. Otherwise, the impedance of the body will be changed and the acquired ECGs will be affected. Nevertheless, this brings an inconvenience to the subjects because some metallic objects such as the dental implants and the pacemakers are difficult to be removed [8].
The rationale and the motivation of this paper are to develop a method for performing the continuous and non-invasive BP estimation outside the clinic. To achieve this goal, first the PPGs are denoised via a discrete cosine transform (DCT) approach. Second, the features are extracted based on some specific points in the PPGs. Third, the random forest (RF) is employed to classify the PPGs into two classes of the BP values. Fourth, three common regressions are employed to estimate the BP values. Finally, the above three estimated BP values are fused together to obtain the final estimated BP values. The innovations and the contributions of this paper are as follows. First, as acquiring the ECG may not be appropriate for the subjects wearing the metallic objects and jewelry while the PPG is an optical signal that is much easier to be acquired, this paper only employs the PPGs to perform the continuous BP estimation. Second, as the conventional DCT denoising approach retains all the low frequency coefficients, the signal to noise ratios (SNR) may still be large at the frequencies corresponding to the small valued DCT coefficients. To address this issue, this paper proposes a training approach to select a part of the low frequency DCT coefficients for performing the denoising of the PPGs. Third, since different classifiers have different characteristics, this paper proposes an L2 norm based regression for fusing three popular regressions to estimate the BP values.
The rest of this paper is as follows. Section 2 presents the PPG based devices. Our proposed DCT based method for performing the denoising of the PPGs is presented in Section 3. Section 4 presents the features extracted for performing the BP estimation. Section 5 presents the classification of the normal class of the BP values and the prehypertension class of the BP values. Section 6 reviews three popular regressions. Section 7 presents the multi-model fusion of the above three regressions. Section 8 presents the computer numerical simulation results. Finally, a conclusion is drawn in Section 9.

The PPG based devices
A pulse oximeter consists of a light source and a photodetector which is used to capture the PPGs. It can detect the cardiovascular pulse waves transmitted through the body as well as determine and record the changes of the human blood flow during each heartbeat. Hence, it becomes realistic to estimate the BP values indirectly by the PPGs [9]. However, it is inconvenient to perform the continuous measurement via the pulse oximeter. On the other hand, the modern wearable devices such as the smart watches and the smart wristbands usually consist of the PPG sensors instead of the ECG sensors because the PPG sensors are with a higher availability and a lower cost compared to the ECG sensors. However, it is much easier to be affected by the noise. Hence, in order to acquire a high quality of the PPG, a high sampling rate and a very effective denoising algorithm are required.

Data source
There are 219 subjects from the Guilin People's Hospital in the Guang-Xi Province of China. They are consented prior to the study via the written letters. 48% of them are males and 52% of them are females. Their ages are between 21 years old and 86 years old with the median age being 58 years old. Also, they are suffered from a variety of diseases including the hypertension, the diabetes, the cerebral infarction and the cerebral blood supply insufficiency.
This dataset contains 657 PPGs acquired from them and their corresponding reference BP values. These PPGs are acquired by the PPG sensor with the model number SEP9AF-2. It contains the dual LEDs with the wavelengths of these LEDs being 660nm in the red light region and 905nm in the infrared region. The PPGs are sampled at 1kHz [10] and quantized using the 12-bit analog to digital converter [10].
As many PPGs are with the poor qualities, they could not be used for extracting the features. Therefore, only 169 PPGs are retained. Here, 89 PPGs are from the healthy subjects with normal blood pressure values and 80 PPGs are from the subjects with the blood pressure values of prehypertension. As it covers a wide range of the PPG values, this dataset has a high representability. This paper classifies these 169 PPGs into two classes of the BP values namely the normal class of the BP values and the prehypertension class of the BP values.

Denoising algorithm
In this paper, the DCT is used to perform the denoising of the PPGs. Here, the DCT is a kind of frequency representations. The first DCT coefficient is the average value of the signal. Hence, it refers to the DC coefficient. All other coefficients refer to the AC coefficients. Now, the proposed denoising algorithm consists of two stages namely the training stage and the performing stage. Figure 1 shows the procedures for performing the training stage. Here, 40 PPGs are taken as the training signals. The detailed procedures are as follow: Step 1: The DCT coefficients of these 40 training PPGs are computed.
Step 2: For each training PPG, the following sub-procedures are performed: Step 2a: The absolute values of all the DCT AC coefficients are sorted in the descending order.
Step 2b: The first 32 largest absolute DCT AC coefficients are selected and the rest DCT AC coefficients are set to zero.
Step 2c: In each iteration, an additional DCT AC coefficient is selected from these 32 DCT AC coefficients starting from the largest absolute DCT AC coefficient. Then, the inverse DCT (IDCT) is applied to this zero masked DCT coefficients containing both the DCT DC coefficients and those selected DCT AC coefficients. This Step 2c is repeated until the best denoising performance is achieved. Here, the best denoising performance is judged by the expert.
Step 2d: The ratio of the sum of the energy of the selected DCT AC coefficients to the sum of the energy of all these 32 DCT AC coefficients is computed.
Step 3: Let R be the column vector containing these 40 energy ratios found in Step 2d. Let w be a column vector containing 32 weights. Let D be a matrix such that the column of D is the vector of the 32 DCT AC coefficients of each training PPG found in Step 2b. Here, the dimension of D is 3240. Hence, the optimal w can be found by finding the solution of the following H  optimization problem. That is, min (1) Figure 2 shows the procedures of the performing stage. Here, each denoised PPG performs the following procedures: Step 4a: The first 32 largest absolute DCT AC coefficients are selected.
Step 4b: Define an energy ratio M for selecting the DCT AC coefficients. It is set to the inner product between w found in Step 3 and the vector of the 32 DCT AC coefficients found in Step 4a.
Step 4c: In each iteration, an additional DCT AC coefficient is selected from these 32 DCT AC coefficients starting from the largest absolute DCT AC coefficient. This Step 4c is repeated until the ratio of the sum of the energy of the selected DCT AC coefficients to the sum of the energy of all these 32 DCT AC coefficients found in Step 4a is closed to the value of M.
Step 4d: Apply the IDCT to the zero masked DCT coefficients obtained in Step 4c to obtain the initialized denoised PPG.
Step 4e: If the total number of the peaks of this initialized denoised PPG is more than 10, then the value of M is reduced by 0.001 and go back to Step 4c to reselect the DCT AC coefficients.
Step 5: Take the obtained denoised PPGs as the final denoised PPGs for performing the feature extraction discussed in the next section.  To show the superiority of our proposed denoising algorithm, it is compared to the four stage filtering approach [11], the block diagram of which is shown in Figure 3. It is worth noting that a greater filter length will result to the filter with a narrower width of the transition band and a smaller ripple magnitude. Hence, it will result in a better denoising performance. However, the peak values of the PPG will drop more. Since the features employed for performing the BP estimations are based on these characteristic points, the distortions on these characteristic points will significantly degrade the estimation performance. Figure 4a shows a realization of the noisy PPG while Figure 4b and Figure 4c show the PPGs denoised by our proposed algorithm and the four stage filtering approach, respectively. It can also be seen from Figure 4b and Figure 4c that the peak values of the PPG denoised by our proposed algorithm are the same as those of the original PPG, while the peak values denoised by the four stage filtering approach are lower than those of the original PPG. Moreover, it can be seen from Figure 4b that our proposed method preserves the characteristic points of the PPG. On the other hand, it can be seen from Figure 4c that the four stage filtering approach over-smoothens the PPG and the characteristic points of the PPG are gone. Hence, our proposed denoising algorithm is more appropriate than the four stage filtering approach for performing the BP estimation.
Compute the DCT coefficients of these 40 training PPGs.
Select the coefficients from these 32 DCT AC coefficients.
Compute the value of R.
The optimal w is found.
Compute the DCT AC coefficients of the test PPG.
Compute the energy ratio M.
Select the coefficients from these 32 DCT AC coefficients.
The energy ratio is less than the value of M.

Yes
The total number of the peaks of the initialized denoised PPG is more than 10.

No
The final denoised PPG is obtained.

No
The value of M is reduced by 0.001.

Yes
An additional DCT AC coefficient is selected.  Figure 5 shows the characteristic points in the PPGs for extracting the features.   5. The area under the PPG within the time interval between the maximum slope and the systolic peak. 6. The area under the PPG within the time interval between the systolic peak and the dicrotic notch. 7. The area under the PPG within the time interval between the dicrotic notch and the minimum peak. 8. The heart rate. The above 8 features are extracted from each of the denoised PPG and they form a feature vector. Further details of these features can be found in [12].

Classification of the normal class of the BP values and the prehypertension class of the BP values
It is worth noting that there are two types of the BP values. They are the systolic BP (SBP) values and the diastolic BP (DBP) values. Also, for each type of the BP values, there are two classes of the BP values. They are the normal class of the BP values and the prehypertension class of the BP values. Here, all the feature vectors are divided into the training set A and the test set B without the overlap, where the training set A and the test set B contain 70% and 30% of all the feature vectors, respectively. It is worth noting that only the types of the PPGs are required to train the classifier. On the other hand, the corresponding BP values are not required to train the classifier. Hence, both the training set A and the test set B contain the feature vectors corresponding to both SBP values and the DBP values as well as the corresponding normal class of the BP values and the corresponding prehypertension class of the BP values. The training set A is used to train the RF classifier and the test set B is used for evaluating the classification performance.

Reviews on three popular regressions
For each type of the BP values (the DBP values and the SBP values) and each class of the BP values (the normal class of the BP values and the prehypertension class of the BP values), the training set A is further divided into the training set C and the training set D without the overlap. Here, the ratio of the total number of the feature vectors in the training set C to that in the training set D is 1:7. Now, the feature vectors in the training set C are employed to perform the support vector regression (SVR), the RF regression and the L1 norm criterion based linear regression as discussed below, while the feature vectors in the training set D is employed to perform the fusion discussed in Section 7.

SVR
The support vector machine (SVM) is widely used for performing the classification. On the other hand, the SVR is widely used to approximate an unknown function [13]. In this paper, the affine linear function defined in [13] is employed as the convex loss function. First, let i x and e be the feature vector and the bias, respectively.
Let p be the vector normal to the surface and it is found by the training. Now, the convex loss function is defined as follow: (2) Let ε be the width of the interval zone. It is worth noting that the convex loss function maps the feature vectors to the corresponding desirable values with its absolute errors bounded by ε. That is, most of the absolute errors are within the interval zone. For some absolute errors outside the interval zone, define i  as the corresponding positive slack variable. Here, the design of the SVR is formulated as an optimization problem subject to the absolute errors bounded by the sum of ε and i  . Therefore, the constraint of the optimization problem is defined as ,0 Let N be the total number of the feature vectors for performing the training. Now, a multi-objective function is defined as the flatness function of a weight vector and the sum of these positive slack variables as follow.
Here, C is the parameter used for performing the tradeoff between the flatness function of the weight vector and the sum of these positive slack variables [13]. Finally, the solution of this convex optimization problem is found via a numerical optimization computer aided design tool such as the Matlab and the obtained solution is employed as the weight vector of this regression.

RF regression
The RF is an extended variant method of the bagging of the decision trees. In particular, the RF selects the feature vectors randomly for training the decision trees. First, the bootstrap method [14] is used to randomly select some feature vectors from the original training set. The unselected feature vectors compose of the out of the bag data. Second, randomly select some features from all the features and select the best partitioning feature as the node to build a decision tree. Each decision tree is a weak learner. Then, repeat the above procedures to build the forest. The arithmetic average from all weak learners is the final output of the regression [15].
Here, the RF improves the formation of the decision trees in the following senses. For a general decision tree, a feature is selected among all the features and a node is divided into the left and the right subtrees of the decision tree [16]. However, the RF randomly selects a part of the feature vectors from all the feature vectors. Then, an optimal feature is selected to partition the left and the right subtrees of the decision tree. This further enhances the generalization ability of the decision tree methods. 6.3. L1 norm criterion based linear regression Same as the SVR, denote N as the total number of the feature vectors for performing the training and i x for 1, 2,..., N i = as the feature vectors. Let ˆi y be the corresponding true value. Let ŵ and b be the weight vector and the bias in the linear regression, respectively. That is, The solution of this optimization problem is ( ) Now, the test set B is employed for evaluating the overall regression performance.

Computer numerical simulation results
In this paper, the root mean squares error (RMSE) is employed as a criterion for evaluating the performances of various methods. Since the obtained RMSEs based on different realizations of the RF are different, both the classification algorithm and the multi-model fusion algorithm have repeated 20 times. Then, the average classification results and the average of the RMSEs of the multi-model fusion are computed for evaluating the performances. Table 1 shows the RMSEs obtained based on three individual regressions and our proposed multi-model fusion after performing the RF based classification via our proposed denoising algorithm. Table 2 shows the RMSEs obtained based on three individual regressions and our proposed multi-model fusion after performing the RF based classification via performing the four stage filtering approach. It can be seen from Table 1 and Table 2 that the RMSEs obtained via our proposed denoising algorithm are lower than those obtained via the four stage filtering approach except for the RF regression. Table 3 shows the RMSEs obtained based on three individual regressions and our proposed multi-model fusion without performing the RF based classification via our proposed denoising algorithm. It can be seen from both Table 1 and Table 3 that the RMSEs obtained based on our proposed multi-model fusion after performing the RF based classification are lower than those without performing the RF based classification. Since the individual regressions are developed based on each class of the BP values, the ranges of the BP values within the same classes after performing the RF based classification are significantly smaller than the ranges of the BP values without performing the RF based classification. As a result, the individual regressions after performing the RF based classification yield the lower RMSEs compared to those without performing the RF based classification. This explains why performing the RF based classification is more effective than without performing the RF based classification.
From Table 1 to Table 3, it can be seen that the RMSEs based on our proposed multi-model fusion are lower than those of the individual regressions for all the cases including the case with performing the RF based classification via our proposed denoising algorithm, the case with performing the RF based classification via the four stage filtering approach and the case without performing the RF based classification via our proposed denoising algorithm. To understand why the multi-model approach yields an improvement compared to the individual regressions, it is worth noting that the characteristics of different regressions are different. For the SVR, its fitting function depends on the chosen kernel function. The SVR will yield the large RMSEs for the PPGs with their fitting functional values being far away from their corresponding true BP values. For the L1 norm criterion based linear regression, it results to few PPGs having very large RMSEs while most of the PPGs having small RMSEs due to the sparsity of the solution of an L1 norm optimization. For the RF regression, the RMSEs are different for different realizations because the randomness is introduced in the RF regression. On the other hand, our proposed multi-model approach fuses various regressions together. This optimal averaging among various regressions results to the lowest RMSEs. This explains why our proposed multi-model fusion is more effective than the individual regressions.  In order to test the robustness of our proposed method, the ratio of the total number of the feature vectors in the training set A and that in the test set B is changed to 60% and 40%, respectively. The corresponding results are shown in Table 4, Table 5 and Table 6, respectively. It can be concluded that the similar results are obtained. Hence, our proposed multi-model fusion is robust.

Conclusion
This paper develops a method for performing the continuous BP estimation only via the PPGs. First, the selection of the DCT coefficients via the training approach is proposed for performing the denoising of the PPGs. Then, a multi-model fusion method is proposed to combine various BP values estimated by various regressions together. The computer numerical simulation results show that our proposed denoising algorithm and the multimodel fusion achieve the lower average root mean squares errors compared to the individual regressions or the four stage filtering approach. This approach can be applied to estimate other health indices such as the blood glucose concentration, the heart rate and the respiratory rate.
Since the proposed approach is an artificial intelligence approach, the accuracy of the proposed method is highly dependent on the correlation between the data in the training set and that in the test set. If the environment for acquiring the test set such as the races of the subjects is changed, then the accuracies will be dropped. The investigation on applying the transfer learning method based on the model developed in this paper will be conducted for solving the above problem in future. Figure 1 The procedures for training the DCT coe cients.

Figure 2
The procedures of the performing stage.

Figure 3
The procedures for performing the four stage ltering approach.   The characteristic points in the PPGs for extracting the features.