Normalization and Comparison of Photoplethysmography Between Normal and Patient Groups Using Deep Neural Networks

Photoplethysmography (PPG) is easy to perform and provides a variety of measurements, including details of heart rate and arrhythmia. However, automated PPG methods have not been developed because of their susceptibility to motion artifacts and differences in waveform characteristics among individuals. With increasing use of telemedicine, there is growing interest in application of deep neural network (DNN) technology for ecient analysis of vast amounts of PPG data. This study proposes an automatic algorithm incorporating DNNs for individual and patient-group identication; this is achieved by selecting normally measured waveforms, deleting error regions, and normalizing the pulse wave to obtain 10 “section values” that can be easily compared to other waveforms. The proposed algorithm was able to distinguish between patients aged 60–75 years with diabetes and hypertension and healthy subjects aged 25–35 years (AUC = 0.998). On the other hand, errors were frequently observed in identication of individuals (AUC = 0.819).


Introduction
As the demand for noncontact telemedicine services increases, the amount of medical data directly measured by patients themselves outside the hospital setting is increasing 1 . To improve the reliability of remotely transmitted medical data, a technology capable of capturing the characteristics of individuals and patients with diseases is required 2 . Photoplethysmography (PPG) is one of the most frequently used medical monitoring technologies due its convenience, and it has been employed to predict the likelihood of disease and for identi cation of individuals [3][4][5][6][7][8] . However, as most conventional automatic PPG analyzers have di culty in distinguishing waveforms caused by motion artifacts and electrodeconnection failures, etc., to increase the reliability of automatic analysis requires deep neural network (DNN) technology for normalization of PPG waveforms and to exclude artifacts [9][10][11] . Although many methods for determining the likelihood of disease and identifying individual characteristics have been proposed, they have the limitation of requiring human experts to determine whether the PPG waves are informative or uninformative [5][6][7][8] . Also, these methods have not been validated because it is di cult to obtain su cient data through manual analysis. Therefore, a number of DNN techniques have been applied with the goal of replacing human experts in PPG analysis. However, their accuracy remains low and more training data are required in the machine learning process to increase the accuracy of these DNNs [12][13][14][15][16][17][18][19] .
Recently, telemedicine in Korea has generated large amounts of PPG data 20 . Reliable DNN techniques have been developed to select data from the transmitted PPGs and normalize the PPG waveform according to the heart-rate (HR) cycle, while excluding data affected by motion artifacts or electrode connection failures [21][22][23][24] . It is expected that a novel DNN algorithm for normalizing the PPG data of healthy subjects and patients with diseases will improve automatic PPG analysis for individual and patient-group identi cation.
The six DNN models in this study identify features of the heartbeat in PPG waveforms and evaluate whether the PPG waves are informative. The algorithm selects the informative region in the PPG and divides it into 10 sections according to the phases of the HR cycle. Then, it calculates the mean ± standard deviation of each section, for use as criteria in individual and patient-group identi cation. This study was performed to evaluate the personal and patient group identi cation accuracy of our new algorithm.

Results
Up Personal identi cation using PPG data Figures 1a-c show example PPG data from subjects #1, #3, and #15, respectively, for identi cation based on previously obtained section-speci c data. Although subjects #1 and #3 both belonged to the young healthy group, their data showed some differences, as did those of subject #1 compared to one of the elderly patients (subject #15). Each individual's statistical data were obtained during 10 sessions, and the mean PPG velocity ± standard deviation was calculated for each session in advance by analyzing the data of at least 120 heartbeats. The statistical data were obtained in ve measurement periods and were normalized according to the heart rate cycle. In Fig. 1, the top (red) and bottom (blue) lines are the criteria for subject #1, and were calculated as the sum and difference of the mean and standard deviation per session, respectively. The newly measured values of subject #1 are consistent with the criteria of subject #1, shown as lines in Fig. 1a. As shown in Fig. 1b, the PPG waveform of normal subject #3 was similar to that of normal subject #1, in that the re ection wave was larger than that of the patient group. However, there was a difference in re ection wave occurrence. If this method can nd differences even among individuals within the same group, it may be possible to distinguish individuals based on the waveform characteristics. However, common characteristics among subjects in the normal group can lead to many errors in individual identi cation. As shown in Fig. 1c, subject #15, who suffered from hypertension and diabetes, was older than the healthy subjects in the control group. As the patient group shows distinct differences from the healthy controls, the possibility of misidentifying a patient as a healthy subject is low.
Figures 1d-f compare the PPG data of subjects #15, #10, and #1 with the statistical data of subject #15. The newly measured PPG data of subject #15 show high consistency with the statistical data of subject #15, as shown in Fig. 1d. Figure 1e shows that, although subjects #15 and #10 were the same age and had small re ection waves on PPG in common, they were recognized as distinct individuals based on differences in their pulsation waves. However, the common features and simplicity of the waveforms among the subjects in the patient group lead to a high probability of individual identi cation failure.
The receiver operating characteristic (ROC) curve in Fig. 2 shows the sensitivity (true-positive rate, TPR) and false-positive rate (FPR), which distinguishes each individual from all other individuals. Z-values are normalized to the differences between newly measured PPG data and previous statistical data (by averaging and dividing by the standard deviation). The area under the curve (AUC) value of 0.819 indicates that identi cation of individuals is inaccurate, but possible to a limited extent.
Patient group identi cation using PPG data Table 1 shows the differences of beat-to-beat PPG waves between the healthy control and patient groups in each of the 10 sessions. Distinct differences between the younger healthy group and older patient group were observed in sessions 2-6. After normalizing the differences by the standard deviation, the waveform was clearly larger in the healthy control subjects than the patients in session 5 (p = 0.05). The patients' waveforms were generally larger than those of the healthy controls in sessions 2 and 3. The lines in Fig. 3a and b show the sum and difference of the mean and standard deviation per session of the PPG waves of younger healthy controls. In Fig. 3a, the newly measured PPG data from subject #6 in each session are shown as circles between the upper and lower lines. The lines in Fig. 3a and b shows the statistical data obtained from younger healthy control subjects. Figure 3b shows the differences between the statistical data of the control subjects and the values for subject #14, who suffered from diabetes and hypertension. In Fig. 3c, as the PPG values of subject #14 are between the lines corresponding to the mean ± standard deviation of the elderly patients with diabetes and hypertension, subject #14 was judged to belong to the patient group. However, Fig. 3d shows that the data of subject #6 are distinct from those of the patient group.  Figure 4a shows the accuracy of the data classi cation (healthy control or patient group) based on analysis of the ROC curve, calculated from the difference between the measured data and the Z-values. Figure 4b shows the classi cation accuracy of unknown and newly obtained data for 353 individuals (healthy controls, n = 150; patients, n = 203) not included in the model-building process. The AUC value was 0.998.

Discussion
A number of previous studies used DNN techniques to analyze PPG and obtain reliable parameters related to cardiovascular disease 10,[12][13][14][15][16][17][18][19] . The recognition score (RS) obtained from DNN models in this study indicate whether the PPG waveform is caused by heartbeat or errors 22 . Previous studies have also shown that HR and heart rate variability (HRV) obtained by an algorithm incorporating DNN and selecting only high-RS PPG data are more reliable than those of conventional algorithms that produce HR and HRV from PPG without removing ambiguous data that includes errors 23 . In addition, when the RS is low, SPO2 and HR measured simultaneously using conventional SPO2 devices can show large error 24 . These studies showed that DNN algorithms are useful for determining the reliability of remotely measured vital signals [21][22][23][24] .
If the algorithm proposed in this study is applied to a remote medical information system, it is possible to increase the e ciency and reliability of medical data collection by automatically requesting the patient to remeasure the questionable data when some data differ signi cantly from previous data. A remote medical information system that can remove unreliable data automatically, thus reducing the amount of data that must be inspected by staff, will decrease operation costs.
As shown in Fig. 2, identi cation of individuals using our algorithm is not entirely reliable because of the high error rate (> 20%). It would be useful if the system could distinguish mistakenly transmitted PPG waveforms of family members coresiding with the patient. When the patient's re-transmitted data still differ from their statistical norm, a message could be sent to the medical staff instructing them to inspect the medical data and identify the cause of the changes in the patient's condition. Therefore, the individual identi cation function can contribute to the e cient management of telemedicine data.
As shown in Fig. 4, patient-group classi cation accuracy was very high (AUC − 0.998), because the peaks corresponding to heartbeat and re ection were higher in the young healthy control group than in the older patient group. However, there were signi cant differences in both age and disease type between the healthy control and patient groups in this study, so it is not clear which of these two factors the differences were due to. Takazawa et al. reported larger re ection waves in young people. In addition, the amplitudes of re ection waves are lower in patients with cardiovascular disease than in healthy controls of the same age 6, 25 . The collection and analysis of PPG waveform characteristics from more patients according to age and disease type would enable a remote arti cial-intelligence system to determine the likelihood of diseases accurately.
Although PPG waveforms can elucidate a patient's condition, most manual methods of PPG analysis have not been veri ed. Our proposed algorithm is useful for collecting various types of PPG waveforms and identifying patient's conditions based on speci c characteristics, and will contribute to the development of new monitoring techniques for patients in remote locations.

Methods
PPG analysis system Figure 5 shows the PPG waveform analysis system. The data are transmitted through a telemedicine network, and whether the vital signs are normal or have changed for any reason is determined. Initially, six DNN models inspect PPG waveforms associated with heartbeats, "blood pressure re ection" from arteries, and motion artifacts. Abnormal waves (RS < 80) are removed, because previous studies have shown that a low RS in PPG analysis indicates that the waveform is related to motion artifacts or probe connection failures rather than the heartbeat. The DNNs also determine the start and end of the heartbeat cycle for a normal PPG waveform. Each PPG waveform corresponding to a heartbeat is normalized in terms of amplitude and differentiated to obtain the PPG velocity (vPPG). Beat-to-beat vPPG is divided into 10 sections, the mean values of which represent the characteristics of the waveform. The mean ± standard deviation values of the 10 sections re ect the characteristics of an individual's PPG waveform. These "section values" of beat-to-beat vPPGs may differ among patients, even those suffering from the same diseases. However, commonalities among patients with the same disease may also be seen, and may be distinct from those of healthy subjects. The individual mean ± standard deviation vPPG section values are obtained through ve measurements conducted at different times. They are also saved in the system and used as criteria to assess the accuracy of individual and patient-group identi cation for each newly assessed individual. Figure 6 shows the PPG waveform-processing steps.
Each DNN in the system consists of one input layer (124 × 1), two hidden layers (124 × 1, 124 × 1), and one output layer (21 × 1) 23 . Three of the DNNs nd the set (S) point, the onset (O) point, and the peak point of the re ection wave (W point), which are related to heart contractions 11 . Another DNN nds the Zpoint associated with blood pressure re ection from peripheral arteries. The other two DNNs nd the start and end points of uninformative sections of data, which are mostly caused by motion artifacts and probe connection failures.
To determine whether the PPG waveforms represent normal heartbeats or noise, the recognition rates of the S, O, and W points of PPG waveforms during the beat-to-beat period should be considered 23 . Error regions should not be included in the beat-to-beat period. RS is calculated by summing the recognized S, O, and W points and subtracting the recognized error regions. The RS value of a heartbeat cycle can be used as an index of the reliability of SPO2 and HR data measured simultaneously using an SPO2 device 24 . Previous studies indicated that the W point shows the highest reliability in terms of identifying heartbeats with DNNs 22 . Thus, the PPG waveforms are divided at the W points to normalize the beat-tobeat vPPG.

Identi cation of individuals and diseases
During measurement, 40 beat-to-beat vPPGs are usually obtained and transmitted to the system. If the percentage of normal heartbeats with a high RS (≥ 80) does not exceed 90% (of all heartbeats), the system sends a message to the patient indicating the need for re-measurement. Subsequently, all the reliable measured beat-to-beat vPPG are divided into 10 sections and the value of each section can be represented as the mean ± standard deviation. For ease of comparison with other values in the same section, the statistical ranges are normalized as Z-values using the following equation: Z-value = (mean of the criterion data − mean of the measured data)/(SD of the criterion data) Criterion data can be obtained from individuals or groups of patients with the same disease. In this study, 15 subjects transmitted PPG waveforms on 428 occasions through the telemedicine system; 75 of the transmitted datasets were used to create 15 individual criterion data for individual identi cation and 2 group criterion data for the healthy control and patient group. Absolute Z-values were calculated for 10 sections, and the average sectional Z-values were analyzed to determine how closely the newly measured data (353) conformed to the criterion data of individuals, the healthy control, and the patient groups.
All experimental protocols of clinical trial were approved by the institutional review board of Kangwon National University (KNUH-2020-06-008-008). All methods were carried out in accordance with relevant guidelines and regulations. Remotely transmitted data from eight healthy control subjects aged 25-37 years, and seven patients aged 63-78 years with diabetes and hypertension, were used. Table 2 shows the diseases, ages, and numbers of the subjects. The data from the rst ve measurements were used to determine the criteria for individual and patient-group identi cation. The other data were then used to evaluate the accuracy of the individual and patient-group identi cation.  Figure 1 Example personal identi cation data from subjects #1 and #15.

Figure 2
Receiver operating characteristic (ROC) curves for personal identi cation using newly measured PPG data and statistical data.

Figure 3
Example patient group identi cation data from subjects #6 and #14.

Figure 4
Accuracy of the classi cation of unknown data to the healthy control and patient groups Figure 5 Block diagram of the PPG waveform analysis system, which incorporates DNN models to select normal data and identify critical changes.