Study population and data collection for construction of a CNN
All data used for construction of a CNN for automatic HF classification were retrospectively acquired at the University of Tokyo hospital from participants who were at least 20 years old between 2013 and 2020. The flow of data creation for the CNN is shown in Figure 1a. The healthy control group was composed of individuals who underwent a comprehensive medical examination and were diagnosed as having no cardiovascular disease (CVD). All patients with HF were diagnosed as New York Heart Association (NYHA) I, NYHA II, NYHA III, or NYHA IV according to NYHA functional classification criteria42. The definitions of the classifications are as follows:
- Healthy control: Participants have no CVD.
- NYHA I: Patient has been hospitalized for HF but has no limitation of physical activity; physical activity does not cause fatigue, palpitation, or dyspnea.
- NYHA II: Patient has been hospitalized for HF, and physical activity is slightly limited by fatigue, palpitation, or dyspnea, but they have no symptoms at rest.
- NYHA III: Patient has been hospitalized for HF, and physical activity is limited by fatigue, palpitation, or dyspnea, even at rest.
- NYHA IV: Patient has been hospitalized for HF and cannot carry on any physical activity without discomfort, and they experience symptoms of HF at rest.
ECG data were obtained from the control group at an annual health check center at the University of Tokyo hospital, where individuals receive regular yearly health checks. ECG data were obtained from patients clinically diagnosed with HF during their hospitalization in the Department of Cardiology. Standard 10-s, 12-lead ECGs were recorded from all patients with HF on admission, and a NYHA class was assigned based on the patient’s symptoms and evaluation of their medical examination by expert cardiologists. Data lacking a NYHA classification were excluded, as were data from patients with a pacemaker or poor ECG recordings due to motion artifacts, inaccurate electrode application or excessive noise. The data used for CNN construction were 25,368 10-s ECGs from 6,901 participants, including both HF patients and healthy individuals.
Data preprocessing
Standard 10-s, 12-lead ECGs were recorded at 500 Hz with patients in a supine position in a resting state. Only lead I ECG data were selected for this study. To train the CNN model, we segmented the 10-s ECG recordings into heartbeat waveforms as independent input data43. Before the heartbeat segmentation, however, the recordings were preprocessed to eliminate baseline drift and noise. First, the baseline drift was removed using the wavelet decomposition method. ECG data were decomposed into sublevels, and the final approximation coefficient was taken as the baseline drift and subtracted from the original signal. Next, a Butterworth bandpass filter was applied to remove power-line noise and high-frequency distortion. Thereafter, heartbeats were detected by using the Pan-Tompkins algorithm to recognize the peaks of the R waves on the ECG recordings44. A window was then used to segment the heartbeats between 0.34 s before and 0.72 s after the R-wave peaks to capture the PQRST complexes. This process enabled us to adjust the alignment of heartbeats using the R-wave peaks.
Heartbeats were annotated based on the NYHA classification of corresponding 10-s ECG recordings. To evaluate the generalizability and stability of the proposed algorithm, we combined two NHYA classes into one group. The resultant three classes include healthy controls, NYHA I-II and NYHA Ⅲ-IV. We then trained and tested the model using the three-group dataset. To increase computational efficiency, we removed outlier heartbeats from the NYHA classes and retained the remaining heartbeats for training and validation. Euclidean distance was applied to identify the outliers based on their distances to the center heartbeat.
CNN modeling and validation
The learned features reflected by the CNN parameters were used to identify the NYHA class from the test heartbeat waveform. The CNN takes heartbeat waveforms as one-dimensional time-series inputs and outputs label predictions as NYHA classes. Each convolutional layer was followed by rectified linear unit (ReLU) activation and a 10% dropout to regularize many parameters. To be specific, the kernels (filters) used to generate the convolutional layers were chosen as [1×128], [1×2] and [1×2], respectively. The learned feature maps were then fed into two flattened and fully connected layers followed by a softmax layer with an output layer of nodes corresponding to the NYHA classes.
The proposed model was trained, validated and tested using data that was randomly split into three datasets to avoid overlap of the same ECG data between sets. Resampling imbalanced the three group datasets, and the heart beats in the three classes were adjusted to the same size. The performance was evaluated based solely on the heartbeats in the independent test dataset, which enabled an efficient global evaluation.
Visual explanation of the CNN model used to identify heart failure and NYHA classification
To understand which parts of the heartbeat waveform are most important for NYHA classification, gradient-weighted class activation mapping (Grad-CAM) was used to show the gradient of the classification score for the convolutional features determined by the network. This can help a clinician understand why the CNN model makes a given classification. The idea of Grad-CAM is to calculate the gradient of the final classification score for the final convolutional feature map. The places where this gradient is large are the places in the data upon which the final score most depends. In other words, the data points on the heartbeat waveform that have the highest Grad-CAM scores contribute most to the classification. To create an “average” Grad-CAM for each NYHA class, we calculated the respective pointwise Grad-CAM scores (normalized to between 0 and 1) for each heartbeat. We then calculated the average heartbeat waveform for each class and accumulated the occurrence of data points that had Grad-CAM scores equal to 1. From the obtained “average” heartbeat waveform and the frequency map, the places where high Grad-CAM scores most frequently occur indicate significant features that have the most impact on the classification.
Retrospective cohort study
To evaluate the CNN’s ability to detect a temporal change in HF patients, we obtained longitudinal ECG data from the electronic medical records of patients who had been hospitalized for HF at least once. The retrospective time series data was adopted to analyze the performance of the CNN for diagnosis of HF severity. Thirty patients with a history of HF hospitalization (n=30, male: 13(43%), mean age: 66±2.4 years) were randomly selected from the database. All patients were over 20-years-old and had no history of CIED implantation. Lead-I ECG data extracted from 12-lead ECGs and NYHA classifications were collected. Then using these datasets we evaluated the performance of the CNN.
Prospective observational pilot study
To evaluate the feasibility of using a CNN model to detect HF based on remote ECG monitoring, we performed a prospective pilot study (Trial registration: UMIN Clinical Trials Registry, UMIN000042073 (http://www.umin.ac.jp/ctr/index.htm)). During their hospitalization, we recruited patients who had been hospitalized because of worsening HF. The inclusion criteria were: (1) age 20 or over; (2) history of hospitalization for HF; and (3) histologically confirmed diagnosis of cardiovascular disease. The exclusion criteria were: (1) under age 20; (2) histologically confirmed diagnosis of pulmonary hypertension; and (3) prior CIED implantation. Fifteen consecutive patients who met the inclusion criteria (n=15, male: 13 (87%), mean age: 54±3.7 years, EF: 45.8±4.4%) participated in the prospective study. Participants were instructed to record their ECGs using a portable ECG monitor every day at home, after which they transmitted the ECG data using a remote monitoring system within the portable ECG monitor. All patients visited our outpatient department every one or two months. Plasma BNP levels and NYHA classification diagnosed based on the patient’s symptoms by cardiologists were collected at the time of each outpatient visit. We conducted the first patient enrollment and started a pilot study in November 2021. To evaluate the performance of the CNN, the time-course data for a HF-index defined by the CNN using the lead-I ECG waveform were compared with the plasma BNP levels.
All clinical studies were approved by the institutional ethical committee of the University of Tokyo (No. 2020024NI-(3)). For the retrospective cohort, the requirement for written informed consent was waived by the institutional ethical committee. Each patient in the prospective cohort provided informed consent before study enrollment. The study protocol complied with the Declaration of Helsinki.
Single-lead ECG device and remote home monitoring system
In the prospective study, participants recorded ECGs using a portable ECG monitor. We used a wireless single-lead device with two electrodes placed at either end of the body (SHINDENKUN®, SIMPLEX QUANTUM Inc. Tokyo, Japan). Upon gripping the stick-shaped portable ECG device, the bipolar lead-I ECG was recorded. The ECG data was then sent to a data server via the internet, and the CNN assigned the data a NYHA classification and calculated the HF-index in real time.
Statistical considerations.
We compared baseline demographics among patients. Statistical significance was tested between the groups using the chi-square test for categorical variables. The other categorical variables were expressed as numbers (percent) and compared using unpaired Student’s t-tests. All analyses were two-sided, and values of P less than 0.05 were considered statistically significant. To quantify the validation performance, we estimated the area under the curve (AUC) from receiver-operating characteristic (ROC) curves and the sensitivity, specificity, and accuracy with 95% confidential intervals. Accuracy was evaluated for the optimal operating point on the ROC curve that maximized the sensitivity and specificity. The 95% CI was estimated by bootstrapping 1,000 random and variable sampled instances. All training and validation were implemented using MATLAB R2020a and performed on an NVIDIA GeForce RTX 2080 Ti platform.