Study Samples
We enrolled 144 participants in a population-based retrospective cohort study at Ramathibodi Hospital. The data were collected from electronic medical records from January 1, 2010, to December 31, 2019.
Adults (> 18 years of age) who had a clinical diagnosis of heart failure (New York Heart Association functional class [NYHA FC] I-IV) and who underwent evaluation via chest X-ray (CXR), echocardiography, or pulmonary capillary wedge pressure (PCWP) were eligible for this study. Pregnant participants and those with complex congenital heart disease, right-sided heart failure, or end-stage malignancy were excluded.
We collected data on baseline characteristics at the time of HF presentation, including age, sex, body weight, height, body mass index, symptoms and signs of HF, comorbidities, medications, devices, blood pressure, heart rate, NYHA FC, LVEF by echocardiography, NT-proBNP, and PCWP.
Ascertainment of HF diagnosis
We used the ICD-10 codes 500, 501, and 509 for the HF diagnosis. The data for HF diagnosis were obtained from electronic medical records, where we reviewed the clinical findings of HF and evaluated the chest X-ray (CXR) data. The diagnosis of HF was made by the presence of the signs and symptoms of HF, with typical HF CXR and abnormal clinical parameters: elevated NT-proBNP, abnormal echocardiography, and/or elevated PCWP. A typical HF-CXR was defined by evidence of redistribution, interstitial edema, alveolar edema, cardiomegaly, and/or pleural effusion; abnormal echocardiography findings, such as an LVEF < 40% (HF with reduced ejection fraction), an LVEF ≥ 40% plus left ventricular hypertrophy, left atrial enlargement, or diastolic dysfunction (HF with mildly reduced ejection fraction and HF with preserved ejection fraction). An elevated PCWP, indicated by a PCWP > 18 mmHg, was determined through right heart catheterization. (9–10, 14–16)
Dataset
We obtained 240 HF CXRs from 144 participants from Ramathibodi Hospital. The RAMA dataset was reviewed and verified by two cardiologists and two radiologists, and a consensus was reached with at least two of the four reviewers in agreement. Prior to interpretation, the CXRs were labeled by consensus as showing pulmonary congestion (interstitial edema or alveolar edema), cardiomegaly, or pleural effusion. Two cardiologists verified the clinical diagnosis based on the signs and symptoms of HF, PCWP, NT-proBNP, and LVEF. The images were then resized from 1024 × 1024 to 256 × 256 pixels, and the final metric was flattened. The final images for training, testing, and validation were selected from the 240 images, with a ratio of 70:15:15 for training, testing, and validation, respectively (17, 18). Feature vectors consisted of clinical factors and dense data, and the HFNet results were used to evaluate the AUC of pulmonary congestion, cardiomegaly, and pleural effusion along with their concomitant clinical parameters (NT-proBNP, LVEF by echocardiography, and PCWP).
Deep learning modeling
Our model, HFNet, was built using Keras version 2.3.1 and Python version 3.7.9. In the initial phase of our study, we developed a pretrained model, DenseNet121, based on reference data from the CheXNet dataset. (13) Subsequently, we customized the model by removing the last layer and adding three dense layers to specifically target three layers that are associated with heart failure, namely, edema, cardiomegaly, and effusion. This approach enabled us to enhance the ability of the model to accurately identify and analyze these specific findings.
Handling Missing Data
To ensure robustness in real-world scenarios, we carefully addressed missing data in our model. When dealing with clinical vectors such as the PCWP, which may have missing values, we employed a data preprocessing approach. Missing data were imputed using the mean value of the respective feature, as this provided a reasonable approximation given the nature of our dataset.
Subsequently, we extended the layers of the model and performed transfer learning on the RAMA dataset. Using this approach, we were able to develop the HFNet model, which is specifically tailored to identify and analyze findings related to heart failure. The HFNet architecture consists of an input layer that takes the (N, 256, 256, 3) input tensor, where N is the number of samples. This input flows to the next convolutional layer (CONV2D) with a tensor shape (N, 252, 252, 128) for feature extraction. The next layer is a max pooling layer for downsampling to a tensor size of (N, 126, 126, 128). Thereafter, we applied a flattened layer to transform the three-dimensional tensor into a feature vector (N, 2032128) to concatenate it with another clinical feature vector. These concatenated features were connected to two dense layers of 16 neurons using the ReLu activation function, where dropout was applied. Finally, for the multilabel classification problem, the output layer was a dense layer of (N, 3) to classify the three radiological findings. Our unique feature model differs from conventional AI models in that we did not develop it solely from input data obtained from chest X-ray imaging. Instead, our model utilizes clinical vectors such as the PCWP in conjunction with imaging data. The overall architecture is illustrated in Fig. 1.
Our strategy applied mini-batch training, where the batch size was set to 16. Early stopping was performed to prevent overfitting. The model stopped learning when the validation loss was no longer improved for 20 epochs, and the best model parameters were restored from the best epoch, as shown in Fig. 2. The dynamic learning rate was applied using the ReduceLROnPlateau, which reduced the learning rate by 10% if the validation loss was no longer improved five consecutive times. The initial learning rate was set to 0.001.
Sample size calculation
A power analysis was conducted using G*Power software to determine the appropriate sample size for our study. We selected the "linear multiple regression: fixed model, R² deviation from zero" analysis type since our AI model includes three predictors related to edema, effusion, and cardiomegaly in chest X-ray for assisting heart failure diagnosis. An effect size of 0.15 was deemed small based on our understanding of the research question. The significance level (alpha error) was set to 0.05, representing the probability of rejecting the null hypothesis when it is actually true. We selected a desired power of 0.95, representing the probability of correctly rejecting the null hypothesis when it was actually false. Based on these criteria, GPower software was used to calculate a sample size of 119 participants to achieve a power of 0.95. This sample size was determined to be adequate for detecting a small effect size with a 95% confidence level for our AI model's ability to detect congestive findings via chest X-ray.
Statistical analysis
The baseline characteristics are presented as percentages for categorical variables and as the means with standard deviations for continuous variables. For continuous variables that were not normally distributed, medians and interquartile ranges (IQRs) were used for presentation. The normality of the data was assessed using the one-sample Kolmogorov‒Smirnov test. To evaluate the predictive accuracy of the models, the area under the curve (AUC) was calculated. IBM SPSS Statistics version 23 was used for conducting the statistical analyses.