Using Deep Learning to Detect Pediatric Congenital Heart Disease by Chest Radiography


 Purpose: To detect congenital heart disease (CHD) patients is a challenging task in low-income areas with limited resources. Chest radiography is usually available in these areas, but the diagnostic accuracy needs to be improved. The aim of the study was to establish a deep learning-based diagnostic tool for paediatric CHD patients using chest radiography. Methods: Totally 11,105 chest radiographs from 11,105 paediatric patients in two centres were labeled, and a convolutional neural network (CNN) was trained for diagnosing CHD in the dataset from Hospital 1. To demonstrate the generalizability and the clinical usefulness of CNN, the accuracy in test set from two centres were both reported. We also trained another CNN to evaluate pulmonary blood flow (PBF) on chest radiographs of the patients. Moreover, the predictive results of both CNN were compared to the decisions of three experienced radiologists (range 10–35 years) without any patient information. Results: The accuracy of CNN for detecting CHD was 81.4% (95% confidence interval, [78.3%–84.2%]) in Hospital 2, while the accuracy was 85.6% (95% confidence interval, [84.0%–87.1%]) in Hospital 1. The overall accuracy of CNN for evaluating PBF was 78.9% (95% confidence interval, [74.3%–83.0%]). Compared to three experienced radiologists, CNN showed statistically non-inferiority at a 5% margin for the accuracy (P < 0.025) in detecting CHD and evaluating PBF.Conclusions: The deep learning-based diagnostic tool we developed could effectively detect CHD and evaluate for PBF on chest radiography. It can potentially aid the radiologists and cardiologists in the future.


Introduction
Congenital heart disease (CHD) is a signi cant cause of paediatric morbidity and mortality. The prevalence and incidence of CHD are reported to be approximately 3.7-4.3 % and 1 % per live births, respectively [1]. The pediatric CHD patients would bene t from timely detection, treatment and intervention. In highly developed countries in the world, the combination of fetal echocardiography, pulse oximetry and routine clinical assessment has been considered as an effective and suitable method to detect CHD [2; 3]. However, many children with simple but serious CHD in low-income, low-resource areas cannot be detected timely enough without those diagnostic methods and potentially create preventable suboptimal clinical outcomes [4; 5].
In areas with limited resources, chest radiography is a helpful and common method to detect paediatric CHD patients. The radiologists could evaluate the heart size, heart shape, and pulmonary blood ow (PBF) from the chest radiography. However, the accuracy of chest radiography has been reported from 30% to 78 % in different clinical settings in distinguishing children with CHD from normal children [6; 7]. Diagnostic accuracy varied even among experienced paediatric radiologists who were blinded to the patients' history. Moreover, the majority of the local radiologists and cardiologists in these rural areas without the bene t of working experience in major tertiary medical centers may not be familiar with CHD, which may lead to the more serious misdiagnosis of those children with CHD.
Deep learning has the potential to help human experts with disease diagnosis and management, especially in areas with limited resources. In Diagnostic Radiology, convolutional neural network (CNN) was shown to automatically and effectively diagnose paediatric pneumonia on chest radiographs with high accuracy [8]; CNN was also used for estimating prognosis and Pulmonary to Systemic Flow Ratio in CHD patients [9; 10]. Therefore, we hypothesize CNN can help radiologists and cardiologists with the diagnosis and detection of CHD on chest radiographs of pediatric CHD patients. To verify this hypothesis, we utilized a large dataset of labeled chest radiographs in Hospital 1 to train the CNN, then prove the predictive performance of CNN in test set from Hospital 1 and Hospital 2. To provide further assistance to the radiologists and cardiologists, we also trained another CNN to classify PBF of the patients in chest radiography in the present study.

Study Population
A total of 11,105 paediatric patient under the age of 14 years from two centers in different cities from January 2014 to March 2020 were included in the study. The dataset from Hospital 1 consisted of 5165 patients with CHD and 5240 normal controls, while the dataset from Hospital 2 consisted of 350 patients with CHD and 350 normal controls. Final diagnoses were determined from clinical information, computed tomography, echocardiography or surgical reports. The CHD patients with increased PBF was de ned as the ratio of pulmonary-to-systemic blood ow (Qp:Qs) of greater than 1.5 derived from cardiac catherization [11]. The patients were diagnosed to have decreased PBF by a combined evaluation of oxygen saturation and morphological features investigated with computed tomography angiography [12]. The normal controls were pediatric patients with other diseases, but without cardiopulmonary disease based on clinical history and physical examination. No patient information such as age, sex, clinical history, physical examination results, or other diagnostic examinations results were included with the chest radiographs.

Chest Radiography Acquisition
All chest radiographs were performed as part of routine clinical care, and each patient had only one chest radiograph. These chest radiographs were acquired by multiple different radiology technologists utilizing various radiographic equipment manufactured by different vendors (including GE Healthcare, Philips Healthcare, Siemens Healthineers, Shimadzu, and Fuji lm). All chest radiographs were retrieved from the PACS of the respective hospitals with the highest quality possible in original size without any speci c window setting.

Deep Learning and Statistics
The deep learning system we developed consisted of two separate CNNs in the present study (Fig 1). The rst CNN was used for determining whether the patient has CHD (label: positive or negative). The second CNN was used to evaluate the PBF of the patients (label: normal, increased, or decreased).
Both CNN were trained limited only to the chest radiographs without any other information such as clinical history.

CNN1 for detecting CHD
A total of 11,105 chest radiographs in two centers were included in this deep learning experiment. The 10,405 chest radiographs in Hospital 1 were randomly separated into training set (CHD: 3165; normal control: 3240), validation set (CHD: 1000; normal control: 1000) and test set (CHD: 1000; normal control: 1000). To evaluate the ability of CNN to generalize across populations and screening settings, 700 chest radiographs (CHD: 350; normal control: 350) in Hospital 2 were recognized as the external test set.
We retrained a VGG-16 network as the nal model, which was pretrained on approximately 1.28 million images from the ImageNet Large Scale Visual Recognition Challenge [13]. The training process was performed by stochastic gradient descent per step using an Adam Optimizer; the loss was calculated using cross-entropy. The activation function of the output layer was a sigmoid layer. To monitor the training procedure, the cross-entropy loss in validation set was recognized as the main metric of selecting the best model in the training procedure. The early stopping technique was used to avoid the over-tting problem and improve the training e ciency.
The accuracy of the test sets in both Hospital 1 and Hospital 2 were utilized as the main metric to evaluate the predictive performance of CNN (Figs 2 and 3). Moreover, the nal decisions of the CNN on the 2000 chest radiographs in test set in Hospital 1 and 700 chest radiographs in Hospital 2 were compared with the interpretations made by three experienced radiologists (a 35-year experienced radiologist, a 25-year experienced radiologist and a 10-year experienced radiologist). These chest radiographs were individually reviewed by experienced radiologists blinded to patient history, case composition and ratio. The confusion matrices of CNN and experienced radiologists were constructed.
We ne-tuned the weights of the upper convolutional layers, and retrained the weights of the fully connected layers utilizing a pre-trained VGG-16 architecture [14]. The activation function of the output layer was performed by using a Softmax layer. The training process was performed by stochastic gradient descent per step using an Adam Optimizer; the loss was also calculated using cross-entropy; and early stopping technique was also performed.
The accuracy of the test set was recognized as the main metric in this multi-class classi cation problem. The output decisions of the CNN on a test set of 360 chest radiographs were also compared with the interpretations by the same three experienced radiologists as described previously. The confusion matrices of CNN and experienced radiologists were also constructed (Fig 4).
To demonstrate the non-inferiority in diagnostic performance of CNN with each experienced radiologist in test set, the non-inferiority test was performed, with a pre-de ned margin of -5% [15]. A one-sided P value <0.025 was recognized as statistically signi cant.

Results
Totally 11,105 patients under the age of 14 years from two centers were enrolled in this study. The distribution of chest radiographs from CHD patients in two centers was shown at table 1.

Discussion
In the present study, we utilized a large dataset of paediatric chest radiographs and trained the CNN for detecting paediatric CHD. To provide further assistance for the radiologists, we also trained another CNN to evaluate PBF of the patients from the chest radiographs. Our results indicated that CNN performed well in the detection of CHD (accuracy, 81.4% (95% con dence interval, [78.3%-84.2%]) in the external test set), and the evaluation of PBF of paediatric patients (accuracy, 78.9% (95% con dence interval, [74.3%-83.0%])) on chest radiographs in test set.
The deep learning based diagnostic tool we have developed has the potential to be widely used in hospitals in small cities and rural areas with limited availability of advanced imaging modalities such as echocardiography, CT, and MRI, to help the radiologists and cardiologists with the diagnosis CHD from chest radiographs. Classically, chest radiographs had been established as a valuable tool in the evaluation of patients with heart murmur or the suspicion of CHD [16]. It had been taught that speci c CHDs can have classic manifestations on chest radiography. Typical examples include a boot-shaped heart [11] for tetralogy of Fallot [17], "egg on a string" for D-transposition of the great arteries [18], and a snowman appearance for total anomalous pulmonary venous connection [17], etc. Furthermore, chest radiography is useful for the assessment of PBF in CHD patients; and PBF is one of the most important ndings for differential diagnosis of CHD, especially when availability of cross-sectional imaging is limited. PBF represented the severity of the hemodynamic status in CHD patients, and it usually affected the treatment of CHD patients. According to the standard management guideline in CHD, nonhemodynamically signi cant patients are recommended for medical treatment, whereas interventional or surgical treatment is recommended in patients with signi cant hemodynamic change [19]. However, the role of chest radiography has been limited. Diagnostic accuracy varied with even among experienced paediatric radiologists who were blinded to the patients' history. Laya et al studied the accuracy of chest radiography in detecting pediatric CHD patients. The accuracy ranged from 71.7% to 82.4% for 5 experienced radiologists in distinguishing normal from CHD patients [7], which was similar with the results in the present study. Therefore, we could imagine that, the majority of the radiologists and cardiologists in rural areas without the bene t of working experience in major tertiary medical centers may lead to the more serious misdiagnosis of those children with CHD.
Recently, Toba et al trained a CNN to detect the CHD patients with signi cantly increased PBF (Qp:Qs >2.0), and the CNN correctly classi ed 64 of 100 chest radiographs. However, the accuracy or true positive rate of detecting patients with decreased PBF was not mentioned [10]. In Tumkosit et al, the true positive rate of chest radiography for experienced radiologists or cardiologists to detect decreased PBF was much lower than to detect increased PBF [11 ; 20]. Therefore, to detect CHD patients with decreased PBF should be regarded as an important goal and a di cult task. To address these problems, the CNN we have trained could effectively classify the patients with increased PBF, the patients with and decreased PBF, and normal control. In the present study, the true positive rate of detecting patients with decreased PBF of CNN was 70.8% , achieved by the 35-year experienced radiologist 55.8% (+15.0%, 95% con dence interval, [3.0%-27.0%], P<0.001), 25-year experienced radiologist 48.3% (+22.5%, 95% con dence interval, [10.4%-34.6%], P<0.001), 10-year experienced radiologist 21.7% (+49.2%, 95% con dence interval, [38.2%-60.1%], P<0.001), respectively. It turned out that CNN could provide a potential to address the lowtrue-positive-rate detection problems in decreased PBF, probably because CNN may have a stronger ability in image identi cation or classi cation.
Though the sample size of paediatric CHD patients was over 10,000, the present study was still a double-center study in detecting CHD and a single-center study in evaluating PBF. The patients in different groups were neither age-matched nor gender-matched. The CNN we had trained seemed to be relatively sensitive but not very speci c. These problems could be addressed by a mult-centre studies in the future.
In evaluating PBF in CHD patients, we did not have hemodynamic data from cardiac catheterization or by MRI ow quanti cation in all CHD patients suspected with decreased PBF. The cardiac catheterization was the routine examination for the CHD patients suspected with pulmonary hypertension, but not the essential examination for all CHD patients. In addition, the CNN for evaluating PBF still needed to be improved. The reason may be the lack of the dataset of CHD patients with increased PBF and decreased PBF con rmed by cardiac catheterization.

Conclusions
In the present study, the two CNNs we have trained could effectively detect pediatric CHD patients and evaluate for PBF on chest radiography with similar accuracy as experienced radiologists. We postulate that the CNNs can assist the radiologists and cardiologists who are in small rural hospitals and have limited experience in the detection of presence of CHD on chest radiographs of paediatric patients. Con icts of interest All authors declare no con icts of interest.
Availability of data and material All data and materials support the published claims and are available from the corresponding author.
Code availability The code was available from the corresponding author.  PBF, pulmonary blood ow; CI, con dence interval