DOI: https://doi.org/10.21203/rs.3.rs-460896/v1
Background: The diagnosis of pneumoconiosis relies primarily on chest radiographs and exhibits significant variability between physicians. Computer-aided diagnosis (CAD) can improve the accuracy and consistency of these diagnoses. However, CAD based on machine learning requires extensive human intervention and time-consuming training. As such, deep learning has become a popular tool for the development of CAD models. In this study, the clinical applicability of CAD based on deep learning was verified for pneumoconiosis patients.
Methods: Chest radiographs were collected from 5424 occupational health examiners who met the inclusion criteria. The data were divided into training, validation, and test sets. The CAD algorithm was then trained and applied to processing of the validation set, while the test set was used to evaluate diagnostic efficacy. Three junior and three senior physicians provided independent diagnoses using images from the test set and a comprehensive diagnosis for comparison with the CAD results. A receiver operating characteristic (ROC) curve was used to evaluate the diagnostic efficiency of the proposed CAD system. A McNemar test was used to evaluate diagnostic sensitivity and specificity for pneumoconiosis, both before and after the use of CAD. A kappa consistency test was used to evaluate the diagnostic consistency for both the algorithm and the clinicians.
Results: ROC results suggested the proposed CAD model achieved high accuracy in the diagnosis of pneumoconiosis, with a kappa value of 0.90. The sensitivity, specificity, and kappa values for the junior doctors increased from 0.86 to 0.98, 0.68 to 0.86, and 0.54 to 0.84, respectively (p<0.05), when CAD was applied. However, metrics for the senior doctors were not significantly different.
Conclusion: DL-based CAD can improve the diagnostic sensitivity, specificity, and consistency of pneumoconiosis diagnoses, particularly for junior physicians.
Pneumoconiosis is the most widely distributed and harmful occupational disease in China [1, 2]. It is characterized by the diffuse fibrosis of lung tissue, caused by long-term inhalation of inorganic mineral dust and retention in the lungs during occupational activities. The cumulative number of cases has reached one million in china and it continues to increase at a rate of more than 20,000 new cases per year [2]. Dust inhaled through the respiratory tract can include silicon dioxide molecules, which cause pneumoconiosis alveolitis, focal lesions, nodular lesions, dusty fibrosis, massive fibrosis, and other pathological changes in the lungs. These lesions primarily manifest as small round structures with an irregular opacity, diffuse interstitial fibrosis of the lungs, and silicosis masses.
Conventional classification of pneumoconiosis is primarily based on the "International X-ray Classification of Pneumoconiosis" guidelines issued by the International Labor Organization in 2011 [3, 4]. The current standard in China is the "Diagnosis of Occupational Pneumoconiosis" (GBZ 70-2015), which is based on the correct interpretation of chest X-ray films using the profusion of small opacity, lung zone distributions, and pleural plaques to diagnose and classify pneumoconiosis [5, 6].
As the occurrence and progression of pneumoconiosis is a continuous process, the profusion of small opacity lesions in chest radiographs is also a sequential procedure. While professional training, comparisons of standard film, and improvements to imaging equipment and technology can increase the diagnostic accuracy of occupational physicians, inter-clinician differences remain high due to the subjective nature of image interpretation. Prior studies under the same external conditions have suggested that inconsistent judgments based on the shape, size, and number of small opacity lesions are the primary cause of inconsistent pneumoconiosis classification [7]. As such, increasing the objectivity, accuracy, and consistency of the diagnostic process is highly important.
Computer-aided diagnosis (CAD) provides an objective methodology for improving the interpretation of medical images. The rapid development of computer technology and medical imaging equipment has enabled the effective combination of artificial intelligence and image processing in recent years, which has led to improvements in detection and evaluation of disease severity [8]. Using computer-generated results as a reference, radiologists can draw more accurate conclusions for disease screenings and cancer risk assessments [9–12].
Several recent studies have investigated the application of CAD technology to the diagnosis of pneumoconiosis. New approaches have been developed for image preprocessing, characterization extraction, classifier selection, and optimization [13–15]. However, acquiring the annotated training data used in conventional machine learning models can be both time- and cost-prohibitive, as lesions must be labeled manually by clinical experts. Subjective error and bias can also become problematic in this process.
The rapid development of artificial intelligence has led to broad applications of deep learning (DL) algorithms for medical image analysis. These models utilize a multi-layer network to facilitate the automated learning of implicit relationships within the data. The resulting characteristics are often more diverse and expressive, particularly in tumor imaging applications. DL can provide semi-supervised or unsupervised autonomous learning of a target image for classification tasks. It can also synthesize images with the same characteristics and imitate the independent learning and analysis capabilities of humans, thereby reducing the subjectivity of extracted features [16–19]. At present, there is no relevant report on the evaluation of pneumoconiosis imaging diagnoses based on deep learning computer-aided diagnostic technology.
Convolutional neural networks (CNNs) and deep residual network (DRNs), an extension of CNNs, are common DL algorithms used for image classification [20]. These models provide the advantages of simplicity, practicality, and generalizability. Alternative models have also been proposed with varying convolutional layer quantities, including ResNet18, ResNet50, and ResNet101, which includes five network depths, a total of 100 convolutional layers, and a connection layer. These and other similar algorithms have been widely applied to image segmentation, detection, and recognition tasks [21, 22]. This study focused on assessing the application value of DL-based CAD technology in diagnosing pneumoconiosis.
This study was approved by the ethics committee of the Occupational Safety and Health Research Center (No. 2018003). Informed consent was waived in this study due to its retrospective nature.
Chest radiographs were acquired from occupational health monitoring medical examiners from January to September 2017. A total of 6020 patient (all male) images were include in the dataset, aged between 22 to 67 years (mean age: 48.62±12.1 y), with a 2–38 years of dust exposure history. The quality and diagnosis of chest radiographs were based on GBZ70-2015. Inclusion criteria required DR image quality to meet second level film standards. Exclusion criteria included signs of pneumonia, a tumor, tuberculosis, or other lung diseases that might affect the diagnosis. The number of control and study groups, as well as the stage criteria for study groups, are listed in Tables 1 and 2, respectively. The gold standard was produced by three experienced clinicians. Cases diagnosed as positive by two or more experts were classified as positive, and the remaining images were assumed to be negative. Pneumoconiosis in varying stages is shown in Figure 1.
To avoid the problem of over-fitting (common in machine learning), fifty images were selected using a random number table method to form positive and negative case groups. A total of 100 cases were used as the test set and the remaining positive and negative data were divided into training and verification sets using an 80/20 proportion. The training set was used for preliminary network training and the test set was used to evaluate the diagnostic efficiency of the CAD system.
Table 1
Case composition of pneumoconiosis data set*
Data set |
Negative |
Positive data |
|
Total |
||
Stage Ⅰ |
Stage Ⅱ |
Stage Ⅲ |
Total |
|||
Training set |
3224 |
704 |
227 |
104 |
1035 |
4259 |
Verification set |
806 |
176 |
57 |
26 |
259 |
1065 |
Test set |
50 |
34 |
11 |
5 |
50 |
100 |
Total |
4080 |
914 |
295 |
135 |
1344 |
5424 |
Note:*The grading standard of the pneumoconiosis is based on GBZ70-2015 promulgated by China |
Table 2
Definition of pneumoconiosis of different stages based on GBZ70-2015
Staging of pneumoconiosis |
Overall profusion |
Distribution of zone of lung |
StageⅠ |
1 |
at least 2 areas |
StageⅡ |
2 |
exceed 4 areas |
3 |
4 areas |
|
Stage Ⅲ |
- |
have large opacity exceeding 2cm×1cm |
3 |
exceeds 4 areas with small opacity clustering or merging large opacity |
A DL algorithm based on DRN-Resnet101 was used to establish a CAD system for diagnosing pneumoconiosis. The included parameters were set as follows. The pre-training learning rate was 0.001 (using the SGD algorithm), which was decreased to 0.0001 after 6000 iterations. A total of 10000 epochs were used with a batch size of 64. All images were first desensitized, and a U-Net was used to extract lung fields on both sides, by removing invalid areas and adjusting the image size to a single 256 × 256 pixel matrix [23]. Images were then converted to a JPG format and the training data were input to the model. The verification set was used to determine the effectiveness of the model and gradually improve the accuracy of model output through continuous iterative optimization.
Six physicians were divided into junior doctor (JD) and senior doctor (SD) groups based on their diagnostic experience. The three doctors in the SD group each had more than 10 years of experience in diagnosing pneumoconiosis. Those in the JD group each had 3–5 years of experience. These participants conducted independent diagnoses on the test set and a comprehensive diagnosis with reference to the CAD results, following the GBZ70-2015 guidelines. The interval between the two diagnoses was 10 days. Chest radiograph images were interpreted using a PACS workstation and displayed on Jusha 5 M medical monitors.
The MedCalc 15.2.2 software package (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2015) was used for statistical analysis. A receiver operating characteristic (ROC) curve was used to evaluate the diagnostic efficacy of the CAD system for diagnosing pneumoconiosis [24]. Area-under-curve (AUC), sensitivity, and specificity values were also calculated. A McNemar test was used to evaluate diagnostic sensitivity and specificity for clinicians with and without the use of CAD software. A value of p<0.05 was considered to be statistically significant. A Kappa test was included to evaluate consistency between the CAD results, the physician diagnosis (with and without CAD), and the gold standard. Consistency was poor, fair, and good for K values of 0.01–0.39, 0.4–0.74, and 0.75–1, respectively.
The results of ROC analysis showed the AUC value for CAD-based pneumoconiosis was high (see Figure 2 and Table 3). Diagnostic sensitivity and specificity were also high and consistent with the gold standard. Classification results are shown in Tables 2 and 3, where it is evident that including CAD increased the diagnostic sensitivity of the JD group from 0.86 to 0.98 and the specificity from 0.68 to 0.86. These differences were statistically significant. Diagnostic sensitivity for the SD group increased from 0.94 to 0.98 and specificity increased from 0.90 to 0.94. This difference was not statistically significant. Diagnostic results with and without the use of CAD are provided in Tables 4 and 5.
Table 3
ROC results for independent CAD diagnoses
Cutoff value |
AUC |
Sensitivity |
Specificity |
|||
0.53
|
Value |
95% CI |
Value |
95% CI |
Value |
95% CI |
0.99 |
0.97~1.00 |
0.94 |
0.82~0.98 |
0.96 |
0.85~0.99 |
|
Note:ROC: receiver operator characteristic; AUC: area under curve. |
Table 4
Diagnostic results with and without the use of CAD
JDs |
GS |
Total |
JDs+CAD |
GS |
Total |
||
- |
+ |
- |
+ |
||||
- |
34 |
7 |
41 |
- |
43 |
1 |
44 |
+ |
16 |
43 |
59 |
+ |
7 |
49 |
56 |
Total |
50 |
50 |
100 |
Total |
50 |
50 |
100 |
SDs |
GS |
Total |
SDs+CAD |
GS |
Total |
||
- |
+ |
- |
+ |
||||
- |
45 |
3 |
48 |
- |
47 |
1 |
48 |
+ |
5 |
47 |
52 |
+ |
3 |
49 |
52 |
Total |
50 |
50 |
100 |
Total |
50 |
50 |
100 |
Note:JD: junior doctor; SD: senior doctor; GS: the gold standard. |
Table 5
A comparison of sensitivity and specificity with and without the use of CAD
|
Sensitivity |
Specificity |
||||
Without |
With |
P |
Without |
With |
P |
|
JDs |
0.86 (0.76-0.96) |
0.98 (0.94-1.00) |
0.03 |
0.68 (0.55-0.81) |
0.86 (0.76-0.96) |
0.02 |
SDs |
0.94 (0.87-1.00) |
0.98 (0.94-1.00) |
0.50 |
0.90 (0.82-0.98) |
0.94 (0.87-1.00) |
0.68 |
Note:JD: junior doctor; SD: senior doctor |
Kappa test results showed that including CAD increased the diagnostic consistency between the JD group and the gold standard from moderate to good, as the kappa value increased from 0.54 to 0.84. The consistency of the SD group improved slightly, with the kappa value increasing from 0.84 to 0.92 (see Table 6).
Table 6
A comparison of diagnostic consistency with and without the use of CAD
Doctors |
Without |
With |
||
Kappa |
95%CI |
Kappa |
95%CI |
|
JDs/GS |
0.54 |
0.37-0.70 |
0.84 |
0.73-0.94 |
SDs/GS |
0.84 |
0.73-0.94 |
0.92 |
0.84-0.99 |
Note:JD: junior doctor; GS: the gold standard; SD: senior doctor |
These results demonstrate that CAD can improve the consistency between diagnostic results and a gold standard, particularly for junior physicians whose performance approached the level of more senior physicians. Specificity improved significantly for the JD group (p<0.05) and their diagnostic consistency improved from moderate to good. Applying CAD also improved the sensitivity, specificity, and consistency for the SD group, though the results were not statistically significant.
The diagnosis of pneumoconiosis has conventionally relied on manual interpretation, which can be subjective and inconsistent. This study made full use of the advantages offered by machine learning technology, developing a self-learning training model based on ResNet101 for the representation of pneumoconiosis in chest radiographs. ResNet101 can decompose a problem into multiple direct residual problems, using a residual vector coding scheme for image processing, without the need for additional network parameters or calculations. In this case, model training speed was increased, and classification accuracy was improved [20].
Pneumoconiosis is characterized in radiograph images by a diffuse distribution of low opacity objects of varying size. The shape, size, quantity, and distribution of these structures is difficult to accurately describe. As such, there is a ‘semantic gap’ between low-level image features and high-level medical terms [25]. This study used the entire lung field on both sides of the image as the research object, including the diffuse structures and some surrounding tissue. There was no need to perform complex feature extraction on specific object representations, thereby avoiding these semantic gaps. The algorithm extracted characteristic information for pneumoconiosis lesions and improved the accuracy of model classification. The collected data were used to train a CAD system, based on a ResNet101 model, for automated pneumoconiosis diagnosis.
Output results for 100 chest X-rays in the test group demonstrate the high diagnostic efficiency provided by the proposed CAD model. The AUC, sensitivity, specificity, and consistency were high when compared with a gold standard. In a future study, we will pursue various options for increasing the intuitive nature of predicted results [26]. For example, heat maps have been used in previous studies to visualize a portion of the test image [9, 12]. Pixels with larger values indicate a greater contribution to the result, which could help physicians to better understand the conclusions of automated diagnoses.
Our results showed that thproposed CAD system improved the overall sensitivity, specificity, and consistency of pneumoconiosis diagnoses for physicians with varying levels of experience. Specificity increased for all three physicians in the JD group and sensitivity improved for two of the three. Sensitivity measures the probability of correctly diagnosing positive cases in a test group, while specificity measures the probability of correctly diagnosing negative cases. High sensitivity could identify patients with pneumoconiosis, providing for earlier treatment and an improved quality of life. High specificity could screen out patients who do not require further examination, thereby reducing caseloads. Junior physicians are generally more prone to misdiagnosis due to a lack of experience. However, the proposed CAD system effectively improved both sensitivity and specificity for the JD group, producing an accuracy that was comparable to more experienced clinicians. Although the senior physicians were more accurate before including CAD, the proposed system increased their sensitivity and specificity, though the difference was not statistically significant.
The presented study does have certain limitations. For example, although the proposed CAD system, based on a deep residual neural network (ResNet101), can achieve high diagnostic accuracy and consistency for independent diagnoses, pneumoconiosis requires a comprehensive diagnosis. It relies not only on chest X-rays, but also on professional history, epidemiology, and clinical manifestations. The inclusion of qualitative factors like these in computer-based decision making is a topic that requires further research. In addition, the number of patients included in this study was small and the results exhibited some deviation. In a future work, the amount of data will be increased for further analysis. Finally, this study is only a preliminary investigation of whether CAD can diagnose pneumoconiosis. Research on different stages of pneumoconiosis should be conducted in the future.
In summary, the results presented in this study demonstrate that CAD can effectively improve the sensitivity, specificity, and consistency of pneumoconiosis diagnoses, particularly for junior physicians. As such, the proposed model could be a powerful new tool for reducing diagnostic subjectivity and inter-clinician variability.
CAD: Computer-aided diagnosis
SD: Senior doctor
JD: Junior doctor
GS: Gold standard
ROC: Receiver operator characteristic
AUC: Area under curve
95% CI: 95% Confidence interval
Ethics approval and consent to participate.
This study was approved by the Ethics Committee of the National Center for Occupational Safety and Health ,NHC,China (ID: 2018003).
Not applicable.
The dataset for the current study is not publicly available, for the purpose of protecting the privacy of the participants. The data are available from the corresponding author upon reasonable request.
Authors declare no conflicts of interest.
This study was funded by the National Center for Occupational Safety and Health, NHC. (NO:2020025) for data collection and analysis.
LZ and WH designed the research. ZW, QQ, JZ and CD performed the data collection. ZW,QQ and JZ performed the data analysis. ZW wrote the manuscript. LZ, JZ, QQ and CD revised the manuscript. All the authors have reviewed the final version of the manuscript and approved it for publication.
The authors wish to thank Dr. Xiaopeng Wei,Yingjie Wang and Ruizhen Liu for their data collection supports, and Dr. Wei Wei for her Statistical advice.