A lung nodule detection system was developed using deep learning that is both highly accurate and robust to radiation dose. Using this system as a second reader significantly decreased the number of missed nodules that required follow-up.
One strength of this study is that a large training dataset of high quality was constructed. This training dataset consisted of data from multiple institutions, and it was annotated under the supervision of radiologists. Compared with the datasets used in previous studies (25), the present dataset had more nodules per scan (5.2) and contained comprehensive annotations including even small nodules that are at risk of being overlooked and cost a lot to create ground truth. Training with these data might have made this system capable of stable detection irrespective of the radiation dose, scanning modality, or type of nodule.
In comparison with previous studies, the present study has two achievements. The first is that, in phantom experiments, the stability of the CAD system in detecting nodules irrespective of image noise level due to differences in radiation dose was demonstrated quantitatively. Deep learning-based AI systems generally exhibit poor robustness to subtle changes in images (26, 27). Because image quality varies as a result of differences in radiation dose, there are concerns that it may also affect detection results. Liu et al. (11) investigated radiation doses in retrospectively collected data and evaluated the effect of differences in dose on detection performance. However, their method is not capable of assessing the pure effect of image noise level due to differences in dose alone. In the present phantom experiments, it was possible to evaluate the effect of differences in dose on detection performance, independently of the effects of individual differences between subjects and differences between devices. It was found that, although changes in the SD value did slightly affect the detection results, there were almost no changes in sensitivity or detection performance. This may have been because robustness to slight changes in images had been achieved by data augmentation in the form of changes in sharpness and the addition of Gaussian noise.
Second, it was shown that using a lung nodule CAD system as a second reader increased the detection rate of nodules that require follow-up examinations, irrespective of the experience of the radiologist interpreting the scan. The effect of CAD use by nodule characteristics was analyzed, and it was found to be particularly effective in increasing the number of ground-glass and small nodules detected. Ground-glass nodules may be atypical adenomatous hyperplasia, as well as adenocarcinoma in situ or another form of lung adenocarcinoma, and if they are discovered early, their prognosis is extremely good (28, 29). If another primary lesion is present, small solid nodules may be metastatic tumors, and their presence affects staging and treatment methods. Lung nodule CAD use in actual clinical practice may improve the detection rate of these lesions, which would be of major benefit to patients.
In the group of subjects who had little experience with chest image interpretation, sensitivity of nodules located in lower lobes was low compared to the more experienced groups. In general, the search for lung nodules is often performed from the apex to the bottom of the lung. Inexperienced readers took a long time to interpret the image. It was probable that the sensitivity decreased in the latter half, when the concentration tended to decrease over time. Since the detection sensitivity of nodules in the lower lobes was improved by using CAD, there is a possibility that CAD leads to a constant quality of interpretation.
In the external validation test, the FP rate was higher than that in the internal validation test. To enable comparisons with previous studies, evaluations were conducted using the same datasets, but it was confirmed that these included nodules that had not been annotated, mainly tiny nodules and faint ground-glass nodules, mainly in external datasets. The LNDb dataset in particular contained scans for which only some nodules had been annotated, such as those with numerous nodules, a very large number of which were counted as FPs, and the CPM score was correspondingly lower than the real situation.
In the present study, in the reader performance test, it was found that checking the CAD results increased the number of nodules picked up by a mean of 1 nodule per scan. This increase in the number of nodules picked up could increase the number of follow-up investigations, thus increasing both patients’ radiation exposure and doctors’ workload in interpreting images. Further prospective studies are needed to investigate whether CAD use will change the number of cases of lung cancer discovered and the number of follow-up investigations required. To limit any increase in unnecessary investigations, it may be necessary to use an AI system that analyzes the size and characteristics of each individual nodule and estimates its malignancy (30, 31). The combined use of such AI may also lead to a reduction in the reading time when using CAD as a 2nd reader. Research that evaluates clinical efficacy when combined with such an AI system can be considered one future research task.