The current study validates the efficacy of deep learning methods in enhancing the image quality of 2x images, as demonstrated by both subjective and objective assessments, meeting clinical diagnostic standards. However, Likert scores of 2x-real and 2x-simu exhibit significant differences compared to standard 1x images, which often reflecting subjectivity and reliance on physician experience for interpretation. Nevertheless, objective evaluations using image quality metrics like LPIPS and FID show no significant differences from 1x images. FID quantifies the disparities between synthetic and authentic data distributions, while LPIPS gauges feature distance and perceptual likeness of image patches, potentially indicating consistency in whole-body bone images. Moreover, the observed differences in 2x-real and 2x-simu images do not impact diagnostic efficacy, as their diagnostic accuracy (81.2%, 80.7%), sensitivity (89.47%, 85.00%), specificity (75.56%, 76.74%), and AUC-ROC values (0.82, 0.81) remain comparable to those of 1x images. The results are also similar to those reported in a previous meta-analysis[3], which showed a sensitivity of 86.5% and specificity of 79.9% for diagnosing bone metastases using WBS, affirming their capability to discern malignant bone lesions effectively.
It is noteworthy that in this study, the diagnostic performance of 2x images, with doubled scanning speed, exhibited a slight decrease compared to standard 1x images. But this difference was not statistically significant, possibly due to the relatively small sample size and potential subjective bias in physician interpretation, given the reliance of bone scintigraphy interpretation on individual knowledge and experience. Nonetheless, the AUC-ROC value was below 0.8, indicating a diminished ability to differentiate malignant bone lesions compared to 1x images and 2x-real, 2x-simu images[18]. The image quality of 3x-real and 3x-simu images enhanced by deep learning methods was notably superior to that of 3x images according to both subjective and objective assessments. However, the accuracy, sensitivity, and specificity were significantly lower compared to 1x images, with all AUC-ROC values below 0.7, indicating an insufficient capability to meet clinical diagnostic requirements.
A previous study developed a deep-learning noise reduction (DLNR) algorithm for whole-body cadmium and zinc telluride (CZT) SPECT images using data from 19 patients, demonstrating that image quality remained at a good-to-excellent level even when reducing the acquisition time to 60%[19]. In this current study, a dual-head gamma camera is used to develop deep-learning models for WBS, which is more commonly used in clinical practice[20]. This study boasts a relatively large sample size and shows that ultra-fast imaging at 50% of the acquisition time (using both real and simulated datasets) can achieve comparable diagnostic efficacy to standard acquisition, as determined by subjective and objective assessments. Similarly, Minarik et al. conducted a visual evaluation of bone metastases presence in DL-filtered images with 50% counts and reported no significant difference in diagnostic performance compared to the reference image[21]. A previous study indicated that the evaluated noise-reducing Pixon algorithm enhanced planar processing does not fully compensate for the loss of counts associated with reducing the scan-time in half for WBS, and the algorithm is based on the principle that the ideal image is represented by the lowest possible number of parameters that correctly represent the raw data image[22]. In this study, two supervised deep learning models were trained based on resampled simulation dataset or acquired real clinical pairs. Our results suggested that advancements in deep learning technology have further improved the quality of low-dose scan images.
Our previous study demonstrated that ultrafast SPECT/CT with a 1/7 acquisition time could be improved using the deep learning methods to achieve image quality and diagnostic value comparable to those of standard acquisition protocols[23, 24]. While WBS and bone SPECT imaging both rely on single photon counting principles, adjacent layers and corresponding CT in tomography provide more priori image information, thus, images obtained with 1/7 standard acquisition time can be accelerated using deep learning techniques to meet diagnostic requirements. For instance, gathering additional stratified standard scan times at intervals like 48%, 46%, and 44% could help refine the minimum scan time for accelerated imaging that ensures both image quality and meets clinical requirements. However, in clinical practice, it takes effort to acquire more low-count images. A study generated low-count original images (75%, 50%, 25%, 10%, and 5% counts) from reference images (100% counts) using Poisson resampling indicated that deep learning method improved image quality and bone metastasis detection accuracy for low-count bone scintigraphy[14]. In our study, subjective scores between 2x-simu (low-count original images from reference images Poisson resampling) and 2x-real still exhibited differences, but did not differ significantly in objective LPIPS and FID scores. While objective LPIPS scores between 3x-simu (low-count original images from reference images Poisson resampling) and 3x-real showed significant difference. It is worth considering whether the objective image quality differences between simulated and real images increase with simulated count reductions, emphasizing the need for further investigation with larger sample sizes to validate these observations. As showed in Figure. 3, the image quality of 2x-real (D), 2x-simu (F), 3x-real(E), and 3x-simu (G) is improved compared to 2x (B) and 3x (C) with reduced noise and increased radiopharmaceutical counts. In WBS, there appears to be no significant difference in image quality between 2x-real (D) and 2x-simu (F), as well as between 3x-real (E) and 3x-simu (G). However, in the enlarged images within the red dashed box, it can be observed that the range of bone metastatic lesions in 2x-simu (F) and 3x-simu(G) is closer to the standard 1x (A) image, whereas in 2x-real and (D) 3x-real (E), the lesion range is slightly larger with less distinct boundaries between lesions and normal bone. These findings suggest that simulation images may not accurately simulate real images. While two models shared the same network architecture and training parameters, the difference in the generated images came from the discrepancy of simulated low-count data and real scanned low-count data. Possible explanations are continuous drug metabolized during the examination and the noise not fully consistent with the Poisson down sampling process for atomic decay.
The advancement of SPECT/CT technology has facilitated quantitative evaluations in bone imaging[25–27]. Several studies have shown strong inter-observer agreement in quantitative analysis and established a significant correlation between standardized uptake values (SUV) derived from SPECT images of bone metastases and those from PET images[28–30]. The SUV in SPECT/CT imaging is based on single photon counting, therefore, in WBS, single photon counts also holds certain quantitative reference value. One of the deep learning principles utilized in this study is to improve image quality by increasing the radiopharmaceutical counts in images acquired with reduced scan times. Consequently, the accelerated counts of 39 lesions in 2x-real, 2x-simu, 3x-real, and 3x-simu images exhibit high consistency with the original images. This further confirms the applicability of deep learning-accelerated planar bone imaging in clinical practice from an objective perspective. These findings align with those of[14], where lesion counts in their results are strongly correlated with those of the original standard images, albeit with a 10% decrease in counts, possibly attributed to the risk of signal loss due to smoothing observed in that study. The highly consistency of max lesion count between deep learning enhanced images and reference WBS could benefit from the count consistency preprocess. Hence, the scale gap between low-count scans and full-count scans in different case was well normalized. Our proposed method also contributes to well alignment of max lesion count.
This study has several limitations. Firstly, the prospective nature of data collection from a single center and a relatively small sample size may lead to biased results. Secondly, this study exclusively utilized 99mTc-MDP. It is recommended to explore the use of other radiopharmaceuticals in future studies to enhance the generalizability and robustness of this deep learning models. Lastly, our study just collected images acquired at 2x and 3x scanning times and assessed the quality of these images after undergoing deep learning processing. The results indicated that 2x-real and 2x-simu images meet clinical diagnostic requirements. However, further refinement of the scanning time between 2x and 3x is needed to identify the shortest scanning time that can meet clinical diagnostic needs after using deep learning methods.