Dual-energy CT-based virtual monoenergetic imaging via unsupervised learning

doi:10.21203/rs.3.rs-3925876/v1

Download PDF

Research Article

Dual-energy CT-based virtual monoenergetic imaging via unsupervised learning

https://doi.org/10.21203/rs.3.rs-3925876/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Since its development, virtual monoenergetic imaging (VMI) derived from dual-energy computed tomography (DECT) has been shown to be valuable in many clinical applications. However, DECT-based VMI showed increased noise at low keV levels. In this study, we proposed an unsupervised learning method to generate VMI from DECT. This means that we don’t require training and labeled (i.e. high-quality VMI) data. Specifically, DECT images were fed into a deep learning (DL) based model expected to output VMI. Based on the theory that VMI obtained from image space data is a linear combination of DECT images, we used the model output (i.e. the predicted VMI) to recalculate DECT images. By minimizing the difference between the measured and recalculated DECT images, the DL-based model can be constrained itself to generate VMI from DECT images. We investigate whether the proposed DL-based method has the ability to improve the quality of VMIs. The experimental results obtained from patient data showed that the DL-based VMIs had better image quality than the conventional DECT-based VMIs. Moreover, the CT number differences between the DECT-based and DL-based VMIs were distributed within $\pm$10 HU for bone and $\pm$5 HU for brain, fat, and muscle. Except for bone, no statistically significant difference in CT number measurements was found between the DECT-based and DL-based VMIs (p > 0.01). Our preliminary results show that DL has the potential to unsupervisedly generate high-quality VMIs directly from DECT.

unsupervised learning

virtual monoenergetic imaging

dual-energy computed tomography

Virtual monoenergetic imaging (VMI) is one of dual-energy computed tomography (DECT) applications [1–5]. With the commercialization of DECT scanners [6], VMI derived from DECT images has been adopted for many clinical applications. For example, low-keV VMIs have been shown to increase contrast and iodine attenuation. This indicates that the reduction of iodine load is possible in patients with kidney dysfunction, heart failure, myocardial infarction, and liver disease [3]. In contrast, beam-hardening artefacts arising from high-attenuation materials (e.g. metal implants) can be reduced by using high-keV VMIs. Moreover, compared to linearly blended CT images, noise-optimized VMIs could provide superior image quality and improve the delineation of pathologic characteristics [4]. One previous study showed the feasibility of using DECT-based VMIs to assess hepatic fatty infiltration [5].

In general, the reconstruction of DECT-based VMIs can be divided into two categories: projection- and image-domain methods. The projection-domain method performs basis material decomposition, and the mass density sinogram of each basis material is solved and then reconstructed. Finally, VMI can be synthesized using the reconstructed mass density maps of the two basis materials and the theoretical mass attenuation coefficients of the two basis materials at each energy level [7]. The image-domain method is simply based on the theory that VMI obtained from image space data is a linear combination of DECT images [8]. By using pre-calculated weighting factors, VMI can be reconstructed in the range of 40 to 150 keV. Despite promising results, DECT-based VMIs at low keV levels often suffer from increased noise. Moreover, both signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of DECT-based VMIs decrease with the increase of energy levels. Further improvement in the quality of DECT-based VMIs is needed.

To our knowledge, there are only few methods developed to improve the quality of DECT-based VMIs. An advanced image-based technique called VMI + was developed for improving the SNR and CNR of VMIs [9]. Several previous studies showed that VMI + outperformed traditional VMI in terms of SNR, CNR, and lesion delineation [10–12]. Filtering VMIs is an alternative to reduce noise and to improve image quality [13]. Recently, many deep learning (DL) based methods have been proposed to conduct different tasks in the field of medical image processing and analysis [14, 15]. In DECT, several DL-based methods were developed to generate high-quality VMI [16–18]. However, these DL-based methods belong to supervised learning which requires a large of training data. In addition, obtaining the labeled data (i.e. high-quality VMI) is difficult in clinical practice. More importantly, one trained DL-based model can only predict one or two VMIs. Training multiple DL-based models is required in order to generate a wide range of different energy VMIs.

In contrast to supervised learning, unsupervised learning is one of machine learning algorithms which doesn’t need training (and labeled) data. Deep image prior (DIP) is one of unsupervised learning techniques which was proposed to solve several inverse problems such as image denoising, super-resolution, and inpainting [19]. One previous study showed the feasibility of using DIP to recover missing CT data in a dual-source DECT scanner [20]. Moreover, one recent study showed the ability of DIP to perform material decomposition in DECT [21]. Inspired by the power of DIP, we propose an unsupervised learning method to generate VMI from DECT. We also show the ability of the proposed method to generate different energy VMIs from one trained DL-based model. We evaluate whether the proposed method can provide improved image quality for VMI. We retrospectively recruited patients who underwent a brain DECT scan and compared the DECT-based VMIs with the DL-based VMIs in terms of the difference in CT numbers, SNR, and structural similarity (SSIM) index [22].

Unsupervised learning for generating VMIs

Based on the assumption that the linear attenuation coefficients at low and high energies can be expressed as a linear combination of the effective mass attenuation coefficients of two basis materials [23, 24], VMI obtained from image space data is a linear combination of DECT images [8] which can be written as follows:

$$\text{V}\text{M}\text{I}\left(\text{E}\right)=\text{w}\left(\text{E}\right)\times {\text{C}\text{T}}^{\text{L}}+(1-\text{w}(\text{E}\left)\right)\times {\text{C}\text{T}}^{\text{H}}$$

where E is an energy level (keV), w(E) is an energy-dependent weighting factor, CT^L is the low-kV CT image, and CT^H is the high-kV CT image. Because w(E) is larger than 1 at low energy levels (E < 60 keV), the image noise from CT^L is amplified. As a result, the low-keV VMIs often suffer from severe noise. In contrast, w(E) is larger than 1 at high energy levels (E > 100 keV). This indicates that the image quality of high-keV VMIs should be similar to that of CT^H. Although the VMI + technique that mixes the VMI(E) with the 70-keV VMI can improve the quality of VMI at each energy level [9], the improvement is limited at high energy levels.

To further improve the quality of DECT-based VMIs, we propose an unsupervised learning based method to generate VMIs from DECT. Based on the concept of DIP [19], the VMI at the energy level E can be generated by learning a neural network that maps DECT images to VMI(E). This relation can be described as follows:

$$\text{V}\text{M}\text{I}\left(\text{E}\right)={\text{f}}_{{\theta }}({\text{C}\text{T}}^{\text{L}},{\text{C}\text{T}}^{\text{H}})$$

where ${\text{f}}_{{\theta }}$ is a convolution neural network (CNN) with network parameters (${\theta }$). Both CT^L and CT^H are model inputs. Because of the strong power of CNN, it is possible to generate more than one VMI(E). Thus, we design a CNN model that can generate three different keV VMIs. Therefore, Eq. (2) can be rewritten as follows:

$$\{\text{V}\text{M}\text{I}\left({\text{E}}_{1}\right),\text{V}\text{M}\text{I}\left({\text{E}}_{2}\right),\text{V}\text{M}\text{I}\left({\text{E}}_{3}\right)\}={\text{f}}_{{\theta }}({\text{C}\text{T}}^{\text{L}},{\text{C}\text{T}}^{\text{H}})$$

As shown in Fig. 1, the measured DECT images are fed into a U-Net model [25] which output three different keV VMIs. To unsupervisedly achieve this aim, each paired VMI set is used to re-calculate the DECT images based on Eq. (1). As a result, there are three paired DL-derived DECT imaging sets. By minimizing the differences between the measured and DL-derived DECT images, the U-Net model can be constrained itself to generate three different keV VMIs directly from the measured DECT images. The loss function can be described as follows:

$${{\theta }}^{\text{*}}=\underset{{\theta }}{\text{argmin}}{‖\text{g}\left(\{\text{V}\text{M}\text{I}\left({\text{E}}_{1}\right),\text{V}\text{M}\text{I}\left({\text{E}}_{2}\right)\}\right)-\{{\text{C}\text{T}}^{\text{L}},{\text{C}\text{T}}^{\text{H}}\}‖}_{2}^{2}+{‖\text{g}\left(\{\text{V}\text{M}\text{I}\left({\text{E}}_{2}\right),\text{V}\text{M}\text{I}\left({\text{E}}_{3}\right)\}\right)-\{{\text{C}\text{T}}^{\text{L}},{\text{C}\text{T}}^{\text{H}}\}‖}_{2}^{2}+{‖\text{g}\left(\{\text{V}\text{M}\text{I}\left({\text{E}}_{1}\right),\text{V}\text{M}\text{I}\left({\text{E}}_{3}\right)\}\right)-\{{\text{C}\text{T}}^{\text{L}},{\text{C}\text{T}}^{\text{H}}\}‖}_{2}^{2}$$

where g is a custom function that solves Eq. (1) given two VMIs. With each model-predicted paired VMI (i.e. VMI(E₁) and VMI(E₂), VMI(E₁) and VMI(E₃), and VMI(E₂) and VMI(E₃)) and known w(E), we are able to solve Eq. (1) and obtain DECT images. Note that there are three paired DL-derived DECT imaging sets which are derived from three model-predicted paired VMIs. In this study, we selected three different keV VMIs ranging from low to high energy levels. Specifically, E₁, E₂, and E₃ were set to 40 keV, 70 keV, and 100 keV, respectively. Note that mapping the measured DECT images to more than three different keV VMIs is possible. However, the more VMIs we generate, the more network parameters the model requires. The more parameters the model has, the more difficult it is to learn. More importantly, it takes more time to optimize a CNN model with more network parameters. Based on our preliminary tests, it is reasonable to use one CNN model to simultaneously generate three different keV VMIs.

As described above, one CNN model should be able to simultaneously generate three different keV VMIs. This indicates that we need to train more than one CNN model in order to generate a wide range of different energy VMIs. For example, 4 CNN models are required to generate twelve VMIs at the range of 40–150 keV (10-keV interval). It would be time-consuming if there are many CT slices to be processed. One solution to generate the other nine VMIs from one CNN model is that the average of the three DL-derived DECT imaging sets obtained from one learned CNN model can be used together with Eq. (1) and known w(E) to calculate the other nine VMIs. Because the average of the three DL-derived DECT imaging sets should have better image quality than the measured DECT images, the quality of the calculated nine VMIs should be improved. This indicates that the proposed DL-based method has the ability to generate a wide range of different energy VMIs from one learned CNN model.

In this study, the U-Net model shown in Fig. 2 was trained using the mean squared error (MSE) loss function. We used the adaptive moment estimation algorithm with the default parameters (learning rate = 1e-4, beta1 = 0.9, beta2 = 0.999 and epsilon = 1e-8) to minimize the MSE loss function. The number of epochs was set to 2000, and the batch size was set to 1. The U-Net model was implemented using PyTorch, and the training process was run on a computer with a NVIDIA Titan XP GPU. DECT images were normalized to values between 0 and 1 before the training process. For qualitative and quantitative comparison, all DECT-based and DL-based VMIs were multiplied by 4095 and then subtracted the results by 1024. The range of CT numbers was − 1024 to 3071 Hounsfield units (HUs).

DECT data acquisition and reconstruction

In this study, we retrospectively recruited eight patients who received a non-contrast brain DECT scan. This retrospective study was approved by our institutional review board. All recruited patients were scanned on a second-generation dual-source DECT scanner (SOMATOM Definition Flash, Siemens Healthcare, Forchheim, Germany). The DE scanning protocol was set as follows: 80 kV/Sn140 kV; reference mAs, 310/155 mAs; gantry rotation time, 0.5 s; pitch, 0.7 and section collimation, 0.6 mm. The DECT raw data were reconstructed with a dedicated dual-energy medium soft convolution kernel (D30f). Each exported CT image was 512$\times$512 pixels with a section thickness of 5 mm.

Data analysis

To evaluate the performance of the proposed method, we calculated and compared the HU differences between the DECT-based and DL-based VMIs. Specifically, four different types of tissues (i.e. brain, fat, bone and muscle) were evaluated. We manually drew 50 region-of-interests (ROIs) on each tissue type and then calculated the mean CT number for each ROI (100 pixels per ROI). For each tissue type, the mean value and standard deviation (SD) calculated from 50 ROIs were presented. We used a Wilcoxon matched-pairs signed-rank test to compare the difference in mean CT numbers between the DECT-based and DL-based VMIs. A p-value of less than 0.01 was considered to be a significant difference. To assess the improvement of image quality, the SNR values of brain tissue were calculated for each energy level. To quantify the visual similarity between the DECT-based and DL-based VMIs, the SSIM index [22] was calculated. The DECT-based VMIs at the range of 40–150 keV (10-keV interval) were obtained using a dedicated software application (Monoenergetic Application Class) on a multimodality workstation (Syngo MMWP VE 40A, Siemens Healthcare, Forchheim, Germany).

Figures 3 and 4 shows the DECT-based and DL-based VMIs for two different slices. It can be found that the DL-based VMIs had lower noise than the DECT-based VMIs, especially at 40 keV and 50 keV. We also observed that the high-keV VMIs obtained from the proposed DL-based method had better image quality than those obtained from the conventional DECT-based method. The mean SSIM index between the DECT-based and DL-based VMIs was above 0.995 for twelve different energy levels (40–150 keV, 10-keV interval). Figure 5 shows the SNR values of DECT-based and DL-based VMIs as a function of monochromatic energy levels (keV). The results showed that the proposed DL-based method could provide 1.4 to 3.5 times higher SNR than the conventional DECT-based method. The SNR improvement was obvious at both low (i.e. 40–50 keV) and high (90–150 keV) energies. In contrast, a slight SNR improvement was observed at middle energies (i.e. 60–80 keV).

Figure 6 shows the mean CT numbers as a function of monochromatic energy levels for different types of tissues (i.e. brain, muscle, fat, and bone). It can be observed that the mean CT numbers obtained from the DL-based VMIs were almost the same as those obtained from the DECT-based VMIs. For all tissue types except for bone (70–150 keV), the mean CT numbers were not significantly different between the DECT-based and DL-based VMIs (p > 0.01). Figure 7 shows the box-plots of the CT number differences between the DECT-based and DL-based VMIs at different energy levels (40 to 150 keV, interval 10 keV) for brain, fat, muscle and bone. As shown in Fig. 7, the primary values of the CT number differences were within $\pm$10 HU for bone and $\pm$5 HU for brain, muscle, and fat.

We proposed an unsupervised learning based method to generate VMI from DECT imaging. Based on the concept of DIP, the proposed method can be performed without using a large training data set. More importantly, the ground-truth labels (i.e. high-quality VMIs) are not required. Our experimental results showed that the DL-based VMIs had better image quality than the DECT-based VMIs, especially at 40–50 keV and 90–150 keV. We also showed that the twelve different keV VMIs (40–150 keV, 10-keV interval) could be obtained from one trained CNN model. This indicates that we can avoid training multiple CNN models and reduce the computational time for generating a wide range of different energy VMIs. Although the results obtained from the VMI + method are not available for comparison, the relative SNR improvement (i.e. compare to traditional VMI) was higher in the proposed DL-based VMIs compared to the VMI + technique reported in [10–12]. In particular, the proposed method showed that the higher SNR improvement could be achieved at high-keV VMIs. In contrast, the VMI + technique showed limited SNR improvement at high-keV VMIs [10–12].

In addition to the improved SNR, the high SSIM values indicate that both the DECT-based and DL-based VMIs were visually similar. Moreover, the mean CT numbers obtained from the DL-based VMIs were consistent with those obtained from the DECT-based VMIs. Although the mean CT number of the bone obtained from the DL-based VMIs was significantly different from that obtained from the DECT-based VMIs, the CT number differences were small (< 10HU) compared to bone’s CT number (i.e. 900 ~ 1400 HU). These results showed the feasibility of using DL to unsupervisedly generate reliable VMIs from DECT. Compared to the supervised learning based methods [16–18], the unsupervised learning based method may be a better alternative for improving the quality of VMIs. This is because high-quality VMIs (i.e. target images) is difficult to obtain in practice. Furthermore, the trained model may not work well for unseen data (e.g. different DECT scanners, body parts, scanning protocols and reconstruction parameters). Solving the generalization problem requires a large and diverse training data which is difficult to obtain in clinical practice.

Despite some promising results, the proposed method has several limitations to be addressed. First, it took 2 ~ 3 minutes to train one CNN model (i.e. 2000 epochs). The proposed DL-based method is more time-consuming than the traditional image-domain DECT-based method. To reduce the computational time, one can use good initial weights and then reduce the number of required epochs. The good initial weights may be determined using meta-learning [26]. An alternative to reduce the computational time of the proposed method is to use a powerful graphics processing unit card. Second, the number of epochs was set to 2000. Using 2000 epochs may not be the optimal choice. Moreover, it is inconvenient to select the number of epochs manually. An automatic stopping criterion [27] that provides the optimal result will be investigated. Third, the three energy levels (i.e. E₁, E₂, and E₃) of the DL-predicted VMIs were set to 40 keV, 70 keV, and 100 keV. Although this setting could provide promising results, it may not be the optimal choice. Finding the optimal energy levels is required. Similarly, the U-Net model used to generate VMIs may not be the optimal CNN model. Because of these reasons, further optimization of the proposed DL-based method is required.

We proposed and investigated the use of unsupervised learning for generating VMI from DECT imaging. The image quality of DL-based VMIs was evaluated and compared to that of DECT-based VMIs. Compared to the conventional DECT-based VMIs, the proposed DL-based VMIs had better image quality. Moreover, the CT number differences between the DECT-based and DL-based VMIs were within $\pm$10 HU for bone and $\pm$5 HU for brain, muscle, and fat. Except for bone, no statistically significant difference in CT number measurements was found between the DECT-based and DL-based VMIs (p > 0.01). DL has the potential to unsupervisedly generate high-quality VMIs from DECT.

Acknowledgements Not applicable

Funding This work was supported by MOST 111-2221-E-002-069 from Ministry of Science Technology, Taiwan.

Competing interests The authors have not disclosed any competing interests.

Ethical approval Ethics (IRB No. 230213) approval was approved by Institutional Review Board Committee B Changhua Christian Hospital.

Yu L, Leng S, McCollough CH (2012) Dual-energy CT-based monochromatic imaging. AJR Am J Roentgenol 199: S9–S15
D’Angelo T, Cicero G, Mazziotti S, et al (2019) Dual energy computed tomography virtual monoenergetic imaging: technique and clinical applications. Br J Radiol 92:20180546
Albrecht MH, Vogl TJ, Martin SS, et al (2019) Review of clinical applications for virtual monoenergetic dual-energy CT. Radiology 293:260–271
Zeng Y, Geng D, Zhang J (2021) Noise-optimized virtual monoenergetic imaging technology of the third-generation dual-source computed tomography and its clinical applications. Quant Imaging Med Surg 11:4627–4643
Li J-H, Tsai C-Y, Huang H-M (2014) Assessment of hepatic fatty infiltration using dual-energy computed tomography: a phantom study. Physiol Meas 35:597–606
Flohr TG, McCollough CH, Bruder H, et al (2006) First performance evaluation of a dual-source CT (DSCT) system. Eur Radiol 16:256–268
Lehmann LA, Alvarez RE, Macovski A, et al (1981) Generalized image combinations in dual KVP digital radiography. Med Phys 8:659–667
Yu L, Christner JA, Leng S, et al (2011) Virtual monochromatic imaging in dual-source dual-energy CT: radiation dose and image quality. Med Phys 38:6371–6379
Grant KL, Flohr TG, Krauss B, et al (2014) Assessment of an advanced image-based technique to calculate virtual monoenergetic computed tomographic images from a dual-energy examination to improve contrast-to-noise ratio in examinations using iodinated contrast media. Invest Radiol 49:586–592
Albrecht MH, Trommer J, Wichmann JL, et al (2016) Comprehensive comparison of virtual monoenergetic and linearly blended reconstruction techniques in third-generation dual-source dual-energy computed tomography angiography of the thorax and abdomen. Invest Radiol 51:582–590
Beeres M, Trommer J, Frellesen C, et al (2016) Evaluation of different keV-settings in dual-energy CT angiography of the aorta using advanced image-based virtual monoenergetic imaging. Int J Cardiovasc Imaging 32:137–144
Frellesen C, Kaup M, Wichmann JL, et al (2016) Noise-optimized advanced image-based virtual monoenergetic imaging for improved visualization of lung cancer: Comparison with traditional virtual monoenergetic imaging. Eur J Radiol 85:665–672
Liu CK, Huang HM (2019) Noise reduction in dual-energy computed tomography virtual monoenergetic imaging. J Appl Clin Med Phys 20:104–113
Shen D, Wu G, Suk H-I (2017) Deep Learning in Medical Image Analysis. Annu Rev Biomed Eng 19:221–248
Maier A, Syben C, Lasser T, Riess C (2019) A gentle introduction to deep learning in medical image processing. Z Med Phys 29:86–101
Cong W, Xi Y, Fitzgerald P, et al (2020) Virtual monoenergetic CT imaging via deep learning. Patterns (New York, NY) 1:100128
Kawahara D, Ozawa S, Kimura T, Nagata Y (2021) Image synthesis of monoenergetic CT image in dual-energy CT using kilovoltage CT with deep convolutional generative adversarial networks. J Appl Clin Med Phys 22:184–192
Fink MA, Seibold C, Kauczor HU, et al (2022) Jointly optimized deep neural networks to synthesize monoenergetic images from single-energy CT angiography for improving classification of pulmonary embolism. Diagnostics (Basel, Switzerland) 12:1224
Lempitsky V, Vedaldi A, Ulyanov D (2018) Deep image prior. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society. 9446–9454
Liu CK, Huang HM (2021) Unsupervised deep learning based image outpainting for dual-source, dual-energy computed tomography. Radiat Phys Chem 188:109635
Chang HY, Liu CK, Huang HM (2023) Material decomposition using dual-energy CT with unsupervised learning. Phys Eng Sci Med 46:1607–1617.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612
Lehmann LA, Alvarez RE, Macovski A, et al (1981) Generalized image combinations in dual KVP digital radiography. Med Phys 8:659–667
Kelcz F, Joseph PM, Hilal SK (1979) Noise considerations in dual energy CT scanning. Med Phys 6:418–425
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Int Conf Med Image Comput Comput Interv Springer 234–241
Zhang K, Xie M, Gor M, et al (2022) MetaDIP: accelerating deep image prior with meta learning. arXiv 2209.08452
Wang H, Li T, Zhuang Z, et al (2021) Early stopping for deep image prior. arXiv 2112.06074

Download PDF

Reviewers agreed at journal
02 Mar, 2024
Reviewers invited by journal
11 Feb, 2024
Editor invited by journal
04 Feb, 2024
Editor assigned by journal
03 Feb, 2024
First submitted to journal
02 Feb, 2024

You are reading this latest preprint version

Dual-energy CT-based virtual monoenergetic imaging via unsupervised learning

Status:

Version 1

Abstract

Figures

Introduction

Methods and materials

Unsupervised learning for generating VMIs

DECT data acquisition and reconstruction

Data analysis

Results

Discussion

Conclusion

Declarations

References

Status:

Version 1