Application of super-resolution convolutional neural network technique to improve the quality of soft-tissue window cone-beam CT images

Objectives To assess the feasibility of using a super-resolution convolutional neural network to improve the quality of cone-beam computed tomography (CBCT) images for visualizing soft-tissue structures. Methods Multidetector computed tomography (CT) images of 200 subjects who were assessed for the status of an impacted third molar were collected as training datasets. CBCT images of 10 subjects who were also examined with CT were collected as testing datasets. The training process used a modi�ed U-Net and bone and soft-tissue window CT images. After creating a model to convert bone images to soft-tissue images, CBCT images were provided as input and the model outputted estimated CBCT images. These estimated CBCT images were then compared with soft-tissue window CBCT and CT images, using slices through approximately the same anatomical regions. Image evaluation was performed with subjective observations and histogram descriptions.


Introduction
Cone-beam computed tomography for dental use (CBCT) was developed by Arai et al. 1 and Mozzo et al. 2 in the 1990s, and there has been much interest in how to use this innovative imaging modality to reveal three-dimensional structures of oral and maxillofacial anatomy. [3][4][5][6][7] In addition to improving diagnostic performance, CBCT examination imparts a lower radiation dose than multidetector computed tomography (CT). 8 However, CBCT has a critical disadvantage in comparison with CT, namely its poor differentiation of soft-tissue structures. CBCT sacri ces soft-tissue image quality and accurate knowledge of radiodensity (CT number) to achieve a low patient dose by visualizing only teeth and bone structures. 9 Therefore, even if CBCT images are manipulated to visualize soft tissue structures by changing the window level and width, they may still be insu cient for effective diagnosis. Consequently, for the evaluation of soft-tissue diseases, CT is more likely to be chosen than CBCT. 3 Many groups have recently developed computer-aided diagnostic systems using convolutional neural network (CNNs), some of which have been applied to radiological studies of oral and maxillofacial structures. [10][11][12][13][14] The various CNN types reported include those designed to improve image resolution, the so-called "super-resolution techniques". 15 These methods can be implemented during image conversion procedures to create images resembling high-dose CT images from low-dose images, [16][17][18][19][20][21][22][23][24][25][26] and images resembling 7-T magnetic resonance images from 3-T images. 27 If CBCT images displayed using window levels and widths suitable for soft tissue structures could be improved by such a technique, they could contribute to clinical CBCT-based diagnosis.
Although there are some differences in the raw data between CT and CBCT, the reconstructed images provide almost the same image quality for bone and hard tissue structures. 28 Therefore, if a CNN could learn to identify differences between bone and soft-tissue in reconstructed CT images, the learning model may also be applied to CBCT images.
The present study aimed to assess the feasibility of a super-resolution technique to improve the quality of CBCT images for visualization of soft-tissue structures, doing so through the use of CT data as training datasets to create a trained model.

Results
The training process took 6 days 14 hours and 11 minutes to complete, while the testing process required The results of the subjective evaluation are summarized in Table 1. For all anatomical structures, the compressed swCT images showed scores of 4.0, which were equivalent to those of the original CT images. Both the swCBCT and estimated CBCT images showed lower scores than the compressed swCT images for all anatomical structures. In the comparison of the swCBCT images with the estimated CBCT images, all structures showed higher scores on the estimated CBCT images than on the swCBCT images, and the differences were statistically signi cant in the parapharyngeal and submandibular spaces and lymph nodes. The structures situated inferior to the mandible, such as the digastric muscle, submandibular space, and lymph nodes, showed relatively higher scores, probably because of a small amount of X-ray attenuation by surrounding bony structures.  deviations were wide in the swCBCT images, resulting in considerable overlap in the areas of the two tissues. This would contribute to a difference in visibility between the two images.

Discussion
CBCT images are generally obtained with a lower radiation dose than multidetector CT images, and therefore their soft-tissue differentiation is poor. 28 Additionally, radiation is signi cantly attenuated by the teeth and jawbone, resulting in insu cient x-rays to create clear images. Therefore, it is usually di cult to identify soft-tissue structures situated within the mandibular arch on CBCT images.
In the present testing data (CBCT bony images), the brightness and contrast were adjusted before images were input into the testing process to create the learning model, imitating the contrast on typical bone window CT images. However, the adjusted images did not reveal the same density pro les as the CT images. Another procedure is therefore required to obtain CBCT images more approximating the density pro les of CT images.
According to the results of the subjective evaluation, the estimated CBCT images could be usable in a clinical setting, except for those of the medial pterygoid muscle and parapharyngeal space. The modi ed U-Net could create images with su cient visible contrast for soft-tissue diagnosis from CBCT images, but it could not re ect the different densities of muscle and fat enclosed in the maxillary dental arch and cervical vertebrae. To the contrary, in the inferior parts of the coverage, where there was little X-ray attenuation from bony structures, soft-tissue density information was already su cient in the original CBCT images, and the super resolution CNN technique converted them into images with more visible contrast, especially between the muscle and fat in the fascial space.
A common disadvantage of super-resolution CNNs is that the training process requires a high-capacity graphics processing unit to handle the large amount of image data. Whole CT images could not be analyzed during the training process because of the relatively low capacity of our computer system. To solve this problem, images were compressed to a lower resolution, but future improvements in processor speed could negate the requirement for such a procedure.
The present study had several limitations. First, the subjective evaluation could not exclude any observer bias. An option to overcome this problem may be to assess whether the reliability of judgments on diseases affecting soft tissue conditions improved following application of the super-resolution technique. Second, the evaluated images were compressed to 8-bit gray values from 10 bits. This might have resulted in a loss of some valuable attenuation information, but all seven radiologists showed no signi cant differences between the original soft-tissue window CT images and the compressed swCT images in their evaluations. To verify with more accurate CT numbers, it should be improved using some normalization processes. Third, the resultant images were di cult to reconstruct in three-dimensions because of the png format that was used. A possible solution would be to create three-dimensional images in DICOM format before inputting them into the super-resolution model. Lastly, the images in the training dataset were compressed to a smaller size to reduce the time cost in the training process, but this reduced the image resolution. To solve this problem, the deep learning machine performance should be improved.

Materials And Methods
Human rights: The requirement for informed patient consent for inclusion of data was waived by the ethics committee because of the retrospective nature of the data use and this study obtained ethical approval from Aichi-Gakuin University ethics committee (No. 496). All methods were carried out in accordance with relevant guidelines and regulations. were selected from our hospital image database for the model learning process. All CT examinations were performed to evaluate the status of an impacted third molar. Cases with severe in ammation were excluded. When these images were downloaded, the window level and width were set at 900 and 4500 HU, respectively, for the bony-window images, and 60 and 300 HU, respectively, for the soft-tissue-window images. The images were downloaded in Joint Photographic Experts Group (jpeg) format with a resolution of 900 × 900 pixels, and the image patches for the learning process were created by compressing them to 256 × 256 pixels with 8-bit gray values. Of the 60 904 patches obtained, 54 740 patches (from 180 patients) were assigned as training data, and 6164 patches (from 20 patients) were assigned as validation data (Fig. 1). CT images were acquired using an Aquilion PRIME scanner (Canon Medical Systems, Otawara, Japan) using the following parameters: tube voltage, 120 kV; tube currenttime product, 100 mAs; slice thickness, 0.5 mm; eld of view, 20 cm.

Preparation of datasets
CBCT images of 10 patients (4272 images) who were also examined with CT were selected from the same image database and prepared for the testing process (Fig. 1). In all 10 patients, the CBCT examinations were performed to clarify the relationship between the mandibular third molar and canal before extraction, whereas the CT was acquired to evaluate post-extraction mandibular nerve damage and in ammation. The two examinations were carried out within 2 years of each other. No severe in ammation, which can affect soft-tissue ndings, was observed in the CT images. Each downloaded CBCT image (1039 × 1264 pixels with 8-bit gray level values in jpeg format) was compressed to 256 × 256 pixels for use as an image patch. To imitate bone window CT images, the brightness and contrast of all the CBCT images were manually adjusted before they were input into the testing process. The CBCT scans were acquired using an Alphard Vega scanner (Asahi Roentgen Ind. Co. Ltd., Kyoto, Japan) with a eld of view of 102 × 102 mm and a voxel size of 0.2 mm 3 .

Learning architecture and processes
Training and testing processes were performed using Neural Network Console (Sony Corporation, Tokyo, Japan) with a Geforce 1080 Ti graphics processing unit (Nvidia, Santa Clara, CA). The learning method used a modi cation of the U-Net CNN reported by Ronneberger et al. 29 The network consisted of a convolutional layer, recti ed linear unit (ReLU) activation function layer, and pooling layer (Fig. 2). The training parameters were: learning epochs, 300; initial learning rate, 0.001; solver type, Adam. The trained model was then used to convert the testing CBCT image datasets to soft-tissue quality CBCT images in the Portable Network Graphics (png) format. These images are referred to as estimated CBCT images.

Subjective evaluation of the quality of the estimated CBCT images
The image quality of the estimated CBCT images (Figs. 3a and 4a) was subjectively evaluated on a personal computer display by seven radiologists, all of whom had more than 3 years of experience in interpretation of CT and CBCT images. The radiologists compared the estimated CBCT images with both the test CBCT image expressed with an appropriate window level and width for visualizing soft-tissue structures, which we refer to as the swCBCT image (Figs. 3b and 4b), and the compressed swCT image (Figs. 3c and 4c). The quality of these images was scored using a ve-point grading system relative to the original soft-tissue window CT images presented on a DICOM display. In the actual evaluations, the observation windows of the displays were manually adjusted to optimize the visualization of six anatomical structures, including the medial pterygoid and digastric muscles, parapharyngeal and submandibular spaces, submandibular gland, and submental or submandibular lymph nodes. For the evaluation of the fascial space, the radiologists paid special attention to the visibility of the included fat tissue. For lymph nodes, the node to be evaluated was indicated beforehand on the images. The subjective scoring was performed according to the following procedure: Score 0: The anatomical structure was di cult to identify.
Score 2: The anatomical structure was su ciently identi able for use in a clinical setting, but the quality was inferior to the original soft-tissue window CT image displayed on a DICOM viewer.
Score 4: The anatomical structure was clearly identi ed and the quality was equivalent to the original soft-tissue window CT image displayed on a DICOM viewer.
The means and standard deviations of the subjective scores were calculated for 10 patients, and the differences between image types were assessed using the Steel-Dwass test with statistical signi cance of p < 0.01.
Visibility of the digastric muscle relative to the fat tissue in the submandibular space The visibility of a soft tissue partially depends on the contrast between the target tissue and adjacent tissues. Therefore, to verify the visibility judgments, the voxel values of the anterior belly of the digastric muscle and the adjacent fat tissue in the submandibular space were measured on a slice of each of the three image types that were subjectively evaluated (estimated CBCT, swCBCT, and compressed swCT images) in the 10 patients. The most appropriate slices showing the maximum area of the muscle were selected by a radiologist (MF), and 160-pixel circular regions of interest (ROIs) were set in the bilateral muscles and adjacent fat tissues (Fig. 5). For the estimated CBCT and compressed swCT images, the widow level and width were maintained at the same values used when they were created. The windowing of the swCBCT images was determined by a radiologist (MF), so that su ciently high contrast was shown between the two tissues. The measured voxel values using ImageJ software [30][31] for the muscle and fat tissues were totaled for all 10 subjects.

Conclusions
Although the resultant super-resolution CBCT images are not currently of the same quality as multidetector CT images, the feasibility of the super resolution CNN model was veri ed, it may have a possibility to use in the clinical situation. Our next research directions are to improve the image quality and to apply the super-resolution technique to diseases affecting soft-tissue conditions. Figure 1 Learning and testing processes. CNN: convolutional neural network.

Figure 2
The architecture of the modi ed U-Net. The network is composed of a convolutional layer, batch normalization layer, recti ed linear unit layer, and pooling layer. Conv: convolutional layer, Pooling: maximum pooling layer, ReLU: recti ed linear unit layer.