Properties of the SSIM metric in medical image assessment: correspondence between measurements and the spatial frequency spectrum

In radiological imaging, the acquisition of the required diagnostic image quality under optimized conditions is important. Although techniques based on structural similarity (SSIM) have been investigated, concerns have been raised regarding their application to medical images. This study aims to clarify the properties of SSIM as an image quality index in medical images, focusing on digital radiography and verifying the correspondence between the evaluation results obtained by SSIM and the frequency spectrum. The analysis target was chest X-ray images of a human-body phantom. Various types of processing were applied to the images, and several regions of interest (ROIs) were used in local areas for analysis. The SSIM was measured using unprocessed data as a reference while changing the calculation parameters, and the spatial frequency spectrum of each local region was analyzed. Thus, a significant effect of ROI size was observed when calculating the SSIM. This indicates that larger the ROI size leads to SSIM values closer to 1 for all analysis conditions. In addition, a relationship between the size of the ROI in the analysis and the frequency components is demonstrated. It was shown that careful attention should be paid to the structures included in the ROI, and parameter settings should be reconsidered. Furthermore, when using SSIM to assess medical images, a multiscale SSIM method obtained by changing the ROI size would be useful.


Introduction
In radiological imaging, it is important to obtain the required diagnostic image quality under optimized conditions [1]. The context of the optimization includes any process related to image acquisition, processing, and display. Therefore, radiological technologists and medical physicists engaged in the diagnosis of medical images should review and evaluate the operating conditions at various stages of imaging systems based on an objective image quality index [2][3][4].
For several years, an image quality evaluation metric called the structural similarity (SSIM) has been considered for optimization tasks in the medical imaging field [5]. The SSIM is a quantitative index calculated from the three components of "luminance" (in this paper, "intensity"), "contrast", and "structure" of an image, and its values have been shown to correlate with the subjective sensations associated with an observer's cognition [6][7][8]. This index expresses how structurally similar an image to be evaluated is to a reference image using a value of 0 to 1 (0 indicates completely different images and 1 indicates that the images are the same). It is calculated by considering not only the signal strength of a single pixel but also its correlation with surrounding pixels in a region of interest (ROI). For example, in diagnostic imaging, SSIM is used as a quality assessment metric for medical images that have undergone data compression processing [9,10] or an evaluation index in parameter determination related to image processing [11]. In addition, it is used to evaluate the effectiveness of denoising algorithms for X-ray image [12]. Furthermore, to quantitatively characterize differences in visual impression caused by different image reconstruction techniques, an index using SSIM has also been developed [13].
In contrast, in the field of image processing for natural images, the performance can be improved by incorporating a SSIM-based index into a noise reduction framework [14] or a segmentation algorithm [15]. In these studies, computational efficiency and robustness are improved by obtaining features and similarities for each patch (local region) in the image, rather than as the average for the entire image. It also seems like a reasonable idea in the context of image analysis related to diagnostic imaging.
However, SSIM has also been criticized from various viewpoints; it has some limitations as mentioned in [16] and is not applicable to the evaluation without comparison with the complete original image (i.e., the full reference) [5]. Therefore, careful attention should be paid when using SSIM in the medical field [16]. In addition, it has been pointed out that the correlation between a subjective sensation and SSIM value deteriorates under conditions where medical images contain noise [17,18], and concerns remain regarding its application as an evaluation index. An alternative is the multiscale SSIM technique, which uses Gaussian filtering with different parameters to analyze the similarity of different scales in detail [19]; however, considering the clarity for interpretation and simplicity of the procedure, it would not be a good measure of image quality.
Given the above issues and assuming that it is possible to obtain objective quantitative values corresponding to perceptual image quality, there is ample room for the development of an evaluation metric for medical imaging. However, to use the results of such metrics, the features and significance of SSIM as a evaluation index of medical images must be clarified by conducting more detailed analysis. In this study, focusing on digital radiography (DR), the image quality of DR images processed by various algorithms was evaluated using SSIM and the unprocessed image as a reference. The effect of parameter values in the analysis and the correspondence between the obtained SSIM results and the spatial frequency spectrum of the image were examined.
The main contributions of this research are described as follows. To clarify the properties of the SSIM metric in medical image assessment, by evaluating the qualities of different images obtained by image processing, the influence of various parameters on SSIM analysis was investigated. In particular, it explains that the analysis region corresponds to a component of the spatial frequency spectrum and clarifies the effect of the size of the analysis region and the frequency components of the structure. On the basis of these results as well as general considerations of image quality evaluation using SSIM, we propose a new evaluation index that associates this metric value with the spatial frequency information. Furthermore, this study highlights a necessity for careful consideration of the suitability of the SSIM metric for medical image assessment tasks.
The paper organization is as follows. The methods section describes the overview of several processing techniques (denoising, sharpening, and noise addition) applied on the original image used for SSIM analysis. Next, the methodology and investigation conditions of the SSIM analysis including the frequency spectrum analysis applied in this study are explained. In addition, we mention the computational complexity of the analysis algorithm and the validation of the results. The results section presents extensive experimental results. The discussion section develops considerations for the interpretation of results regarding analysis parameters and practical medical image analysis. Finally, an idea for an expression of the proposed metric is provided. It also clarifies the issues and limitations of this research.

Methods
This research does not include the acquisition of images of the human body or image data that accompanies personal information, and the results of the observer study by humans to evaluate the images were not included.

Acquisition of image data for the verification
Using a KXO-50SS/DRX-3724HD X-ray system (Canon Medical Systems, Tochigi, Japan), a PBU-50 "Xray-Man" phantom (Kyoto Kagaku, Kyoto, Japan) was imaged with a CALNEO Smart C47 (Fujifilm, Tokyo, Japan), which is an indirect conversion-type flat panel detector. Figure 1 shows the phantom images used for verification. The parameters of the obtained digital images are listed in Table 1. The imaging conditions were as follows: the X-ray source-to-detector distance was 120 cm, the tube voltage was 80 kV, the tube current time product was 2 mAs, and an anti-scatter grid was It was confirmed that the surface dose for the phantom under this condition was 0.12 mGy, which is equivalent to the dose used in clinical practice (the diagnostic reference level for chest radiography in 2020 was 0.4 mGy) [20].

Image processing
Several image processing algorithms (denoising, sharpening, and noise addition) were applied to examine the properties of the SSIM-based image quality evaluation in chest X-ray images.
About the processing for denoising, one was a Gaussian filter, which performs smoothing with linear characteristics [21], and the other was a bilateral filter, which preserves edges during smoothing because of its nonlinear behavior [22].
The Gaussian filter outputs a weighted-average value within the neighboring region multiplied by the weight coefficients spread over the Gaussian distribution according to the distance from the center pixel. It is expressed as where G(x, y) is the distribution of the filter coefficients, the spread of which is adjusted by parameter . In this experiment, the kernel size of the filter was set to 25 × 25 and was 5.0.
The distribution of filter coefficients of the bilateral filter is defined by Here, x and y are the coordinates of the target pixel in the image, f (x, y) , i and j are the coordinates in the filter, and x + i and y + j are the distance from the target pixel. In a bilateral filter, the spread of the distribution of filter coefficients changes with respect to both the distance and difference in pixel values from the target pixel; 1 is the parameter that controls the variation depending on the distance from the target pixel, and the influence of the difference in pixel value with respect to the target pixel is controlled by 2 . Therefore, even if the pixels are close to the target pixel, if the difference in the pixel value is large (e.g., at an edge), the weight coefficient will be smaller, and the smoothing effect on those pixels becomes weak. However, in a region where there is no pixel value variation, the effect of the second term on the right side of Eq. (2) decreases, and a normal smoothing effect similar to that of the Gaussian filter is exhibited. In this evaluation, the kernel size of the filter was set to 25 × 25, 1 was 5.0, and parameter 2 was 50.0.
For sharpening, a classical unsharp masking technique expressed by the following formula [23] was used to evaluate this effect clearly.
where USM(x, y) is the sharpened image of the original. For the generation of the edge component ( f (x, y) − f u (x, y) ), f u (x, y) was obtained using the Gaussian filter with the parameters described above. The emphasis coefficient k was set to 3.
In addition, noise addition processing was performed to confirm the influence of noise on the SSIM analysis. The conditions of the added noise were Gaussian noise with = 50 and = 100 ( is standard deviation), but it was white noise that did not consider spatial frequency components (i.e., white nose).

SSIM analysis
The evaluation of SSIM in [6] was perform by using the mean SSIM to analyze the quality of the entire image. However, in this evaluation, to investigate the results for each region with different structures, local SSIMs were obtained by focusing on local regions instead of the average value of the entire image.
For reference image r and target image t , the SSIM of the two images is defined as follows: where l(r, t) , c(r, t) , and s(r, t) represent the similarity of the three elements of intensity, contrast, and structure, respectively, in an ROI set at the same position in both images. These elements are defined as follows: Assuming that = = = 1 in Eq. (3) and C 3 = C 2 ∕2 in Eq. (6), Eq. (3) becomes [24] where is the mean value of each ROI, is the standard deviation, and rt is the covariance. Moreover, C 1 and C 2 are constants represented by respectively. According to [6], K 1 = 0.01 and K 2 = 0.03 . Generally, parameter L is the dynamic range of the image, and if the number of quantization bits is 8 (possible pixel values are 0-255), L = 255 is applied. In this verification, the unprocessed original image was used as the reference, and each filtered image was used as the target to investigate the effects of properties of the SSIM on changes in image quality. To calculate the local SSIM, ROIs for the analysis were placed in the lung area and abdominal position, as shown in Fig. 1. To capture the characteristics of SSIM image quality evaluation, SSIM was calculated with ROI sizes of 5 × 5, 10 × 10, 20 × 20, 50 × 50, 100 × 100, 200 × 200, and 400 × 400 pixels. In addition, because the dynamic range in medical images is larger than that of general natural images, we compared the SSIM values obtained by setting L to 255, 1,023 and 16,383 to examine the influence of this computational parameter. These parameter changes were applied to 14 bit verification images.
Furthermore, regarding the computational complexity, the processing time required for the analysis of the local SSIM calculated by changing the ROI size described above was measured. Then, it was compared with the time required for conventional mean SSIM analysis [6] in the same local area with 400 × 400 pixels. All SSIM calculations were performed using a PC with the following specifications: LEVEL-R769-127-TAX/Intel core i7-12700 2.1 GHz, 32 GB memory (iiyama, Tokyo, Japan). The program for the SSIM analysis used a self-made one written in Python language, to apply arbitrary parameters, instead of using an existing library.

Spatial frequency spectrum analysis
The use of the spatial frequency domain to express the characteristics of an image is common in medical image analysis [25][26][27]. Here, we investigated how the SSIM, obtained as a single value, corresponds to the actual spatial frequency spectrum.
For images with complex structures such as a human body used in this verification, it would be difficult to evaluate them using simple frequency response characteristics based on an impulse signal transfer theory such as modulation transfer functions. Therefore, in this verification, an ROI of 400 × 400 pixels was placed in the lung field and abdominal region where the SSIM was calculated and a two-dimensional raw power spectrum of the local region in the image was obtained using ImageJ [28]; the amplitude spectrum data were calculated by taking the square root of the power spectrum value. To describe the frequency spectrum information in an easy-to-understand manner for twodimensional images, including images with complicated anatomical structures, the radial frequency technique [29,30] was used to obtain the average spectrum in all directions. As shown in Fig. 2, the one-dimensional spectrum F rad (f ) was calculated using the radial method by additive averaging of the two-dimensional spectrum F(u, v) for each frequency based on the distance from the zero frequency (the coordinates of the origin in the frequency domain). Because the target image has already been digitized, the spectrum up to the Nyquist frequency f Ny was used instead of pre-sampled characteristics containing information above the Nyquist frequency [31]. Because all analyzed image data were digitized by the same imaging system, the Nyquist frequency was 3.33 cycles/mm for all images.

Confirmation for the validity of analysis results
All evaluations up to this point had been conducted on phantom images obtained under specific imaging conditions. To make the results more general, we performed an additional SSIM analysis utilizing the database of chest clinical images [32]. Five images without a nodule that were randomly extracted from the 94 images included in the database were used. As mentioned above, the analysis region was the lung field and abdomen area, and the ROI of 400 × 400 pixels was manually set for each image. This is because each individual has a different anatomical area of the lung and abdomen. For the parameter condition in the SSIM analysis for this validation, L = 255 was set because it was confirmed during the process of the experiment that the most reasonable values can be obtained. The database images were digitized at 0.175 mm and had 12-bit grayscale depth. Therefore, the Nyquist frequency of these images for confirmation was 2.86 cycles/mm; this value was applied for spatial frequency spectrum analysis.

Results
Figures 3 shows the respective results of the SSIM values obtained by changing the ROI size and the calculation parameter L for images with various processing. Table 2 1 3 summarizes the results for single SSIM values calculated with the original method (using only 8 × 8 ROI size) [6]. These results indicate that the analysis area and the parameter L affect the calculation results of SSIM. Moreover, this effect differs depending on the local region to be analyzed, and the SSIM value approaches 1 as the ROI size increases in any local region. In addition, the results at L = 16383 demonstrate that the SSIM has a value close to 1 under all processing conditions, and it was clear that the SSIM result strongly depends on the value of parameter L.
To qualitatively confirm whether the SSIM results correlate with the perceptual image quality, Fig. 4 shows the images (400 × 400 pixels) of each local region used for analysis. For example, the image processed by the bilateral filter for the lung field (Fig. 4c) is similar to the reference image (Fig. 4a), which does not contradict the SSIM results close to 1, as shown in Fig. 3. The Gaussian-filtered image (Fig. 4b) and the sharpened image (Fig. 4d) were strongly changed in image quality, which reduced its similarity, and the SSIM value was relatively decreased, which is an acceptable result. In contrast, for images with added noise, it was observed that the SSIM was reduced, although no significant reduction in similarity was subjectively recognized.
The frequency spectra of each local region are shown in Fig. 5. For the lung field, the spectra of the unprocessed reference data and bilateral-filtered data were approximately equivalent up to the Nyquist frequency; the noise added images ( = 50 ) were also relatively similar. However, the spectrum of the Gaussian-filtered image and the unsharp masked data were shown to be different from the others. For the abdomen, the spectra of the data from the processed images were significantly different from the those of reference image; it appears to be comparable at very low spatial frequencies. This trend was also found in the lung filed results, although it was not as clear as the abdominal condition. In contrast, outside the low spatial frequencies, the spectral values were varied and it was clear that they did not have a comparable spectrum. Figures 7 and 8 illustrate the results of the SSIM analysis of clinical chest images (Fig. 6) on the various conditions to prove the validity of the above results. Table 3 lists the SSIM values for an ROI size of 8 × 8 [6]. These SSIM and spectral analysis results, which are represented by the average of five clinical images, show trends consistent with the results in Figs. 3 and 5, suggesting that it can be confirmed that the influence of parameter changes in the SSIM analysis investigated in this verification is reliable.
The processing time (in terms of computational complexity) required for SSIM analysis using several types of ROI sizes in this verification was 47.06 ± 0.36 s per analysis region in our computing environment. The time required to calculate a single SSIM value was 7.88 ± 0.06 s, by using single ROI size. If the number of plots of the results shown in Figs. 3 and 5 is increased, the Fig. 2 Overview of the spatial frequency spectrum analysis. By additive averaging radially over the two-dimensional spectrum with frequency components in various directions, the entire spectrum is clearly represented as a one-dimensional spectrum computation time increases in proportion to the number of plots related to the ROI size.

Discussion
In the present study, we performed SSIM evaluations on DR images using different image processing methods. Using the unprocessed image as a reference, we then investigated the effects of the parameters used in the SSIM calculation. In addition, the correspondence between the result of the SSIM values and the spatial frequency spectrum of the image data was evaluated.   The results in Fig. 3 reveal that the SSIM value changes substantially depending on the value of L , which is determined by the image's dynamic range. Here, we consider the effect of parameter L on the calculation of the contrast defined by Eq. (5), which is one of the elements of SSIM. Assuming r = 50 , t = 30 , and 2 r t = 3000 based on the pixel values of actual image data, the following calculation can be performed:  992 . This change in the analysis value means that the correlation between the subjective sensation and SSIM can be lost, depending on parameters. As already mentioned, the dynamic range of medical images is generally large (from 10 to 16 bits), but the entire range of pixel values is not always included in the analysis region. In other words, the SSIM calculation is performed under the condition that the range of pixel values within the analysis region is very small compared to the dynamic range of the whole image. As found in this evaluation, when SSIM is used to evaluate medical images, the local region is often used as an analysis target rather than the entire image because the structures in the human body are not uniform and vary greatly depending on the local region. Therefore, for tasks in which the range of pixel values differs depending on the analysis region, the value of parameter L should not be fixed to the dynamic range of the image. Hence, it is necessary to devise a technique such that the difference between the minimum and maximum pixel values within the analysis region is used to determine L.
For example, in medical X-ray images, quantum noise causes fluctuations that are small compared to the dynamic range (at most, the fluctuations are the square root of the mean value of the surrounding pixels). Therefore, if the dynamic range of the image is used to determine L without any basis, results that reflect subjective graininess may not be obtained when SSIM is used to analyze the differences in image quality due to dose changes.
X-ray quantum noise causes random fluctuations in medical X-ray images. It was also confirmed that the influence of noise in this verification and SSIM values changed considerably according to the amount of noise.
The magnitude of fluctuations introduced by X-ray quantum noise affects the second term on the right-hand side of Eq. (8). This term is calculated from the variances of the reference and target image and the covariance of the two images, but it means that the value varied depending on the amount of noise even if the structure is completely the same. As shown in Fig. 3, if the parameters are appropriate, it seems that changes in the image due to noise can be captured with good sensitivity by the SSIM, but it is also possible that SSIM responses strongly to graininess that would be subjectively unrecognizable. In other words, there are cases in which changes in SSIM values due to noise do not reflect subjective sensations; thus it could be difficult to use the SSIM to analyze differences in image quality due to changes in dose in medical images. Furthermore, because the effect of noise is thought to depend on the contrast and structure of the signal on images, it would be reasonable to consider structural similarity for each local region, as detailed below.
Regarding the calculation of the SSIM, the larger the ROI size, the closer SSIM values are to 1. Even if the images are processed using the same algorithm, a more complex structure in the analysis region will lead to a higher SSIM value. In the calculation of SSIM (defined in Eq. (8)), the mean and standard deviation of the pixel values within the ROI are used. When pixel values are correlated or contain nonuniform structures, these statistics are affected by the ROI size. That is, if the ROI size is large, the rough variation characterizes the standard deviation within the ROI, and if the ROI size is small, the deviation is determined by variation in details; a large ROI analysis mostly evaluates low spatial frequency components, whereas a small ROI analysis evaluates high spatial frequency components. As shown in Fig. 5, where the spectra vary significantly, the low spatial frequency spectra are almost identical, whereas the high spatial frequency spectra close to the Nyquist frequency are different. Hence, as also indicated by the results in Fig. 3, when the ROI size is large (i.e., lowfrequency components are considered), the SSIM is close to 1, indicating a high similarity, but when the ROI size is small (i.e., high-frequency components are considered), the SSIM differs depending on the local region and image processing, resulting in differences in similarity.
Hence, in the analysis of SSIM, it is clear that the ROI size and frequency spectrum correspond with each other, and it is important to interpret the effect of the frequency components of the structure present within an analysis region. This means that even if SSIM ≈ 1.0 , it may be that only rough structural similarity is assessed, and the application of different ROI sizes and locations for other purposes in the local SSIM assessment should be considered. As discussed in References 14 and 15, introduction of local region information can be expected to lead to improvement in the accuracy and robustness in the context of medical image analysis. In a typical analysis using SSIM, the mean SSIM is obtained by analyzing the entire image with a constant ROI size [6,14]. Based on the findings obtained in this study, for a more detailed analysis using SSIM, it would be useful to apply a multiscale SSIM with ROI size R as a variable ( SSIM(R) ), as shown in the results of this study. This approach would allow us to observe results in a single graph with structural similarities and added scale features corresponding to the frequency spectra. It is expected that the interpretation of such results would be easy, and the amount of information obtained by this analysis would be large. In addition, because only the ROI size needs to be changed, major changes to the procedure for SSIM calculation can be smoothly introduced into the conventional evaluation process, which is also an advantage. In particular, this metric could be useful for quantitatively understanding changes in image quality caused by complicated nonlinear processing. We note that in this evaluation, only the properties of the evaluation by SSIM were considered, and its usefulness as an index for the optimization of imaging conditions was not mentioned. Additionally, only the lung field and abdominal area on the phantom chest X-ray image were targeted; other anatomical areas and different modalities were not considered. Furthermore, not all the parameters related to SSIM, such as image quality conditions (e.g., resolution or noise level), were evaluated; concerns regarding simply increasing computation time cannot be easily ruled out. These issues should continue to be extensively investigated because they are considered important for the evaluation of medical images.

Conclusion
Using SSIM, we evaluated several local regions with different structures in processed DR images consisting of chest X-ray images of a phantom and evaluated the influence of the parameters used in the analysis on the results. Furthermore, the correspondence between the calculated SSIM values and spatial frequency spectrum was investigated. It was found that SSIM parameter L substantially affects the results, and under conditions where the range of pixel values differs depending on the analysis region, it is clear that careful attention is required to interpret the results. Moreover, in such cases, the value of L should be reconsidered. The value of SSIM tends to approach 1 as the ROI size increases. Because the ROI size at the time of analysis corresponds to a component of the spatial frequency spectrum, it was shown that consideration of the size of the analysis region and the frequency component of its structure is important. The findings of this study further indicate that the use of SSIM(R) with ROI size R as a variable at the time of calculation would be useful when analyzing a local region with the SSIM metric.
In future work, we will clarify the relationship between subjective perception and calculation parameters of the SSIM. This can potentially improve the usefulness and robustness of SSIM values. In addition, we plan to develop comprehensible indicators for medical image assessment based on the concept of full-reference metrics such as the SSIM.
Author contributions Not applicable.
Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not for-profit sectors.

Competing interests
The authors have no relevant financial or nonfinancial interests to disclose.
Ethics approval This study did not include human participants. In accordance with the regulations of the Gunma Prefectural College of Health Sciences Research Ethics Committee, we confirmed that ethical approval is not required.

Consent to participate Not applicable.
Consent to publish Not applicable.