Method to Determine the Statistical Technical Variability of SUV Parameters

Background: Some of the parameters used for the quantication of Positron Emission Tomography (PET) images are the Standardized Uptake Value (SUV)Max, SUVMean and SUVPeak. In order to assess the signicance of an increasing or decreasing of these parameters for diagnostic purposes it is relevant to know their standard deviation. The sources of the standard deviation can be divided in biological and technical. In this study we present a method to determine the technical variation of the SUV in PET images. Results: This method was tested on images of a NEMA quality phantom with spheres of various diameters with full-length acquisition time of 150 s per bed position and foreground to background activity ratio of F 18 -2-uoro-2-deoxy-D-glucose (FDG) of 10:1. Our method is based on dividing the full-length 150 s acquisition into subsets of shorter time length and reconstructing the images in the subsets. The SUVMax, Mean and Peak were calculated for each reconstructed image in a subset. The coecient of deviation of the SUV parameters within each subset has then been used to estimate the expected standard deviation between images at 150 s reconstruction length. We report the largest technical variation of the SUV parameters for the smallest sphere, and the smallest variation for the largest sphere. The expected variation at 150 s reconstruction length does not exceed 6% for the smallest sphere and 2% for the largest sphere. Conclusions: With the presented method we are able to determine the technical variation of SUV. The method enables us to evaluate the effect of parameter selection and lesion size on the technical variation, and therefore to evaluate its relevance on the total variation of the SUV value between studies.


Introduction
Positron Emission Tomography (PET) has become an indispensable diagnostic tool over the last decades.
Computed Tomography (CT) is added to the PET modality for the purpose of attenuation correction and furthermore PET-CT imaging provides a combined view of functional and morphological information.
The radio-ligand F 18 -2-uoro-2-deoxy-D-glucose (FDG) has ensured the success of PET-imaging. The glucose component of the molecule provides a higher uptake of FDG in malignant than in healthy cells [1], the F 18 component provides the detectability in the PET-CT system.
PET-CT images can be reported visually by the nuclear medicine physician, however an important advantage of PET imaging is that the uptake can be quanti ed in absolute measures. Quanti cation of FDG PET enables the staging of cancer and the comparison of follow-up studies to track the evolution of cancer and response to tumour therapy [2]. An important factor in quanti cation is to minimize the inter-and intra-observer variability as much as possible.
One of the measures for the comparison of F 18 -FDG PET images is the Standardized Uptake Value (SUV). The most used SUV parameters are the SUVMax and the SUVMean. In the SUVMax only the voxel with the highest uptake is considered, while in the SUVMean all the voxels in a certain region or volume are taken into account. The SUVMax has a low inter-and intra-observer variability but a high technical statistical variation. The SUVMean on the other hand has a lower technical variation but a higher inter-and intra-observer variability, since the thresholds for the borders of the volume are a determining factor of the result. The SUVPeak is introduced as a "best of both worlds" parameter because it includes the voxels in a xed limited volume around the voxel with the maximum value and it might improve reproducibility for SUV measurements especially of the most metabolically active tumour regions [3].
The proposed framework for PET Response Criteria in Solid Tumours (PERCIST) suggests to consider a 30% change in SUV as signi cant variation of tumour activity [4]. Decreasing the variability of the SUV quanti cation will enable the detection of smaller signi cant changes and therefore enable an earlier detection of degeneration or an effect of therapy. Limiting the variations and standardizing the process as much as possible is therefore essential.
In test-retest studies, patients are injected and scanned twice to assess the variation of the complete PET imaging process. However, to be able to further reduce the variation e ciently it is important to know from which components the total variation is built up.
Theoretically, the variation in a PET measurement consists of biological variability and technical variability [5].
Biological variability arises from variations in blood glucose, paravenous administration of FDG and FDG uptake and can be minimized by standardizing the patient preparation with a protocol de ning diet and uptake time. For what concerns the technical variability, studies have shown that important factors effecting the technical variation of the SUV are variation in image reconstruction and scanner characteristics [6,7,8]. Standardizing the positioning of the patient, acquisition, reconstruction and quanti cation protocol and PET-system calibration will reduce the technical variation. However a certain amount of technical variation is unavoidable due to the statistical variations of the F 18 decay and the image reconstruction.
We de ne this as the "statistical technical variation", which is the part of the technical variation that cannot be minimized by standardization but is intrinsically present on statistic grounds. Knowing the statistical technical variation is important because it gives us insight in the origin of the total variation.
In order to be able to evaluate the impact of an increase of the dose on the total variation it is necessary to be able to calculate its effect on the technical variation. Since the statistical technical variation differs for the several SUV parameters it is important to be able to speci cally quantify it. Furthermore the statistical technical variability can be scanner-and reconstruction-speci c and will be dependent on the size of the lesions. The assessment of the statistical technical variation gives insight in its contribution to the total variation and enables the effect of several parameters on the technical variation.
In this study we present a method to estimate the statistical technical variation on different SUV parameters and lesion sizes of images acquired with the same scanner and reconstructed with the same method. The basics of the method is that one PET acquisition is divided into a number of time-frames and that the variation between the SUV quanti cation of the separate time-frames is used to estimate the standard deviation of the total acquisition. The proposed method determines the standard deviation of SUV parameters such that the relevance of technical statistical SUV changes in clinical practice can be established in the context of other uctuations.
The method is described, validated and applied on a 150 s acquisition of the image quality phantom with a foreground to background activity ratio of 10:1 as example of application. The values and standard deviations of the SUVMax, SUVMean and SUVpeak of the several spheres in the phantom are presented and discussed.

Material And Methods
The method that we describe in this paper for estimating the statistical technical variation of SUV parameters between PET images includes three main steps: acquisition of the dataset generation and reconstruction of subset with shorter acquisition frames calculation of the variation in the subsets, and traslation to the variation of the original dataset To test the validity of our method, the variation is calculated based on subsets with different acquisition lengths.

Image acquisition and reconstruction
A NEMA NU2-2007 image quality phantom was imaged on a Philips Gemini TF PET/CT system (Philips Healthcare, Andover, MA). PET reconstructions were made using scanner's default Ordered Subset Expectation Maximization (OSEM) reconstruction algorithm with 33 subsets, 3 iterations, matrix size of 144 × 144, and voxels of 4×4×4 mm. No Gaussian lter was applied. The reconstruction was corrected for geometrical response and detector e ciency (normalization), random coincidences, scatter and attenuation. Data were stored in list-mode, to be able to reconstruct subsets with different acquisition times. All list-mode reconstructions were decay-corrected to the start time of the acquisition.
The phantom acquisitions were made according to the requirements for the EANM/EARL FDG-PET/CT accreditation [9]. The phantom was composed by a llable torso compartment acting as background, by a cylindrical insert in the centre of the torso compartment and by 6 llable spheres of different diameters (10 mm, 13 mm, 17 mm, 22 mm, 28 mm and 37 mm) placed around the central insert. The llable torso compartment and the spheres have been lled with a solution of water and F 18 -FDG. At the starting moment of the scan the activity concentration was 2.10 MBq/ml in the torso background compartment and 20.04 MBq/ml in the spheres, resulting in an actual sphere to background ratio of 9.6:1 (aim is 10:1). [10] The original dataset was acquired with 150 s frame duration. The total acquisition time was 10 minutes (600 s, 4 bed positions). The full-length 150 s list-mode acquisition was divided into subsets of shorter time length varying from 4 s to 30 s. An attenuation corrected reconstruction was performed for different reconstruction lengths, generating as many images as possible per subset, without using the same coincidences by varying the starting time of the reconstruction. For example, for the rst subset (4 s reconstruction length), the rst image was reconstructed using the coincidences recorded between 0 and 4 seconds, the second image by using the coincidences recorded between 5 and 8 seconds and so on, varying the starting moment of the reconstruction, generating a total of 37 images. The longest frame length was 30 s, generating a subset of 5 images. A total of 14 subsets was generated, of respectively 4s, 6s, 8s, 10s, 12s, 15s, 17s, 19s, 20s, 22s, 24s 26s, 28s and 30s acquisition length.
The Philips reconstruction software automatically corrected each reconstruction for the decay of 18 F (half-life of 109.7 minutes [11]), compensating the time difference between the start of the study and the start of the reconstruction by using an opportune scaling factor.

Image analysis
The datasets were analysed using a Python 3. The SUVMean 2D and 3D were calculated considering the complete 2D central plane or 3D volume, respectively, without using thresholding techniques based on pixel values or on a percentage of the maximum value.
SUVPeak: the average value within a 1 cm 3 sphere centred in the maximum value of the sphere [12]. The algorithm tted the sphere, found the maximum voxel value in the 3D volume, used this voxel as the centre of a spherical Region of Interest (ROI) of 1 cm 3 and calculated the average value within the 1 ml sphere.
The SUV values were calculated per image in a subset. The SUV values populations have been tested for normality with a Kolmogorov-Smirnov test and all subsets matched the characteristics of a normal distribution. The SUV parameters of the different images in a subset were averaged and their standard deviation was calculated. The coe cient of variation of the SUV parameters was calculated as the standard deviation divided by their average value multiplied by 100.
In our measurements we can assume a random sampling model, with no correlations, for independent and identically distributed random measurements. The different subsets do not differ in activity nor voxel dimensions and the quanti cation of the SUV parameters has been done by using the same ROI dimension. We can therefore describe the ratio of the standard deviations SD of two independent repetitions of PET measurements as: (1) With SD1 and SD2 being the standard deviations and RL1 and RL2 being the reconstruction lengths. [7]. By setting the measured coe cient of variation of the SUV in a subset as SD1 and the length of the reconstruction of the speci c subset as RL1, it is possible to estimate the variation SD2 between repetitions of scans at the total acquisition time RL2=150 s according to: (2) Formula 2 was used to calculate the estimated SD of the SUV at reconstruction length SD2=150 s using each subset of the dataset. Since we divided our acquisition into 14 different subsets, we could calculate 14 different estimations of the SD2. We validated our method by testing whether the value of the estimated SD2 was independent of the acquisition length of the images in the subset.
Furthermore, the results were tested to verify if there is a signi cant difference in SUV between adjacent spheres. The population of two adjacent spheres (10 mm and 13 mm, 13 mm and 17 mm, 17 mm and 22 mm, 22 mm and 28 mm, 28 mm and 37 mm) were tested by using a two-sample t-test assuming unequal variances with a signi cance level alpha=0.05.

Results
After generating and reconstructing the images in the subsets with shorter acquisition frames, the SUVMax 2D and 3D, the SUVMean 2D and 3D and the SUVPeak were calculated for the 6 spheres in each image of a subset. The SUV parameters from all the images per sphere were averaged and their standard deviation was estimated. The average value was plotted as a function of the sphere diameter, obtaining a recovery curve based on SUVMax 2D and 3D, the SUVMean 2D and 3D and the SUVPeak. The results are shown in Fig. 1 as examples for the 4 s (blue line), 15 s (red line) and 30 s (yellow line) subsets.
For the larger spheres, for the SUVMax and the SUVPeak parameters, the shorter acquisition lengths tend to have a higher average value, as shown for the three representative datasets in Fig. 1. The estimated average coe cient of variation of the SUVMax2D and 3D, SUVMean2D and 3D and SUVPeak at 150 s reconstruction length and its standard deviation has been estimated for spheres of different diameters as plotted in Fig. 7 and 8 for the 10 mm and 37 mm spheres (red and blue lines respectively). The average estimated coe cients of variation are summarized in Table 1 The data were statistically analysed to verify if the difference between the estimated SD of the SUV parameters was signi cant between spheres (same parameter, different sphere diameter, so difference between columns in Tab.1) and between SUV parameters (different SUV parameter, same sphere diameter, so difference between rows in Tab.1).
Concerning the differences between spheres, we report that the difference between the estimated coe cient of variation of the SUV parameters of the sphere with d=10 mm and d=13 mm and with d=28 mm and d=37 mm is signi cant for SUVMax, Mean and Peak. For SUVPeak, the difference between the estimated coe cient of variation of the sphere with d=13 mm and d=17 mm is also signi cant. The difference between estimated coe cient of variation of the SUVMean 2D and 3D is signi cant between each sphere diameter.
Concerning the differences between SUV parameters, we report no signi cant difference between the estimated coe cient of variation of the SUV parameters of the two smaller spheres (d=10, 13 mm). The values of the SUVMean 3D are signi cantly lower than the values of the SUVMax 2D, 3D and SUVPeak for the four larger spheres (d=17, 22, 28, 37 mm). The SUVPeak signi cantly differs from SUVMax 2D and 3D for d=17 mm and for SUVMean 2D and 3D for d=22, 28, 37 mm.

Discussion
This study describes a methodology to estimate the statistical technical variability SUV parameters in PET images of different lesion sizes acquired with the same scanner and reconstructed with the same method. The proposed method determines the coe cient of variation of SUV parameters, which is a part of the total variation and provides an estimate of the relevance of the statistical technical variation in SUV values in the clinical practice. The value of this method is that it enables the possibility of estimate the in uence of lesion size and choice of SUV parameter on the total variation.
The method divides the total acquisition into subsets at different timeframes and estimates the expected coe cient of variation of the total acquisition using the standard deviation within the subsets. The method was tested on a 150 s acquisition with a foreground to background activity ratio of 10:1.
The article shows how subsets of an original scan can be used to estimate the technical variation between images at different reconstruction lengths.
In Fig. 1 the calculated values of the SUV parameters are shown for all spheres and for all acquisition lengths of the timeframes in the subset. We report higher average SUV values for shorter acquisition lengths. This can be explained by the fact that at shorter acquisition lengths, when the images are noisier, the chance is bigger that a single voxel or group of voxels will have a higher value due to the a higher statistical variation. This effect occurs with the SUVMax and the SUVPeak, but not at the SUVMean where all the voxels in the regions are used for the calculation. Figure 2 to 6 show the coe cient of variation of the SUV parameters as a function of the reconstruction length for the different spheres. This test can provide an expectation of the variation for noisier images (images with high standard deviation between voxels) or for small lesions. In our case we observe that the variation is higher for shorter reconstruction lengths, suggesting that the contribution of the technical variation might be higher, for example, for images acquired with a shorter acquisition time or with a low counts emitter. The higher variation at shorter reconstruction lengths reaches values up to 30%, in our case, for the sphere with 10 mm diameter. This suggests that, when performing quanti cation of PET images on small lesion, the effect of the technical variability evaluated with this method might not be negligible when compared with the variation used for diagnostical purposes. Our method might also be used to de ne the minimal required acquisition length: when the technical statistical variation of the SUV has become negligible to the test-retest variation, a longer acquisition time might not add value. Fig. 7 and 8 show that the choice of the length of the subset to use for the estimation is not expected to create a bias in the estimation of the coe cient of variation of the full-length dataset.
In Table 1 we report the result of the estimated standard deviation of the SUV quanti cations. We report signi cant differences in variation for different sphere dimensions. The difference is always signi cant for each SUV parameter for the smallest (diameter of 10 mm) and the largest (diameter of 37 mm) sphere. The difference is signi cant between all spheres for the SUVMean. The values are typically ranging from 5% for the 10 mm sphere to 1% for the 37 mm sphere. We notice that this is in accordance with the range reported for simulated data [13]. The maximum expected variation between images, for any estimated parameter, did not exceed the 6% for the smallest object (sphere of diameter 10 mm) and the 2% for the largest object (sphere of diameter 37 mm) for an acquisition length of 150 s. This provides an indication of the contribution of the technical statistical variation when the same scanner is used, with equal reconstruction length and activity, and can be compared with the variation measured in FDG PET test-retest studies reporting a typical variation of approximately 10% [14,15,16]. Our study is not a testretest study and aims to quantify the intrinsic statistical variation to expect when (ideally) repeating exactly the same acquisition without changing any external factors if not the statistical ones related to the characteristics of the emitters. The variation measured in our study is smaller than the one typically measured in a test-retest study due the fact that we do not have to deal with other factors, as for example repositioning of the patient or phantom and reinjection of the activity.
Nevertheless, it is important to notice that the value of the estimated technical statistical variation calculated for our scanner and reconstruction method is not directly translatable to other centres. The variability in the calculation of SUV parameters inhibits the direct comparison of these values. [17] Other factors introducing variation of the technical components are reconstruction protocols, analysis methods and scan duration, and their in uence is too prominent for a direct comparison of the absolute values of the variation.
[18] A simple method as the one described in this article can be routinely implemented to identify the contribution of the technical variation between PET images, in order to take into account of it during quanti cations and comparisons of images for diagnostic purposes.
The SUV parameters (Max, Mean and Peak) presents some signi cant differences for the same sphere diameter.
For what concerns the smaller spheres (d=10, 13 mm), the averaging step introduced in the calculation of SUVMean and Peak does not provide a signi cant difference in the coe cient of variation in our measurements for such small lesions. For larger lesions the difference between the variation in SUV Mean 3D and SUV Peak is signi cant. This suggests that the dimension of the ROI used for the averaging has a signi cant effect on the SUV quanti cation and that a too large ROI might atten the results. It is worth reminding that our de nition of SUVMean was based on the knowledge of the objects that were measured because the ROI was de ned as a sphere of diameter equal to the nominal diameter of the imaged sphere. This is not always possible during the analysis of images for diagnosis purposes. In that case another de nition of SUVMean must be used and the variation between measurements might be expected to increase [12,14,15]. Other factors present in clinical practice, as glucose blood levels, velocity of FDG uptake in the lesions or weight recording can increase the SUV variation in diagnostic images of patients. [19,20,21,22,23] Another method to estimate the technical statistical variation would be to acquire a dataset with a longer acquisition time in comparison with the acquisition time used for diagnostic and generate subsets of the long acquisition with time length similar to the one used for diagnostic. This could be a more direct way to measure the variation, possibly less susceptible to low photon statistics as we report in images reconstructed with a short reconstruction length. A similar approach has been shown in [7] for SUVMax and Mean for reconstruction length of 5 minutes with variation between 11.2-1.2% depending on the lter, type of acquisition (2D or 3D) and parameter (Max or Mean) used.
For this study we worked with a foreground to background activity ratio of 10:1. In order to further verify the method with other uptakes it would be possible to repeat the study with other ratio's, as for example 5:1 and 2.5:1.
Furthermore, the acquisitions could be repeated after a certain amount of hours in order to analyse the variation with other levels of noise. As previously discussed, a higher coe cient of variation is to be expected for noisier images.

Conclusion
In this study we present a method to estimate the statistical technical variation of different SUV parameters. We used the method to estimate the statistical technical variation of SUV for different lesion sizes. The method shows that the coe cient of variation of SUVMax, SUVMean and SUVPeak varies as a function of the dimension of the objects in the imaged phantom and ranges between 5 and 6% for the smallest sphere (diameter of 10 mm) and between 0.9 and 2% for the largest sphere (diameter of 37 mm). The coe cient of variation reaches values up to 30% for shorter acquisition lengths (in the order of 4 s), suggesting that the variation might become not negligible for noisier images with low counts. The difference between SUV parameters (Max, Mean and Peak) was not signi cant for the smaller spheres. The variation in SUVMean 3D was signi cantly lower for the larger spheres in comparison with the variation of the other parameters. Our method can be used routinely to provide insight into the statistical technical variation of a SUV quanti cation of images acquired with the same scanner and reconstruction parameter.  Coe cient of variation of the SUVMax 2D as a function of the reconstruction length.

Figure 3
Coe cient of variation of the SUVMax 3D as a function of the reconstruction length.

Figure 4
Coe cient of variation of the SUVMean 2D as a function of the reconstruction length.

Figure 5
Coe cient of variation of the SUVMean 3D as a function of the reconstruction length. Coe cient of variation of the SUVPeak as a function of the reconstruction length. Estimated variation of the SUV parameters between repetitions of images with reconstruction length 150 s for a sphere of 10 mm.

Figure 8
Estimated variation of the SUV parameters between repetitions of images with reconstruction length 150 s for a sphere of 37 mm.