The method that we describe in this paper for estimating the statistical technical variation of SUV parameters between PET images includes three main steps:

acquisition of the dataset

generation and reconstruction of subset with shorter acquisition frames

calculation of the variation in the subsets, and traslation to the variation of the original dataset
To test the validity of our method, the variation is calculated based on subsets with different acquisition lengths.
Image acquisition and reconstruction
A NEMA NU2–2007 image quality phantom was imaged on a Philips Gemini TF PET/CT system (Philips Healthcare, Andover, MA). PET reconstructions were made using scanner’s default Ordered Subset Expectation Maximization (OSEM) reconstruction algorithm with 33 subsets, 3 iterations, matrix size of 144 × 144, and voxels of 4×4×4 mm. No Gaussian filter was applied. The reconstruction was corrected for geometrical response and detector efficiency (normalization), random coincidences, scatter and attenuation. Data were stored in listmode, to be able to reconstruct subsets with different acquisition times. All listmode reconstructions were decaycorrected to the start time of the acquisition.
The phantom acquisitions were made according to the requirements for the EANM/EARL FDGPET/CT accreditation [9]. The phantom was composed by a fillable torso compartment acting as background, by a cylindrical insert in the centre of the torso compartment and by 6 fillable spheres of different diameters (10 mm, 13 mm, 17 mm, 22 mm, 28 mm and 37 mm) placed around the central insert. The fillable torso compartment and the spheres have been filled with a solution of water and F18FDG. At the starting moment of the scan the activity concentration was 2.10 MBq/ml in the torso background compartment and 20.04 MBq/ml in the spheres, resulting in an actual sphere to background ratio of 9.6:1 (aim is 10:1). [10]
The original dataset was acquired with 150 s frame duration. The total acquisition time was 10 minutes (600 s, 4 bed positions). The fulllength 150 s listmode acquisition was divided into subsets of shorter time length varying from 4 s to 30 s. An attenuation corrected reconstruction was performed for different reconstruction lengths, generating as many images as possible per subset, without using the same coincidences by varying the starting time of the reconstruction. For example, for the first subset (4 s reconstruction length), the first image was reconstructed using the coincidences recorded between 0 and 4 seconds, the second image by using the coincidences recorded between 5 and 8 seconds and so on, varying the starting moment of the reconstruction, generating a total of 37 images. The longest frame length was 30 s, generating a subset of 5 images. A total of 14 subsets was generated, of respectively 4s, 6s, 8s, 10s, 12s, 15s, 17s, 19s, 20s, 22s, 24s 26s, 28s and 30s acquisition length.
The Philips reconstruction software automatically corrected each reconstruction for the decay of 18F (halflife of 109.7 minutes [11]), compensating the time difference between the start of the study and the start of the reconstruction by using an opportune scaling factor.
Image analysis
The datasets were analysed using a Python 3.7.0 script (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]. The algorithm automatically detected the spheres and their central 2D plane. Different SUV parameters were calculated in each image of the subsets:

SUVMax 2D: the maximum voxel value in the central 2D plane;

SUVMax 3D: maximum voxel value in each sphere;

SUVMean 2D: the average value of the voxels in the central 2D plane of each sphere;

SUVMean 3D: the average value of the voxels in each sphere.
The SUVMean 2D and 3D were calculated considering the complete 2D central plane or 3D volume, respectively, without using thresholding techniques based on pixel values or on a percentage of the maximum value.

SUVPeak: the average value within a 1 cm3 sphere centred in the maximum value of the sphere [12]. The algorithm fitted the sphere, found the maximum voxel value in the 3D volume, used this voxel as the centre of a spherical Region of Interest (ROI) of 1 cm3 and calculated the average value within the 1 ml sphere.
The SUV values were calculated per image in a subset. The SUV values populations have been tested for normality with a KolmogorovSmirnov test and all subsets matched the characteristics of a normal distribution. The SUV parameters of the different images in a subset were averaged and their standard deviation was calculated. The coefficient of variation of the SUV parameters was calculated as the standard deviation divided by their average value multiplied by 100.
In our measurements we can assume a random sampling model, with no correlations, for independent and identically distributed random measurements. The different subsets do not differ in activity nor voxel dimensions and the quantification of the SUV parameters has been done by using the same ROI dimension. We can therefore describe the ratio of the standard deviations SD of two independent repetitions of PET measurements as:
(1)
With SD1 and SD2 being the standard deviations and RL1 and RL2 being the reconstruction lengths. [7]. By setting the measured coefficient of variation of the SUV in a subset as SD1 and the length of the reconstruction of the specific subset as RL1, it is possible to estimate the variation SD2 between repetitions of scans at the total acquisition time RL2=150 s according to:
(2)
Formula 2 was used to calculate the estimated SD of the SUV at reconstruction length SD2=150 s using each subset of the dataset. Since we divided our acquisition into 14 different subsets, we could calculate 14 different estimations of the SD2. We validated our method by testing whether the value of the estimated SD2 was independent of the acquisition length of the images in the subset.
Furthermore, the results were tested to verify if there is a significant difference in SUV between adjacent spheres. The population of two adjacent spheres (10 mm and 13 mm, 13 mm and 17 mm, 17 mm and 22 mm, 22 mm and 28 mm, 28 mm and 37 mm) were tested by using a twosample ttest assuming unequal variances with a significance level alpha=0.05.