Retrospective PET/CT Imaged Based Dosimetry Following Transarterial Radioembolization

Background: Transarterial Radioembolization (TARE) uses beta particle emitting microspheres, containing Yttrium 90 ( 90 Y), to treat unresectable primary and metastatic liver tumors. Current dosimetry models are highly simplistic. 90 Y post-TARE PET/CT images are being evaluated for their use in image based dosimetry post treatment allowing for improved treatment ecacy. We sought to provide guidance on reconstruction and dose calculation algorithms that give the most reliable image based dosimetry. Methods: Retrospective image study for 11 patients each having a 90 Y PET/CT following TARE SIR-Spheres ® treatment. In total, 6 emission images were reconstructed using Ordered Subset Expectation Maximization (OSEM) routines and Bayesian penalized-likelihood reconstruction algorithms (Q. Clear). PET/CT image sets were resampled to 3mm resolution and used to create voxel based dose distributions using our own convolution algorithm (WFBH), SurePlan TM MIRD and SurePlan TM LDM. Additionally, we analyzed the effects of a systematic personalized background subtraction prior to dose distribution creation. Results: Reconstructed activity was highest among Q. Clear β = 350 and 1000 and both OSEM methods. OSEM with 3 iterations and 24 subsets gave comparable dose distributions to Q. Clear β=1000 (max dose ratio of 0.96 ± 0.14). No statistical difference was identied among Q. Clear and OSEM methods comparing dose distributions with and without additional background subtraction post-reconstruction (average local gamma value = 98.01% ± 3.75%). LDM is more sensitive than MIRD to small activity differences between reconstruction methods (p-value= 0.048). LDM calculated max dose values were higher than MIRD. DVH curves showed LDM model potentially underestimates low dose values and overestimates high dose values. Conclusion: Background subtraction is suciently addressed in PET image reconstruction. We recommend post-TARE image based dosimetry to be calculated using image series reconstructed with Q. Clear β= 1000 OSEM


Background
Transarterial Radioembolization (TARE) is a treatment modality for unresectable primary and metastatic liver tumors where beta particle emitting microspheres are injected locally to implant around the tumor(s).
TARE treatment typically utilizes the hepatic artery as a passageway to deliver Yttrium-90 ( 90 Y) containing microspheres that have a maximum radiation delivery range of 11 mm and a mean range of 2.5 mm in tissue [1][2][3]. Prior to treatment, Technetium-99m macroaggregated albumin particles ( 99m Tc-MAA) are used as a predictor of 90 Y microsphere distribution and are assumed to be a valid foundation for dosimetry calculations [1,[4][5][6]. However, due to the complexity of this treatment modality there is often poor agreement in distribution between 99m Tc-MAA particles and 90 Y microspheres upon treatment [7][8][9]. Following the dosimetry process for general brachytherapy procedures, post-TARE emission imaging validation is necessary for patient safety and accurate measures of absorbed dose [7][8][9][10][11].
The most common post-TARE imaging protocol is 90 Y bremsstrahlung single-photon emission computed tomography (SPECT) to verify treatment deposition in the liver along with no extrahepatic microsphere deposition [1,12]. Due to the noisy nature of these images, they are typically used only for qualitative purposes because quantitative assessment is di cult [1,13]. However, many institutions have been looking at the promising option of post-TARE 90 Y positron emission tomography (PET) imaging. Although a majority (99.983%) of 90 Y decays through beta minus decay to the ground sate of Zirconium-90 ( 90 Zr), there is a small decay branch that decays to the 0 + rst excited state of 90 Zr at 1760 keV; which is followed by an internal pair production with a branching ratio of (32.6 ± 0.07) × 10 − 6 [1,[14][15][16]. This small positron emission allows for the use of PET imaging post-TARE as an option for quanti cation of radioactivity within the treated region and improved spatial resolution compared to bremsstrahlung SPECT [13,[17][18][19]. Additionally, from a clinical perspective, these post-TARE 90 Y PET images give an evaluation of a treatment's technical success as well as allow generation of absorbed dose maps which have the potential to predict treatment e cacy [5,20,21]. Furthermore, they can provide a management plan for patient follow-up as well as treatment-planning for successive therapies [21,22]. Others have reported variation amongst reconstruction methods but there is strong evidence to suggest that these 90 Y PET images are accurate in their quanti able data and therefore can be used as dosimetric tools once the necessary parameters are optimized [10,17,23]. This study aims to provide guidance on optimal reconstruction method and dose calculation algorithm for the most reliable image based dosimetry.
The two reconstruction methods explored are a Regularized Reconstruction iterative algorithm (Q.Clear) [24] and an Ordered Subsets Expectation Maximization (OSEM) algorithm. Q. Clear is a Bayesian penalized likelihood reconstruction algorithm with an additional term that allows it to reach full convergence, steering the optimization away from nosier images [24]. OSEM is an accelerated variant of the Maximum-Likelihood Expectation Maximization algorithm using an update equation to create a new image with increased likelihood while reducing reconstruction time by performing updates based on small parts of the data at once [24].

Materials And Methods
Two types of studies were executed in this work to understand post-TARE dosimetry effects based on reconstruction and dose calculation algorithms: phantom studies and patient studies.
Phantom based study A phantom study using a National Electrical Manufacturers Association NU-2 image quality phantom was performed. A 90 Y PET/CT image was initially acquired on a GE Discovery MI DR™ (GE Healthcare, Chicago, IL, USA) system with a scan time of 5 minutes per bed position on day 0, with an increase in acquisition time to 10 minutes and 20 minutes on day 4 and 7 respectively, in order to keep total counts consistent across time points as we account for decay. Six different reconstructed 90 Y PET image series were analyzed for each day: four Q. Clear variations with β parameter values of 350, 1000, 4000 and 7000 and two OSEM variations using 3 iterations and 18 or 24 subsets. This set of PET images were reconstructed at the time of image acquisition on the scanner with a slice thickness of 3.27 mm and pixel spacing of 2.73 mm x 2.73 mm. Each emission image was then retrospectively resampled to 3 mm cubic voxels and analyzed, both using MIM Software SurePlan™ (MIM Software Inc., Cleveland, OH). Each sphere within the phantom was contoured on its associated CT image and the contours were then transferred to each reconstructed PET image. The sphere sizes within the phantom have the following diameters: 10 mm, 13 mm, 17 mm, 22 mm, 28 mm and 37 mm. Recovered activity (RA) values for each sphere was calculated using the equation below where is the mean pixel value in the volume of interest (VOI) and c is the known activity concentration.
Additionally, dose distributions based on these images were created using multiple algorithms.

Patient based study
We followed an IRB approved retrospective patient imaging review in which post-TARE image based dosimetry models were created for 11 patients who underwent treatment at Wake Forest Baptist Medical Center between July 2016 -March 2018 (Table 1 shows patient characteristics). All patients consented to a positron emission tomography/computed tomography (PET/CT) within 4 hours of treatment acquired on a GE Discovery MI DR™ (GE Healthcare, Chicago, IL, USA) system following SIR-Spheres® (Sirtex SIR-Spheres Pty Ltd., St Leonards, Australia) implantation in addition to the standard protocol of a Philips Healthcare™ bremsstrahlung single photon emission computed tomography/computed tomography (SPECT/CT). Emission scans were reconstructed using the same six algorithms described above in the phantom based study and resampled SurePlan™ to 3 mm cubic voxel size.  Figure 1 shows an example of these image based dosimetry models. Methods to compare dose distributions include local gamma analysis, dose pro les taken along a sagittal line through the max dose point de ned by the dose distribution created using MIRD Q. Clear 1000 and dose volume histogram (DVH) curve comparison.
To study the effects of background on post-TARE dosimetry, a CT contour de ned as the whole body minus the liver and lungs expanded by 2 cm was created. Figure 2 is a representative background contour shaded in blue. The average concentration of activity found in this contour was taken as the constant background value (Table 2 shows average statistics for this contour over all patients) for each corresponding image set and a dose distribution with this value subtracted was created using our WFBH dose algorithm (WFBH Bkgrd). These background contour statistics are averaged over all patients in the study and are presented for each reconstruction method Results Shown in Fig. 3 are RA values measured for each image reconstruction series for the four larger sphere sizes in the phantom on Day 0. RA values of the 10 mm and 13 mm spheres are omitted from this gure due to their large uncertainty. With this data we investigated the effects of reconstruction method on RA to validate the accuracy of activity represented in these 90 Y PET images to better understand the reliability of activity in patient images. As sphere size decreased so did the calculated RA value which is due to the partial volume effect [10]. Table 3   The dosimetric effects of these reconstruction methods in patient post-TARE Y 90 PET images with support from phantom studies is the focus of the study. Figure 4 shows a dose pro le based on our phantom Y 90 PET image comparing the different Q.Clear β values while dose calculation algorithm was held constant. Moreover, Fig. 5 shows a representative dose pro le for our patient data. In both gures you can see there is a smoothing effect within the dosimetry due to reduction in noise as β value increases which was quantitatively shown by Scott and McGowan [10].
Through comparable RA values and apparent noise reduction as beta values increase [10] there is no added value in using Q. Clear 350 over Q.Clear 1000 and only gives increased noise that effects resulting dosimetry in an unfavorable manner shown through the dose pro les in Figs. 4 and 5. There is a grouping of pro les in Fig. 5 based on Q. Clear 1000, OSEM 18 and OSEM 24 reconstruction methods showing spatial similarity but further analysis is necessary to determine absolute dose similarities.
Initially a complete analysis on all 6 reconstruction methods were performed for all of our patients. For the purposes of presenting data in a clear manner, we chose to present the data for the following four reconstruction methods: Q. Clear 1000, Q. Clear 4000, OSEM 18 and OSEM 24. Our rational behind these four methods was based on our phantom study results which eliminated Q.Clear 350 and Q.Clear 7000. Although Q. Clear 4000 appears to be inferior to Q.Clear 1000, it is included in our presented analysis due to Rowley et al. deeming it to be visually optimal [12].
A large problem with trying to use radioactive emission images for quantitative purposes is the effect of background noise on activity accuracy. We sought to understand the effects of background on dose distributions and how the various reconstruction methods handle this added noise that is patient speci c.
Overall, both Q. Clear and OSEM methods su ciently handle background within their algorithms based on this study's de nition of what was background activity. Figure 6 shows an example of the dose pro les comparing dose distributions with and without background subtraction for a representative patient. A two-tailed p-value for each patient comparing these dose pro les was calculated and showed no statistical difference between WFBH and WFBH Bkgrd dose distributions. Furthermore, Table 4 shows the local gamma analysis, which compares percent dose difference relative to the dose at the given point [25], values between WFBH and WFBH Bkgrd dose distributions with 3% dose difference at 3 mm distance with no low dose threshold. Gamma values had an average of 98.01% with standard deviation of 3.75%. Figure 7 then shows DVH curves for the liver contour and treated area for a representative patient. The treated area is de ned to be the volume that has a dose value greater than 20% of the max dose. Overall, with both Q. Clear and OSEM reconstruction methods there was no signi cant effect on 90 Y PET based post-TARE dosimetry found when taking the additional step of background subtraction and therefore the reconstruction algorithms themselves handle this appropriately from the beginning. Comparing our WFBH convolution algorithm to the MIRD convolution algorithm through two tailed paired p-values of dose pro les and DVH curves, no statistical difference among WFBH and MIRD dose algorithms were found. Figure 8 shows representative DVH curves comparing MIRD and WFBH dose algorithms and their agreement. Note that the outlier sets of DVH curves are based on the Q. Clear 4000 reconstruction method.
Comparing LDM dose algorithm to MIRD for post-TARE 90 Y PET/CT patient images, LDM was statistically more dependent on reconstruction method than MIRD. For each patient the max dose obtained via LDM and MIRD for each reconstruction method was recorded and it was found that the standard deviation (SD) of the max dose values among the reconstruction methods were statistically higher (p-value = 0.048) within the LDM calculated values compared to the MIRD values. The spread of these deviations are shown in Fig. 9 and highlights how much the max dose uctuates as a result of reconstruction method when using LDM. This same variance in max dose values based on reconstruction method was found in our phantom study with a SD between the reconstruction methods for LDM being 137.98 and MIRD being 52.51As detailed in Table 5, LDM is giving consistently higher dose values across the reconstruction methods when compared to their MIRD counterpart. This overestimation of high dose regions is not just a scaling factor of the entire dose distribution as shown by the leftward shift in Fig. 10, where the percent of contour max is shown on the x-axis. If this were a scaling factor, then the DVH curves for the two dose algorithms would overlap with this renormalization to their respective max dose in place. However, instead we see that LDM algorithm

Discussion
Throughout our study we initially performed our analysis on all six reconstruction method variations. However, it became apparent to us that in combination with other's work [10,12] we could focus the presentation of data better as we advanced in this study by eliminating the presentation of results for methods that were inferior. Scott and McGowan concluded based on phantom studies that the optimal β value for the Q.Clear reconstruction method for Y 90 PET imaging was β = 1000 [10]. Similar RA values were found to Scott and McGowan following the same trend of as β values increase, there is a decrease in RA; likely due to this parameter's involvement in noise suppression [10]. Initially, the potential of using Q. Clear β = 7000 reconstructed images for dosimetry purposes was eliminated due to its signi cantly lower RA values and overly smooth dose pro les. Next, in agreement with Scott and McGowan, Q. Clear β = 350 is less optimal than Q. Clear β = 1000 due to much larger dose gradients from neighboring voxels for this reconstruction method, especially within low dose regions where less of this drastic variation is expected.
Moreover, all RA values are less than 0.80 for Y 90 phantom sphere compartments. Typical uorine-18 phantom studies give values that are much closer to unity across various imaging systems [26] and therefore the need for scaling is less of an issue with typical PET imaging. With these Y 90 PET images however, we need to account for the loss of activity through scaling the image by the inverse of the measured RA. These phantom studies should be done alongside other annual QA calibration measurements in order to have site speci c values and thus, most accurate rescaling values. At this time, we recommend using the RA value calculated for the 37 mm diameter sphere as the scaling factor seeing as its radius (18.5 mm) is larger than the maximum penetration depth of Y 90 beta. In the future, more complex corrections are needed in order to account for RA changes as sphere size decreases.
A goal of this study was to provide guidance on optimal reconstruction method and dose calculation method for post-TARE 90 Y PET image based dosimetry. In order to expand the impacts of this research, OSEM reconstruction methods were included for those institutions that do not have the capabilities of GE imaging based Q. Clear methods. We assume that the Q.Clear method is the closest to ideal since it can be proven to mathematically converge [24]. Dose distributions resulting from OSEM 24 image reconstruction were statistically the same as those derived from Q. Clear β = 1000 based on dose pro les, maximum dose value and DVH curves.
When comparing our dose algorithm, WFBH, to MIRD we found them to be statistically equivalent. The difference between the two algorithms is the scattering kernel used. WFBH uses a 5 × 5 × 5 kernel whereas the MIRD algorithm uses a 3 × 3 × 3 kernel. At this level the extra voxels are negligible in its effect on dosimetry. Considering the maximum penetration depth for 90 Y is 11 mm with an average of 2.5 mm, it is expected that this expanded scattering kernel would not have a signi cant effect on the resulting dose distributions [1][2][3]. Our images have a 3 mm cubic voxel size so considering 3 voxels out from that point versus 5 voxels would be a matter of 9 mm compared to 15 mm respectively. Both of which are well above the average range.
According to Pasciak et al. the main contributing factor to differences in image based dosimetry from mathematical solutions is the image noise in a 90 Y PET/CT scan and less so the dose calculation algorithm when comparing LDM and dose-point kernel (DPK) algorithms [23]. Additionally, Pasciak et al.
concluded through their studies that LDM appears to offer a slight improvement in accuracy compared to DPK when emissions scans are obtained on a PET system with a point spread function (PSF) greater than 3.25 mm at full width at half maximum (FWHM) along with a small voxel size but, in other situations depending on resolution and PSF, LDM leads to equal or greater error [23]. This study demonstrates that the LDM dose algorithm is highly dependent on reconstruction method. When looking at max dose values and DVH curves obtained via MIRD dose algorithms, there are very minimal differences from one reconstructed image to the next whereas with the LDM algorithm the differences are statistically larger. The realization that such subtle changes in the emission scan reconstruction can lead to such large dose value changes is less than ideal for image based dosimetry post-TARE. Moreover, LDM dose algorithms were overestimating high dose regions and underestimating low dose regions. We found that across all six reconstruction methods, LDM would consistently give signi cantly higher max dose values than MIRD except for Q. Clear 7000 (data not shown). However, this is explained by the main differentiating assumption between the two algorithms. LDM calculates a dose for a given voxel based on the assumption that all of the energy is locally deposited within that one voxel, for a given decay event [23]. Whereas with convolution algorithms, speci cally MIRD in this study, the dose value that is calculated considers the neighboring voxels based on the scattering kernel, i.e. not all of the energy is deposited strictly in that one given voxel. Thus, it would be expected that max dose values calculated using LDM would be higher than values obtained through MIRD due to the con nement of deposited energy. The agreement in max dose values between the two algorithms we saw with the Q. Clear 7000 reconstruction method is due to the over smoothing in that reconstructed image which is essentially forcing the activity to spread out across multiple voxels. Thus, the LDM dose algorithm was unintentionally applying the spread of activity between voxels as with the MIRD dose algorithm.

Conclusion
The use of 90 Y PET imaging post-TARE has the ability to identify dosimetric treatment success much sooner than historically where the lack of a successful treatment would not be apparent until follow-up appointments weeks to months after treatment [22]. The overall goal of this study was to advise on the optimal post-TARE 90 Y PET/CT image based dosimetry protocol. The convergence of the Q. Clear method has been proven [24]    De nition of background contour Representative PET image showing the background contour shaded in blue, liver in purple and lungs in pink.

Figure 3
Recovered activity values grouped by reconstruction method Calculated recovered activity for each sphere size in the phantom is shown with one standard deviation error bars. As β value increases the recovered activity accuracy decreases.   curve for the liver. Once again these two dose algorithms give the same curves adding to the argument that background subtraction is su ciently handled within the reconstruction methods.  Maximum dose values based on dose algorithm The spread in standard deviation of the maximum dose value across each reconstruction method for each patient is shown. LDM gives a higher variation in max dose values from one reconstruction method to the next.