Accuracy and uncertainty analysis of reduced time point imaging effect on time-integrated activity for 177Lu-DOTATATE PRRT in clinical patients and realistic simulations

Background. Dosimetry promises many advantages for radiopharmaceutical therapies but repeat post-therapy imaging for dosimetry can burden both patients and clinics. Recent applications of reduced time point imaging for time-integrated activity (TIA) determination for internal dosimetry following 177Lu-DOTATATE peptide receptor radionuclide therapy have shown promising results that allow for the simplification of patient-specific dosimetry. However, factors such as scheduling can lead to undesirable imaging time points, but the resulting impact on dosimetry accuracy is unknown. We use four-time point 177Lu SPECT/CT data for a cohort of patients treated at our clinic to perform a comprehensive analysis of the error and variability in time-integrated activity when reduced time point methods with various combination of sampling points are employed. Methods. The study includes 28 patients with gastroenteropancreatic neuroendocrine tumors who underwent post-therapy SPECT/CT imaging at approximately 4, 24, 96, and 168 hours post-therapy (p.t.) following the first cycle of 177Lu-DOTATATE. The healthy liver, left/right kidney, spleen and up to 5 index tumors were delineated for each patient. Time-activity curves were fit with either monoexponential or biexponential functions for each structure, based on the Akaike information criterion. This fitting was performed using all 4 time points as a reference and various combinations of 2 and 3 time points to determine optimal imaging schedules and associated errors. 2 commonly used methods of single time point (STP) TIA estimation are also evaluated. A simulation study was also performed with data generated by sampling curve fit parameters from log-normal distributions derived from the clinical data and adding realistic measurement noise to sampled activities. For both clinical and simulation studies, error and variability in TIA estimates were estimated with various sampling schedules. Results. The optimal post-therapy imaging time period for STP estimates of TIA was found to be 3–5 days (71–126 h) p.t. for tumor and organs, with one exception of 6–8 days (144–194 h) p.t. for spleen with one STP approach. At the optimal time point, STP estimates give mean percent errors (MPE) within +/−5% and SD < 9% across all structures with largest magnitude error for kidney TIA (MPE=−4.1%) and highest variability also for kidney TIA (SD=8.4%). The optimal sampling schedule for 2TP estimates of TIA is 1–2 days (21–52 h) p.t. followed by 3–5 days (71–126 h) p.t. for kidney, tumor, and spleen. Using the optimal sampling schedule, the largest magnitude MPE for 2TP estimates is 1.2% for spleen and highest variability is in tumor with SD=5.8%. The optimal sampling schedule for 3TP estimates of TIA is 1–2 days (21–52 h) p.t. followed by 3–5 days (71–126 h) p.t. and 6–8 days (144–194 h) p.t. for all structures. Using the optimal sampling schedule, the largest magnitude MPE for 3TP estimates is 2.5% for spleen and highest variability is in tumor with SD=2.1%. Simulated patient results corroborate these findings with similar optimal sampling schedules and errors. Many sub-optimal reduced time point sampling schedules also exhibit low error and variability. Conclusions. We show that reduced time point methods can be used to achieve acceptable average TIA errors over a wide range of imaging time points and sampling schedules while maintaining low uncertainty. This information can improve the feasibility of dosimetry for 177Lu-DOTATATE and elucidate the uncertainty associated with non-ideal conditions.

uncertainty. This information can improve the feasibility of dosimetry for 177 Lu-DOTATATE and elucidate the uncertainty associated with non-ideal conditions. Background Treatment of gastroenteropancreatic neuroendocrine tumors with xed activity 177 Lu-DOTA-octreotide ( 177 Lu-DOTATATE) has shown to increase overall survival and progression-free survival [1,2]. Despite the promising results of standardized treatment, the need for patient-speci c treatment options is indicated by the heterogeneity in pharmacokinetics, especially among tumors.
Dosimetry-guided peptide receptor radionuclide therapy (PRRT) can be used to maximize dose to tumors while ensuring that normal organs are spared during treatment. To perform patient-speci c dosimetry accurately, generally multiple SPECT/CT acquisitions (minimum of 3 as recommend by [3]) are needed in the week following activity administration to quantify the radiopharmaceutical distribution and t timeactivity curves (TACs). This process imposes an imaging burden on not only the clinic but also the patient. In order to reduce this burden, there have been investigations of reduced and single-time point (STP) methods to estimate the time-integrated activity on a patient-speci c basis while reducing the imaging burden. Hanscheid et al. [4] and Madsen et al. [5] have proposed two popular single time point methods that have been applied to 177 Lu-DOTATATE imaging. The former relies on an approximation of a monoexponential function that is only valid at times near the effective half-life of the organ of interest.
The latter assumes a population average effective half-life to produce an accurate estimate at a larger range of time points, but that still depends on each patient's kinetics being similar to that of the population average. These methods have been evaluated [6][7][8][9][10], with Madsen being particularly robust over a wide range of assumed patient-speci c effective half-lives and imaging time points, but with some of these studies cautioning against using STP methods [6,7]. In addition, other reduced time point methods that employ 2 or 3 SPECT/CTs (2TP and 3TP) have been explored in 177 Lu-DOTATATE [10][11][12][13][14][15] and other therapies [16][17][18][19][20].
These investigations into optimal imaging schedules for 177 Lu-DOTATATE dosimetry with reduced time points have often shared common conclusions, primarily about the importance of including late time points [6,11,12,13,16] and the in uence of early time points [6, 10,11,14,15]. Investigations into STP methods also tend to recommend similar imaging times due to the necessity for the imaging time to be close to the effective half-life of the target organ. In order to balance the long half-life of tumors with the relatively short half-life of kidneys (a primary organ at risk in 177 Lu-DOTATATE PRRT), STP imaging is often recommended at 72-96 h post-therapy (p.t.) [4,8,14].
Prior studies evaluating reduced time point imaging have typically focused only on evaluating performance for kidney [6,9,[12][13][14][15] while generally being limited to only 3-time point ground-truth data [6, 10,14] or utilizing sub-optimal planar imaging to use as ground truth [9,13,15]. Having access to 4time point SPECT/CT imaging following 177 Lu-DOTATATE PRRT for a cohort of patients and autosegmentation tools to de ne multiple structures, we aim to perform a comprehensive evaluation of reduced time point methods through analysis of clinical patient data and simulated patient data with realistic measurement noise modeling. We investigate 1, 2, and 3 time point imaging, identify the optimal sampling schedules, and evaluate the error and variability in time-integrated activity (TIA) estimation with both optimal sampling and other schedules that are non-optimal but allow more exibility to the clinic and patient.

Patients
Patients with gastroenteropancreatic neuroendocrine tumors who received at least one cycle of 177 Lu-DOTATATE PRRT at the University of Michigan Hospital and volunteered for post-therapy SPECT/CT imaging as part of an ongoing IRB-approved research protocol were eligible for this study. A total of 28 patients met these criteria and underwent 4-time point SPECT/CT imaging for subsequent dosimetry. Patient characteristics are summarized in Table 1. The post-therapy imaging process has previously been described in detail [21] and is summarized here. 177 Lu SPECT/CT imaging was performed on a Siemens Intevo at approximately 4, 24, 96, and 168 hours p.t. following 177 Lu-DOTATATE administration. SPECT/CT acquisitions at all time points were 25 min and used manufacturer recommended settings. Data were reconstructed with Siemens xSPECT Quant, outputting images in units of activity (Bq/mL). Partial volume correction was applied to delineated structures using volume-based recovery coe cients determined from previous phantom experiments [21].

Volumes of interest delineation
For each patient, the largest tumors (up to 5 per patient) were outlined by a radiologist using baseline imaging (CT and MR) and rigidly transferred to post-therapy 177 Lu SPECT/CT images. Tumors with volume < 2 mL or located in bone were not considered for analysis due to uncertainties associated with segmentation and poor SPECT resolution.
The following normal structures were also delineated and considered for analysis: healthy liver, left/right kidney, and spleen. Total liver and left/right kidneys were segmented on the CT of the reference 177 Lu-DOTATATE SPECT/CT using an automatic deep-learning-based model (MIM Software). Manual slice-byslice spleen contours and, as needed, manual ne-tuning of total liver and left/right kidney deep-learningbased contours was performed by a medical physicist. Delineated tumor volumes were removed from the total liver contour to give the "healthy liver."

Clinical data
Ground truth time-activity curve tting and integration. The SPECT images were co-registered with a contour-intensity-based SPECT-SPECT registration (MIM Software) [21] and the mean activity in each of the segmented structures as a function of time was t to either a monoexponential curve of the form or a biexponential curve of the form based on the Akaike information criterion as proposed by Sarut et al. [22]. Exponential functions with more terms (e.g. 4-parameter biexponential) were not considered since our data was limited to 4-time points and functions t with the number of data points ≤ number of free parameters are often underconstrained in that there are many combinations of parameters that t the data well [23]. The analytic TIA for the exponential t functions calculated for each structure for each patient in the clinical dataset was considered as the ground truth.
Reduced time point tting. Patient time-activity data was grouped into 4 time periods corresponding approximately to scans performed on days 0 ( : 3-5h), 1-2 ( : 21-52 h), 3-5 ( : 71-126 h), and 6-8 ( : 144-194 h) following treatment. For each structure of each patient, the activities from every possible combination of 2 and 3 time points were t to a monoexponential function and compared to the ground truth TIA. This led to 6 combinations for 2 and 4 combinations for 3 time points. Note that not all patients had scans that matched one-to-one with the designated time periods. We also evaluate the single time point calculation methods of Hänscheid et al. [4] and Madsen et al. [5], henceforth simply referred to as the Hänscheid and Madsen method, respectively.

Simulated data
Due to the relatively small patient data set (N = 28) and the discrete nature of patient time-activity data, we generated additional, clinically realistic time-activity curves by simulation. Simulation was performed in 3 parts as described in the following paragraphs: time-activity curve generation, activity sampling, and adding noise.
Time-activity curve generation. 250 simulated TACs were generated for each evaluated structure (healthy liver, kidney, spleen, and tumor), the distribution of monoexponential and biexponential ts was maintained by bootstrapping the t type from the clinical dataset 250 times. Within each t type, a lognormal distribution [8] was used for each exponential t parameter ( and or , , and of the clinical dataset) of the form: where is the parameter value, is the standard deviation of the natural logarithm of the clinical data for that parameter, and is the exponential of the mean of the natural logarithm of the clinical data for that parameter. To ensure realistic TACs and capture a wide range of clinical possibilities, the tted lognormal distribution for each sampled parameter was further restricted to the minimum and maximum within the dataset; or, for the case of for monoexponentials and for biexponentials, if a smaller minimum or larger maximum for effective half-life was reported in the studies summarized by Hou et al. [8], then that value was used as a cutoff instead.
Sampling activity from simulated curve ts. For each simulated curve, activity was sampled at 1-hour intervals from 1 to 240 hours post-injection for testing STP methods. For 2TP tting, activity was sampled at 4 hour intervals and similarly; for 3 TP tting, sampling was performed at 4 hour intervals but with restrictions that prevented any two time points from being on the "same day" (within-12 hours) or "overnight" (assuming therapeutic injection occurred at the beginning of the day at time 0 h, starting 4 h after injection, the only valid sampling times are at the beginning (24 h, 48 h, 72 h, …), middle (4 h, 28 h, 52 h, …), and end (8 h, 32 h, 56 h, …) of each day). This resulted in 240 possible times for the STP method, 1770 combinations for the 2TP method, and 3294 combinations for the 3TP method.
Measurement noise. SPECT imaging is affected by measurement noise, especially in low-uptake regions or at late timepoints. Thus, it is important to account for noise when simulating TACs. To estimate measurement noise to include in the virtual time-activity data, we performed repeat imaging (4 times) of a 177 Lu phantom to determine the variability in counts. This process was repeated 10 times with varying acquisition times to imitate the decreased count-rate over approximately 10 days that would be expected due to physical and biological decay, although all measurements were performed on the same day. 7 phantom inserts of various shapes and sizes in an anthropomorphic abdominal phantom were lled with concentrations in patient organs and tumor. The healthy liver was lled with 258 kBq/mL; 2 uniform spheres and 1 uniform ellipsoid were lled with 1682 kBq/mL; 1 uniform ellipsoid and one sphere with a cold center were lled with 421 kBq/mL; and the background was lled with 39 kBq/mL. Activity quanti cation of the repeat imaging with each scan length was used to compute a relative standard deviation for each object and scan length. The "effective activity" of each region was also found for each object and scan length which corresponds to the activity that would have resulted in the same number of decays if the scan length would have been 25 minutes (similar to a patient scan). The relative standard deviations and effective activities were used to t a power law that models measurement noise as a function of activity.
The power law function was then used to add measurement noise at each sampled activity value by providing the standard deviation of a normal distribution about each sampled point along the simulated time-activity curves.

Optimal time points and error determination
For each structure of each clinical and simulated patient, TIA was computed using STP methods, and monoexponential functions re t to the clinical and simulated 2 and 3 time point time-activity data. For each time point combination (sampling schedule), the accuracy of the STP and reduced TP imaging methods was evaluated using root mean square error (RMSE), mean percent error (MPE) with associated standard deviation (SD), and mean absolute percent error (MAPE). Ground-truth for the clinical patient data was the TIA corresponding to the original 4-time point t of the data while the ground-truth for the simulated data was the TIA corresponding to the simulated time-activity curve. The optimal sampling schedule for each structure was de ned as the sampling schedule with the lowest RMSE across all patients (clinical or simulated).

Results
Clinical Data STP methods. The STP Hänscheid and Madsen methods were evaluated for patient time-activity data at the 4 de ned time periods post-radiopharmaceutical injection. The percent error distribution of the STP predictions is presented in Fig. 1 for each structure. was the time period with lowest RMSE across both STP methods for all structures except slightly outperformed (RMSE 7% lower) when the Hänscheid method was applied to the spleen. The optimal time period and various measures of error are summarized for both methods and all four structures in Table 2.
Multi-time point. Results of tting 2TP combinations of time periods with monoexponential functions are represented by boxplots in Fig. 2. The time period combinations with lowest RMSE were for kidney, spleen, and tumor and for liver. These optimal schedules are presented in Table 2 alongside associated measures of error. MPE in TIA prediction for 3TP combinations are presented in Fig. Measures of error for this optimal sampling schedule are also presented in Table 2.

Noise Phantom
The results of the phantom experiment are presented in Fig. 4 with the average effective activity across 4 samples plotted against the relative standard deviation of those samples. A log-log transformation of the data indicates that a power-law reasonably describes the noise as a function of effective activity. The coe cients of the power-law were determined using ordinary least squares regression on the log-log transformation of the data presented in Fig. 4.

Simulated Patients
250 different simulated curves for each structure were generated with measurement noise added to the sampling points before re tting. MPE with 95% con dence interval for the Hänscheid and Madsen STP methods is plotted for each time point in Fig. 5. 2TP sampling schedules for two different rst time points (optimal and 48 h) are plotted in Fig. 6. MPE and SD for all 2TP sampling schedules are given as 2D heatmaps in Supplemental Fig. 2-5. These heatmaps indicate that there are many sampling schedules that exhibit MPE <+/-5%, even with added measurement noise. Table 3 summarizes the RMSE, MPE (SD), and MAPE at the optimal time point combinations for STP, 2TP, and 3TP sampling methods. A tool that computes various error metrics for the requested non-optimal 1, 2, or 3 time point sampling schedule and provides a visualization of the error has been made available online [24]. the kinetics of both the kidneys and tumor adequately. We note that the Hänscheid approximation is less robust than the Madsen method as the sampling time moves away from the population effective half-life with large negatively biased errors (Fig. 1, 5). However, while the Madsen approximation remains unbiased across a wide range of sampling times, variability increases quickly with distance from the optimal time.  10,11,14,15]. The slight overestimation that we observe in our data and that is present even in the optimal time point groupings is due to the monoexponential ts missing information about the uptake phase of the pharmaceutical that is measurable with early time point imaging. 3TP sampling schedules exhibited lower variability than 2TP, but improved MPE was generally only seen when one of the imaging time points was earlier than 48 hours. In that case, the additional time point can capture information about the radiopharmaceutical uptake phase. Across the clinical and simulated data, however, 3 TP exhibited lower variability than 2 TP methods. As indicated in Table 2, dropping the early (t D0 ) time period from the 4-time point ground-truth results in the optimal 3 TP clinical sampling schedule with MAPE ranging from 2.1%-2.5% depending on structure. Note that dropping the late (t D6−8 ) time period results in larger differences due to 3 TP monoexponential tting with MAPE ranging from 6.7%-15.6% (Fig. 3).
The simulation results corroborate the results derived from the clinical patient data. The effect of measurement noise is negligible in most cases with the notable exception of large errors ( > > 100%) and variability ( > > 100%) associated with choosing 2 late time points that are too close to each other (Suppl Figs. [2][3][4][5]. This effect is due to low activity structures (e.g. small, low-uptake tumors) with relatively large measurement noise at late time points that result in unrealistic monoexponential ts. It is also worth noting that the techniques we used to model clinically realistic simulation data could be applied as a framework for investigating error and variability in reduced time point imaging for other radionuclide therapies.
Gustaffson et al. Our study provides a comprehensive overview of TIA error and variability as a result of using different reduced and single time point tting methods to patient data and simulated time-activity curves. However, our current study possesses some limitations. While we expanded our analysis to include multiple normal organs by exploiting recently available auto-segmentation tools, bone marrow was not included as it is a complex structure that is not easily de ned. Furthermore, we observed that time-activity in regions of marrow uptake were not well-t by the 2 or 3 parameter exponential models that we used as the reference in this work considering that we have only 4 sampling points. Analysis of reduced time point methods for bone marrow will be undertaken in the future as we are in the process of developing tools for bone marrow dosimetry [25]. We are also limited in our clinical data by a small sample size of 28 patients with full 4-time point imaging. Our simulated data is similarly limited because the simulated time-activity curves informed by this limited sample size (although cutoffs for effective half-life incorporated values from other cohorts). Furthermore, our simulated curve t parameters are assumed to follow log-normal distributions based on observations from other groups [8] and supported by KS tests but the true distribution is not known a priori. The ground truth clinical curve ts were based on 4-time point tting of measured time-activity data. We chose to allow only 2 and 3 parameter exponential ts to this data because 4 parameter biexponentials were underconstrained for the 4-time point data, but organs of interest can exhibit a 2-phase clearance pattern that is not accurately captured by mono or bi-exponential ts. Simulations using physiologically-based pharmacokinetic models may provide more realistic curves that are not bound to monoexponential or biexponential functional forms, but are affected by uncertainty in the estimates of the physiological parameters [16-18, 20, 26]. It is also worth noting that there are other methods of STP and reduced time point dosimetry such as those that employ non-linear mixed models [27,28] or Jackson et al. [29] which uses historical time-activity curves normalized to a single imaging time to estimate the mean and range of TIA, while we focus on 2 of the more common and simple implementations of STP dosimetry.

Conclusions
We show that reduced time point methods can be used to achieve acceptable average TIA errors for both tumor and normal organs over a wide range of imaging time points and sampling schedules for 1, 2, and 3 TP imaging regiments with low uncertainty. Provided clinics avoid imaging at two early time points (< 48 h p.t.), even when accounting for measurement noise, performing 2TP imaging can provided TIA estimates with average error and standard deviation less than 5% of the ground-truth TIA for tumor and kidney. 3TP imaging provides similar performance but with generally lower variability. The 2 common STP methods investigated exhibit slightly higher average error and variability but still show MPE within +/-5% and SD < 9% for tumors and organs at optimal time points. STP imaging at time points much different from the optimal increases error and/or variability. The study was conducted in accordance with the standards of the University of Michigan Institutional Review Board (IRB) which approved the study. All subjects provided written informed consent for posttherapy SPECT/CT imaging as part of an ongoing research protocol, as required by the IRB.

Consent for publication
All subjects gave written informed consent that included consent to publish data.

Availability of data and material
Some of the datasets generated and analyzed during the current study and an accompanying Python tool are available online at, https://github.com/averybpeterson/reduced-tp-error-checker, and an archived version can be found at, https://doi.org/10.5281/zenodo.7843928 [24]. The remaining data is available from the corresponding author on reasonable request.

Competing interests
YD is a consultant for MIM Software and DM is an employee of MIM Software.

Funding
This work was supported by grant R01CA240706 from the National Cancer Institute.
Authors' contributions AP conducted data organization, analysis, and manuscript writing and contributed to the study design. DM contributed to study design, interpretation of data, and manuscript writing. YD contributed to study design, interpretation of data, and manuscript writing. All authors read and approved the nal manuscript.  Percent error distribution for TIA estimates from 3TP tting of the clinical patient data grouped into 4 time periods. Box plots indicate minimum, maximum, median, 25 th , and 75 th percentile cutoffs. The black "X" identi es the mean value for each time period with triangle markers indicating the 95% con dence interval. Relative standard deviation plotted as a function of effective activity. The t line, as determined from ordinary least squares regression of a log-log transformation of this data, is plotted as a solid line. The t equation is also given. MPE with 95% con dence intervals for TIA estimates using the Hänscheid (blue) and Madsen (orange) STP methods as a function of sampling time for simulated time activity curves with added measurement noise. Indicated optimal timepoints correspond to the minimum RMSE.  For each structure, the MPE with 95% con dence interval for TIA prediction of simulated 2TP combinations is presented for two different rst time points: the optimal time point (green) and 48 h (purple). The optimal combination of rst and second time point, based on minimum RMSE, is indicated by a green "X".