Uncertainty in delineating the VOI is a primary source of error along the radionuclide therapy dosimetry chain [4, 13]. The degree of contrast enhancement, spatial resolution, and tumor volume are main factors that restrict the precision with which the observer can assess the lesion boundary on anatomical imaging modalities. Ideally, we would create an average VOI boundary across multiple observers, but this is an impractical use of resources in clinical practice. Our study simulated that ideal situation by having three radiologists repeatedly outline the same tumors on a historical data set. Leveraging these repeated measurements, we have quantified how observer Effects contribute to uncertainty in VOI delineation and the corresponding absorbed dose estimates and provided a model that can be used in future studies to estimate uncertainty. To our knowledge, this is the first study to assess the impact of manual lesion contouring variability on dosimetry results via an inter- and intra-operator study.
Dice coefficients revealed that the mean intra-observer spatial overlap (0.85) is significantly greater than the mean inter-observer overlap (0.79) in contours, which substantiates the assumption that operators tend to agree with themselves more than they agree with other operators. Based on variance components analysis, the vast majority of overall variance in volume, mean absorbed dose, and RECIST diameter is attributable to inherent differences between the lesions such as differences in volume and enhancement (represented by the darkest blue bars in Fig. 4). This observation makes sense in the context of the study because the lesions in our sample varied greatly in size. We anticipated that inter-observer effects would account for a greater proportion of variability than intra-observer effects, which is true in most cases. However, this hypothesis did not hold for mean absorbed dose overall, in small lesions, or in poorly defined lesions. Sensitivity analyses revealed this result to be largely attributable to one very small, poorly defined lesion (Fig. 2(d), lesion code 55.4). Radiologist A defined Lesion 55.4 to have a volume of 1.8mL on one read and 3.5mL on another; this nearly two-fold difference was greatly magnified by the application of RC correction at small volumes, thereby creating large variation within Radiologist A’s mean dose measurements. When excluding Lesion 55.4, inter-observer variability is larger than intra-observer variability across all outcomes and subgroups, as expected. We also anticipated that variability attributable to the reader would be greater in small versus large lesions and in poorly-defined versus well-defined lesions. Indeed, observer effects (the two bars in lighter shades of blue in Fig. 4) account for a greater proportion of the total variance in small lesions than in large lesions. Surprisingly, however, observer effects are not greater in poorly defined lesions compared to well-defined lesions. One possible explanation is that the subgroup analyses by boundary definition did not account for lesion volume, which we know to be an important factor in variability.
The inter- and intra-observer reliability coefficients presented in Table 2 suggest substantial agreement both between and within readers. Most intra-observer reliabilities are greater than 0.9, reinforcing the conclusion that observations of the same case made by the same reader are generally consistent and reproducible. Encouragingly, inter-observer reliabilities are nearly all above 0.8, reflecting substantial agreement between readers, as well. These findings are consistent with the findings of Meyers et al. [14], who report an inter-observer ICC of 0.94 for volume and 0.73 for mean absorbed dose for delineation on contrast enhanced CT in a similar cohort of HCC patients treated with 90Y RE. Similarly, McErlean et al. [11] determined intra- and inter-reliabilities of 0.957 and 0.954, respectively, for RECIST measurements on CT images, which compare well with our values.
We provide fitted uncertainty curves (Fig. 5) that can potentially be applied to future patient studies to produce an informed estimate of standard uncertainty without implementing the entire error propagation schema. However, it is important to caveat that these values depend heavily on the imaging modality/parameters and contouring method used. We expect that the findings will be also applicable to hepatic lesion types other than HCC, because lesion contouring was done on the contrast enhanced sequences of MRI, which is routinely used for evaluation of any primary or secondary hepatic malignancies. Figure 5 demonstrates that in general, uncertainty is reduced when progressing from volume to mean dose calculation. This is to be expected because the dose maps are blurred out by motion and the limited spatial resolution of 90Y PET. In Fig. 5, the sharp rise in the mean dose uncertainty at small volumes is partly due to the sharp rise in the RC curve at small volumes.
The relationship between volume and dose uncertainty ascertained by our empirical approach can be compared with results presented by Finnochiaro et al. [13], who used an analytical equation that captures uncertainty in volume as a function of image resolution. Although the pattern is the same, our estimates of uncertainty are much lower. For example, at a volume of 100 mL, Fig. 5 estimates just over 10% uncertainty in volume and about 5% uncertainty in mean dose. In contrast, Finnochiaro et al. estimate over 30% uncertainty in volume and over 25% uncertainty in mean dose. This difference can be mostly attributed to the fact that we used MRI for tumor segmentation, which is much higher resolution than the SPECT imaging used in the comparison study. Furthermore, although uncertainty in segmentation was the dominant factor, they included other sources of uncertainty from the dosimetry chain, which were beyond the scope of our study.
We applied our model of segmentation uncertainty from the present study to determine how it impacts a model of probability of tumor control published previously by our group. We found the largest impact on TCP among lesions with intermediate mean dose values, which is attributable to the shape of the logistic curve (Fig. 6). Overall, our analysis predicts that approximately one in four lesions would have ΔTCP of at least 25% when accounting for standard uncertainty. Although TCP is not presently formally utilized as a clinical decision-making aide, it reflects expected treatment efficacy and patient outcomes. A clinician might make different treatment decisions given 50% probability of tumor control compared to 75% probability. Similarly, our characterization of the segmentation uncertainty on absorbed dose reporting could be used to design processes to reduce its contribution to treatment failures. One solution is to devise a more reproducible and repeatable segmentation method. Another is to plan RE infusions with enough additional dose (“dosimetric margin”) to lesions such that the segmentation uncertainty has little effect on outcome; for example, forcing tumor absorbed doses deeper into the plateau region of a dose-response curve. Normal liver parenchyma must be considered as well in such a plan. Nevertheless, demonstrated benefit of personalized dosimetry in a recent trial [3] suggests that such escalation is feasible in select cases.
Although volume uncertainty is expected to be the greatest contributor to dose uncertainty in radionuclide therapies, there are other potential sources of error along the dosimetry chain [4, 5, 13] that we did not investigate, which is a limitation of our study. Some of these, such as those associated with the need for multi-time point imaging of the activity distribution, are not relevant to the 90Y RE application. However, uncertainties due to image mis-registration when transferring the contours defined on baseline MR images to the co-registered PET/CT dose maps are relevant. This could be estimated empirically by intentionally introducing mis-registrations in various directions and calculating the effect on the lesion absorbed dose. Quantifying this effect was beyond the scope of our study aims, but it motivates future work to integrate the analysis conducted here with other sources of variation. Similarly, it is important to note that our analysis of change in TCP only captures uncertainty arising from VOI delineation. Future studies could also benefit from the inclusion of more than three radiologists, which was another limitation of our study. Additionally, in the absence of ground truth, our study was constrained to assessing observer variability and was not equipped to ascertain accuracy. A previous study of accuracy and reproducibility using synthetic brain MR images reported that manual tracers tend to overestimate lesion margins compared to automated techniques [18]. Regardless, reduction of inconsistencies among radiologists reduces variability in dosimetry and has potential to reduce variability in patient outcomes when using dosimetry-guided radionuclide therapy.