To our knowledge, this is the first study to specifically compare the CL examined using different software programs. A novel aspect of this study was the evaluation of the number of participants recruited from three different facilities using different PET scanners. Using various agreement correlation analyses, a consistent relationship in the calculations of CL values among the three different methods - CapAIBL, VIZCalc, and Amyquant, for semiquantifying 18F-flutemetamol amyloid PET images, even across different facilities using different PET scanners, was clarified. Furthermore, the results of the linear regression adhered to the criteria established by Klunk et al. in 2015(1). The adherence to these criteria underscores the robustness and reliability of the analysis. Analysis assessing the consistency of CL values among these methods holds considerable importance, as it demonstrates strong agreement, highlighting their reliability and interchangeability for amyloid quantification using the CL scale.
The Food and Drug Administration mandates that amyloid PET images be qualified using a binary system of positive or negative scans(16). Despite the relatively high interrater agreement of visual assessment in various amyloid PET tracers (i.e., K = 0.8–0.9), approximately 10% of cases were interpreted as equivocal between positivity and negativity(17–19). Considering the benefits of early intervention using disease-modifying therapies, early screening using more objective methods to interpret amyloid PET findings is crucial. There have been reports that semi-quantitative analysis using the SUVR is useful for the interpretation of such equivocal cases(18–20). Quantitative evaluation, including the SUVR of amyloid PET, is vulnerable to variations arising from differences in tracers, timing of imaging acquisition, PET scanners, and imaging protocols(21). Additionally, the major shortcoming of SUVR is its inability to provide a unified evaluation for different radiotracers and target/reference region settings. Moreover, the CL scale represents a substantial advancement in amyloid imaging and offers standardized units that address several key challenges associated with standardized SUVR measurements. Advantages of CL include enhanced data comparability across different sites and tracers, consistent quantification, cutoffs for improved tracking of longitudinal changes, and simplified data interpretation(1). Therefore, validating the consistency and reliability of the CL calculations across different pipelines is crucial.
The remarkable consistency demonstrated by the three CL calculation methods used in this study is particularly intriguing. In particular, when conducting Bland–Altman analysis, which entailed pairwise comparisons among the three groups, the most notable average difference observed was a mere 1.2 units between CapAIBL and Amyquant. Despite concerns raised in various studies about factors such as differences in template utilization, standardization methods, and technical intricacies that could potentially affect the SUVR results following CL value calculations, our findings remain significant(22; 23). In recent years, considerable progress has been made in calculating CL values and/or SUVR from amyloid PET images. In the SUVR measurements, the WC was chosen as the reference region over the pons because of its higher stability and sensitivity. This choice may be attributed to the potential errors caused by the relatively small volume of the pons. Therefore, the WC is preferred as a reference region in this context(24).
Differences in the final results owing to the varied data processing techniques among the three pipelines were anticipated. Both CapAIBL and VIZCalc utilize non-MRI methods; however, they differ in the creation of reference image sets. CapAIBL locally and adaptively selects the best M templates for each cortical location and uses a Bayesian framework to generate consistent PiB estimates for each region, which are then standardized by the SUVR. In contrast, VIZCalc uses zero-mean normalized cross-correlation to compute the similarity of each candidate template with the participant's PET image and selects the candidate template with the highest similarity as the optimal template. SUVR may be overestimated when using standardized PET images with an average template compared to standardized MR images(14; 25). An optimal template, which is a weighted average image of positive and negative templates that maximizes the similarity to the participant's PET image, can reduce the standardization error for each patient(26). In contrast to Amyquant, VIZCalc and CapAIBL use PET templates as intermediaries for PET image standardization and directly calculate CL values from PET images. The variations noted in the Wilcoxon signed-rank tests between Amyquant and the other two methods may be primarily attributed to significant differences in the PET templates. However, it is important to emphasize that the differences among the three methods were minimal, indicating their limited diagnostic significance.
This study had several limitations. First, the inclusion of samples as a continuum resulted in an uneven distribution of CL values. In the Bland–Altman plots, it is evident that the average CL values are densely distributed in the low and high ranges, particularly in the lower range. However, the distribution in the intermediate equivocal range was relatively sparse. An uneven distribution of the CL values can lead to inaccurate consistency assessments. This might cause an overestimation or underestimation of agreement across the entire CL range, owing to significant discrepancies in certain ranges. Second, this study exclusively confirmed the effectiveness of 18F-flutemetamol and lacked evidence of consistent results with other radiotracers. Third, this study only conducted consistency validation for CL calculations using WC as the reference region. Performance validation of the three pipelines for CL measurements using other reference regions such as the pons and WC plus the brainstem was not conducted.
In conclusion, this groundbreaking study validated the consistent performance of the CapAIBL, VIZCalc, and Amyquant methods in semiquantifying 18F-flutemetamol amyloid PET images across diverse facilities with varying scanners. This analysis unequivocally underscores the substantial agreement between the CL values derived from these methods, thereby emphasizing their robust reliability and practical interchangeability for amyloid quantification on the CL scale. Despite the marginal differences in CL measurements when comparing Amyquant to other methods, we posit that these variances have minimal potential to affect the interpretation of amyloid-related findings.