The present study demonstrated that the harmonization procedure to remove site-specific effects improved the accuracy of detecting the loss of dopaminergic terminals in the striatum in multicentre DAT-SPECT data.
DAT-SPECT is an established method for the differential diagnosis of PD and related disorders [27]. Moreover, the uncorrected DAT data displayed an acceptable level of accuracy for differentiating PDs and HCs, in reference to the MDS PD criteria as the gold standard. Without harmonization, 92.6% of the diagnostic accuracy in a multicentre study appeared considerably high to support the clinical diagnosis of PD. Together with other clinical assessments, including the MDS-UPDRS part III score olfactometry, uncorrected DAT-SPECT in clinical practice appeared sufficiently accurate even for uncorrected data.
Several studies [1, 4–8, 28, 29] have demonstrated the effects of age, sex and scanner differences on SBR using linear regression models; thus, these effects are conceivable. However, a previous study using the Parkinson Progression Marker Initiative dataset reported that SBR data without correcting for sex or age might be acceptable for differentiating between PDs and HCs in a multicentre DAT study [9]. This claim reflected that the effect size of the dopaminergic terminal loss in PD was substantially larger than that of age and sex. Therefore, the confounding effects, other than the disease, were negligible in differentiating PD from HCs. This interpretation appears reasonable because the loss of dopaminergic terminals progress before the clinical onset of PD [30, 31], and a substantial reduction of SBR is already present at the onset, even in mild cases [32]. Indeed, the present study confirmed that the correction for age and sex did not substantially alter the ability of SBR to differentiate between PDs and age- and sex-matched HCs [9]. The correction for the age and sex is not necessary upon the availability of appropriate control data.
By contrast, corrections for age and sex may be required to compare individual data with those in the database [1]. This may also be necessary for the marginal mean difference in the SBR across the groups, for example, when SBR is applied to differentiating prodromal PD with healthy elderlies or with PD. Future studies should address this possibility. Owing to the existing effects of age, sex, and inter-site differences on SBRs, the consideration of correcting these effects depends on the purpose of the study.
The present study demonstrated that correction for differences in the facilities improved the diagnostic accuracy of dopaminergic denervation. The correction for facility differences comprised two factors as follows: (i) the correction across scanners by applying a linear transformation equation based on data from a phantom filled with 123I solution for each SPECT scanner (phantom correction) and (ii) the standardization of human operation to set up VOI in software computing SBR (operation standardization). Both factors exerted significant effects on the SBR. Moreover, we identified substantial interactions between the phantom correction and operation standardization, thus suggesting the effects differed across the facilities (i.e., some sites employed a similar procedure to the standard procedure, whereas some did not). The application of phantom correction and operation standardization achieved a high agreement with the clinical diagnosis of PD. Both corrections improved the effect size of the SBR differentiating PDs and HCs from mild to medium, with an improvement rate of approximately 6% (from ROC-AUC).
The ROC-AUC improvement may not appear monumental. However, this level of difference should exert tremendous effects in large-scale studies, such as a randomized control trial for disease-modifying therapy. In clinical trials involving thousands of participants, a 6% difference in diagnostic accuracy will result in over a hundred misdiagnoses [33–36]. A cohort based on an accurate diagnostic test should yield a more specific outcome of the intervention and save enormous time and financial costs in these large-scale studies. Therefore, we strongly recommend phantom correction and operation standardization to reduce false findings from DAT-SPECT for managing a large-scale multicentre SPECT study. The correction will be considerably greater in clinical trials in prodromal PD, which comprise marginal differences in SBRs from HCs.
The phantom-operation correction removed the site-effects in HCs; however, the correction only reduced the site-effects in PD (Supplementary Table 2). As DAT-SPECT reflects the severity of PD, this finding can be attributed to the difference in the severity of PD across the sites (Supplementary Table 1) [37]. Hence, the phantom-operation correction likely removed the technical differences across sites only, leaving the difference in the participants’ factors unaffected. This is favourable when we consider analyses using inter-individual differences after the harmonization.
Two HCs comprised SBRs categorized as PDs, and three PDs were categorized as HCs from the SBR cut-off, even with extensive corrections. We assessed the detailed clinical background of these participants. None of the two HCs (false positives) displayed increased MDS-UPDRS III scores or general cognitive decline (CDR = 0). However, one of the HCs with reduced SBR finished TMT-B at 82 s, approaching the cut-off value, and suspected latent cognitive decline. The remaining HC had an OSIT-J score of 5 points, thus indicating mild olfactory impairment. These participants were followed up in the PADNI cohort to monitor the development of parkinsonism or cognitive decline. The three PDs (false negatives) displayed MDS-UPDRS part III scores (excluding tremors) of 8, 15, and 14, respectively, thereby indicating relatively mild motor symptoms. Moreover, the PADNI will follow up with these participants to observe possible progress in parkinsonism and a decrease in SBR.
The correction with ComBat improved the diagnostic accuracy, comparable to the full model-based correction. The ComBat harmonization is principally used in genomic and MRI studies as a simple and robust method for correcting measurement bias across facilities. ComBat correction is a powerful method that easily replaces the laborious method, such as phantom scanning, at each facility. In addition, it appears useful during the inability to perform phantom scans, for example, for already completed research projects. Moreover, it should be effective while analysing a public neuroimaging dataset [38]. However, whereas ComBat correction is promising, it appears to have a limitation. The ComBat-corrected SBR revealed a low mean SBR compared with the remaining correction methods. The result was probably attributed to the low mean SBR of an uncorrected SBR in a facility with numerous participants. Specifically, the mean and variance of a single facility can significantly affect the corrected data, thus compromising its generalizability to third parties or in meta-analyses. Therefore, we recommend model-based corrections for the age, sex, and site-effect whenever possible.
A limitation of the present study was that PD diagnosis depended on the clinical symptoms, levodopa responsibility, and olfactory tests only, without intense tests to exclude atypical parkinsonism, for example, with 123I-metaiodobenzylguanidine SPECT.
In conclusion, we compared the correction methods for evaluating dopaminergic terminal loss in multisite DAT-SPECT data. The prospective phantom and operation correction improved the diagnostic accuracy and effect size, despite an institute lacking data from HC. A multisite database with a completely standardized SBR will enable reliable large-scale multisite research, thus overcoming the study-wise limitation at each facility. Furthermore, the ComBat correction reasonably improved the diagnostic accuracy of PD. The ComBat correction is applicable during unavailable phantom scanning, for example, to compare data with publicly available data sets.