Prediction of future Alzheimer’s disease dementia using plasma phospho-tau combined with other accessible measures

A combination of plasma phospho-tau (P-tau) and other accessible biomarkers might provide accurate prediction about the risk of developing Alzheimer’s disease (AD) dementia. We examined this in participants with subjective cognitive decline and mild cognitive impairment from the BioFINDER (n = 340) and Alzheimer’s Disease Neuroimaging Initiative (ADNI) (n = 543) studies. Plasma P-tau, plasma Aβ42/Aβ40, plasma neurofilament light, APOE genotype, brief cognitive tests and an AD-specific magnetic resonance imaging measure were examined using progression to AD as outcome. Within 4 years, plasma P-tau217 predicted AD accurately (area under the curve (AUC) = 0.83) in BioFINDER. Combining plasma P-tau217, memory, executive function and APOE produced higher accuracy (AUC = 0.91, P < 0.001). In ADNI, this model had similar AUC (0.90) using plasma P-tau181 instead of P-tau217. The model was implemented online for prediction of the individual probability of progressing to AD. Within 2 and 6 years, similar models had AUCs of 0.90–0.91 in both cohorts. Using cerebrospinal fluid P-tau, Aβ42/Aβ40 and neurofilament light instead of plasma biomarkers did not improve the accuracy significantly. The clinical predictions by memory clinic physicians had significantly lower accuracy (4-year AUC = 0.71). In summary, plasma P-tau, in combination with brief cognitive tests and APOE genotyping, might greatly improve the diagnostic prediction of AD and facilitate recruitment for AD trials. Plasma P-tau, in combination with clinical measures, predicts future Alzheimer’s disease dementia in two independent cohorts with high accuracy and is superior to the clinical diagnostic predictions of specialists.

C orrectly determining if a patient with subtle cognitive symptoms, such as memory decline, suffers from prodromal or preclinical AD and will progress to AD dementia within the near future remains a challenge for clinicians. The task is nonetheless of utmost importance for a timely referral to a memory clinic, a correct and early AD diagnosis, initiation of symptomatic treatment, planning for the future and, hopefully soon, for initiating disease-modifying treatments. Although there have been impressive developments in biomarkers for AD and progression to AD dementia, such as cerebrospinal fluid (CSF) analysis of β-amyloid (Aβ42 (ref. 1 )) or the ratio of Aβ42/Aβ40 (ref. 2 ), P-tau 3,4 and neurofilament light (NfL) 5 , as well as Aβ-positron emission tomography (PET) 6,7 and tau-PET 8,9 , the invasive nature, high cost and limited availability restrict their use to a limited number of highly specialized centers. A possible turning point has emerged with the recent development of blood-based biomarkers, making it possible to measure NfL 10,11 , Aβ42/Aβ40 (refs. 12,13 ) and P-tau (phosphorylated at either threonine 181 or threonine 217) in plasma [14][15][16] .
Plasma P-tau181 and P-tau217, in particular, have shown especially high diagnostic performance for discriminating AD dementia from other neurodegenerative diseases [14][15][16] . Plasma P-tau has also recently been shown to be suitable for individualized prediction of cognitive decline in individuals with mild cognitive impairment (MCI) 17 . In the clinical workup of patients with cognitive complaints, however, it is unlikely that plasma P-tau (or any other biomarker) will achieve the highest potential predictive accuracy on its own, owing to the multifactorial nature of AD etiology and its heterogenous clinical presentation. There is, therefore, now a need to identify which other measures plasma P-tau should be combined with to produce the most accurate prediction of future AD and establish an optimal diagnostic algorithm of non-invasive, cost-effective and easily available methods for early diagnosis of AD.
Furthermore, before establishing plasma P-tau in clinical practice, alone or as part of an algorithm, it is important to examine whether it actually performs better than the clinical prediction made by a treating physician, which has not been previously examined. The aim of the current study was, therefore, to examine the accuracy of plasma P-tau among patients with mild cognitive symptoms for predicting future AD dementia when combined with other accessible and non-invasive biomarkers. The prediction included not only the discrimination between progression to AD dementia and stable cognitive symptoms but also versus progression to other dementias. The accuracies were compared with the diagnostic prediction of memory clinic physicians who had performed extensive clinical assessments and evaluated cognitive testing and structural brain imaging at baseline. Selection of variables and accuracies from the models were examined in two independent, multi-center cohorts. The primary cohort was the Swedish BioFINDER study, Prediction of future Alzheimer's disease dementia using plasma phospho-tau combined with other accessible measures and the validation cohort was the ADNI. The primary outcome was progression to AD dementia within 4 years; secondary outcomes were progression to AD dementia within 2 and 6 years, respectively. Finally, a cross-validated model was established and implemented as an online tool for predicting the individual risk of progressing to AD dementia (http://predictAD.app).
results Participants in BioFINDER. The cohort included 340 consecutively enrolled patients with cognitive complaints. One hundred and sixty-four patients were subsequently characterized as having subjective cognitive decline (SCD), and 176 patients were subsequently characterized as having MCI. Ninety-one patients progressed to AD dementia at follow-up; 48 patients progressed to other dementias; and 201 patients did not progress to any dementia. The mean (s.d.) age was 70.7 (5.6) years, and 49% were women. The mean (s.d.) time to dementia was 2.9 (1.5) years, and the mean (s.d.) follow-up time in participants who did not progress to dementia was 4.5 (1.6) years. Participant characteristics are described in Table 1; the enrollment process is described in Extended Data Fig. 1. Figure 1 summarizes the model selection process and main results. First, a data-driven model selection was performed to select the model with the lowest Akaike information criterion (AIC)-that is, the best tradeoff between model fit and model complexity for predicting AD dementia (see Supplementary Methods for a detailed description of AIC). Variables screened for included key demographics, number of APOE ε4 alleles, brief tests from four cognitive domains, a magnetic resonance imaging (MRI) measure (' AD signature' cortical thickness from temporal regions prone to atrophy in AD 18 ) and plasma biomarkers (NfL, P-tau217 and Aβ42/Aβ40). Then, a parsimonious Data are shown as mean (s.d.) or n (%). Group comparisons were performed using the Mann-Whitney U test. a There were 17 participants with a clinical follow-up diagnosis of AD dementia who were Aβ-negative. These were coded as having other dementias. b Note that plasma P-tau181 was available in only a small subset in BioFINDER, and that pre-analytical and assay differences make the BioFINDER concentrations incomparable with the concentrations in ADNI. c Calibration-related (NfL) or camera-related (MRI) differences make it difficult to directly compare the results between the cohorts for these biomarkers. Note that, before analyzed in logistic regression models, biomarker concentrations were transformed so that higher values corresponded to more abnormal results. DLB, dementia with Lewy bodies; FTD, frontotemporal degeneration; NA, not applicable; PDD, pervasive developmental disorder; PSP, progressive supranuclear palsy; VaD, vascular dementia.

Prediction of AD dementia in BioFINDER.
model was created by removing as many variables as possible while maintaining a similar model performance defined as being within two AIC points of the lowest AIC model identified in the first step (ΔAIC < 2) 19,20 . Thereafter, variables were removed further in a stepwise procedure to examine the performance of more basic models.
Using the primary outcome-prediction of AD dementia within 4 years-the best model included the predictors plasma P-tau217, number of APOE ε4 alleles, executive function, memory function, cortical thickness and plasma NfL ( Fig. 1  Separate AUCs of univariable models for predicting progression to AD dementia at each time point from 2 to 6 years are shown and discussed in Extended Data Fig. 2. The clinical prediction of AD dementia in BioFINDER. The subsample where the memory clinic physicians at baseline determined the most probable underlying cause of the cognitive impairment (that is, 'clinical prediction') comprised 285 patients, of whom 72 converted to AD dementia during follow-up. Using the primary outcome (progression to AD dementia within 4 years), the AUC for the clinical prediction was 0.72 (95% CI 0.65-0.78), and, for plasma P-tau217 alone, the AUC was 0.81 (95% CI 0.75-0.87) (P = 0.03 versus the clinical prediction). Adding memory, executive function and APOE to plasma P-tau217 provided a further significantly higher accuracy than the clinical prediction (AUC = 0.90, 95% CI 0.86-0.94, P < 0.001) (see Fig. 2 and Supplementary Table 4 for details). Significantly better accuracies for the models versus the clinical prediction were also seen at 2 and 6 years (Supplementary  Tables 5 and 6).

Comparison with CSF biomarkers.
To examine the effect of using CSF biomarkers instead of plasma biomarkers, we tested different models with CSF P-tau, Aβ42/Aβ40 and NfL for the main outcome: prediction of AD dementia within 4 years (Supplementary Table 7  . Comparisons between AUCs were performed using DeLong statistics. b, ROC curve analyses of the different models for discriminating those who progressed to AD dementia versus those who progressed to other dementias or remained cognitively stable. Note that the comparison with the clinical prediction was performed on a subsample where the clinical prediction was available, hence the slightly different AUCs (and 95% CIs) compared with those shown in Fig. 1. n = 247, of whom 53 progressed to AD dementia within the time period. APOE, apolipoprotein E genotype (number of ε4 alleles); Exec. function, executive function; MRI, magnetic resonance imaging of the cortical thickness of a temporal AD-specific region; ROC, receiver operating characteristic.
In the model with best fit (Fig. 1, top model), the use of CSF P-tau and CSF NfL, instead of plasma P-tau and plasma NfL, produced a non-significantly different AUC (0.93, 95% CI 0.90-0.96, P = 0.44).
In a new data-driven model selection using CSF instead of plasma biomarkers, CSF P-tau, CSF Aβ42/Aβ40, memory, executive function and cortical thickness were selected for the model with best fit. This model also produced a non-significantly different AUC when compared to the model with best fit using plasma biomarkers (0.94, 95% CI 0.92-0.97, P = 0.085). The more basic model with P-tau, APOE, memory and executive function had very similar accuracies (AUC = 0.91 using either CSF or plasma P-tau; P = 0.55). Finally, in univariate analyses, both CSF and plasma P-tau had an AUC of 0.83 (P = 0.95).
Validation in the ADNI cohort. The cohort included 106 participants with SCD and 437 participants with MCI, of whom 102 progressed to AD dementia at follow-up and 28 to other dementias ( Table 1). The validation from BioFINDER was carried out in two steps. First, the same type of model selection was performed in ADNI to examine if similar variables were selected (with the exception that plasma P-tau181 was available instead of P-tau217 and that plasma Aβ42/Aβ40 measures were available only in a small subsample and, therefore, not included in the analysis; 21,22 see Supplementary Methods for details). Second, the key models identified from the BioFINDER cohort were tested in ADNI. Third, a cross-validated model was constructed and implemented online (see next section). For the primary outcome (predicting AD dementia within 4 years), the same biomarkers as in BioFINDER were selected for the model with best fit in ADNI, with the exception that plasma NfL was not chosen in ADNI ( Fig. 3 and Supplementary Table 8). Note that, even though plasma NfL was selected in BioFINDER, it was not a significant predictor (Supplementary Table 1). When testing the variables selected in the parsimonious model from BioFINDER in ADNI (plasma P-tau, MRI, APOE, memory and executive function), the accuracy (AUC = 0.91, 95% CI 0.87-0.94 in BioFINDER) was not different from the model with best fit established in ADNI (AUC = 0.91, 95% CI 0.87-0.94, P = 0.41, ΔAIC < 2) ( Fig. 3 and Supplementary Table 8). The more basic model with just P-tau, APOE, memory and executive function performed very similarly in both cohorts (AUC = 0.90, 95% CI 0.86-0.94 in ADNI versus AUC = 0.91, 95% CI 0.87-0.94 in BioFINDER).
Similar accuracies were seen in ADNI compared to BioFINDER for predictions of AD dementia within 2 and 6 years, respectively (Supplementary Table 2 versus Supplementary Table 9 and  Supplementary Table 3 versus Supplementary Table 10).

Cross-validation and implementation of a prediction algorithm.
As seen in the comparison between the BioFINDER and ADNI cohorts, a model consisting of plasma P-tau, memory, executive function and number of APOE ε4 alleles provided a good balance among simplicity, accuracy and generalizability for prediction of AD dementia within 4 years (AUCs 0.90-0.91; Figs. 1 and 4). We, therefore, created a new model in BioFINDER, where the estimates of the model could be directly tested in ADNI (and other cohorts). Because of the different plasma P-tau isoforms used in the cohorts, P-tau217 and P-tau181 were converted to binary variables (abnormal/normal). Unbiased plasma P-tau cutoffs were established in independent populations in BioFINDER and ADNI at the mean level + 2 s.d. in Aβ-negative controls (Methods). Cognitive domain scores for both the BioFINDER and ADNI cohorts were established based on the distribution in an independent control sample in BioFINDER without tau or Aβ pathology, adjusted for age and education (Supplementary Methods) 23,24 . The new model had an AUC of 0.89 (95% CI 0.85-0.93) in BioFINDER (P = 0.18 versus the model using continuous plasma P-tau217 data in Fig. 1). When the estimates of the model were validated in ADNI, the AUC was 0.86 (95% CI 0.81-0.90) (Fig. 4a). A secondary cross-validated model was constructed using plasma P-tau z-scores (instead of binary P-tau status) based on the distribution in Aβ-negative control participants (Methods). This model had an AUC of 0.90 (95% CI 0.87-0.94) in BioFINDER, and, when the estimates of the model were validated in ADNI, the AUC was very similar (0.89, 95% CI 0.85-0.93) (Extended Data Fig. 5). The cross-validated models were implemented online at http://predictAD.app where the individual probability of progressing to AD dementia within 4 years can be calculated for new cases (Fig. 4b).
Additional analyses. In primary care centers and in centers not specifically specialized in dementia disorders, patients without dementia are unlikely to be correctly classified as having SCD or MCI, because it requires a substantial cognitive battery 25 . However, this is often done in specialized memory clinics, and, for this sake, separate analyses in patients with SCD and MCI were performed. When evaluating the predictive accuracy in patients with SCD, we found that plasma P-tau217, APOE, memory and executive function could predict development of AD dementia within 4 years with an AUC of 0.95 in BioFINDER (Supplementary Table 12). When selecting only individuals with MCI, we found that plasma P-tau217, APOE, memory and executive function could predict AD dementia within 4 years with an AUC of 0.86, which increased to 0.88 when also including MRI in BioFINDER (Supplementary Table 14). In ADNI, the same variables could predict AD dementia in MCI with an AUC of 0.90 (Supplementary Table 17). Details of these and additional analyses in SCD and MCI populations are shown in Supplementary  Tables 12-19. In addition, negative and positive predictive values for the main models in the whole populations of BioFINDER and ADNI are reported in Supplementary Tables 20 and 21.
Finally, a plasma P-tau181 measure (although with a different assay and analytical platform than for the one used in ADNI) was available in a subsample in BioFINDER (n = 192 for the 4-year prediction). When comparing plasma P-tau217 to this P-tau181 measure for predicting progression to AD dementia within 4 years, no significant differences in predictive accuracies were found (Supplementary Table 22).

Discussion
In this study, we examined how plasma P-tau could be best combined with other easily accessible and cost-effective measures to predict progression to AD dementia, primarily within 4 years, in a heterogenous and consecutively included memory clinic cohort. Although plasma P-tau alone could predict AD dementia accurately within 4 years (AUC = 0.83), the most marked increase in accuracy was seen when it was combined with brief cognitive tests of memory and executive function and APOE genotype (AUC = 0.91; Fig. 1 and Supplementary Table 1). Minor further improvements were seen when also including cortical thickness and plasma NfL (Fig. 1). No significant differences were observed in accuracies when using CSF biomarkers instead of plasma biomarkers (Supplementary Table 7 and 11). Plasma P-tau217 alone and in combination with other variables had significantly higher accuracy than the clinical diagnostic predictions of memory clinic physicians after a comprehensive baseline assessment including medical history, cognitive testing and computed tomography (CT) or MRI of the brain (Fig. 2 and Supplementary Table 4). The generalizability of these predictors was demonstrated by a similar variable selection and performance in the independent ADNI cohort (Fig. 3) as well as the performance in separate SCD and MCI populations, respectively (Supplementary Tables 12-19). Especially the combination of plasma P-tau, memory, executive function and APOE genotype had a robust performance and high accuracy in both cohorts (AUC = 0.90-0.91 in BioFINDER and ADNI; Figs. 1 and 4). This selection of variables was used to create cross-validated models that can be used to predict the individual risk for progression to AD dementia for new cases (Fig. 4 and Extended Data Fig. 5; http://predictAD.app).
Although there has been great progress recently in validating plasma P-tau as a biomarker for AD [14][15][16][26][27][28][29][30] and for individualized prediction of cognitive decline in individuals classified as having MCI 17 , this paper is, to our knowledge, the first to present how plasma P-tau can be combined with other easily available and cost-effective measures for predicting development of AD dementia in patients seeking medical care based on diverse mild cognitive symptoms. To ensure robustness of the models that included plasma P-tau and other measures, the other measures that were examined in this study were based on the literature of cognitive tests sensitive to the cognitive decline in AD 31 have been shown to measure the underlying disease processes in AD at different stages 11,12,16,18 and known demographic and genetic risk factors of AD and cognitive decline 34,35 . Another novelty of the study is the comparison with the clinical prediction (Fig. 2), which shows the true value of implementing plasma P-tau alone or in combination with other measures to improve the diagnostic prediction in clinical practice. Using BioFINDER as the primary cohort of interest had some valuable strengths for determining optimal combinations of tests for use in clinical practice. The population consisted of consecutively recruited patients who had been referred to participating memory clinics, making the cohort heterogenous and representative of a future target population (Table 1). Nonetheless, similar results were obtained in ADNI that consists of a selected population focused on AD. The cognitive span in both cohorts ranged from subjective to objective cognitive symptoms (that is, both SCD and MCI), which best mimics the clinical scenario where physicians would use the combination of tests. Although the commonly used division into individuals with MCI and cognitively unimpaired individuals (SCD and controls) can make sense from a research standpoint-for example, for studying disease mechanisms in AD 36

Fig. 4 | Cross-validation and implementation of an algorithm. a,
Cross-validation of the logistic regression model using plasma P-tau status instead of continuous plasma P-tau levels. Model coefficients were established in BioFINDER (AUC = 0.89) and tested in ADNI (AUC = 0.86). z-scores were inverted so that higher scores equal poorer results. A secondary cross-validated model using continuous plasma P-tau z-scores was also established in BioFINDER (AUC = 0.90) and tested in ADNI (AUC = 0.89) (Extended Data Fig. 5). BioFINDER: n = 297, of whom 70 progressed to AD dementia within the time period. ADNI: n = 376, of whom 86 progressed to AD dementia within the time period. b, Implementation of the logistic regression models at http://predictAD.app, where one can enter the raw cognitive test scores, number of APOE ε4 alleles and plasma P-tau status (from either P-tau217 or P-tau181). Alternatively, a plasma P-tau z-score can be entered for a higher predictive accuracy. Age and education are not part of the logistic regression model but are used to calculate cognitive z-scores. The example shows that the predicted individual risk of progressing to AD dementia within 4 years is 95% (error bars represent the 95% prediction interval) in a 75-year old individual with cognitive complaints who has 12 years of education, one APOE ε4 allele and abnormal plasma P-tau status and scores eight errors on a ten-word delayed recall test (that is, remembers two out of ten words), names seven animals in 1 min and completes the Trail-Making Test B (TMT-B) in 170 s. ADAS-Cog DW recall, 10-word delayed recall test from the ADAS-Cog.
translate into clinical practice. Depending on MCI definitions and use of cognitive tests and cutoffs, those classified as MCI will greatly vary between populations [37][38][39] . And even if a unified definition and set of cognitive tests were determined, the comprehensive cognitive test battery needed would not fit the testing routines-for example, in primary care 25 . This study had limitations. First, plasma Aβ42/Aβ40 was not available in an adequately large sample size in ADNI and could not, therefore, be included in that cohort. However, in BioFINDER, where plasma Aβ42/Aβ40 was available, it was selected only for the prediction within 6 years, and removing it from the model reduced the AUC by less than 0.02. Second, the cognitive tests that were available in both cohorts were limited. This resulted in suboptimal tests for the verbal and visuospatial domains, which could have contributed to their lower accuracy (Extended Data Fig. 2). On the other hand, those representing the memory and executive domains have been extensively validated as sensitive measures of early cognitive decline in AD [31][32][33] . Third, although we did not find a significantly different accuracy between the best-performing plasma-based models (AUC = 0.92) and CSF-based models (AUC = 0.94), we cannot rule out that using a much larger sample size would have identified a significant difference. Fourth, the updated diagnostic criteria for AD are based on a framework of identifying Aβ (A), tau (T) and, depending on stage of the disease, neurodegeneration (N) 36 . From this ATN scheme, we included only a biomarker for T (P-tau) in our online algorithm consisting of plasma P-tau, APOE and cognition. However, in this model, we think that the number of APOE ε4 alleles acts as a proxy for Aβ given their incrementally increased association with Aβ burden 34 . As shown when using CSF instead of plasma biomarkers, CSF Aβ42/40 (which is a better biomarker for Aβ pathology than plasma Aβ42/40), and not APOE genotype, was selected in the model with the lowest AIC (Supplementary Table 7, model 3). In addition, P-tau has also been shown to reflect Aβ pathology (that is, it is not a pure tau biomarker) 4,40 , further supporting that the predictors, including in the algorithm, partly reflect both A and T in the ATN scheme. Regarding 'N' , although it is an important marker for determining the pathophysiological stage of AD, the inclusion of cognitive tests might have overruled its importance in predicting cognitive decline, because cognition itself probably is more closely linked to progression to dementia in individuals who already experience cognitive symptoms. Fifth, a model using estimates based on absolute plasma P-tau concentrations could not be cross-validated, because different P-tau isoforms were available in BioFINDER (plasma P-tau217) and ADNI (plasma P-tau181). Instead, a cross-validated model including binary plasma P-tau (abnormal/normal), memory, executive function and number of APOE ε4 alleles was validated (Fig. 4). This model had high accuracy in the training cohort but, more importantly, also when applied in the validation cohort. Furthermore, we developed a second cross-validated model using continues z-scores of plasma P-tau based on a reference population, which resulted in higher accuracy in the cross-validation analysis (Extended Data Fig. 5; both models are implemented at http://predictAD.app). In the near future, it is likely that several different P-tau assays will be available on the market. The robust cross-validated results found in the present study, despite using different plasma P-tau assays in BioFINDER versus ADNI, opens up the possibility to use the same algorithm for different P-tau assays with similar prognostic information, including high-performing assays for either plasma P-tau181 or P-tau217, provided that either binary or standardized continuous P-tau data are used.
As for the potential diagnostic improvements in clinical practice, our comparison with the clinical-based prediction shows a clear advantage of using plasma P-tau in combination with the other measures (increases in AUC from 0.72 to 0.89-0.92; Fig. 2). The clinical prediction consisted of the baseline assessment of memory clinic physicians, showing the potential improved value at a specialist center. In addition, the presented models showed similarly high accuracy when comparing CSF biomarkers and plasma biomarkers (Supplementary Tables 7 and 11). This suggests that the plasma P-tau models might provide a similar substitute for CSF analyses in settings where these are not accessible or too expensive. Venous puncture (for plasma analyses) is also easier for patients to undergo than lumbar puncture (for CSF analysis). In primary care, the implementation of these models is even more important because of the restricted availability of accurate diagnostic tools and the fact that only 20-50% of patients with dementia are routinely recognized and documented 41,42 . Presuming that primary care physicians make less accurate predictions of future AD dementia than memory clinic physicians, the advantage of using brief diagnostic algorithms based on plasma P-tau in primary care would be even greater. However, the present cross-validation does not warrant an accurate prediction of AD dementia in primary care, and further validation is required in large, unselected and ethnically diverse primary care populations with a lower pre-test probability of underlying AD. Additionally, future steps before clinical implementation of plasma P-tau-based prognostic algorithms include (1) development of clinical-grade plasma P-tau assays approved by the appropriate regulatory authorities; (2) establishment of a standardized protocol for collecting and handling of plasma; and (3) agreement on appropriate use criteria to avoid misinterpretation and misuse of these prognostic algorithms in the clinic.
Besides clinical use, a prognostic algorithm could be used for recruitment of individuals with early AD to clinical trials. The presented models might, therefore, provide substantial cost benefits compared to using CSF analysis or PET to screen for eligible participants, which could speed up recruitment and, hence, facilitate the drug development process of future disease-modifying treatments for AD 32 .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41591-021-01348-z.

Participants.
Participants from the Swedish BioFINDER study (http://biofinder. se; NCT01208675) consisted of consecutively included non-dementia patients with mild cognitive symptoms who were referred to the participating memory clinics. Two hundred and fifty-three (81%) patients were referred from primary care units; 40 (13%) patients were referred from other specialist settings; and 20 (6%) patients were self-referrals. The inclusion criteria were as follows: (1) referred to the memory clinic at Skåne University Hospital or Ängelholm Hospital in Sweden owing to cognitive symptoms experienced by the patient and/or informant. These symptoms did not have to be memory complaints but could also be executive, visuospatial, language, praxis or psychomotor complaints; (2) age between 60 and 80 years; (3) Mini-Mental State Examination (MMSE) score of 24-30 points at the baseline visit; (4) do not fulfill the criteria for any dementia; and (5) speak and understand Swedish to the extent that an interpreter was not necessary for the patient to fully understand the study information and neuropsychological tests. The exclusion criteria were as follows: (1) significant unstable systemic illness or organ failure, such as terminal cancer, that makes it difficult to participate in the study; (2) current substantial alcohol or substance misuse; (3) refusing lumbar puncture or neuropsychological assessment; and (4) the cognitive impairment at the baseline visit could, with certainty, be explained by another condition or disease, such as normal pressure hydrocephalus, major cerebral hemorrhage, brain infection, brain tumor, multiple sclerosis, epilepsy, psychotic disorders, severe depression and ongoing medication with drugs that invariably cause cognitive impairment (such as high-dose benzodiazepines).
After inclusion, the patients were categorized as having either SCD or MCI based on an extensive neuropsychological battery performed at baseline, examining verbal, episodic memory, visuospatial ability and attention/executive domains, as previously described 25 . Patients with domain z-scores of ≤ −1.5 in at least one domain were classified as having MCI. In agreement with Diagnostic and Statistical Manual of Mental Disorders, 5th Edition, (DSM-5) criteria for mild neurocognitive disorders, all patients with composite z-scores of −1 to −1.5 were individually assessed by a senior neuropsychologist and classified as having MCI if the performance was assessed to represent a significant cognitive decline in comparison with their estimated premorbid level. All patients with SCD were required to experience cognitive symptoms (but not necessarily in the memory domain). Note that the cognitive tests used for the SCD versus MCI differentiation were different from those used as predictors in the models.
Participants with at least one follow-up visit and a complete baseline dataset of all variables included in the logistic regression models were selected for this study. See Extended Data Fig. 1 for an enrollment flowchart. The participants were followed longitudinally with yearly follow-ups that included cognitive testing, informant-based activities of daily living (ADL) questionnaires and detailed assessments by physicians experienced in neurocognitive disorders. All patients gave their written informed consent to participate, and the study was approved by the regional ethics committee in Lund, Sweden.
The clinical diagnostic prediction. In a subgroup of patients (those included from the memory clinics in Malmö and Lund), the treating physician at the memory clinic was prospectively registering the most likely underlying cause of the cognitive impairment (here called the 'clinical prediction') in the clinical research form at baseline. The clinical prediction was based on the first visit to the clinic (1.5-h-long visit with the patient and informant), informant-based cognitive symptom (CIMP-QUEST 43 ) and ADL (FAQ 44 ) questionnaires, a broad cognitive test battery (https://biofinder.se/data-biomarkers/clinical-evaluation/) and a CT or MRI scan. The physicians had no access to CSF, plasma and PET biomarker data, because these tests were performed after this initial visit.
Cognitive tests. Brief cognitive tests that were available in both BioFINDER and ADNI were selected to approximately represent different cognitive domains. Trail-Making Test B and verbal fluency (animals) were selected as measures of executive/attention performance based on their validated use in modified preclinical Alzheimer's cognitive composites, which are sensitive in tracking cognitive decline in AD 31,33 . The ten-word delayed recall test from the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) has also been validated for detecting early cognitive decline in AD 31,32 and was chosen for the memory domain. The naming objects and fingers task from the ADAS-Cog was used for verbal performance 45 , and the clock-drawing test was used for visuospatial performance 46 . Each domain was converted to a z-score based on the test score distribution in the present population. See Supplementary Methods for the z-scores used in the cross-validated model. In addition, the MMSE was used as a brief test of global cognition with specific sensitivity to the cognitive decline seen in AD 33,47 .
Plasma biomarkers. Blood samples were collected at baseline and analyzed according to a standardized protocol 12 . Plasma P-tau217 and P-tau181 concentrations in BioFINDER were measured on a Meso-Scale Discovery platform, using an assay developed by Eli Lilly, as previously described 14,16,30 . For the plasma P-tau217 assay, biotinylated-IBA493 was used as a capture antibody and SULFO-TAG-4G10-E2 as the detector (both antibodies developed by Lilly Research Laboratory). For plasma P-tau181 assays, biotinylated-AT270 was used as a capture antibody (Thermo Fisher Scientific, cat. no. MN1050) and SULFO-TAG-LRL (antibody developed by Lilly Research Laboratory) as the detector. Note that plasma P-tau181 was available only in a smaller subsample (n = 192) and that a different platform, capture antibody and detection antibody were used compared with ADNI (in addition to the differences in pre-analytical protocols), making the measured concentrations incomparable to the plasma P-tau181 concentrations measured in ADNI. Plasma Aβ42 and Aβ40 concentrations were analyzed using the Elecsys immunoassays on a cobas e601 analyzer (Roche Diagnostics), and plasma NfL was measured using Simoa, as previously described 10 .
Plasma P-tau in the cross-validated model. Cutoffs were developed to evaluate the generalizability of plasma P-tau-based models across cohorts and assays. The cutoffs were established using an out-of-sample population to further add to the robustness of the results. In our previous work in neuropathology-confirmed cases 14,16 , we observed that changes in plasma P-tau are associated with tau pathology only in the presence of Aβ pathology. This was in contrast to cases with low Aβ where very few cases had elevated plasma P-tau. Furthermore, we found that fewer cognitively normal individuals had increased plasma P-tau levels in contrast to those with MCI and AD dementia. Based on these previous findings, we used Aβ-negative cognitively normal individuals to define an independent normal population and establish cutoffs for plasma P-tau217 and P-tau181 in BioFINDER and ADNI, respectively. In BioFINDER, CSF Aβ42/Aβ40 was used to define Aβ status, and the Aβ-negative sample consisted of 215 healthy controls. In ADNI, 18 F-florbetapir PET was used to define Aβ status and the Aβ-negative sample consisting of 547 healthy controls. The cutoff for plasma P-tau abnormality was set at the mean + 2 s.d. in each sample, respectively, similarly to how plasma P-tau cutoffs have been established previously 16 . This approach, thus, resulted in unbiased, non-optimized cutoffs, which should provide higher reproducibility in other populations 48 . The cutoff was >0.387 pg ml −1 for plasma P-tau217 and >38.2 pg ml −1 for plasma P-tau181 (note that differences in P-tau isoforms, assays and platforms explain the large difference in measured concentrations).
In a secondary cross-validated model, z-scored plasma P-tau data were used. The z-scores for plasma P-tau217 and P-tau181 were established based on the distribution in the above-described Aβ-negative samples the following way: (P-tau concentration − mean P-tau concentration in reference sample) / s.d. of P-tau concentrations in reference sample. Such z-scores can, thus, be obtained from any clinical chemistry lab with a similar reference sample. CSF biomarkers. CSF was collected and handled according to a structured protocol as previously described 1 . P-tau and the Aβ42/Aβ42 were analyzed using the Elecsys immunoassays on a cobas e601 analyzer 49 . NfL was analyzed as previously described 5 .
MRI. The MRI protocol for BioFINDER was previously described 50 . As MRI measure, cortical thickness in temporal brain regions susceptible to atrophy in AD was used (referred to as the ' AD signature' region). The cortical thickness was quantified using FreeSurfer version 5.1 (http://surfer.nmr.mgh.harvard.edu). The AD signature region was calculated based on the cortical thickness in entorhinal, inferior temporal, middle temporal and fusiform regions, as previously described 18 . The AD signature region was chosen instead of hippocampal volume, which performed poorer for predicting progression to AD dementia (data not shown).
Outcomes. The primary outcome was prediction of progression to AD dementia versus progression to any other dementia or not progressing to any dementia within 4 years. Four years was chosen to reflect a clinically relevant timeframe in which it seems reasonable for a physician to give prognostic advice to an elderly patient and also a suitable timeframe for clinical trials to detect differences in conversion to dementia. Those who converted to AD dementia within that timeframe were coded as '1' , and stable SCD/MCI and conversion to any other dementia within the timeframe were coded as '0' . Non-dementia converters with follow-ups <4 years were excluded from this analysis. Conversion to AD dementia within 2 and 6 years, respectively, were secondary outcomes with corresponding selection criteria for the examined population. The follow-up diagnosis was based on the treating physician's follow-up assessments and reviewed by a consensus group including memory clinic physicians and a senior neuropsychologist. The diagnosis was based on the DSM-5 criteria for major neurocognitive disorder due to probable AD. In addition, the patient was required to show signs of abnormal amyloid accumulation according either to CSF analysis 49 or Aβ PET 1 in agreement with the National Institute on Aging-Alzheimer's Association criteria for AD 51 . See Supplementary Methods for further details. Plasma biomarkers were analyzed after the follow-up diagnosis had been determined (that is, neither the treating physician nor the consensus group had access to them).
Validation cohort. The ADNI was used as the validation cohort (NCT00106899). The data used in the preparation of this article were obtained from the ADNI database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adni-info.org. ADNI was approved by the institutional review boards of all participating institutions, and written informed consent was obtained from all participants at each site (see the Reporting Summary for a list of the institutional review boards).
According to our aim, we selected only non-dementia patients with cognitive symptoms at baseline. This included participants with MCI from the MCI cohort and participants from the healthy control cohort who had significant memory concerns (here referred to as 'SCD'). Inclusion/exclusion criteria are described in detail at www.adni-info.org and in Supplementary Methods. Briefly, all patients in this study were between the ages of 55 and 91 years, had completed at least 6 years of education, were fluent in Spanish or English and were free of any significant neurologic disease other than AD. Patients with SCD had an MMSE score ≥24 and a Clinical Dementia Rating (CDR) score of 0 but expressed concerns about memory impairment on the Cognitive Change Index 52 . Patients classified as having MCI had an MMSE score ≥24, objective memory loss as shown on scores on delayed recall on the Wechsler Memory Scale Logical Memory II test, a CDR of 0.5, preserved ADL and absence of dementia. Detailed criteria for SCD and MCI are specified in the Supplementary Methods. All SCD and MCI participants with at least one follow-up visit and a complete dataset of variables included in the logistic regression models were included in this study. The variables used in the model selection were the same as in BioFINDER except for the plasma P-tau biomarker (in ADNI, P-tau181 was available instead of P-tau217).
The plasma-handling procedures were described previously 27 . Plasma P-tau181 in ADNI was measured on Simoa HD-X instruments (Quanterix) at the Clinical Neurochemistry Laboratory, University of Gothenburg. The capture antibody AT270 (MN1050, Invitrogen) was coated onto paramagnetic beads (103207, Quanterix), and the detector antibody (Tau12, 806502, BioLegend) was biotinylated; these reagents were used together with recombinant tau 441 phosphorylated in vitro by glycogen synthase kinase 3β (TO8-50FN, SignalChem) as the calibrator to build the assay, as described previously 15 . Personnel performing the plasma analysis were blinded to the clinical and biomarker information. The analysis was performed after the follow-up diagnosis had been determined. Further assay details were published previously 27 . Plasma NfL was analyzed using the same Simoa-based assay as described for the BioFINDER study. In ADNI, there were two different plasma Aβ42/Aβ40 assays, and each one was available only in very small datasets. Therefore, plasma Aβ42/Aβ40 was not included in the ADNI models. This is further described in the Supplementary Methods. CSF P-tau181 was analyzed using an Elecsys immunoassay on a cobas e601 analyzer, as previously described 49 .
The MRI measure was extracted from structural brain images acquired using 3T MRI scanners with T1-weighted scans. Cortical thickness regions were quantified using FreeSurfer version 5.1 (http://surfer.nmr.mgh.harvard.edu) and combined to the AD signature composite region, as described in the BioFINDER methods.
To create a similar outcome variable as in BioFINDER, participants were deemed converters to AD dementia if they had a follow-up diagnosis of AD dementia 53 and were Aβ-positive according to the Aβ PET scan 54 . Cognitively stable participants and converters to other dementias or Aβ-negative AD dementia were, thus, coded as non-AD dementia converters. See Supplementary Methods for further details.
Statistical analysis. Conversion to AD dementia was used as the dependent variable in logistic regression models. All continuous independent variables were transformed to z-scores based on the distribution in the present population. APOE ε4 genotype was coded into two different variables: presence of just one ε4 allele and presence of two ε4 alleles, as per previously described differences in their risk of AD 32,34 . The initial model selection was performed using the R package MuMIn, which tests all different combinations of variables and then ranks the models according to the AIC. For more information on the AIC, see Supplementary Methods and refs. 19,20 . The model with the lowest AIC was selected as the model with the best tradeoff between fit and model complexity. The next step was then to find models with as few variables as possible that had a similar performance (defined as ΔAIC < 2 from the model with the lowest AIC 19,20,30 ). Therefore, a stepwise removal of variables was performed as long as the ΔAIC was <2 from the model with best fit to end up with the 'parsimonious model' . In addition, only variables with P < 0.10 were kept in the parsimonious model. Then, variables were removed using a stepwise procedure in subsequent models to illustrate the added value of the different variables to plasma P-tau. This process was repeated for the time points 2, 4 and 6 years. Comparisons of AUC were performed using DeLong statistics. A two-sided P value <0.05 was considered statistically significant. R version 4.0 was used for all statistical analyses.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
For BioFINDER data, anonymized data will be shared by request from a qualified academic investigator for the sole purpose of replicating procedures and results presented in the article and as long as data transfer is in agreement with EU legislation on the general data protection regulation and decisions by the Ethical Review Board of Sweden and Region Skåne, which should be regulated in a material transfer agreement. ADNI data are stored (publicly available) in the loni database (https://ida.loni.usc.edu/).

Code availability
No custom code or mathematical algorithm that was central to the conclusions was used in this study.