Data exploration
Baseline characteristics and outcomes
Our investigations were based on data from ADNI, NACC and ROS/MAP/MARS, including 10,390 initially cognitively unimpaired participants with one baseline and at least one post-baseline visit. The observed characteristics and main outcomes of the included participants are given in Table 2.
Table 2. Baseline characteristics and outcomes of initially cognitively unimpaired individuals
|
ROS/MAP/MARS
N=1682
|
NACC
N=8218
|
ADNI
N=490
|
Sex, n (%)
Female
|
1272 (75.6%)
|
5326 (64.8%)
|
253 (51.6%)
|
APOE4 status, n (%)
Homozygote
Heterozygote
Non-carrier
Unknown/not genotyped
|
29 (1.7%)
285 (16.9%)
1033 (61.4%)
335 (19.9%)
|
147 (1.8%)
1747 (21.3%)
4634 (56.4%)
1690 (20.6%)
|
11 (2.2%)
131 (26.7%)
348 (71.0%)
Not included
|
Progressed during the observation period, n (%)
to MCI
to dementia due to AD
to MCI or dementia due to AD
|
462 (27.5%)
290 (17.2%)
497 (29.6%)
|
1307 (15.9%)
569 (6.9%)
1579 (19.2%)
|
80 (16.3%)
25 (5.1%)
80 (16.3%)
|
Progressed within 8 years, n (%)
to MCI
to dementia due to AD
to MCI or dementia due to AD
|
360 (21.4%)
170 (10.1%)
389 (23.1%)
|
1305 (15.9%)
568 (6.8%)
1577 (19.2%)
|
71 (14.5%)
14 (2.9%)
73 (14.9%)
|
Age at study entry in years,
Mean (SD)
|
76.2 (7.4)
|
73.9 (8.2)
|
74.3 (5.8)
|
Age category, n (%)
Below 60
60 to < 65
65 to < 70
70 to < 75
75 years of age or older
|
Not included
56 (3.3%)
359 (21.3%)
358 (21.3%)
909 (54.0%)
|
Not included
1116 (13.6%)
1593 (19.4%)
1776 (21.6%)
3733 (45.4%)
|
4 (0.8%)
10 (2.0%)
91 (18.6%)
168 (34.3%)
217 (44.3%)
|
Years of education
Median (interquartile range)
|
16 (13–18)
|
16 (13–18)
|
16 (14–18)
|
Follow-up in years
Median (interquartile range)
|
6 (3–10)
|
4 (2–6)
|
3.5 (2–5)
|
AD, Alzheimer’s Disease; ADNI, Alzheimer’s Disease Neuroimaging Initiative; APOE4, apolipoprotein E ε4; MCI, mild cognitive impairment; N, total number of cognitively unimpaired individuals who had at least one post-baseline visit and did not have a diagnosis of MCI or dementia due to AD at study entry in the corresponding cohort; NACC, National Alzheimer’s Coordinating Center; SD, standard deviation
|
Exploration of the TTE
We explored data from ADNI, NACC and ROS/MAP/MARS to estimate the 5-year risk of a first diagnosis of MCI or dementia due to AD for the target population of 60- to 75-year-old APOE4 homozygotes of Generation Study 1. The estimation was based on data from 5108 genotyped participants aged 60–75 years out of which 1478 (29%) were heterozygotes and 146 (3%) were homozygotes. The estimated event risk (KM estimate) in the pooled ROS/MAP/MARS, NACC and ADNI populations was higher for APOE4 homozygotes than for heterozygotes and non-carriers, in agreement with previous observations of the increased risk of (an earlier) onset of symptoms of cognitive impairment in APOE4 homozygotes (19): 38% for APOE4 homozygotes, 23% for heterozygotes, and 16% for non-carriers.
Assuming a treatment effect of a risk reduction of 33% (i.e., HR of 0.67), 218 observed events are needed to reach a power of at least 80% based on the Schoenfeld formula (20). A total sample size of 650 with a randomization ratio of 3:2 for active versus control treatment was planned for Generation Study 1 to reach 80% power to demonstrate a treatment effect on the TTE endpoint on its own using a two-sided test and a type-1 error rate of 4% when the last randomized participant reaches the 5-year follow-up time.
Exploration of the APCC score
Cognitive performance over time was investigated using the APCC in the ROS/MAP/MARS cohorts and the APCC proxy in NACC on the three categories of Progressors to dementia due to AD, Progressors to MCI (who did not progress to dementia during the time of observation), and Non-progressors (to MCI, dementia, or both). The mean APCC (or its proxy) score at study entry was 64 in both ROS/MAP/MARS and NACC. Proxies for PACC and RBANS were also investigated. Observed patterns in cognitive decline, shown using standardized z-scores, were similar across the different cognitive composite measures (Fig. 1).
On average, Progressors to only MCI and Non-progressors showed a similar, linear, quite flat course in time, with the only difference between the two groups being that Progressors to MCI (not to dementia) started at a lower cognitive level than Non-progressors. By contrast, the average cognitive decline for Progressors to dementia was not linear, starting at a low rate about 10 years before diagnosis and becoming steeper a few years before diagnosis, mainly during the MCI stage. The LOESS estimate for Progressors to MCI (not to dementia) may have been impacted by a differential drop-out after the diagnosis of dementia due to AD.
APCC, Alzheimer’s Prevention Initiative preclinical composite cognitive; LOESS, locally estimated scatterplot smoothing; MCI, mild cognitive impairment; NACC, National Alzheimer’s Coordinating Center; PACC, Preclinical Alzheimer Cognitive Composite; RBANS, Repeatable Battery for the Assessment of Neuropsychological Status. Trajectories are anchored at the time of diagnosis for Progressors to dementia and aligned by median age of Progressors at the time of diagnosis of dementia for Progressors to MCI and for Non-progressors to MCI/dementia.
Splitting by APOE4 status in NACC (which contains most of the APOE4 homozygote data) using the APCC, PACC and RBANS proxies showed that the shape of the curve did not seem to depend much on the genotype if anchored at the time of diagnosis, with a steep decline in cognition occurring 2–4 years prior to the manifestation of dementia (Additional File 1: Fig. S1). Plotting the APCC data versus age showed that homozygotes started to decline cognitively at a younger age than heterozygotes and non-carriers (data not shown).
Taking all these results together, we concluded that the APCC decline was not linear and was mainly driven by how close the diagnosis of dementia was. When APCC was anchored at the time of diagnosis, the impact of genotype and age was minor. In addition, in earlier stages of the disease (i.e., more than 8 years before diagnosis of dementia), Late progressors behaved very similarly to Non-progressors. We also confirmed that the PACC behaved similarly to the APCC.
Modeling results
Identification and evaluation of the TTE model
The data from ROS/MAP/MARS and NACC (total N=9900, Table 2) that were used to fit the TTE model included a total of 2076 participants who progressed to MCI or dementia due to AD (N=497 from ROS/MAP/MARS and N=1579 from NACC, Table 2).
The model included factors for event type and genotype (Fig. 2). Having the event type as a factor allowed us to estimate both TTE endpoints, i.e., time to diagnosis of MCI or dementia due to AD (whichever is first) and time to dementia due to AD, with the same model.
The candidate models (Weibull, piece-wise exponential, exponential and Gompertz) were investigated and compared visually by genotype and by using AIC. Fig. 2 depicts the predictions of the probability to remain event-free (i.e., no diagnosis of MCI or dementia) from the four models overlaid with the KM estimates based on the observed data by genotype, and shows that all models characterized well the observed data across genotypes. The models also included a submodel for the time to diagnosis of dementia due to AD (but not for the time to diagnosis of MCI). The AIC values for each of the models were 19249 (Weibull), 19070 (piece-wise exponential), 19933 (exponential) and 19517 (Gompertz), favoring the piece-wise exponential and Weibull models over the other two candidates. Since the AIC values were relatively close, we selected the Weibull model because of the higher flexibility and lower complexity of this model compared to the piece-wise exponential, which outweighed the slightly better performance of the latter.
AD, Alzheimer’s disease; AIC, Akaike’s information criterion; MCI, mild cognitive impairment; TTE, time to event. Kaplan-Meier curves: Confidence limits are wide for homozygotes due to the small sample size. Non-genotyped subjects were assumed to be non-carriers.
Candidate individual factors that could explain the between-subject variability of the TTE were identified as APCC at baseline, years of education, APOE genotype, sex, and age at baseline. A backward elimination approach based on AIC was performed. The final model included interactions of event type with APOE4 genotype, age at baseline, years of education, and APCC at baseline. Adding sex to the model did not improve the model fit.
Quality check of the Weibull model for the TTE
A VPC comparing the observed and simulated data showed that the Weibull TTE model fit the data with good accuracy regardless of the type of event, i.e., diagnosis of MCI or dementia due to AD or diagnosis of dementia due to AD (Fig. 3). The adequate fit was confirmed by cross-validation (a model estimated on 50% of the data used to predict observations in the other 50%) (Additional File 1: Fig. S2). It should be noted, however, that the fit was less accurate for the diagnosis of dementia in the homozygote subpopulation, though this was not unexpected considering that it is much smaller than the other genotype subpopulations.
AD, Alzheimer’s disease; CI, confidence interval; KM, Kaplan Meier; MCI, mild cognitive impairment; TTE, time to event; VPC, visual predictive check. Non-genotyped subjects were assumed to be non-carriers.
Identification and evaluation of the APCC models
The data from ROS/MAP/MARS used to fit the APCC models included 536 Progressors and 1352 Non/late progressors with available data on the APCC. Based on the data exploration, a power model was chosen to characterize the cognitive decline as measured by the APCC in the years before and after MCI or dementia diagnosis in the Progressors subpopulation of the ROS/MAP/MARS cohorts.
A linear model was adequate to characterize the time course of the APCC score in the Non/late progressors.
Candidate covariates tested for inclusion in the APCC models were APCC at baseline, APOE4 status (homozygotes, heterozygotes, and non-carriers/non-genotyped), years of education, sex, age at baseline, and age at the time of the first MCI/AD diagnosis. The covariates selected for the final APCC models for Progressors and Non/late progressors are shown in Table 3.
Table 3. Covariates selected for the APCC models
|
Progressors
|
Non/late progressors
|
|
Impacting baseline values*
|
Impacting the progression rate
|
Impacting baseline values*
|
Impacting the progression rate
|
APCC at baseline
|
X
|
X
|
X
|
X
|
APOE4 status
|
|
X
|
|
X
|
Years of education
|
X
|
X
|
X
|
X
|
Age at baseline
|
|
|
X
|
X
|
Age at the time of event
|
X
|
|
|
|
APCC, Alzheimer’s Prevention Initiative preclinical composite cognitive; APOE4, apolipoprotein E ε4. *Covariates impacting the individual’s APCC value 12 years before first diagnosis. Baseline covariates centered around observed medians: APCC logit-transformed and centered around the value 62, age around 74, and years of education around 16. APOE4 status includes homozygotes, heterozygotes and non-carriers/non-genotyped.
The adequacy of the models was assessed via VPCs based on 1000 replications. The simulated data reproduced the decline and variability of the APCC score reasonably well, indicating that the models adequately represent the data (Fig. 4 for progressors and Additional File 1 Fig. S3 for non/late progressors), especially in the time from 10 years before to 2 years since diagnosis. The assessment of the model fit beyond 2 years after diagnosis is difficult due to differential drop-out and it is not within the scope of this study. In addition to the VPC, the NPDE indicated good adequacy of the models (data not shown).
AD, Alzheimer’s disease; APCC, Alzheimer Prevention Initiative Preclinical Composite Cognitive; MCI, mild cognitive impairment. VPC, visual predictive check. Diagnosis of MCI/Dementia: Diagnosis of mild cognitive impairment or dementia due to AD
APCC, Alzheimer’s Prevention Initiative preclinical composite cognitive; APOE4, apolipoprotein E ε4. Both individual predictions and predictions for a “typical individual” take into account all model covariates; individual predictions are further adjusted by estimates of individual variability not explained by identified covariates
Clinical trial simulation
The simulation platform allowed us to generate a large set of virtual subjects defined by their demographic characteristics (see the Methods section). Clinical endpoints were simulated based on the TTE and APCC models described in the previous sections. Since the APCC score at baseline was simulated based on a model including age and years of education as factors, years of education was removed from the factors in the TTE model to avoid co-linearity with the baseline APCC score.
Explorations of endpoints dynamics and dependencies via simulations
As part of the covariates, the distribution of baseline factors such as age in the trial population has a major impact on the event rates and the change in APCC over time. Firstly, we investigated the impact of the parameters of interest using the underlying simulated population based on bootstrapping (n=100 repeats).
Our main objective was to understand the dependency/variation of the event risk at Year 5 with respect to risk factors and to compare this with published results about APOE4 homozygotes (19). Another objective was to explore the factors/dynamics of the change in APCC and the resulting effect size in terms of reduction of APCC decline compared to the control group.
To optimize the enrichment strategy of Generation Study 1, we examined the impact of restricting the age distribution on the power of the trial by investigating a 1:2:2 ratio of age groups [60-65), [65-70) and [70-75] years (mean age: 68.2 years) compared with the expected “natural” ratio of 3:2:1 (mean age: 65.4 years). Table 4 shows the results at the simulated population level (not from simulated clinical trials) for the two assumptions on the age distribution and the selected outcomes of interest. In the older population (1:2:2 ratio of age groups), the median TTE is shorter and the event risk rate at Year 5 is higher (6.5 years and 40% risk for 1:2:2 vs 7.5 years and 34% risk for 3:2:1 for the control group specified by a HR of 1). In comparison, the estimated event risk rate at 5 years, as determined by the KM estimates for homozygotes, was 38% based on pooled data from ROS/MAP/MARS, NACC, and ADNI (mean age: 67.8 years). Thus, if a clinical trial population includes a smaller proportion of younger individuals (no more than 20% in the age range of 60–65 years), the event rate may be higher than expected.
The derived effect sizes in terms of reduction of the change in APCC score from baseline to Year 5 were low, ranging from 0.23 for a HR of 0.60 (40% risk reduction) to 0.13 for a HR of 0.75 (25% risk reduction), even for the older population (1:2:2, Table 4).
Table 4. Endpoint characteristics for two different age distributions within the selected age range
|
Age distribution 3:2:1
|
Age distribution 1:2:2
|
HR
|
Median TTE, Years
|
Event risk at Year 5
|
Effect size of APCC change from BL to Year 5
|
Median TTE, Years
|
Event risk at Year 5
|
Effect size of APCC change from BL to Year 5
|
0.60
|
10.705
|
0.228
|
0.2209
|
9.095
|
0.269
|
0.2319
|
0.65
|
10.045
|
0.243
|
0.1916
|
8.830
|
0.286
|
0.2008
|
0.67
|
10.000
|
0.248
|
0.1827
|
8.500
|
0.293
|
0.1860
|
0.70
|
9.640
|
0.258
|
0.1752
|
8.500
|
0.304
|
0.1765
|
0.75
|
9.230
|
0.273
|
0.1364
|
8.000
|
0.321
|
0.1325
|
1.00
|
7.500
|
0.343
|
0.0000
|
6.500
|
0.399
|
0.0000
|
APCC, Alzheimer Prevention Initiative Preclinical Composite Cognitive; BL, baseline; HR, hazard ratio; TTE, time to event.
|
Clinical trial simulation
We implemented a clinical trial simulation platform to sample participants of clinical trials from the simulated population under various options. The platform allows:
- Sampling of participants of clinical trials controlling for the age distribution within the age range of 60–75 years,
- Varying the recruitment pattern and duration,
- Varying drop-out patterns and probabilities,
- Selecting the total sample size, the ratio between active and control groups, and the follow-up time,
- Calculating power to demonstrate a treatment effect in at least one endpoint based on different analysis methods for the two endpoints (TTE and APCC).
For a fair comparison of power of the dual endpoints, we first simulated 1000 clinical trials with a fixed total sample size of 650, and a randomization ratio of 3:2 for the active versus control arms under simple assumptions: no drop-outs and a fixed follow-up duration of 5 years for each participant. The results for a family-wise type-1 error rate of 5% and different scenarios to distribute the type-1 error (using a simple Bonferroni adjustment for testing two hypotheses for the two endpoints) are summarized for the age distribution of 1:2:2 in Table 5.
Table 5. Power of clinical trial replicates (1000 simulation runs) for the 1:2:2 age distribution
|
Power given the distribution (%) of type-1 error rate for TTE/APCC
|
HR
|
100%/0%
Single primary endpoint: TTE
|
80%/20%
|
50%/50%
|
20%/80%
|
0%/100%
Single primary endpoint: APCC
|
0.60
|
0.959
|
0.952
|
0.943
|
0.914
|
0.798
|
0.65
|
0.877
|
0.855
|
0.822
|
0.777
|
0.653
|
0.67
|
0.840
|
0.819
|
0.785
|
0.735
|
0.581
|
0.70
|
0.752
|
0.723
|
0.693
|
0.645
|
0.525
|
0.75
|
0.581
|
0.556
|
0.499
|
0.439
|
0.337
|
APCC, Alzheimer Prevention Initiative Preclinical Composite Cognitive; HR, hazard ratio; TTE, time to event.
The power of the TTE endpoint alone was above 75% for a HR of 0.70 to 0.60 (two-sided log-rank test). The overall power was only slightly lower for the 80%/20% distribution of the alpha between TTE and APCC. The power for the APCC endpoint alone (two-sided t-test) was consistently lower compared to the power for the TTE endpoint alone. Also, the overall power was lower for the 20%/80% distribution of the family-wise type-1 error rate between TTE and APCC compared to the 80%/20% distribution.
Based on the simulation results, we incorporated the 1:2:2 age distribution in the design of Generation Study 1, thereby restricting the recruitment of the lowest age group (60–65 years) to 20% of the target sample size. For the primary statistical analysis, an initial distribution of the family-wise type-1 error rate was set to 20% for testing the primary hypothesis on the APCC and to 80% for the TTE within a graphical procedure. This approach comprised the dual endpoints as well as the key secondary endpoint of Generation Study 1 (Clinical Dementia Rating Scale - Sum of Boxes [CDR-SOB]) and was adjusted for testing multiple endpoints, and to allow alpha propagation after rejection of the null hypothesis for one endpoint to another. In addition, to further increase the power of the TTE endpoint, a variable follow-up time of 5 to 8 years was planned.
Additional simulations tailored to the final design were implemented to assess the power under realistic assumptions regarding recruitment and drop-out patterns (data not shown). The overall power at the projected analysis time was acceptable (range from 75% to 96%) for HRs of 0.70 to 0.60, which correspond to a risk reduction of 30% to 40%.