Vanishing White Matter
Vanishing White Matter (VWM, OMIM #603896) is a rare leukodystrophy caused by biallelic pathogenic variants in any of the five genes EIF2B1-5, encoding the five subunits of eukaryotic initiation factor 2B (eIF2B). The only known epidemiological data come from the Netherlands, where the incidence is approximately 1:100,000 live births and the prevalence is approximately 1.3:1,000,000 inhabitants (1). The incidence and prevalence in other countries is likely similar based on studies of genomic databases (2, 3), making it an orphan disease. VWM is characterized by chronic decline with stress-induced episodes of rapid decline, followed by death or partial or complete recovery. Patients develop motor and cognitive disabilities and die after years of progressive handicap (1). The diagnosis is made when patients present with neurological decline or when family screening is performed because of an affected sibling. An MRI pattern of brain white matter rarefaction prompts genetic testing to confirm the diagnosis (4). The age at manifestation varies from antenatal period to adulthood. Earlier onset is associated with a more severe phenotype (1, 5).
Two-thirds of the patients have disease onset < 6 years of age; they most often experience rapid neurological decline with a short life expectancy (1). For patients with onset after 6 years, the disease course is more variable and often more protracted (1). Life expectancy is shortened, but patients may survive for decades (1). Because survival is much shorter for patients with onset before 6 years than for patients with later onset, two-third of the living patients is 18 years or older and living patients below 6 years are scarce (3). In children with VWM, disability is dominated by motor signs (ataxia and spasticity); in adults, the disease is dominated by changes in behavior and cognition (1). Currently, there is no cure for VWM. Treatment includes supportive care and avoidance of provocative stressors, such as head trauma and fever (6).
Over the past decade, increased understanding of pathophysiological mechanisms has provided insight into opportunities for therapy. A deregulated integrated stress response (ISR) is the driving pathomechanism of VWM (7). Modulation of the deregulated ISR improves the phenotype of VWM mice (7, 8). The ISR can be targeted on several levels. Numerous compounds affecting the ISR have been identified: compounds reducing endoplasmic reticulum stress (chaperones, e.g. ursodiol), modulating eIF2α phosphorylation, modulating eIF2B phosphorylation (GSK3β inhibitors e.g. trazodone and lithium), activating eIF2B (ISRIB, 2BAct and other eIF2B activators) (7, 9), targeting GADD34 (guanabenz, sephin1, salubrinal) (7, 10), inhibiting ATF4, and modulating factors downstream of ATF4 (3). This means that there are multiple drugs with strong therapeutic potential for VWM.
The year 2021 marked the first therapeutic trial in VWM, an open label phase 2 drug repurposing trial to investigate Guanabenz (8, 11, 12). Several novel drugs targeting the ISR are under clinical development for neurological indications. Some compounds are currently tested or about to be tested in Amyotrophic Lateral Sclerosis (13, 14). Such compounds are also of interest in VWM. With emerging new treatments, the field of VWM faces new research and regulatory challenges.
The international VWM registry (3) with world-wide data collection over 20 years currently comprises over 400 genetically confirmed VWM patients, of whom approximately 250 are alive. Dealing with extremely limited patient numbers, a highly heterogeneous patient population, a complex disease course with chronic as well as relapsing-remitting decline, with soon irreversible brain damage, makes drug development challenging. An additional complication is that no validated biomarkers are available that correlate with disease progression. The problematic scarcity of eligible trial candidates will be further worsened if simultaneous trials compete for patients. These issues make classical randomized clinical trials (RCT) virtually impossible in VWM.
To accelerate progress in VWM therapy trial development and delivery, the Vanishing White Matter Consortium (VWM Consortium, www.vwmconsortium.org) was founded, an academia-led collaboration of international VWM experts (3). The VWM Consortium has published on trial development in terms of trial design, definition of more homogeneous clinical subtypes and phenotype-adapted outcome measures (3). In particular, to optimize trial efficiency in view of the low number of patients eligible for trials, and to minimize patients on placebo in view of the high unmet medical need, the VWM Consortium felt that placebo control data must be shared (3). In the current paper, we develop a core protocol, functioning as a template for trials in VWM, facilitating sharing of control data, while allowing flexibility regarding trial details.
Innovative trial design
Over the past decade, growing attention has been paid to innovative trial design. Various alternatives to traditional RCTs, such as basket, umbrella, and platform trials, are well established in oncological diseases and, more recently, COVID-19 (15–17). Such designs enable testing multiple investigational medicinal products (IMPs) or multiple conditions for a single IMP simultaneously, in order to improve trial efficiency and enhance drug development (18, 19). Basket trials are designed to test an IMP in different conditions or disease subtypes (17, 20). Umbrella trials are used to study multiple IMPs in a single condition (17, 20). Platform trials include features of both basket and umbrella trials and can be used to investigate multiple IMPs in multiple diseases, disease stages, or disease subtypes (21). The master protocols of innovative trial set-ups can be adaptive and, for example, the arm with the highest benefits may gain priority in randomization (15, 22, 23).
Regulatory agencies have expressed interest in master protocols. The Food and Drug Administration (FDA) published a guideline on master trial protocols for oncological diseases (18) and is co-founder of the Clinical Trials Transformation Initiative (CTTI) (24). In the recently revised ‘Guideline on the clinical evaluation of anticancer medicinal products’, the European Medicines Agency (EMA) advises for the first time on master protocols (25). Further, the European Heads of Medicines Agencies launched a working group ‘Clinical Trial Facilitation and Coordination Group’ (26).
All trial set-ups with a master protocol presume predetermined IMPs, conditions, or biomarkers at the start of the trial (27), although amendments with new trial arms with new IMPs and new diseases are possible. The master protocols contain information on the IMPs and conditions investigated and are usually submitted and registered as a single clinical trial. Currently, guidelines for designing and evaluating master trial protocols are mostly focused on cancer (18, 25). However, the oncology field differs from the field of VWM and other rare disorders in several respects. The total numbers of patients are higher and outcome measures are generally more uniform. Examples are survival and relapse-free survival. The organization and logistics of a platform trial with multiple sponsors is challenging (18). In principle, the set-up is the same for the different sponsors; therefore, the use of a single master protocol increases the number of amendments needed and already complicated trials become even more difficult to manage during execution (28).
We suggest the use of a limited core protocol as an innovative trial design for orphan drug development in VWM, facilitating pooled control data as key gain, while at the same time allowing flexibility in trial details and operational efficiency for sponsors. This study presents the design and implementation of a core protocol for clinical trials in VWM.
Rationale
The core protocol is designed as a template for phase 2/3 clinical trials in VWM with the purpose to collect safety, tolerability, and efficacy data for marketing authorization (including conditional approval, exceptional circumstances, and accelerated approval) and economic evaluation and benefit-risk assessment as part of health technology assessment (HTA). We combine phase 2 and phase 3 to use the low number of eligible trial candidates efficiently. The template establishes core features for separate trials including a description of the study population, age-specific endpoints, inclusion and exclusion criteria, study schedule of assessments, randomization plan, sample size determinations, and a statistical analysis plan. The core protocol comprises a fixed part, to which participating sponsors are obligated to comply. In addition, there is a flexible, unconstrained part, in which other details, such as pharmacokinetics, pharmacodynamics and biomarkers, can be added (Fig. 1). The sponsor is free to choose the primary outcome measure. Each trial arm is executed by a sponsor with its own Contract Research Organization (CRO). So, the design harmonizes clinical trial execution in VWM, but does not operate as a single multi-arm multi-stage trial, such as a platform trial. The use of this core protocol enhances efficiency in assessing new therapeutic agents in VWM, because uniformity across trials allows the pooling of data from the control arms and comparison of treatments.
Three age-specific protocols
Disease severity, progression, and manifestations in VWM vary widely for different ages and ages of onset (1), preventing development of a suitable single trial protocol for all patients. Definition of more homogeneous subpopulations is a crucial part of the study design. In alignment with the natural history study (1), we made three separate core trial protocols for: (I) adult patients (current age ≥ 18 years) who have predominantly behavioral and cognitive decline, (II) children (current age ≥ 6 to < 18 years) who have predominantly motor manifestations and (III) young children (current age < 6 years) with very rapid and severe neurological decline. Double-blind, randomized, placebo-controlled trials are the preferable trial design and ethically acceptable for protocols I and II. However, for protocol III, because of the rapid disease worsening and early development of severe and irreparable white matter damage, open-label design is preferable (3). Over time, placebo controls and historical controls can be replaced by controls receiving the first effective therapy.
Core protocols I and II are very similar since motor, behavioral, and cognitive assessments are part of both protocols. Predominant cognitive or motor decline in older and younger patients respectively may impact the choice of primary outcome measure. The only difference between core protocols I and II is in the details of the neuropsychological assessment, which depend on current age.
The reason for choosing current age of 6 to distinguish between protocol II and III can be explained by the natural history (1). For younger VWM patients, the natural history study demonstrated that the rate of decline is consistently faster for onset below 4 years than for later onset and correlates with the exact age of onset. For onset from 4 years on, the rate of decline is similar for different ages of onset, with the exception of cognitive decline, which is faster for onset ≥ 18 years. However, some patients with age of onset before 4 years have a slower disease course with milder neurological decline and longer survival (1). Thus, we have established a criterion of mild to moderate neurological handicap at the current age ≥ 6 years to distinguish between the patients with fast regression compared to slower regression. Patients with onset < 4 years but slower regression can thus be eligible for protocol II. Mild to moderate neurological handicap at current age ≥ 6 is therefore the main inclusion criterion for protocol II.
Based on the numbers of known available participants, the consortium recommends to start trial execution in patients ≥ 18 years, followed by patients ≥ 6–18 years. The number of currently living patients aged < 6 years is extremely low and few patients are known to the consortium (1), hampering trial development.
Trial eligibility
Predicting VWM phenotype based on genotype is only possible to a limited extent (29). There are many different genotypes and many patients have private gene variants. Certain genotypes are associated with mild or severe phenotypes (5, 30), but there is considerable intrafamilial variation in disease course (1). Therefore, when patients are diagnosed genetically, but do not have neurological signs, disease onset and course are uncertain. It is then impossible to assess efficacy of a new investigational treatment. Therefore, neurologically pre-symptomatic patients are not included in the trial protocols. White matter abnormalities on MRI can occur years before neurological manifestations and do not count as neurological manifestations. Also, ovarian failure does not count as onset of neurological disease.
Inclusion criteria are determined in terms of clinical functioning to make sure that there is potential to show stabilized disease (3). It is known from other leukodystrophies that treating in early disease stages is crucial for good outcomes. For example, hematopoietic stem cell transplantation, as used in metachromatic leukodystrophy, X-linked adrenoleukodystrophy, and Krabbe disease, is able to slow or halt disease progression when applied very early in the disease (31). Previously established inclusion criteria for VWM trials comprise ambulation without or with minimal support and reasonable cognitive function (3). Perceptual IQ was previously found to be a better predictor of outcome than verbal IQ in X-linked adrenoleukodystrophy (32). So, perceptual IQ and ambulation are used as inclusion criteria for trial protocols I and II. For the often severely handicapped patients < 6 years of protocol III, assessment of perceptual IQ is difficult and the criterion of ambulation would exclude almost all patients. To facilitate comparison with historical controls, Health Utility Index (HUI) scores, as collected in the natural history study (1), are used as inclusion criterion.
Statistical considerations
The heterogeneous disease course and rarity of VWM pose statistical challenges. For this proposal, calculations are based on a minimum study duration of 2 years and the assumption that the IMP stops disease progression. To enhance the power, we propose open-label extension after 2 years until the last patient entering the trial has completed the 2 trial years, with a final assessment for all patients at the end; so if inclusion takes 1 year, the first patients have a trial duration of 3 years. We did not include this recommendation in the power calculations. Of course, if improvement occurs, power calculations should be revised and lower patient numbers are needed. Each sponsor using the core protocol will need to determine a separate statistical analysis plan because the planned sample size and randomization ratio depend on the primary outcome chosen and the associated clinically relevant effect.
The relapsing-remitting disease course combined with a chronic deterioration of VWM imposes additional constraints on data analysis. Episodic deteriorations introduce noise in the data, for which analysis should be corrected. One of the solutions is to separate the mean trend from intra-personal variance by increasing the frequency of measurements. It is, however, necessary to cap the total number of measurements to limit the burden on patients. In this core protocol, patients are assessed at least 6 times: at baseline and 3, 6, 12, 18 and 24 months.
Sample size and level of power
Considering the very low number of available VWM patients, overall as well as per age group, a classically powered RCT is not considered feasible in VWM. The conventional statistical analysis of a double-blind RCT is based on a power of 80% with two-sided testing, testing, and an alpha of 0.05. Strategies for substantially reducing sample sizes include setting the power to 60%, replacing two-sided testing with one-sided testing, and increasing alpha to 0.10. Such settings are common in phase 2b trials and are considered acceptable for ultra-rare diseases (33).
For this proposal, historical data on ambulation and HUI scores, collected in the VWM natural history cohort (1), are used for sample size calculations. Single attribute scores, describing one domain of function, range from 1 (best) to 0 (worst). The HUI generic score is calculated based on the scores of all domains and ranges from 1 (best) to -0.5 (death) (1). To estimate the sample size from historical cohort data, we assume that the HUI scores follow a normal distribution, that HUI scores are linear in the time since the first HUI assessment, and that the intercept and slopes of the linear trends vary across patients.
In VWM patients ≥ 18 years, cognitive decline is the most prominent symptom and therefore used for power calculations. We chose a HUI cognition score of > 0.32 as minimum to match the inclusion criterion IQ ≥ 50(-40). Based on data from the VWM natural history study (1), the annual decline in the single-attribute HUI cognition score in patients with first assessment at age ≥ 18 years and a baseline score > 0.32 was estimated to be 0.03 points with a residual (within-subject) variance of 0.018, a between-subject intercept variance of 0.009, a between-subject slope variance of 0.001, and a between-subject intercept-slope covariance of 0.001. The standard deviation of the sample of estimated individual slopes was obtained by parametric bootstrapping in a setting with 6 HUI assessments, where data were generated under a mixed effects model with random intercept and slope. The standard deviation of the individual slopes was 0.086. Under the assumption that the experimental treatment stops the cognitive decline, the standard 1:1 randomization design with a power of 0.8, two-sided testing, and alpha of 0.05 requires 130 patients per arm to demonstrate a difference in mean 2-year cognitive decline relative to placebo. The total number of 260 for one trial is not feasible and precludes multiple parallel trials. A sample size of 60 patients per arm would suffice to demonstrate a difference in mean 2-year cognitive decline relative to placebo with 60% power assuming one-sided testing at a 5% significance level. If the significance level is increased to 10%, the number of patients per arm decreases to 40. If only 30 patients are available per arm, then 60% power is achieved when the experimental treatment does not only stop the cognitive decline but leads to a small increase in standardized cognition score of 0.005 points per year. A randomization ratio different from 1:1 enables experimental treatment for a larger proportion of patients. For instance, if the randomization ratio in patients with first assessment ≥ 18 years is set at 2:1, the experimental treatment stops cognitive decline, and the significance level is 10%, the power decreases by only 3% at a sample size of 40 per arm.
In VWM patients ≥ 6 - <18 years, motor decline is most prominent and used for power calculations. We used a HUI ambulation score of > 0.16 to match the inclusion criterion of walking ≥ 10 steps without support or with light support of both hands (GMFM-88 item 67). Based on data from the VWM natural history study (1), the average annual decline in the single-attribute HUI ambulation score in patients with first assessment at age ≥ 6 - <18 years was estimated to be 0.029 points with a residual within-subject variance of 0.008, a between-subject intercept variance of 0.059, a between-subject slope variance of 0.001, and between-subject intercept-slope covariance of 0.001. The parametric bootstrap estimate of the standard deviation of the individual slopes was 0.062. Under the assumption that the experimental treatment stops the motor decline, the standard 1:1 randomization design with a power of 0.8, two-sided testing, and alpha of 0.05 requires 73 patients per arm. The total number of number 146 patients is not feasible. A sample size of 34 patients per arm would suffice to demonstrate a difference in mean 2-year motor decline relative to placebo with 60% power assuming one-sided testing at a 5% significance level. If the significance level is increased to 10%, the number of patients per arm decreases to 22 per arm. If the therapy is expected to increase performance on the ambulation score by 0.01 per year, then the number of patients further decreases to 13 per arm. If the randomization ratio in patients with first assessment between 6–18 years is set at 2:1, the significance level is 10% and the experimental treatment stops motor decline, the power decreases by only 3% at a sample size of 22 per arm.
For patients < 6 years of age, a single-arm open-label study using historical controls for comparison has been recommended previously (3). The best data available for this age group is the HUI generic score (3). We chose a HUI generic score of > 0 to exclude patients with a very low level of functioning. Based on data from the VWM natural history study (1), the average annual decline in HUI generic score in patients with first assessment < 6 years and a baseline HUI score > 0 is estimated to be -0.054 points with a residual within-subject variance of 0.018, a between-subject intercept variance of 0.086, a between-subject slope variance of 0.002, and between-subject intercept-slope covariance of -0.001. The parametric bootstrap estimate of the standard deviation of the individual slopes was 0.092. Superiority testing is proposed where the null hypothesis is rejected when the average HUI score change in the experimental group is higher than a superiority margin equal to the average change plus 1.28 times the standard error in historical controls. The superiority margin is equal to the 90% upper confidence bound of the average change in HUI score in historical controls. The standard error of the change in HUI generic scores in the historical controls is estimated at 0.012 under the assumption of a linear mixed effect with a random intercept and slope. Under the assumption that the experimental treatment stops further decline, a one-armed superiority design with a power of 0.8, one-sided testing, and alpha equal to 0.05 requires 36 patients, which is not feasible. A sample size of 22 patients would be required to demonstrate a difference in mean 2-year HUI decline relative to historical controls with 60% power at a 5% significance level. Increasing the significance level to 10% leads to a sample size of 14 per arm. The choice of the superiority margin influences the sample size considerably. If the margin is increased to 95% confidence bound of the average change in HUI score in historical controls, then in order to achieve 60% power 28 patients are required at 5% significance level and 19 patients at 10% significant level. If it is expected that the therapy results in an annual increase in HUI generic score of 0.01, then only 10 patients with baseline HUI score > 0 are needed to achieve 60% power assuming one-sided testing at a 10% significance and a 90% confidence bound superiority margin.
The patients eligible for protocol III overlap with those eligible for the currently ongoing Guanabenz trial, which includes still ambulant patients with disease onset < 6 years (11). The outcome measures are mostly the same. If interim analyses show efficacy of Guanabenz, comparison with historical control data can be changed into comparison with Guanabenz-treated patients.
Alternative approaches
Adaptive enrichment design was considered, as it allows the eligibility criteria to be updated during the trial. However, this results in altering inclusion criteria, which is not compatible with the data-sharing goals of the core protocol.
A Bayesian approach, a methodology based on continuous learning in which observed data prompt adaptations in the probabilities in the statistical model (34, 35), has been used in innovative trials to reduce the sample size needed and increase trial efficiency (16, 23, 33). However, for the core protocol we considered this suboptimal because of limited available prior information on the primary outcome in the control group. When this prior information is lacking, it is not likely that a Bayesian approach will increase power and it is preferable to choose a classical hypothesis testing approach with predefined criteria for null-hypothesis testing (35), at a type I error of 0.10. After multiple trials have been completed, Bayesian methods can be used to inform a new study with the outcomes of historical controls by means of a power prior. There are static and dynamic versions of power priors where dynamic means that the level of borrowing from historical controls depends on the agreement between current and previous trial data (36, 37). In practice, the value of historical controls is limited by differences between trials in design and operationalization. Hence, the best way of sharing controls is to combine different trials running at the same time using the core protocol, the data of which can be analyzed with both frequentist and Bayesian metrics (38).
Outcome measures
Key considerations in the selection of outcome measures were that (a) the collection of outcome measures should be practical and trial burden should be minimized to reduce the risk of dropouts and missing data; (b) the set of outcome measures should cover different functional (motor, cognitive and behavioral) domains; and (c) the assessment tools can preferably be used across broad age ranges and different levels of functioning.
Most of the clinical endpoints were previously published and were selected based on a real-time Delphi consensus procedure (Table 1) (3). However, we identified gaps requiring several additional tests. Again, the consortium used a consensus-based approach for test selection. To assess hand function, we added the 9-hole peg test (39). Because extensive neuropsychological tests, such as the Wechsler Adult Intelligence Scale (WAIS), are often not feasible in cognitively disabled patients (3), we included only WAIS subscales to assess perceptual IQ (which also serves as inclusion criterion). Adding the full WAIS is optional. The Montreal Cognitive Assessment (MoCA) was added as a brief cognitive test frequently used in adults. Because processing speed is frequently affected in leukodystrophies, the consortium voted for adding the Trail Making Test part A (40). The Clinical Global Impression (CGI) (41) and the Caregiver Global Impression (CaGI) rating scales (41), developed by the National Institute of Mental Health, were added as standardized assessment to evaluate treatment effects. Finally, the Food and Drug Administration considers the Columbia-Suicide Severity Rating Scale (C-SSRS) (42) obligatory in protocols for patients > 6 years of age (43).
Time of death is highly dependent on decisions of parents and health care providers and survival was therefore not considered a reliable endpoint. Because of the extreme variability in the stress-provoked episodes of decline, they were also not considered a useful endpoint.
Recently published work in leukodystrophies has revealed the additional value of quantitative biomarkers in the form of serum neurofilament light chain (44), and for VWM specifically quantitative MRI measures (4). In the current core protocols, the MRI pulse sequences chosen allow segmentation and quantification of normal gray and white matter, rarefied and cystic white matter, and cerebrospinal fluid (CSF) (4). Serum neurofilament light chain is part of the core protocol, but no other body fluid biomarkers have been included. Lumbar punctures are not part of the core protocol and are considered optional. We recommend that clinical trials should identify non-CSF biomarkers as much as possible to reduce trial burden. In order to facilitate sharing of control data from different trials, adequate serum and plasma (30 ml blood for patients ≥ 6 years) should be collected at all trial visits. The shared controls samples should in part be stored at a central CRO or a laboratory employed by this CRO, so that subsequent trials have the opportunity to compare their biomarker results.
Importantly, the primary, secondary, and exploratory outcome measure(s) are decided by the sponsor. It is optional to define composite outcome measures. It is up to the sponsor whether additional outcome measures are collected, including patient-defined outcomes. The sponsor defines which change in primary outcome measure will be considered significant. Thus, for the sponsors of each trial executed using this core protocol, there is significant remaining flexibility.