There is a pressing need to improve the accuracy of clinical trial endpoints in rare diseases.1 Previous research has addressed this by assessing the neutrality of clinical endpoints in rare diseases using the DSS as a surrogate for the Neutral list.4 However, some diseases had more than one DSS, meaning that the neutrality of any study measured against them may vary over time as a function of the body of knowledge on a disease. We expected that the neutrality of the fixed sample of clinical studies would decrease over time, as the body of knowledge (operationalised here as the number of indicators) increased. The number of indicators in the surrogate Neutral list for all diseases increased over time, but we found that the neutrality of clinical studies for one subgroup of diseases increased over time, while it decreased for another subgroup. This suggested that the neutrality of clinical studies changed in different ways with respect to the body of knowledge.
In the first subgroup of diseases, the mean neutrality of clinical studies was higher when measured against the composite DSS than the first DSS, and this appeared to be driven by an increase in sensitivity. That is, as the number of indicators in the surrogate for the Neutral list increased over time as a function of growth in the body of knowledge, clinical studies included a greater proportion of its indicators. This was not merely a function of the higher number of indicators increasing the probability of a match between the composite DSS and clinical studies, because we did not find this pattern across all diseases. Rather, this suggested a convergence of knowledge whereby DSS indicators generated through scientific research over time also showed up in the group of clinical studies assessed. We assumed that the sample of clinical studies would be static in terms of growth in the body of knowledge; however, it contained research spanning many years, so the body of knowledge that informed the composite may also have informed a proportion of the clinical studies in the sample. In the study of scientific epistemology (for example, Peirce’s convergence of truth and the mathematics upon which it is based), a convergence of knowledge on a construct observed alongside an increase in sample size can be taken as a sign of the validity of the knowledge generated.13,14
The convergence of knowledge between DSSs and clinical studies in the first subgroup suggested that the disease phenotype operationalised by the indicators shared between them tended towards a more accurate representation of the theoretical Neutral list over time, producing more accurate measures of disease severity. In the second subgroup of diseases, we observed a decrease in neutrality, which we expected under the incorrect assumption that clinical studies would be unaffected by the increasing body of knowledge. Given our findings in the first subgroup, this can be better interpreted as a divergence of knowledge. If the clinical and DSS studies were methodologically sound, then this divergence may be fertile ground for hypothesis building and further knowledge generation.15 Further research may examine qualitative differences between indicators in each disease, as our findings suggested that DSSs in the first subgroup were more likely to contain indicators that were specific, measurable and objective and that were pathophysiological as well as behavioural and psychological. As we measured clinical studies as a homogenous group and did not separate them out into two timepoints, our findings cannot suggest that the changes in neutrality found represented a specific relationship between neutrality and time within the diseases studied. However, our methods were sufficient to demonstrate that changes in how the Neutral list is operationalised over time affect the accuracy of clinical trial disease measurement and that this must be accounted for during the selection of endpoints.
The mean sensitivity of clinical studies for encephalitis remained constant at zero, suggesting that clinical studies included no indicators relevant to disease severity as defined by the DSS at either time point, suggesting a divergence of knowledge regarding the operationalisation of disease phenotype between clinical studies and DSSs. The Clinical Assessment Scale for Autoimmune Encephalitis is the only disease-specific DSS developed and validated for use in patients with encephalitis.16,17 Other DSSs for encephalitis included here were designed and validated to measure status epilepticus, a single intracranial complication that only covers part of the disease phenotype.18–20 This could account for the lack of overlap of indicators between clinical studies and DSSs at both timepoints. Additionally, heterogeneity of disease phenotype between and within rare disease subtypes, as is found to a great degree in encephalitis,21–23 may affect the neutrality and standardisation of disease-severity measurement in clinical trials.9
The inaccurate measurement of disease severity in clinical trials may result in patient misclassification.3,9,10 We measured the impact of neutrality via its components, sensitivity and specificity, on the probability of detecting false negative and false positive results at different disease prevalence rates. In a clinical trial setting (20% prevalence rate), in many diseases, the probability of a false positive was equal to one (the classification of a patient as ‘severe’ when they are ‘not severe’). If these disease-severity measures were used as inclusion criteria for trials, our findings suggested a high probability of including patients outside of the target population. Additionally, in many diseases, specificity was equal to zero, meaning that all indicators observed in clinical studies were irrelevant to disease severity. The detection of a treatment effect in these cases could result in the licensing of a medicine with little clinical significance to patients. If no treatment effect was detected, then trials may be abandoned, and effective medicines may be rejected at the regulatory stage, meaning that potentially life-changing medications may fail to reach patients, which is a recurrent problem in rare disease clinical trials and may be attributed to lack of neutrality in endpoint selection.24,25 Further, for these diseases, outcomes of relevance to disease severity may be underrepresented in the body of research, so patients may not benefit from ongoing evidence generation regarding the problems they deal with in their day-to-day lives. We observed a similar pattern of data at all prevalence rates that became more pronounced as prevalence increased. This was in line with previous findings and gave confidence in our results.4
Limitations
First, we assumed that the DSS was a surrogate for the Neutral list, as it was the most accurate representation of the disease phenotype available. However, the Neutral list is an empirically unattainable theoretical concept.5 This is likely to have resulted in an over-estimation of the neutrality of clinical studies in this study than if a ‘true’ measure of neutrality was made. Second, Neutral theory assumes that indicators are independent of each other; however, associations may exist between indicators to varying degrees. Finally, we did not control for the effect of time of publication of clinical studies, which may be reasonably expected to affect the number of indicators they shared with the surrogate Neutral list to some degree (clinical studies published before composite DSSs may be less likely to contain their indicators, although this is not guaranteed, as DSSs are generated based on existing bodies of knowledge shared by those who conduct trials). The variation in the year of publication of DSSs between diseases was not suggestive of a confound in respect of the effects noted in this study, and most DSSs were published between 5 and 10 years before the analysis.