Quantifying Benefit-Risk and Addressing Missing Data in Late-Stage Clinical Trials

doi:10.21203/rs.3.rs-3221975/v1

Download PDF

Method Article

Quantifying Benefit-Risk and Addressing Missing Data in Late-Stage Clinical Trials

https://doi.org/10.21203/rs.3.rs-3221975/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

It is important in clinical trials to quantify the net benefit and reliably estimate the effect of treatment when some patients dropout prematurely. The current approach to assessing net benefit is subjective, and methods for handling missing data are inadequate or not easily accessible by those who are not subject matter experts. We investigate a method for quantifying benefit-risk and addressing missing data. For benefit-risk, an important set of efficacy and safety endpoints is hierarchically organized. Each pair of patients, one from each group, is compared on the first hierarchically arranged endpoint to determine which patient fared better (‘wins’). If the comparison results in a win, the remaining endpoints are not compared. If it is tied, the pair is compared on the next hierarchical endpoint, and so on. The results of all such pairwise comparisons are combined as a win ratio: #wins / #losses. It is proposed that the analyses are presented sequentially and cumulatively, so that the contribution of each component to benefit-risk can be assessed. For missing data, the reason for and timing of missingness are incorporated in the pairwise comparison. For example, if both patients in a pair dropout due to adverse events, the patient who dropped out later has won. Thus, the reason for and the timing of missing data are explicitly incorporated. The Phase 3 PATENT-1 study evaluating riociguat in pulmonary arterial hypertension and patient preferences of benefit vs risk for renal cell carcinoma is used for illustration of the methods. The methods proposed provide a more objective way to assess benefit-risk and address missing data.

Biostatistics

Benefit-risk

Mann-Whitney method

Missing data

Win ratio

Two important topics relevant to late-stage clinical trials are addressed: quantifying the benefit of an intervention relative to its risk of adverse events (AEs) and handling of missing data. The assessment of whether benefit outweighs risk is subjective. Consider the case of lecanemab. It was authorized under the U.S. Food and Drug Administration’s (FDA) accelerated approval pathway for the treatment of Alzheimer’s disease. The primary endpoint was change from baseline at 18 months in a measure of cognition in patients with early Alzheimer’s disease. Key secondary endpoints were related to amyloid burden and other measures of cognition and function. The results for the primary and key secondary endpoints were statistically significant. However, the rates of amyloid related imaging abnormalities (edema/effusions or hemosiderin deposits), AEs of interest, were higher on lecanemab (21.5%) than on placebo (9.5%)¹. What is the process of deciding if the net benefit is positive? How does one weigh the statistically significant result on efficacy endpoints with the higher rates of certain AEs? The criteria are not pre-specified, and the decision-making process is subjective and reactive. We make a proposal to quantify the benefit vs risk tradeoff. It has the additional advantages of making the decision-making step transparent and prospective.

Missing data occurs when patients do not complete their scheduled assessments usually due to early withdrawal. The reasons for premature termination of patient participation are diverse and can comprise the experience of AEs as well as withdrawal of consent or being lost to follow-up. As has been noted, the ‘reliability of results from clinical trials can be substantially reduced by missing data.’². The following three examples of Phase 3 trials each expose different problems with missing data handling methods.

The first is a trial in sickle cell disease patients with a primary endpoint of recurrent sickle cell crises. The topic of missingness was front and center at the U.S. FDA Advisory Committee meeting. The FDA summary review document stated that “The review team was concerned with the amount of missing data and the imputation methods the Applicant used to overcome the impact of the missing efficacy data.”³ Not only was the missing data rate high, but there was also a higher missingness rate in the active group (36%) compared to placebo (24%). Further, “for the patients who discontinued the trial medication or placebo, the number of pain crises was imputed as either the mean number of crises (rounded to the nearest integer) in patients in the same trial group who completed the trial or the actual number of crises the patient had at the time of discontinuation, whichever was greater.”⁴ Because of the earlier and higher dropout rate in the active group, such an imputation approach will undercount the pain crises in the active group and is not to be recommended.

The second is the Alzheimer's trial referred to above. The rates of AEs leading to discontinuation of study drug were higher for lecanemab (6.9%) compared to placebo (2.9%). The primary analysis was performed without imputation of missing values. The analysis will be biased in favor of lecanemab because it assumes that patients who discontinue will respond similarly to patients who remain in the study. This is clearly not the case here.

The third is a trial conducted in patients with transthyretin amyloidosis cardiomyopathy⁵. Although the primary endpoint was a hierarchical combination of time to all-cause mortality and frequency of cardiovascular-related hospitalizations, we draw attention to the first secondary endpoint, change from baseline to month 30 in the 6-minute walk test (6MWT). 41% and 60% of patients randomized to the active and placebo groups, respectively, did not complete the month 30 6MWT assessment. The higher missing month 30 6MWT rate in the placebo group was in part due to a benefit on the primary endpoint. The imputation method should be such that the active group should be given ‘credit’ for a lower rate compared to placebo. There is a risk that the typical imputation methods, which are conservative, can substantially understate the treatment effect because of the higher missingness rate on placebo.

The methods proposed are illustrated using data from the Pulmonary Arterial hyperTENsion sGC stimulator Trial-1 (PATENT-1) and additionally described conceptually for benefit-risk by incorporating patient preferences in renal cell carcinoma for which trial data are unavailable^6–7. PATENT-1 was a randomized, double-blind, placebo-controlled phase III trial evaluating riociguat in patients with pulmonary arterial hypertension (PAH). Patients were randomized in a 4:2:1 ratio to riociguat individual dose titration (IDT), placebo and an exploratory lower dose arm of riociguat. The predefined efficacy analyses compared the riociguat IDT and placebo groups and we will follow this for the analyses to be discussed. 254 and 126 patients were randomized to riociguat IDT and placebo, respectively. The primary outcome of the study was change from baseline in 6MWT at week 12. Among the secondary endpoints was time to clinical worsening, a disease-specific composite event endpoint which included events like death, hospitalization for PAH and initiation of new PAH therapy.

Quantifying Benefit-Risk

In attempting to quantify the net benefit, one weighs the relative importance of formally assessed efficacy endpoints with safety findings. The proposals to follow aim to make the subjective evaluation more objective and transparent as follows.

Hierarchically organize a limited set of important efficacy and safety endpoints. The complete list of safety endpoints to be included in the hierarchy can be based on a review of blinded data and should be finalized prior to unblinding the trial. The hierarchical arrangement depends entirely on the clinical relevance attributed to each endpoint regardless of whether they are efficacy or safety endpoints. The idea is to compare all pairs of patient data, one from each group, on the endpoint of highest priority⁸. If deaths are observed in the study, it will be at the top of the hierarchy. Each pair of comparisons results in one of two outcomes: (a) a win for one patient and a loss for the other patient, or (b) an indeterminate outcome (a tie). If, for a pair, the comparison is a win for one of the two patients, no further comparisons are made. If it is tie, then the same pair is compared on the next endpoint in the hierarchy, and so on. Each pairwise comparison occurs at the minimum of the follow-up times⁹.

Let #wins and #losses denote number of pairwise comparisons that result in a win and a loss for patients randomized to the investigational group. The results of the pairwise comparisons are combined into a single metric, the win ratio = #wins/#losses. The win ratio has become a popular method for combining efficacy endpoints hierarchically¹⁰.

The proposal is to conduct the win ratio analysis sequentially: the first analysis includes the endpoint at the top of the hierarchy, the second analysis includes this endpoint and the next most important endpoint, and so on. The sequential analyses indicate the extent to which inclusion of endpoints in a stepwise manner alters the assessment of benefit-risk. If, for example, there is excess risk with mortality as a standalone endpoint, then the analyses on the remaining endpoints may be disregarded even if for each subsequent sequential analysis, the benefit-risk assessment is in favor of active treatment. The assessment of excess mortality risk may have to be decided subjectively because the event rate may be low for a formal assessment based on p-values.

Incorporating Patient Preferences in Quantifying Benefit-Risk

A different approach to quantifying benefit-risk is possible when patient’s preferences are available on benefit vs risk tradeoffs. This is described at conceptual level (because trial data are unavailable) for renal cell carcinoma.⁷ Efficacy was defined by progression free survival, PFS (assuming no change in survival between treatments), SAEs defined by liver failure and blood clot, and tolerability defined by fatigue/tiredness, diarrhea, hand-foot syndrome (HFS), and mouth sores. Patients were then asked trade-off questions between efficacy, SAEs, and tolerability. The key findings were

PFS was the most important attribute to patients when making treatment decisions.
Severe fatigue and severe diarrhea were the most troublesome tolerability effects.
Patients were willing to accept small risks in SAEs related to liver failure or blood clot in exchange for additional months of PFS.

As before, this approach first compares patients in a pairwise manner on time to death. If the comparison is indeterminate, patients are then compared on SAEs related to liver failure or blood clot. A win is assigned to a patient with fewer SAEs. However, if the comparison is still indeterminate, instead of breaking the tie using the remaining hierarchy of PFS and tolerability effects, patients will be compared based on a score that combines the PFS outcome and the severity of tolerability effects that they experienced. More specifically,

A tolerability score is first computed for each patient according to Table 1. The numerical values in Table 1 are assigned according to the estimated mean patient preference weights.⁷ For example, a numerical value of 10 which is assigned to severe fatigue/tiredness implies that to an average patient, avoiding severe fatigue/tiredness (in comparison to no fatigue/tiredness) is as important as increasing PFS by 10 months. This tolerability score of the patient depends on the severity of the most troublesome episode of each tolerability effect that the patient went through. For example, if a patient experienced severe fatigue/tiredness, severe diarrhea, moderate HFS and moderate mouth sores, then this patient has a tolerability score of 10 + 8.7 + 3.3 + 4 = 26 points. If another patient experienced moderate fatigue/tiredness, mild diarrhea, no HFS and moderate mouth sores, then this second patient has a tolerability score of 2.5 + 0 + 0 + 4 = 6.5 points. The higher the tolerability score, the less tolerated the treatment is by a patient.
The above tolerability score is then subtracted from the PFS outcome (in months) to arrive at a final combined score for the patient. A win is assigned to a patient with a higher value of this combined score for ‘PFS (in months) minus tolerability score,’ otherwise, a tie is assigned. For example, a patient with a PFS of 10 months and a tolerability score of 4 points has a combined score of 10–4 = 6 points. This patient will be tied with another patient who has a PFS of 6 months and a tolerability score of 0 (combined score = 6–0) but will be considered to have fared better than another patient who has a PFS of 12 months with a tolerability score of 8 (combined score = 4).

Table 1

Illustration of tolerability score that corresponds to the renal cell carcinoma patients’ preference for the trade-offs between benefit and risk, where events of fatigue/tiredness, diarrhea, hand and foot syndrome (HFS), and mouth sores were identified as tolerability endpoints. The numerical values in this table are assigned according to the estimated mean patient preference weights¹¹, except for severe mouth sores. This is because the trade-off questions in the survey did not include severe mouth sores (for which the severity levels included were mild, mild-to-moderate, and moderate).
	Fatigue/Tiredness	Diarrhea	HFS	Mouth Sores
Severe	10	8.7	8	6
Moderate	2.5	3	3.3	4
Mild or None	0	0	0	0

This gives the following hierarchical arrangement: time to death, frequency of SAEs (liver failure and blood clot), PFS-tolerability score. As before, the recommendation is to conduct three analyses sequentially:

Time to death,
Time to death, frequency of SAEs related to liver failure and blood clot,
Time to death, frequency of SAEs related to liver failure and blood clot, PFS-tolerability score.

Addressing Missing Data

The method proposed incorporates the reason and the time of missingness in handling of missing data and addresses the concerns illustrated with the 3 examples presented earlier. It is in line with ICH E9 R(1): “[A] patient who discontinues treatment due to toxicity may be considered not to have been successfully treated”¹¹. The ICH document on Estimands, under the composite variable strategy, incorporates the reason for treatment discontinuation (toxicity) in imputing an outcome for the patient (not successfully treated). The idea is similar to the pairwise comparisons described for the quantification of benefit-risk. An important exception is that each comparison need not occur at the minimum of the follow-up times of the two patients in a pair. In making the determination of a win/loss or a tie, the reason for why data are missing and the time of missingness is considered in the assessment.

The estimand framework is more flexible because different strategies for intercurrent events can be invoked for different pairwise comparisons. For example, if both patients in a pair take rescue medication, the proposed approach allows for assigning a win to the patient who takes the rescue medication later instead of deciding the win on the outcome of the endpoint itself. On the other hand, if only one patient in a pair takes rescue medication, the win would be assigned to the other patient in the pair with non-missing data.

Quantifying Benefit-Risk for PATENT-1

The hierarchically arranged endpoints were chosen to be time to death, time to clinical worsening excluding death, time to serious adverse events (SAEs), and change from baseline to week 12 in 6MWT (for simplicity referred to as just 6MWT), with rates of events shown in Table 2. For 6MWT a difference was only considered to be a win if the change from baseline for one patient compared to the other patient exceeded a certain amount, with a cut-off of 10 meters (m) for one analysis and 30 m for another analysis. This was done to prevent assigning wins based on minor differences, which seems appropriate considering the severity of the endpoints higher up in the hierarchy.

Table 2

PATENT-1: Number (%) of patients with events of interest quantifying estimating benefit-risk and addressing missing data. * 3 additional patients died but have another reason for drop-out being reported prior to the observed death.
Investigator-reported Reason	Riociguat IDT (n = 254)	Placebo (n = 126)	Which analysis? Missing dataBenefit-risk
Death as dropout reason* Death	0 2 (0.8)	2 (1.6) 3 (2.4)	√	√
Clinical worsening excluding death	1 (0.4)	6 (4.8)		√
Serious adverse event	29 (11.4)	23 (18.3)		√
Adverse event leading to early withdrawal	8 (3.1)	6 (4.8)	√
Other non-complete Lost to follow-up Non-compliance with study drug Protocol violation Withdrawal by subject	9 (3.5) 1 (0.4) 1 (0.4) 1 (0.4) 6 (2.4)	5 (4.0) 0 0 2 (1.6) 3 (2.4)	√
Completed study with missing 6MWT at week 12	4 (1.6)	1 (0.8)	√

As shown in Table 3, for all stepwise analyses, the win ratio exceeded 1, and the final win ratio (95% confidence interval [CI]) was 1.73 (1.34, 2.27) with a 6MWT cut-off of 10m and 1.85 (1.37, 2.53) with a 6MWT cut-off of 30m. Because the win ratio estimate for all sequential analyses exceeds 1, and the lower 95% CI exceeds 1 for the final analysis, one may conclude that the net benefit of riociguat IDT is positive. The CI was calculated by the bootstrap procedure. An easy-to-use formula to calculate the CI is available when the number of wins and losses (relative to the total number of comparisons) is not small¹². It gave similar CIs as the bootstrap procedure for rows 3–5 of Table 3.

Table 3

Benefit-Risk assessment of PATENT-1 via sequential and hierarchical analyses of a limited set of important efficacy and safety endpoints.
Endpoint	\(\frac{\#\text{W}\text{i}\text{n}\text{s}}{\#\text{l}\text{o}\text{s}\text{s}\text{e}\text{s}}\)	Win Ratio	# ties	95% CI
Death	\(\frac{489}{114}\)	4.29	31401	0,\(\infty\)
Death, clinical worsening	\(\frac{1461}{231}\)	6.32	30312	1.38,\(\infty\)
Death, clinical worsening, SAE	\(\frac{4517}{2948}\)	1.53	24539	0.81, 2.76
Death, clinical worsening, SAE, 6MWT^A	\(\frac{18702}{10797}\)	1.73	2505	1.34, 2.27
Death, clinical worsening, SAE, 6MWT^B	\(\frac{16058}{8673}\)	1.85	7273	1.37, 2.53
Cut-off for 6MWT assessment: ^A10m, ^B30m

The variability of the win ratio depends on the proportion of tied outcomes with a larger proportion resulting in a larger variance. Thus, identifying endpoints that result in fewer ties makes better use of the data collected in the trial, and increases the ability to discriminate between two groups.

For PATENT-1, the second hierarchical component for quantification of benefit-risk was the time of SAEs. Very few patients had multiple occurrences of SAEs, otherwise it would be appropriate to use frequency of SAEs instead as the component endpoint, so that when comparing pairs of patients with SAEs, the patient with the fewer SAEs wins.

It was noted earlier that the complete list of endpoints could be based on blinded data with the list finalized just prior to unblinding. However, an outline of the hierarchy can be decided prior to the start of the trial – e.g., whether efficacy (assuming it is not mortality) precedes or follows frequency of SAEs, and if other anticipated AEs which are not serious are worth including in the list.

Addressing Benefit-Risk for PATENT-1

The reasons for missingness in PATENT-1 are shown in Table 1. An outline of the rules for determining the outcomes of pairwise comparisons is as follows

If neither patient in a pair has missing data, the evaluation of a win/loss or a tie is made on the observed 6MWT data. The win is assigned to the patient with a better change from baseline to week 12 in 6MWT value.
If one patient has missing data, and the other patient does not, the win is assigned to the patient with no missing data.
If both patients have missing data, then reasons for missing data are invoked to determine a win, a loss, or a tie. Missing data due to death or AE are considered to be the worst category. When comparing two patients who both had missing data due to one of these reasons, the patient with the later event wins. Missingness due to loss to follow-up, non-compliance with study drug, protocol violation and withdrawal by subject are lumped into a single ‘Other non-complete’ (ONC) category. It was decided that any one of these reasons for study discontinuation was not sufficiently different from another to assess who won. Thus, for example, a tie was assigned if one patient discontinued the study due to non-compliance with study drug and another due to protocol violation regardless of which event occurred earlier. Similarly, a tie was assigned if the earlier missingness is due to ONC and the later is due to death or AE. However, if the earlier missingness is due to death or AE and the later missingness is due to ONC, then the patient with missingness due to ONC is assigned a win.

Figure 1A shows all scenarios when Patient 1 wins (and thus Patient 2 loses) and Fig. 1b shows all scenarios of tied outcomes. After applying the rules shown in Fig. 1A and 1B we calculate the win ratio = #wins/#losses.

Comparing each riociguat IDT patient with each placebo patient using the rules in Figs. 1A and 1B gives:

#wins = 19,958, #losses = 11.741, # ties = 305, WR = 1.70, and 95% CI of (1.33, 2.22).

If one takes a naïve approach by assigning ties when at least one patient in a pair has missing data, then:

#wins = 16,599, #losses = 9332, # ties = 6073, WR = 1.78, and 95% CI of (1.36, 2.37).

While the number of ties is substantially higher with the naïve method, the resulting win ratio and CI are fairly robust. With more missing data that would likely change. For prospective planning of a trial, more complex rules would still be recommended, as the amount and pattern of missing data are difficult to anticipate.

In the literature, similar methods, with important exceptions, have been suggested for quantifying benefit-risk.^13–14 However, one of the methods includes AEs of mild and moderate severity in the hierarchical arrangement of endpoints and does not consider the sequential analysis.¹³ Because the number of patients with mild and moderate AEs tend to be greater than those with severe AEs, their inclusion can dominate the win ratio. For example, if the number of mild AEs is greater in the placebo group and is the reason for a favorable win ratio outcome, that would not be convincing evidence of positive net benefit for trials intended to treat serious diseases. Similarly, a sequential analysis is valuable as it reveals the importance of each added endpoint to the hierarchy. A different method assigns scores to the efficacy and safety endpoints, and thus does not lend itself to sequential analyses.¹⁴

How is the use of the win ratio for benefit-risk assessment different from its purpose for efficacy evaluations? For efficacy, there is little room for overriding the result due to tension with personal judgement. For benefit-risk, there will need to be greater freedom in interpreting results as some residual subjectivity cannot be avoided. But the hierarchical ordering of outcomes in a blinded manner will assist a structured review and support a well-informed judgement. An unanswered question is how one might use the proposed method for determining the power of sequentially performed analyses.

For missing data, the proposal to incorporate the reason for missingness and the time missingness occurred as a component of the endpoint itself addresses the concerns identified in the Introduction Section. It diminishes any efficacy advantage of the arm with a higher or earlier AE dropout rate. One limitation of the proposed method is that the rules for handling missing data can get complicated for other types of endpoints (e.g., recurrent events). For PATENT-1 the groups compared had a relatively low rate of missingness (9.2%), limiting a more in-depth demonstration.

It is possible that for any given trial, there may be no consensus on arranging endpoints hierarchically. Analyses may then be conducted for each of the proposed hierarchical arrangements. If the conclusion depends materially on the chosen hierarchical arrangement, that may expose the limitation not only of the method but also in the subjective assessment.

Conflict of interest

The authors declared no competing interests for this work.

van Dyck CH, Swanson CJ, Bateman RJ, et al. Lecanemab in early Alzheimer’s disease. N Engl J M. 2023; 388; 9-21.
Fleming, TR. Addressing missing data in clinical trials. Ann Intern Med. 2011; 154: 113-7. doi: 10.7326/0003-4819-154-2-201101180-00010.
Advisory Committee Briefing Material, quote on page 4: https://www.accessdata.fda.gov/drugsatfda_docs/nda/2017/208587Orig1s000SumR.pdf
Niihara, Y, Miller ST, Kanter J, et al. A Phase 3 trial of L-glutamine in sickle cell disease. N Engl J M. 2018; 379: 226-235.
Maurer MS, Schwartz JS, Gundpaneni B, et al. Tafamidis treatment for patients with transthyretin amyloid cardiomyopathy. N Engl J M. 2018;379(11):1007-16.
Ghofrani HA, Galiè N, Grimminger F, Grünig E, Humbert M, Jing ZC et al. Riociguat for the treatment of pulmonary arterial hypertension. N Engl J Med 2013;369:330-40.
Mohamed AF, Hauber AB, Neary MP. Patient benefit-risk preferences for targeted agents in the treatment of renal cell carcinoma. Pharmacoeconomics. 2011; 29(11):977-88.
Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 1947; 18: 50-60.
Finkelstein DM, Schoenfeld DA. Combining mortality and longitudinal measures in clinical trials. Statist Med. 1999;18(11):1341-54.
Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J. 2012; 33(2):176-82.
Yu, RX and Ganju J. Sample size formula for a win ratio endpoint. Statist Med. 2022; 41: 950-963.
ICH E9 R(1): Statistical principles for clinical trials: Addendum: Estimands and sensitivity analysis in clinical trials. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/e9r1-statistical-principles-clinical-trials-addendum-estimands-and-sensitivity-analysis-clinical
Ceesay T, Mt-Isa S, Heyse JF. Application of the win ratio for benefit-risk analysis. 2018 Joint Statistical Meetings. https://ww2.amstat.org/meetings/jsm/2018/onlineprogram/AbstractDetails.cfm?abstractid=329675.
Evans SR, Follman D. Using outcomes to analyze patients rather than patients to analyze outcomes: a step toward pragmatism in benefit:risk evaluation. Stat Biopharm Res. 2016;8(4): 386-93.

Download PDF

Version 1

posted

You are reading this latest preprint version

Quantifying Benefit-Risk and Addressing Missing Data in Late-Stage Clinical Trials

Status:

Version 1

Abstract

Figures

Introduction

Methods

Quantifying Benefit-Risk

Incorporating Patient Preferences in Quantifying Benefit-Risk

Addressing Missing Data

Results

Quantifying Benefit-Risk for PATENT-1

Addressing Benefit-Risk for PATENT-1

Discussion

Declarations

References

Status:

Version 1