Conditional power, predictive power and probability of success in clinical trials with continuous, binary and time-to-event endpoints

,


Introduction
The need to determine the probability of "study success" may arise at various stages of drug development.For example, the goal of such an exercise could be making a go or no go decision at the beginning of a prospective trial based on data from the earlier phase.It can also be used to monitor an ongoing clinical trial to answer various questions: should the trial continue or should it stop?Is the current sample size sufficient?Do we need any adaption to the trial?The term "success" or "study success" is often understood in the context of achieving a pre-specified threshold for p-value (e.g., two-sided 0.05 or one-sided 0.025) at the end of the trial.In this paper, we would call it "trial success" to differentiate it from "Clinical success"."Clinical success" is defined as the observed treatment effect size exceeding some threshold value that is often clinically meaningful [1,2].This paper focuses on the determination of the conditional power (CP), predictive power of success (PPoS) and probability of success (PoS) to assess the chance of "trial success" and "clinical success".
The CP, PPoS and PoS along with power are often used statistical tools to quantify the chance of success (either "trial success" or "clinical success").Of these, power and PoS are calculated at the beginning of the trial, whereas CP and PPoS are determined after observing the interim results.All these measures attempt to determine pr(success) (i.e., chance of success) based on the prior belief of effect size at the beginning of the trial, or the observed value of effect size in the first part of the trial, or the combination of both.At the beginning of a trial, pr(success) is determined solely based on the prior belief.On the other hand, calculation of pr(success) at the interim analysis has to take following things into consideration: (a) available interim results, (b) limited uncertainty in the trial data arising from the post-interim part, and (c) the decision to use or not the prior information.Further, power and CP are frequentist tools, whereas PoS and PPoS follow the Bayesian paradigm.The difference of Bayesian approach over the frequentist approach is the way available knowledge on effect size is summarized: Bayesian measures summarize this knowledge as distribution of effect whereas frequentist measure makes the best guess about the effect size as a single value, and thereby frequentist measures may not be a good indicator of pr(success) [3][4][5][6].For this reason, PoS is also viewed as average 'power' over the prior distribution of θ [5,6] whereas PPoS is the average CP over the predictive distribution of θ.Further, in the Bayesian approach, both prior knowledge and available interim results can be used together in assessing pr(success); however, all available knowledge must be summarized into a single value in the frequentist approach.
CP is defined as pr(success) given the interim result and assuming a fixed effect size for the remainder of the trial.Halperin et al. [7] first proposed the use of CP in monitoring a long-term clinical trial.They proposed to calculate two CPs for trial success given the current results: (a) assuming the fixed effect size for the remainder of the trial as specified under the null hypothesis (H 0 ), and (b) assuming the fixed effect size as expected at the beginning of the trial under the alternative hypothesis (H 1 ).Lan and Wittes [8] generalized the calculation of CP using the B-value (see Section 2).Further, Lachin [9] has shown that any formal stopping boundaries based on the CP (along with the type I and II error probabilities) can be expressed using B value in a study with interim futility analysis.The use of PPoS in a clinical trial can be traced back to Choi, Smith and Becker [10]: they employed the predictive distribution of proportion using a beta prior to obtaining the "desired probability" of trial success.However, it was Spiegelhalter, Freedman and Blackburn [11] who first proposed a general Bayesian framework to obtain unconditional power by averaging the conditional probability of success over the current opinion of the treatment effect based on the results observed in the first part of the trial.They termed this unconditional power as the "predictive probability of rejecting the H 0 .In the remainder of this paper, this quantity is referred to as PPoS.For a prospective trial, Spiegelhalter and Freedman [3] first suggested calculating "average power" as the "overall predictive probability of obtaining a significant result", where the so-called statistical power of the trial is averaged with respect over the prior distribution of belief about the possible effect size.To distinguish it from the PPoS based on interim data, we would call it PoS.The frameworks of [10,11,3] apply the Bayesian methodology in monitoring clinical trials where analyses were carried out using conventional frequentist techniques, and this framework was later referred to as 'hybrid classical-Bayesian' [12].Lan, Hu and Proschan [13] have discussed the relationship between the CP and PPoS.The majority of the earlier works (e.g., [7,10,11]) considered a two-arm trial with a binary endpoint for illustration.However, these frameworks are illustrated for the continuous endpoints (e.g., see [14]) and the survival endpoints (e.g., see [15]) as well.
Calculation of pr(success) (i.e., CP, PPoS and PoS) is often 'non-trivial' mathematical task [3].To make it more difficult, the literature in this area is severely suffered by the inconsistent use of terminologies for these concepts.For example, PPoS has been referred as 'predictive power' [1,13], 'Bayesian predictive power (BPP)' [16], 'predictive probability of statistical significance' [2] and 'probability of study success' [17] in the literature.On the other hand, the PoS in the literature is also referred in the literature as 'average success probability [5,6], 'assurance' [4], and 'expected power' [6].Further, certain areas (e.g., trials with binary endpoint) received more attention compared to the rest.For example, we could not find any literature discussing these probability measures in a single-arm trial with the time-to-event endpoint.Despite the wide popularity of these measures, the current literature still lacks the concise presentation of these measures under a general framework of hypothesis testing.The present work attempts to fill all these gaps.
In this paper, we focus on CP, PPoS with or without prior and PoS.We first derive general expressions for these pr(success) measures for normally distributed test statistics with normal prior (Section 3) under a unified framework of hypothesis testing (Section 2).Subsequently, we present the expressions for these measures in single-arm and two-arm trials with continuous, binary and time-to-event endpoints separately in Section 4. For two-arm trials, examples are presented along with the comparison between CP and PPoS, and assessments of the impact of the prior distribution on predictive distribution of effect size and PPoS.Importantly, we have derived the expressions for CP, PPoS and PoS in a single-arm trial with the time-to-event endpoint in Section 4.5 that was never addressed in the literature to our knowledge.In that discussion, we have also shown that commonly recommended approximated variance of 1/d (e.g.see [18]) consistently under-estimates the variance of log(median) (see Figure 3), and we have derived alternative expression for variance.Expressions of PPoS for the binary endpoint with beta prior are presented in Section 5 with example.Implementation in R through LongCART package [19] and R shiny app (https://ppos.herokuapp.com/) are discussed in Section 6 and illustrated in Appendix 2.

Preliminaries and notations
Notations presented in this Section are listed in Table 1.We consider the following general form of hypothesis testing in a clinical trial: where θ is a parameter of interest.For example, it could be either mean or proportion in a given population or their difference between two populations or log HR.We also assume results from the interim analysis performed after the accrual of t amounts of "information" (0 ≤ t ≤ 1) are available.At any given time of analysis, the information t equals the proportion of interim subjects (n, say) θ projected value of θ from post-interim data (excludes data upto interim).θ 0 , σ 0 , mean and SD of a normal prior for θ ψ σ 2 0 /(σ 2 0 + k 2 /t) θ min threshold for "clinical success" γ γ = c(1) for "trial success" and γ = θ min k for "clinical success".
to the maximum number of planned subjects (N , say) for continuous and binary endpoints, or proportion of observed events at interim analysis (d, say) to the maximum planned events (D, say) for time-to-event endpoints.Let's θ(t) be the estimate of θ at the interim analysis with corresponding ndard error (SE) as Note that, k is the SE of the estimate at the final analysis and does not depend on t.For example, k = σ/ √ N in trials with continuous endpoint (where σ is the standard deviation (SD)), and k ≈ 2/ √ D in trials with time-to-event endpoint, respectively.With this, for both Z-test and log-rank test, test statistic Z(t) can be expressed as where c(t) is the rejection boundary.c(t) must be identified in advance and should be such that it preserves overall type I error of α.In a single look design without any interim analysis, c(1) = Φ(1 − α), where Φ(•) denotes the cumulative distribution function of a standard normal variate.In a multiple looks design with one or more interim analyses, c(1) must be determined according to the appropriate alpha spending function (e.g.[20]).
Since, E [Z(t)] = (θ/k) • √ t, the information growth in Z(t) is proportional to √ t.Following Lan and Wittes [8], the B-values are defined as follows: With interim data available at information time t, the uncertainty is now restricted to the results from the post-interim data (i.e., data contributing to remaining information of (1−t)).As B(1)−B(t) is independent of B(t), we can decompose B(1) as follows Based on Eq. (2.1), it translates to where Z(1−t) is the test statistic using the post-interim data only (i.e., data accrued after information t) and is defined as follows: where, θ(1 − t) is the estimate of θ based on post-interim data only.From Eq. (2.3), we have Note the similarity in the definition of the "Trial success" and "Clinical success" criteria.By replacing c(1) with θ min k in the "Trial success" criterion, we can obtain the "Clinical success" criterion.Therefore, we define the general criteria of "success" as follows: where γ = c(1) for "Trial success" and γ = θ min k for "Clinical success".
3 Conditional power, Predictive power, Probability of success In this section, we first explain the concept of CP, PPoS and PoS.Subsequently, the generic expressions for these measures are derived using normal approximation.Calculation of PPoS and PoS using Beta-Binomial distribution are presented in Section 5.

Conditional power (CP) based on interim results
CP is the probability that the final study result will be statistically significant (or clinically successful), given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the current data, or under the null hypothesis [9].Projecting the estimate of θ from the post-interim period to be θ , the estimate of θ in the post-interim data would be distributed as It is very intuitive and common to replace θ by θ(t) in calculation of the CP.In that case, the expression of CP reduces to (e.g., see [13]) This is the CP when the post-interim trend expected to follow the interim trend.

Predictive power of success (PPoS) based on interim results
CP depends on the specified treatment effect in the post-interim data, and therefore, calculation of CP can be arbitrary.An alternative would be to obtain PPoS as averaged CP over the predictive distribution of θ(1 − t) at interim.The use of prior in the calculation of PPoS optional.We have discussed PPoS below both with and without prior.
Suppose the prior knowledge about θ can be summarized using the following prior distribution: With this prior, the posterior distribution of θ is is the proportion of the contribution of interim data.Accordingly, the predic- We can now use the predictive distribution of θ(1 − t) and Eq.(2.6) to derive the PPoS as follows This is the PPoS given the interim results and the prior distribution.Without the prior distribution, PPoS can be derived as a special case of Eq (3.4) by setting ψ = 1 which implies 100% contribution of interim data to the predictive distribution of θ(1 − t).Therefore, after some simple algebraic manipulation, PPoS without the prior distribution is obtained as follows (e.g., see [13,21]): Expressions of the PPoS in Eq. (3.5) and the CP in Eq. (3.2) are very similar except the additional √ t inside Φ(•) for PPoS.It's simple consequence is that CP>PPoS for CP > 0.5 and CP<PPoS for CP < 0.5 [13] (see Figures 1, 2 and 4).That is, the CP is less extreme than the PPoS.For example, the stopping rule based on the PPoS will always make it harder to stop a trial compared to the CP.

Probability of success (PoS) of a prospective trial at the design stage
The concept of PoS is very similar to PPoS: PPoS is averaged CP over the predictive distribution, whereas PoS is averaged power over the predictive distribution.Unlike PPoS, PoS is calculated at the beginning of the trial, and hence it only relies on the prior distribution of θ.As mentioned before, the PoS has also been referred to as 'assurance' [4], and 'expected power' or 'average success of probability' [6].
With the prior distribution specified in Eq. (3.3) and expecting the SE in the trial to be k, the predictive distribution of θ(1) is In that case, the PoS given the prior distribution would be vs. vs. vs. vs. vs.
The expression for PoS, CP and PPoS are identical for two-arm trials with binary and continuous endpoints, although the quantities (e.g., δ n , s n etc.) included in the expression are to be obtained differently.

Expressions of CP, PPoS and PoS by type of endpoints
Expressions of CP, PPoS and PoS for continuous, binary and survival endpoints in a single-arm and two-arm trials are derived separately in this section based on the general expressions presented in the previous section with normal approximation.Summary of notations and expressions of CP, PPoS and PoS discussed in the Section are presented are presented in Table 2.For the two-arm trial, the allocation ratio (treatment arm to control arm) is denoted as a : 1.We denote r 2 = (a + 1) 2 /a.The expressions for single-arm trial can be obtained directly from the corresponding expressions from the two-arm trial by specifying r = 1.Intuition for setting r = 1 is simple: the single-arm design can be thought of as 1 : 0 allocation ratio (instead of a : 1) in which case, r = (1 + 0)/ √ 1 = 1.For clarity, we present the expressions for both single-arm and two-arm scenario separately.
We also would like to remind that here the expressions are presented for general success criteria as presented in Eq. (2.6).One needs to set γ = c(1) for "Trial success" and γ = θ min k for "Clinical success".We have illustrated the calculation of CP, PPoS and PoS based on published clinical trial results for two-arm trials and also compared the behaviour of CP and PPoS.

Continuous endpoint, single-arm trial
We start with the single-arm trial continuous endpoint.Denote the population mean as µ and the maximum sample size in the trial as N .We test the following hypotheses: Here, θ = µ − µ 1 .At interim analysis with sample size n, the estimate of θ is θ(t) = x n − µ 1 where x n is the sample mean at interim.The corresponding test statistic is where s n is the estimate of SD (σ) at interim analysis.Further, in this case, t = n/N and k = s n / √ N .

Conditional power (CP):
The CP with the future trend of the sample mean as µ is If we assume that the current trend observed through interim analysis continues to hold for future data as well (i.e., µ = x n ), then the expression of the CP reduces to Predictive power of success (PPoS): The PPoS solely based on the interim information can be expressed as Incorporating prior information specified in Eq. (4.6), expression of PPoS can be refined as where, ψ = nσ 2 0 /(nσ 2 0 + s 2 n ).

Probability of success (PoS):
The PoS of a prospective clinical trial with N subjects and prior information specified in Eq. (4.6) can be expressed as where σ is the projected SD and k = σ/ √ N is the projected SE in the trial.For the calculation of PPoS with prior distribution in Eq. (4.4) and PoS in Eq. (4.5), following prior for µ was used

Continuous endpoint, two-arm trial
Now consider a two-arm trial comparing treatment (T) with control (C) arm with population means as µ T and µ C , respectively.Denote the maximum total sample size as N .We test where s n is the estimate of pooled SD (σ) at interim analysis.Further, t = n/N and k = r • s n / √ N .

Conditional power (CP):
The CP with estimated mean difference from post-interim data as ∆ is Assuming the interim trend continues to hold for the future data (i.e., ∆ = δ n ), the CP reduces to Predictive Power of success (PPoS): The PPoS without prior distribution can be expressed as The PPoS incorporating prior information specified in Eq. (4.12) can be expressed as Probability of success (PoS): The PoS of a prospective clinical trial with N subjects and prior information specified in Eq. (4.12) can be expressed as (e.g.see [4]) where σ is the projected pooled SD and k = r • σ/ √ N is the projected SE in the trial.For PPoS with prior distribution (Eq.(4.10)) and PoS (Eq.(4.11)), following prior was used Example 1: In the pragmatic, unblinded, non-inferiority CODA trial [22], 1552 subjects (=N ) with appendicitis were equally randomized to receive either antibiotics or to undergo appendectomy.The primary outcome was 30-day health status, as assessed with the European Quality of Life-5 Dimensions (EQ-5D) questionnaire (scores range from 0 to 1, with higher scores indicating better health status; non-inferiority margin, 0.05 points).For this illustration, we imagine to have an interim analysis at the sample size of 776 (=n).According to O'Brien alpha spending function, the rejection boundaries for Z test statistic are 2.96 and 1.97 (=c(1)) at interim and final analyses, respectively.
We are testing following hypotheses: H 0 : µ T − µ C ≤ −0.05 vs.For the calculation of CP and PPoS, let's consider following interim results: mean difference of -0.025 (=δ n ) points with SD as 0.16 (=s n ).Assuming the interim trend to be continued to the remaining part of trial as well, based on Eq. (4.8), the conditional power for trial success (γ = c(1) = 1.97) is 0.941 and PPoS would be 0.866 (see Eq. (4.10)).Now, expecting -0.030 mean difference from post-interim data (i.e., ∆ = −0.030), the conditional power would be 0.871.Further, the PPoS for trial success given the interim results and prior distribution is 0.944 (see Eq. (4.9)).
The predictive distributions of ∆ = µ T − µ C with and without prior distribution, and the CP and PPoS values for trial success against δ n (i.e., interim estimate of mean difference) are plotted in Figure 1.As mentioned in Section 3.2, we can verify that CP>PPoS for CP > 0.5 and CP<PPoS for CP < 0.5, when prior information was not incorporated in PPoS.Both the predictive distribution and PPoS were improved due to use of optimistic prior.

Binary endpoint, single-arm trial
Let's Π denotes the population proportion in a single-arm trial with the binary endpoint.The maximum sample size in the study is N .We test the following set of hypotheses: We have, θ = Π − Π 1 .At interim analysis with sample size n, θ(t) = p n − ∆ 1 with p n being the sample proportion.The corresponding test statistic is Expressions of the CP, PPoS and PoS in this case can be obtained from the corresponding expressions in single-arm trial with continuous endpoint by replacing xn with p n , µ 1 with Π 1 , and µ with Π .

Conditional power (CP):
The CP with projected proportion of Π for post-interim data is If the interim trend continues to hold for the future data (i.e., Π = p n ), the CP reduces to Predictive power of success (PPoS): The PPoS based on the interim information can be expressed as Incorporating prior information specified in Eq. (4.18), expression of the PPoS can be refined as where, ψ = nσ 2 0 /(nσ 2 0 + N s 2 n ).

Probability of success (PoS):
The PoS of a prospective clinical trial with N subjects and prior information specified in Eq. (4.18) can be expressed as where, σ = Π(1 − Π) is the projected SD and k = σ/ √ N is the projected SE in the trial with Π being the projected proportion in the trial.Following prior was used for the PPoS with prior distribution in Eq. (4.16) and PoS in Eq. (4.17)

Binary endpoint, two-arm trial
Consider a two-arm trial comparing treatment (T) with control (C) arm with population proportions as Π T and Π C , respectively.Denote the maximum total sample size as N .We test Here, θ = Π T − Π C − ∆ 1 .At interim analysis with total sample size n, θ(t) = δ n − ∆ 1 where δ n = p T,n − p C,n is the difference between the estimated proportions (p T,n and p C,n ) in the two arms.
) is the estimate of pooled SD (σ).Therefore, the corresponding test statistic is Expressions of the CP, PPoS and PoS for two-arm trial with binary endpoint are same with that of continuous case in Section 4.2.That is, • Eq. (4.7) for the CP with the projected difference from post-interim data as ∆ .
• Eq. (4.8) for the CP with the projected difference similar to that observed at interim analysis.
• Eq.For the PPoS with prior distribution and PoS, following prior for Π T − Π C was considered Example 2: Fenaux et al. [23] reported the trial results of placebo-controlled, phase 3 trial evaluating the effect of Luspatercept in patients with lower-risk myelodysplastic syndromes.The primary endpoint was the proportion of patients with transfusion independence for eight weeks or longer during weeks 1 through 24.A total sample size of 210 patients (=N ) with 2:1 treatment allocation ratio would give the study 90% power to detect differences between response rates of 0.30 in the luspatercept arm and 0.10 in the placebo arm with the one-sided alpha of 0.025 and 10% dropout rate.For this illustration, we add an interim analysis at the sample size of 158 (=n).Further, we assume the clinically meaningful difference is 15% (=θ min ).According to O'Brien alpha spending function, the rejection boundaries for the Z test statistic are 2.34 and 2.012 (=c(1)) at interim and final analyses, respectively.
In this case, we are statistically testing the following hypotheses:  Plots of the predictive distributions of ∆ = Π T − Π C , and the CP and PPoS against δ n (i.e., interim estimate of difference in proportion) are plotted presented in Figure 2. We can confirm that CP>PPoS for CP > 0.5 and CP<PPoS for CP < 0.5 regardless the type of success (i.e., trial success or clinical success) when prior information are not incorporated in the PPoS.

Survival endpoint, single-arm trial
Let's consider a study with a single treatment arm and time-to-event endpoint.Denote the population median as M and the maximum number of events in the study as D. We test the following hypotheses: Here, θ = log M − log M 1 .At interim analysis with d (< D) events, the estimate of θ is θ(t) = log m d −log M 1 where m d is the estimated median at interim.Assuming that m d normally distributed (e.g., see [18,24]), and with var(log m d ) = ξ 2 /d, a test can by constructed as follows: Under the exponential time-to-event distribution, var(log is the plain sample median from a Weibull time-to-event distribution with β shape parameter (see Appendix 1b).For other estimator (e.g., smallest time with at least 50% Kaplan-Meier (KM) estimate of survival probability), ξ 2 is simply the ratio of the variance of the log of estimated median to that of MLE.We have plotted the empirical SE of log(KM estimate of median) for various event sizes (ranging between 20 and 60) and sample sizes (0%, 30% and 50% more than event size) in Figure 3. Empirical SE were obtained as follows: (a) 5000 datasets were simulated, (b) in each of the simulated datasets, event times were generated from exponential distribution with median of 12 months and KM estimates for median were obtained, and (c) SD of log(KM estimate of median) were obtained.The results suggest that 1/ √ d under-estimates the empirical SE.In comparison, (log 2) −1 / √ d almost coincides with the empirical SE for N = d.However, as N becomes greater than d, (log 2) −1 / √ d tends to over-estimate the SE as the follow-up times of the censored subjects attributes to the increased precision of KM estimates which was not factored into the derivation of (log 2) −1 / √ d.In any case, t = d/D and k = ξ/ √ D.

Conditional power (CP):
The CP with the projected median from post-interim data as Projecting the future trend to be identical with the interim trend (i.e., M = m d ), the expression of the CP simplifies to Predictive power of success (PPoS): The PPoS solely based on the interim data can be expressed as Now, incorporating prior information specified in Eq. (4.25), expression of the PPoS is refined as where, Probability of success (PoS): The PoS of a prospective trial with D events (therefore, k = ξ/ √ D) and the prior information specified in Eq. (4.25) can be expressed as follows: For the calculation of the PPoS and PoS with prior distribution, the following prior for M was used log M ∼ Normal log M 0 , σ 2 0 (4.25)

Survival endpoint, two-arm trial
Denoting the treatment to control HR as ∆, in a two-arm clinical trial with time-to-event endpoint and the maximum target number of events as D, we test the following hypotheses: Here, θ = log (∆ 1 /∆).At interim analysis with the total d events, θ(t) = log (∆ 1 /δ d ) where δ d is the estimated HR.The corresponding log-rank statistic for trend test is approximately equivalent to Further, in this case, t = d/D and k = r/ √ D.

Conditional power (CP):
With the projected HR from future data as ∆ , the expression of CP is If the interim trend continues to hold in post-interim data (i.e., ∆ = δ d ), the CP reduces to Predictive power of success (PPoS): The PPoS solely based on the interim data can be expressed as Incorporating the prior information specified in Eq. (4.31), revised expression of PPoS is [15] Φ Probability of success (PoS): The PoS of a prospective trial with D events (hence, k = r/ √ D) and the prior information specified in Eq. (4.31) is (e.g., see [17]): For the calculation of the PPoS and PoS with prior distribution, the following prior for ∆ was used log ∆ ∼ Normal log ∆ 0 , σ 2 0 (4.31) Example 3: In the INTELLANCE-I trial on glioblastoma patients comparing investigational drug depatuxizumab mafodotin, total of 639 subjects (=N ) were enrolled with 1:1 allocation ratio [25].The primary endpoint in the study was overall survival.The target number of events at the final analysis was 441 (=D) and an interim analysis was planned with 332 events.The trial used a weighted log-rank test, however, here we illustrate assuming standard log-rank test.Therefore, expected SE of log HR at final analysis is k = 2/ √ 441 = 0.0952.The rejection boundaries for the Z test (i.e., trend test) statistic are 2.340 and 2.012 at interim and final analyses, respectively.For clinical success, we assume HR ≤ 0.80 (=∆ min ) resulting in γ = − log (0.80)/0.0952 = 2.344.
We test following hypotheses: H 0 : HR = 1 vs. H 1 : HR < 1 (i.e., ∆ 1 = 1).The phase 2 trial on recurrent disease [26] reported HR of 0.71 (=∆ 0 ) with 133 events resulting in prior distribution Consider the following interim results: estimated HR of 0.82 (=δ d ) with 346 events (=d).Note that, k = 2/ √ 441 = 0.0952.Assuming the projected HR from the post-interim data as 0.75 (=∆ ) as assumed at the design stage, the CP for trial success (γ = 2.012) and clinical success (γ = 2.344) are 0.722 and 0.451, respectively (see Eq. (4.26)).However, if we assume that the interim trend continues to the remaining part of the trial, the CP for trial success and clinical success are 0.561 and 0.288, respectively (see Eq. (4.27)).The PPoS for trial success and clinical success solely based on the interim results are 0.554 and 0.310, respectively (see Eq. (4.28)).If we incorporate prior distribution, the PPoS for trial success and clinical success are 0.625 and 0.370, respectively (see Eq. (4.29)).
In Figure 4, the predictive distribution of HR are plotted in the left panel, and the CP and PPoS against the interim HR estimate are plotted in the right panel.For both trial success and clinical success, CP>PPoS for CP > 0.5 and CP<PPoS for CP < 0.5 when prior information is ignored.

single-arm trial
Consider a prior distribution of probability of response Π as Beta(a, b).Denoting the observed number of response as x n from n subjects at interim analyses, the posterior distribution of Π is Let, Y be the number of the observed response from remaining N − n subjects.The predictive distribution of Y is (e.g., see [10,27]) is the beta function.Thus, the PPoS would be where I(•) is the indicator function for trial success (e.g., based on approximate Z test or exact binomial test) or clinical success (i.e., estimated proportion exceeds certain threshold value) is met.

two-arm trial
We assume following priors for the proportions in treatment (T) and control (C) arms: Π T ∼ Beta(a T , b T ) and Π C ∼ Beta(a C , b C ).At the interim analysis, x T of n T subjects in the treatment arm and x C of n C subjects in the control arm are responded.The posterior distributions are [2] Π Let, Y T and Y C be the number of responders from remaining N T − n T and N C − n C subjects, respectively.The predictive distribution of Y T and Y C are (e.g., see [27]) Thus, the PPoS at the end of the trial is where I(•) is the indicator function for success criteria which could be either trial success (e.g., based on approximate Z test or Fisher's exact test) or clinical success indicating the observed difference in proportion exceeds the certain clinically meaningful value.

Example 4:
This example is inspired by the example given in Johns and Andersen [27].Consider a clinical trial to demonstrate that the relapse rate in patients treated in the experimental treatment arm is less than the control arm's response rate.It was planned to enrol 340 patients in each arm.The interim analysis was planned after 170 patients in each arm completed treatment.Noninformative uniform priors were assumed for the two relapse rate: In this case, we are statistically testing the following hypotheses: H  A user-friendly R shiny app is also available at https://ppos.herokuapp.com/ to calculate these measures.A screenshot of this shiny app is presented in Figure 5.

Discussion
In this paper, expressions for various measures of the probability of success are presented by type of endpoints.The discussion in this paper is restricted to the normally distributed test statistics along with normal prior, and beta prior for binomial distributions.For other distributions, the relevant expressions can be obtained using the general framework presented in Section 2 or simulation based methods such as Bayesian clinical trial simulation (BCTS) [4,17] may be used.Nevertheless, a natural question arises which pr(success) measure one should prefer.Often PPoS is preferred over CP for the following reasons: (1) these have better predictive interpretation, (2) unlike frequentist counterpart, the knowledge on θ (the parameter of interest) is used as distribution, whereas in frequentist calculation, we assume that the value of θ is known without any uncertainty, and (3) unlike frequentist paradigm, the prior information can be incorporated in the Bayesian paradigm.As evident from Figure 1, Figure 2 and Figure 4, the CP is more aggressive than the PPoS and hence use of the CP increases the chance of early stopping for futility or efficacy.Lachin [9] has shown that futility termination may markedly decrease the power in direct proportion to the probability of stopping for futility.Therefore, PPoS seems to be more useful while monitoring a trial for early termination.
Dallow and Fina [21] discussed the disadvantages of PoS and PPoS, especially PPoS can lead to much larger sample sizes than the CP during sample size re-estimation.On the other hand, the effect of varying prior distribution of θ on predictive power in the context of futility monitoring is discussed by Dmitrienko and Wang [1], and in general by Rufibach, Burger and Abt [16].In summary, they have proposed to use aggressive prior for futility monitoring as the use of non-informative may increase the early termination rate.Tang [15] suggested using the upper limit of PPoS in futility monitoring.The effect of prior on PPoS in the context of the binomial endpoint is discussed by Johns and Andersen [27].
One might consider the predictive power of clinical success (PPoCS) in monitoring for early stopping as well; however, in general, its use should be discouraged.Saville et al. [2] have pointed out that the PPoS with respect to trial success (referred to as 'predictive probabilities') are naturally appealing for monitoring a clinical trial as (a) the PPoS directly addresses the question of whether the study is going to be a success at the end, (b) and the PPoS often changes drastically with the accrual of more data whereas the PPoCS (referred as 'posterior probabilities') may remain nearly identical.Further, we also would like to point out the potential misuse of PPoS for survival endpoints with delayed treatment effects.In that case, the use of futility criteria for early stopping based on PPoS or CP may be misleading.In these cases, the futility criteria, if any, must be determined through exhaustive evaluation of operating characteristics.

Figure 1 :
Figure 1: Left: Predictive distributions of ∆ = µ T − µ C and Right: plots of the CP and PPoS for trial success against δ n (interim estimate of ∆) for Example 1. Horizontal and vertical reference lines in the left panel correspond to 50% power and observed value of δ n = −0.025,respectively.

Figure 2 :
Figure 2: (Left) Predictive distributions of ∆ = Π T − Π C , and (Right) plots of the CP and PPoS for trial success against δ n (interim estimate of ∆) for Example 2. Horizontal and vertical reference lines in the left panel correspond to 50% power and observed value of δ n = 0.157, respectively.

Figure 4 :
Figure 4: Predictive distribution of HR (i.e., ∆) and the relationship of CP and PPoS for trial and clinical success with δ d (i.e., interim estimate of HR) based on the Example 3. Horizontal and vertical lines in the left panel correspond to 50% power and observed HR of 0.82 (=δ d ), respectively.
Suppose we observed following results at interim analysis: (a) in the treatment arm, 155 (=n T ) out of 170 patients responded with 13 (=x T ) subsequent relapses, and (b) in the control arm, 152 (=n C ) out of 169 patients responded with 21 (=x C ) subsequent relapses.Subsequently, additional 340 − 170 = 170 patients (=N T − n T ) and 340 − 169 = 171 patients (=N C − n C ) to be enrolled in the treatment arm and control arm, respectively.With this information, the PPoS for trial success based on a Z test at one sided 0.025 level is 0.536 (see Eq. (5.2)).

Table 1 :
Summary of notations used in Section 2 and Section 3 in deriving general expression of CP, PPoS and PoS.

Table 2 :
Summary of notations and expressions of CP, PPoS and PoS with normally distributed test statistics and normal priors presented in Section 4 Subscripts T and C indicates measurements in treatment (T) and control (C) arms, respectively.− 1 SD: Standard deviation.For two arm trials it is the pooled SD. − 2 SD in two-arm trial with binary endpoint: σ So far we have discussed normally distributed test statistics and normal priors.Here, we discuss the derivation of PPoS in trials with binary endpoint with beta prior.We have considered the same notations and hypothesis testings presented in Section 4.3 and Section 4.4.