A Shortcut in Bayesian Application

doi:10.21203/rs.3.rs-2102163/v1

Download PDF

Method Article

A Shortcut in Bayesian Application

https://doi.org/10.21203/rs.3.rs-2102163/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

For a well-established class of medical products, when a similar or improved version of a product in the class has been developed, to evaluate the safety and effectiveness of the new product, the Bayesian approach is gaining popularity to save resource and reduce trial duration by leveraging highly relevant external data. However, the Bayesian approach can appear daunting to practitioners, including statisticians without sufficient exposure to the Bayesian methodology. Instead of using the typical modeling approach to borrow evidence from external data, this paper proposes to quantify the effective sample size from the external data first and then combine the effective portion of the external data with the current trial data through a Bayesian scheme. Guided by the fundamental principles of the central limit theorem and sufficient statistics, the proposed method provides a shortcut to assess and implement the Bayesian methodology in a transparent and generalized fashion. For two selected case studies, the proposed method generates comparable analysis results with those from the typical modeling approaches.

Biostatistics

Bayesian

power prior

central limit theorem

sufficient statistics

meta-analysis

performance goal

The Bayesian methodology has experienced explosive growth in the recent decades due to its ability to leverage highly relevant external data to cut cost and shorten trial duration in the clinical development, particularly under the call of the Food and Drug Administration (FDA) for least burdensome approaches (FDA 2010). The fundamental condition for the Bayesian application is the exchangeability between the external data and the current trial data. Here external data include historical data, registry data, and other highly relevant data from various sources defined by FDA (2017) and Medical Device Innovation Consortium (MDIC) (2022). Intuitively, if the disagreement is large between the external data and the current trial data, there should be minimal borrowing. How much can be borrowed should depend on the degree of the disagreement. Therefore, how to quantify the borrowing based on the degree of disagreement is the key element of the Bayesian scheme. The methodologies of power prior, commensurate prior, and hierarchical Bayesian, among others are designed to control the Bayesian borrowing from a systematic model-based approach by Hobbs et al. (2012) and Ibrahim et al. (2014). When the prior distribution for the external data and the distribution for the current trial data follow a conjugate form, the posterior distribution has an explicit form. Otherwise, a Markov Chain Monte Carlo (MCMC) approach may be required for numerical derivation of the posterior distribution. More often than not, given the external data and the current data follow a conjugate distribution with the parameters of interest showing disagreement, naïve integration through the conjugate mechanism is not appropriate because of the violation of exchangeability.

Due to the widespread usage of the power prior method and its variants, we briefly review its mathematical construct in this paper. The basic form of the power prior can be denoted as:

$$\pi \left(\theta \right|{D}_{0},{a}_{0})\propto {L\left(\theta |{D}_{0}\right)}^{{a}_{0}}{\pi }_{0}(\theta )$$

where $L\left(\theta |{D}_{0}\right)$ is the likelihood function for the external data, ${a}_{0}$ is the discount or power parameter, and ${\pi }_{0}\left(\theta \right)$ is the initial prior distribution for $\theta$, the parameter under investigation.

The posterior distribution of $\theta$ is given by:

$$\pi \left(\theta \right|{D, D}_{0},{a}_{0})\propto {L\left(\theta |D\right)L\left(\theta |{D}_{0}\right)}^{{a}_{0}}{\pi }_{0}(\theta )$$

where $L\left(\theta |D\right)$ is the likelihood function for the current data. To implement the Bayesian method, there is a subtle difference between the Bayesian learning and Bayesian inference. For Bayesian learning, it is a process to accumulate data and gradually reach the truth. Given enough time and sufficient data, the process should be relatively smooth without exerted stress. However, for Bayesian inference on medical product approval, the expectation and pressure are dramatically different. Even with the external and current data believed to be similar, many factors could lead to non-exchangeable or marginally exchangeable data. How to best integrate such data to satisfy the regulatory requirements and the broader medical community is a rigorous process under time pressure. Given the relative complexity of the Bayesian design against the traditional designs, better understanding the Bayesian design could help alleviate some of the stress.

For relatively large sample size, the central limit theorem plays a key role in statistical application. In this sense, majority of the Bayesian applications should be expected to fall under the conjugate normal framework, where the prior distribution and the current distribution for the location parameter both follow the normal approximation. This prompts a transparent and generalized application of the Bayesian scheme. However, application of such a unified and generalized approach seems lacking. The generalized application of the adaptive design through the Z-score statistic by Wang and others (2020; 2022) also hints a move toward this direction. Given the conjugate normal distribution, the posterior distribution is still normal. However, a pair of normal distributions with certain degree of disagreement prevents a simple combination because it violates the exchangeability principle. The prior distribution needs to be discounted based on the degree of disagreement. With the discount in mind, the naïve conjugate combination approach to obtain the posterior distribution is no longer feasible. Ibrahim et al. used the likelihood function to implement the discount with their methods enjoying certain optimality features. However, their methods still require mathematical sophistication. Guided by the fundamental yet powerful principles of sufficient statistics, the normal distributions for both the prior and current data can be condensed into the sample mean and its standard error. Combining these two pieces of information essentially serves the purpose to combine the external and the current data. If we hold the skepticism’s viewpoint to regard the external data and the current data to be different in the first place, intuitively, the best timing for the discount should be prior to the combination. The remaining task is to follow this intuition and discount the prior information to be commensurate with the degree of disagreement between the external and current data.

The original power prior approach is more a semi-objective approach than an objective approach as the ‘objectivity’ is only related to the external data in contrast to the experts’ opinions while the power parameter a₀ in Ibrahim’s paper still holds certain subjectivity. To address this issue, Ibrahim and others developed various methods to search or automate the guiding value for a₀ given it is considered as a fixed or static value. An alternative approach is to treat a₀ as a random variable. However, due to the extensive computation associated with this approach, Ibrahim et al. recommend sensitivity analyses to examine the influence of a range of fixed power parameters. Hobbs et al. developed a commensurability model to guide the power parameter determination to gain some objectivity. In Hobbs’s paper, the power parameter a₀ is connected to the difference in the parameter of interest between the prior and current data, with the degree of borrowing being controlled by this difference and a commensurability parameter. Haddad et al. used p-value and the Weibull cumulative distribution function to control the borrowing of the external data (2017). The above methods still use explicit likelihood functions and the power parameter a₀ to implement the integration and discounting. The process gets simplified with the conjugate distributions. We want to illustrate in the following sections that this process can be further simplified with the Z-score statistic under the principles of the central limit theorem and sufficient statistics given auspicious conditions.

In one word, the Bayesian approach is an approach to integrate the prior information and the current data by appropriately discounting the prior information based on the disagreement between the prior information and the current data. Therefore, to have a transparent and easy-to-implement approach, a simple integration and discounting method is preferred. When both the prior information and the current data are manifested approximately in the form of normal distribution, the Z statistics almost serve as sufficient statistics and therefore statistical operations based on purely the Z statistics are justified and should enjoy some pleasant features.

The “treatment effect” ($\theta$) can take many different forms, either in the form of direct clinical relevance such as difference in event rates, or in the form of a model parameter associated with the treatment indicator in the generalized linear model. Examples include log(HR) in the Cox regression, log(OR) in the logistic regression, where HR stands for hazard ratio and OR stands for odds ratio. The “treatment effect” is quoted as it may apply to single-arm trials where the objective is to estimate the single-arm event rate or conduct hypothesis testing against a performance goal (PG).

In the existing literature, the discount schemes include leveraging the power prior, commensurate prior, Weibull cumulative distribution function, spike and slab prior by Walli and Wagner (2011), among others. All these methods require some mathematical sophistication and may appear like a magic box to practitioners. The Weibull discount function utilized by the MDIC seems to offer a wide range of tuning parameters. However, extensive simulations are needed to settle on a reasonable choice. The normal distribution has two identity-defining parameters: the mean and the variance. The discount function in the normal distribution setting should take these two identities into consideration and convert the degree of disagreement into a single discount measure. This discount function does not need to be linear. It should have features resembling those of the spike and slab concept. For this matter, the sigmoid function or the Weibull function or other equivalent function have been preferred in the literature for similar or other purposes. A natural discount function should be between 0 and 1 that characterizes the degree of disagreement in a non-linear way. A candidate for this purpose is the I² statistic which measures the heterogeneity among multiple sources and have a value between 0 and 1 by Higgins et al. (2002; 2003). When there are multiple sources of highly relevant external data, a meta-analysis can be performed to integrate all the independent external sources into a single prior source. After the above integration we only concern about the remaining two sources: the single prior distribution and the current data. When the two sources are close enough, the I² statistic stays 0; when there is moderate disagreement, the I² statistic is close to 0.5; when tangible disagreement exists (heterogeneity p-value becomes significant), the I² statistic gets close to 1. The quantity of 1- I² becomes a natural discount measure by multiplying the sample size for the single prior source with 1-I² to obtain the “effective” sample size. However, when the two sources are close enough, the I² statistic may stick to 0 which leads to full borrowing of the external data. To address this issue, the heterogeneity p-value from the meta-analysis could be considered as it is also a quantity between 0 and 1, and in non-extreme situations typically it has a value staying away from 0 and 1. As a refinement to using I² alone, we could use (1-I² + p_h)/2 as the multiplying factor to derive the effective sample size, where p_h stands for the heterogeneity p-value. One may criticize that when sample sizes are large for both sources or the variance is small for both sources, difference without clinical relevance may lead to the heterogeneity p-value close to 0 and unnecessarily minimize the borrowing. However, in the above situations, borrowing may not be needed in the first place as the Frequentist approach should be sufficient given a large sample size for the current source. In addition, in such a situation, the I² statistic should still support a 100% borrowing. Averaging with no borrowing from the heterogeneity p-value, a 50% borrowing is still feasible. To be conservative in borrowing external data, whenever the upper limit of the confidence interval for the I² statistic is available, the upper limit can be used instead of the default I² statistic.

This approach is similar to the two-stage approach in prospective Propensity Score (PS) adjusted analysis by Lu et al. (2020). The first stage is to calibrate all external data to align with the current trial data to ensure that all key stakeholders agree with the degree of borrowing. Of importance, the degree of borrowing is data-driven and dynamic, not a static specification. The second stage is to combine the external data and the current data by accommodating a data-driven discount of the external data.

Suppose the external data has an equivalent total of N₀ subjects (or events for time-to-event outcome), the current data has n₁ subjects. N₀ can be a very large number. However, for Bayesian borrowing, usually there is a cap on how much can be borrowed. A typical cap is 50% of the current sample size of n₁. We denote the cap in absolute number as n₀. To evaluate the heterogeneity between the two sources, we need the estimates for the parameter of interest and the associated standard error, where the parameter of interest could be event rate difference, HR, or OR for two-arm trials, or an event rate estimate for a single-arm trial. In practice, if we have the individual subject-level data from the external source, the associated standard error can be estimated. Otherwise, it may be obtained from publications from a single trial or estimated from multiple sources through the meta-analysis. To align with the cap imposed on the external data borrowing, the standard error associated with the complete external data needs to be adjusted to be commensurable with the cap if N₀ is much larger than n₀. Denote the standard error for the complete external data as SE₀, the commensurable se₀ at the cap is $\sqrt{{N}_{0}/{n}_{0}}{ \times SE}_{0}$. This se₀ is used to calibrate the degree to discount the historical data. Once the size of the final discounted historical data is determined, which is smaller than or equal to n₀, the associated commensurable se_0.final needs to be calculated for the final integration of the historical data and the current data through a Bayesian scheme.

In the situation of a two-arm trial where the external source only has data for the control arm, which is typical, the discount and integration process should happen in a single-arm fashion in the control arm. Due to the likely population difference between the external data and the current trial data, if individual subject-level data are available, an Analysis of Covariance (ANCOVA) model is recommended by including both the treatment and control-arm data in the current trial and the external control data to obtain the adjusted estimates for the parameter of interest and their associated standard error, then followed by the above-described discount and integration process.

As the proposed method operates under normal approximation, large deviation from normal distribution needs to be examined. For highly skewed distribution, if certain transformations of data can lead to reasonable normal approximation, this method still holds as long as the parameter of interest is aligned in definition and unit between the external and the current data. Otherwise, the agreement assessment can be distorted.

In this section, we evaluate the performance of the proposed technique through simulations. No systematic comparison with the typical Bayesian approaches is provided as there always exist certain combinations of the tuning parameters for the model-based Bayesian methods that could match the results from this proposed method.

Let X follow a normal distribution with a mean of 0.3 and a standard deviation (SD) of 1, denoting as X ~ N (0.3,1). We generate 100 such random samples to represent the current data and certain numbers of additional random samples for the external data with varying means. The cap of borrowing from the external data is set at 50. The hypothesis test is:

$${H}_{0}: \mu \le 0 vs. {H}_{1}: \mu >0,$$

where $\mu$ is the population mean. The nominal significance level is set at one-sided 0.025. To evaluate the type I error of the above scheme, let the mean for the current data to be 0, the mean for the historical data to vary from 0.1 to 0.6, which is a moderate range in comparison with the SD of 1. For power evaluation, the mean of the current data is set at 0.3. To put things into perspective, the statistical powers are 85% and 95% with a total sample size of 100 and 150 respectively at the mean value of 0.3 with a frequentist design (without borrowing). The simulations were conducted with runs of 20,000 for type I error and 5,000 runs for power.

The first type of simulation is to discount the external data only through the I² statistic as described in the method section. Once the external data have been discounted, to test the hypothesis test in (3), the normal conjugate for the location parameter is used to combine the two sources of data. This method is denoted as the CJGI2 (Conjugate plus I²) scheme. The external data are randomly generated given a specific mean with a SD of 1. Of note, for a binary endpoint, the beta-binomial conjugate could be used.

Table 1

Type I error and power for the CJGI2 scheme with varying external means
Total external data points		Mean for external data
Total external data points		0.1	0.2	0.3	0.4	0.5	0.6
50	Type I error	0.079	0.094	0.106	0.085	0.062	0.042
	Power	0.873	0.912	0.928	0.925	0.889	0.886
500	Type I error	0.051	0.116	0.147	0.088	0.040	0.031
	Power	0.849	0.938	0.971	0.921	0.880	0.865
2000	Type I error	0.052	0.108	0.158	0.084	0.036	0.032
	Power	0.870	0.950	0.969	0.923	0.891	0.869

As one can see, for different amount of external data, the type I error control is generally reasonable. The type I error is relatively higher near the mean of 0.3 for external data, going down in both directions. This is because at the mean value of 0.3 for external data, the strength against the null of mean at 0 is relatively strong and the borrowing is not heavily discounted due to the moderate disagreement between 0 and 0.3. Toward the lower side of the external means, though the borrowing is more, it is offset by weaker evidence against 0. Toward the higher side, the stronger evidence against 0 is offset by heavy discounting due to the stronger disagreement. The amount of total external data may also interplay with the type I error control and power performance as it influences the stability of the estimates of the mean and variance for the complete set of the external data.

The second type of simulation is to discount the external data by leveraging both the I² statistic and the heterogeneity p-value also described in the method section. This method is denoted as the CJGI2PV scheme. The external data are also randomly generated given a specific mean with a SD of 1.

Table 2

Type I error and power for the CJGI2PV scheme with varying external means
Total external data points		Mean for external data
Total external data points		0.1	0.2	0.3	0.4	0.5	0.6
50	Type I error	0.060	0.071	0.076	0.058	0.048	0.034
	Power	0.864	0.882	0.917	0.915	0.902	0.868
500	Type I error	0.052	0.089	0.093	0.058	0.034	0.0285
	Power	0.854	0.922	0.946	0.918	0.894	0.872
2000	Type I error	0.050	0.092	0.097	0.059	0.036	0.026
	Power	0.868	0.932	0.961	0.943	0.902	0.865

Based on Table 2, one can see that incorporating the heterogeneity p-value into the discounting better controls the type I error without sacrificing much of the power.

The operating characteristics of the CJGI2PV reflected in Table 2 is reasonably good as the Frequentist powers are 85% and 95% with a sample size of 100 and 150 respectively at the mean value of 0.3.

One may argue that in real-life situations, the external data are known, which means no variability. In Table 3, which is only performed for the CJGI2PV method, the external data are fixed at the specified mean with the standard deviation of 1. The results are similar to those in Table 2 with large number of external data points. As the mean and the standard deviation for the external data are fixed in Table 3, the random numbers do not need to be generated for the external data. In addition, the number of the varying total external data points no longer matters.

Table 3

Type I error and power for the CJGI2PV scheme with external mean and SD fixed
	Mean for External Data
	0.1	0.2	0.3	0.4	0.5	0.6
Type I error	0.053	0.095	0.105	0.058	0.035	0.031
Power	0.862	0.916	0.944	0.923	0.898	0.876

Under normal distribution, the ideal discount function should be associated with the disagreement of the location parameter adjusted by the variance. Putting the location difference and the variance together, a quality candidate to measure disagreement is the coefficient of variation (CV). Figure 1 generated by software R illustrates the inefficiency by using the I² statistic alone – the discount does not seem to exist for CV up to 1 due to its inertia to discount given the fair proximity of the two sources of data. On the other hand, the heterogeneity p-value from the meta-analysis discounts aggressively – approximately 70% of the external data has been discounted away at CV of 1. The discounting controlled by the average of the I² and heterogeneity p-value is more reasonable. In general, the discounting can be controlled by $w\times (1-{I}_{2})+(1-w)\times {p}_{h}$, where w varies between 0 and 1. For serious adverse events such as death, w should be chosen smaller to have more aggressive discount in case of disagreement. If the discount by the p-value is not yet sufficient, customized discount function more aggressive than the bottom curve in Fig. 1 may be warranted. Interestingly, the I² and p-value discount almost converge for large disagreement. The partnership between the I² and the p-value is not a serendipitous finding. Based on Fig. 1, as long as the discount function is connected to CV, which is a standardized and unified scale for disagreement, there is a wide class of discount functions to serve the clinical needs under specific contexts. In later examples, we will see that this I² and p-value combination works quite well without further fine tuning, though it may not hit the bull’s eye.

Automating the discount function or finding a good guiding value for the discount function is a major component of the current Bayesian scheme. Haddad et al. used the Weibull cumulative distribution function with the p-value as an input parameter to model the power prior parameter. Their method comes close to the CV linkage. Though the Weibull approach can control a range of type I error and power, it requires fine tuning in two-dimensional space due to the location and shape parameters of the Weibull model and it is less intuitive than the proposed method. Hobbs’s commensurate method considered the difference of the study parameter yet still missed the intuitive and unified CV scale.

A systematic comparison between this method and the existing Bayesian methods is beyond the author’s depth as it may require years of practice to reach the required level of competencies.

Examples

Two real-life examples are provided below to illustrate the utilities of the proposed method.

Case 1: Campbell (2017) provided a review of the Bayesian application in medical device clinical trials. In his paper, one example is focused on estimation. In this example, the historical data has a 20% event rate (50/250) while for the current data, the event rate is 17% (85/500). It is appropriate in such a case to use 0% as the reference to derive the “treatment effect”. Using the CJGI2 approach, it is full borrowing, while for the CJGI2PV method, the effective borrowing is close to that from the hierarchical model described in Campbell’s paper. If a performance goal (PG) is used in his example and the objective is to meet the PG, then the “treatment effect” estimation should be performed against the PG instead of 0%. Using normal approximation, the standard errors for the treatment effect for the historical and current data are 0.0253 and 0.0168 respectively. As the sample size for the historical data is 50% of that for the current data, no adjustment for the standard error is necessary. Using a generic version of the meta-analysis function metagen() in the R software, the I² and the heterogeneity p-values are 0% and 0.3232 respectively. The effective sample size borrowing from the historical data under the CJGI2 method is (1 − 0)×250 = 250 while that under the CJGI2PV method is (1 − 0 + 0.3232)/2×250 = 165. The above-derived effective sample size of 165 is similar to that described in Campbell’s paper (slightly > 150 as quoted), which is derived post hoc (back engineering) based on a hierarchical Bayesian model. We can also calculate the “Bayesian” point estimate and the “credible interval” based on the beta-binomial conjugate starting from a non-informative super prior Beta(1, 1). The effective historical sample size of 165 is associated with an event count of 33 given the observed event rate of 20%. This leads to an updated beta prior of Beta(34, 133). The resulting posterior distribution is Beta(119, 548), whose point estimate is 0.178, with the 95% credible interval being (0.149, 0.207), both close to that from Campbell.

Case 2: For the XIENCE Diabetic indication, a hierarchical Bayesian modelling by Pennello and Thompson (2008) is employed to combine the data from four historical trials with the prospective data from two clinical centers (FDA 2015). The nature of the analysis is single-arm study against a PG of 14.8%. The primary endpoint is the 1-year Target Vessel Failure (TVF) rate, where TVF is a composite endpoint consists of cardiac death, target-vessel myocardial infarction, and ischemia-driven target-vessel revascularization. The four historical trials have event rates 7.8% (34/433), 11.8% (14/119), 7.3% (13/178), and 3.6% (6/169) respectively, with a total combined sample size of 899. The two clinical centers for the prospective data are combined into a single center for convenience, with an event rate of 8% (21/261). The different historical event rates are due to a combination of factors, such as different population and small sample size, where the latter contributes to a higher variability. For example, the highest event rate of 11.8% is for the long lesion patients with the smallest sample size. How to handle the historical data is subject to many considerations. For convenience of illustration, we use the random effect meta-analysis to integrate the historical data. This leads to an overall event rate of 7.3% with a 95% CI of [5.3%, 10%]. The standard error is approximately 0.012. Assuming the maximum number of patients to be borrowed is 50% of the prospective data, which translates into 130. The adjusted standard error for the historical meta-analysis mean is 0.032. The “treatment effect” against the PG for the prospective data and the maximum borrowable historical data are − 0.068 (0.08–0.148) and − 0.075 (0.073–0.148) respectively, with the associated standard errors of 0.017 and 0.032. The generic meta-analysis between the two sources leads to a I² value of 0 and a p-value of 0.8468. The associated discount proportion is 0.92, which leads to an effective sample size of 120 for the historical data. With an effective sample size of 120 for the historical data and the observed rate of 7.3%, the associated number of events is approximately 9 by rounding. Combining the prospective data (21/261) and the effective historical data (~ 9/120) together through the beta-binomial conjugate starting from a non-informative super prior Beta(1,1), the posterior beta distribution is Beta (31,360), resulting in a Bayesian point estimate of 7.9% and the associated 95% credible interval of (5.3%, 10.6%). The above-derived results are again very close to the submitted Bayesian analysis results, which went through extensive simulations with the WinBUGS software and involved several rounds of regulatory interactions. In contrast, the computation using the proposed method is relatively intuitive and straight forward.

This paper offers an intuitive and transparent approach to implement the Bayesian scheme under a generalized framework. Given that the “treatment effect” and the associated standard error for both the external and the current data can be estimated, an appropriate discount factor can be obtained through a generic version of the meta-analysis. This discount measure enjoys many natural features: it measures the homogeneity or exchangeability of the prior information and the current data through a generalized framework, without being dragged into the granular level of details. The discount is not linear: when the two sources of information give similar evidence toward the “treatment effect”, full borrowing is possible. Given large disagreement, especially nearing statistical significance of heterogeneity, the borrowing is dramatically reduced. Once an appropriate degree of borrowing has been determined, the integration of the effective prior information and the current data can be implemented through a well-defined and transparent process. In this process, the effective sample size from the prior information is conveniently calculated, which is a quantity of interest to practitioners. A subtle point in effective sample size determination is that the newly proposed method is upstream-oriented while the existing literatures are mostly post hoc.

The concept of effective sample size in the Bayesian framework is important because practitioners want to have a practical sense on how much information is borrowed from the external data without fully appreciating the sophisticated modeling. It is a window into a magic box to evaluate the utility of the Bayesian approach under the Frequentist’s framework. The existing approaches typically go through the strenuous journey first and then obtain the effective sample size through back-engineering. The proposed method skips the strenuous journey to obtain the effective sample size first and perform the integration afterwards, which serves as a shortcut. The integration in the second step can follow the formal Bayesian scheme through either the normal-normal conjugate or the beta-binomial conjugate, depending on the endpoint type.

Majority of the existing Bayesian discounting scheme rely on parametric modeling so that one or more parameters can come into play to discount the external data. The power prior method relies on the power parameter, which applies to the likelihood function for the external data. The commensurate prior method by Viele et al. (2014) is to dynamically control the external borrowing through the power parameter which is a function of the study parameter difference between the two sources. Though elegant, the control process is quite complicated and can appear daunting to practitioners. This proposed approach may not be perceived as an authentic Bayesian approach as there is no Bayesian theorem involved for the most part, but rather let the external data first be explicitly discounted. In general, authentic Bayesian models do not exist in real-world as we never know whether the external data and the current data come from the same distribution. The pure exchangeability does not hold and cannot be verified. In this sense, the typical parametric or likelihood-based Bayesian methods are simply using the Bayesian vehicle to implement a reasonable integration by appropriate discounting. As appropriate discounting and statistical inference only depend on the location parameter and the variance in the asymptotic sense guaranteed by the central limit theorem, the vehicle to implement the discounting and integration does not need to be a Bayesian approach. Having said that, this approach is hinged upon the availability of the estimate of the “treatment effect” and its associated standard error. When the situation deviates from such a setting, the typical Bayesian approach can be relied upon.

To implement the discount through the proposed approach, the I² statistic has an appealing interpretation as the percentage of heterogeneity. It is natural to use this percentage to remove the heterogeneous component. Using the upper confidence limit of the I² statistic intends to remove the heterogeneity to a larger degree. Of course, one may even use one-sided 99% confidence limit of the I² statistic. Due to the inertia of the I² statistic in discounting when the external and current data are relatively similar, the heterogeneity p-value is introduced to discount more aggressively. In the normal or near-normal situations, the partnership between the I² statistic and the heterogeneity p-value does provide a sensitive nonlinear discount function by taking into consideration of both the location shift and the variance. Furthermore, a class of discount functions are available by using the proposed approach as a benchmark. The key take-home message is to build the discount function on the coefficient of variation. In this approach, as the prior information is typically informative, type I error inflation is inevitable. However, based on the simulation, the inflation of the type I error is controlled within the typical limits in the Bayesian framework while enjoys the desired power. This method can be considered as a semi-parametric approach as statistics can be derived from parametric models while the discounting and integration is based on normal approximation supported by the central limit theorem.

This paper does not include the illustration for the adjustment of covariates. If covariates are available for both the external and current data at the individual subject level, the ANCOVA model can be built jointly for the external and current data to have the adjusted “treatment effect” and standard errors for all sources and arms. For example, in the case of a two-arm trial situation with borrowing only from the control arm, this can be achieved by including class variables for the different sources and arms of data and use least square means (LSM) to estimate the parameter of interest and its associated standard error for all sources and arms under one integrated model. The ANCOVA model can include generalized linear models and other non-linear models such as the Cox regression. This proposed approach in general requires the same parameter set-up between the external and current data modeling to facilitate assessment of exchangeability and data integration.

The proposed method may seem too simple to be appreciated mathematically. This is due to the fact that the lion’s share of the work has already been done by the central limit theorem and the sufficient statistics. In addition, the timing of the discount has been moved upstream by utilizing a discount function associated with the coefficient of the variation, which is a standardized scale to measure statistical distance. All these combined, the Bayesian application through the proposed approach has been simplified to become intuitive and transparent. Interestingly, the Bayesian theorem has hinted the importance of the order of problem solving per its simplest form of P(AB) = P(A)P(B|A) = P(B)(A|B). One direction may be straight forward while the other may be intractable. In amenable situations, the proposed method seems to offer the simpler path forward by discounting the external data first followed by a canonical Bayesian integration.

For highly skewed distribution or atypical distributions with normalization not feasible, special techniques may be required. There are existing Bayesian models trying to automatically identify the discount parameters that optimize both the type I error control and power gain. The systematic comparison between this approach and such Bayesian models is beyond the author’s capability. Even in situations where the typical Bayesian approaches are more appropriate than the proposed approach or the proposed approach is not applicable, the class of discount functions suggested by Fig. 1 may provide a good starting point for reasonable discounting. Figure 1 may also serve to bridge the gap between the clinicians and statisticians.

The proposed method provides a transparent and generalized approach to implement and evaluate the Bayesian approach in appropriate situations.

Acknowledgement

The author thanks Hong Zhao at Abbott Vascular for useful information regarding MDIC Bayesian modules.

Data availability this paper is based on publicly available data and simulations

Conflict of interest the author is an employee with Abbott

Campbell G (2017) Bayesian methods in clinical trials with applications to medical devices. Communications for Statistical Applications and Methods 24 (6): 561–581
Haddad T, Himes A et al (2017) Incorporation of stochastic engineering models as prior information in Bayesian medical device trials. Journal of Biopharmaceutical Statistics 27: 1089–1103
Higgins JP, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21: 539–1558
Higgins JP, Thompson SG et al (2003) Measuring inconsistency in meta-analyses. BMJ 327: 557–560
Hobbs BP, Sargent DJ, Carlin BP (2012) Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Analysis 7: 639–674
Ibrahim JG, Chen MH, Gwon Y, Chen F (2014) The power prior: theory and applications. Statistics in Medicine 34: 3724–3749
Lu N, Xu YL, Yue LQ (2020) Some considerations on design and analysis plan on a nonrandomized comparative study using propensity score methodology for medical device premarket evaluation. Statistics in Biopharmaceutical Research 12: 55–163
Medical Device Innovation Consortium (2022) External evidence methods (EEM) framework. https://mdic.org/project/external-evidence-methods-framework
Pennello G, Thompson L (2008) Experience with reviewing Bayesian medical device trials. Journal of Biopharmaceutical Statistics 18: 81–115
US FDA (2010) Guidance for the use of Bayesian statistics in medical device clinical trials. FDA Guidance Documents
US FDA (2015) PMA P070015/ S128 and P110019/S075: summary of safety and effectiveness data. FDA Documents https://www.accessdata.fda.gov/cdrh_docs/pdf7/P070015S128b.pdf
US FDA (2017) Use of real-world evidence to support regulatory decision-making for medical devices: guidance for industry and food and drug administration staff. FDA Guidance Documents
Viele K, Berry S, Neuenschwander B (2014) Use of historical control data for assessing treatment effects in clinical trials. Pharmaceutical Statistics 13 (1): 41–54
Walli GM, Wagner H (2011) Comparing spike and slab priors for Bayesian variable selection. Austrian Journal of Statistics 40 (4): 241–264
Wang J (2022) Sample size re-estimation using the Com-Nougue method to evaluate treatment effect. Statistics in Biosciences 14: 90–103
Wang J, Li JJ, Shu Y, Su XL (2020) A practical perspective: application of the generalized approach for adaptive design. Therapeutic Innovation and Regulatory Science 50 (1): 167–170
Wang J, Ren Q (2020) A note on the promising zone approach in adaptive trial design. Statistics in Biopharmaceutical Research 14: 132–137

Download PDF

Version 1

posted

You are reading this latest preprint version

A Shortcut in Bayesian Application

Status:

Version 1

Abstract

Figures

Introduction

Methodology

Simulation Results

Examples

Discussion

Concluding Remarks

Declarations

References

Status:

Version 1