2.1 Patients
We will include adults (18 years or older) who are diagnosed with either early RA (2010 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria) [18, 19] or established RA (1987 classification criteria) [20]. Patients with inner organ involvement (such as interstitial lung diseases), vasculitis, or concomitant other systemic autoimmune diseases will be excluded. We will include both treatment naïve patients and patients who have insufficient response to previous treatments. We will include patients with moderate to severe disease activity based on any validated composite disease activity measures. Patients who have already achieved remission or at low disease activity at baseline will be excluded. Patients who have used certolizumab (CTZ) within 6 months before randomization will be excluded.
2.2 Interventions
We will include RCTs which compare certolizumab (CTZ) plus methotrexate (MTX) with MTX monotherapy, regardless of doses. If a study compares CTZ + any csDMARDs with any csDMARDs, we will only include patients on CTZ + MTX or MTX from that study. Trials investigating the tapering or discontinuation strategy of CTZ will be excluded.
2.3 Outcomes
Our primary outcome is efficacy defined by disease states, which is achieving either remission (based on ACR-EULAR Boolean or index-based remission definition [21]) or low disease activity (based on either of the validated composite disease activity measures [22]: DAS28 (Disease Activity Score based on the evaluation of 28 joints) ≤3.2 [23], CDAI (Clinical Disease Activity Index) ≤10 [24], SDAI (Simplified Disease Activity Index) ≤11 [25]) at three months (allowance: 2-4 months) after treatment, as a binary outcome. We choose it as our primary outcome because it is suggested as the indicator of the treatment target in both the practice guideline [1] and the guideline for conducting clinical trials in RA [2], and it has been shown to provide more information for future joint damage and functional outcomes compared to relative response (change from baseline) [26].
We have two secondary outcomes. One is efficacy defined by response (improvement from baseline), for which we will use the ACR response criteria ACR50 (50% improvement based on ACR core set variables) [27]. Another is adverse events (AEs). We will perform an IPD-MA separately for patients with all kinds of infectious AEs within 3 months since it is one of the most important AEs for biologic agents. We will also describe other noticeable AEs within 3 months reported in the trials. We will not make predictions models for the secondary outcomes.
2.4 Patient or trial characteristics to be used as potential covariates in the prognostic model
Based on the literature [28-30] and our clinical practice, we propose the following factors as candidates of potential prognostic factors (PFs, baseline factors that may affect the response regardless of the treatment) (Table), which will be used for the baseline risk score development (see section 2.8.2 below). We will try to collect all the information listed below from the data, but only available factors that have been recorded in the trials will be added into the model. We will decide in which type (e.g., continuous, categorical, binary, etc.) a covariate will be put into the model according to the distribution of that covariate after we obtain the data.
2.5 Data collection
We will conduct an electronic search of Cochrane CENTRAL, PubMed, and Scopus, with the keywords: “rheumatoid arthritis”, “certolizumab” or “CDP870” “Cimzia”, “methotrexate” or “MTX”, without language restrictions. We will search WHO International Clinical Trials Registry Platform to find the registered studies. We will search the US Food and Drug Administration (FDA) reports to see if there are any unpublished reports from the pharmaceutical companies. We will select the double-blind RCTs according to the eligibility criteria listed above. For IPD, we will contact the company which markets certolizumab and request IPD through http://vivli.org. We will assess the representativeness of the IPD among all the eligible studies.
2.6 Risk of bias assessment
Two independent reviewers will assess the risk of bias (RoB) for each included RCT according to ‘RoB 2 tool’ proposed by the Cochrane group [32]. For the efficacy primary outcome, RCTs will be graded as “low risk of bias”, “high risk of bias”, or “some concerns” in the following five domains: risk of bias arising from the randomization process, risk of bias due to deviations from the intended interventions, missing outcome data, risk of bias in measurement of the outcome, risk of bias in selection of the reported result. The assessment will be adapted for IPD-MA, i.e. as per the obtained data and not the conducted and reported analyses in the original publications. Finally, they will be summarized as an overall risk of bias according to the RoB 2 algorithm.
2.7 Average relative treatment effect: IPD-MA
We first synthesize the data using one-stage Bayesian hierarchical IPD-MA [33]. We will estimate the average relative treatment effect in terms of odds ratio (OR) for efficacy.
Let yij denote the dichotomous outcome of interest (yij = 1 for remission or low disease activity), for patient i where i = 1, 2, ..., nj in trial j out of N trials, tij be 0/1 for patient in control/intervention group, and pij is the probability of having the outcome.
Where αj is the log odds of the outcome for the control group, in trial j, which is independent across trials; δj is the treatment effect (log OR), which we assume to be exchangeable across trials; δ is the summary estimate of the log-odds ratios for the intervention versus the control arm, and τ2 is the heterogeneity of δ across trials. and normally distributed across trials.
2.8 Predicting treatment effect for patients with particular characteristics: a two-stage model
2.8.1 Data pre-processing
Within each study the outcomes and the covariates will be evaluated for missing data, and we will further look at their distributional characteristics and correlations between the covariates (listed in section 2.4 and 2.5). We will use multiple imputation methods for handling missing data [34]. We will consider data transformation for continuous variables to resolve skewness and re-categorization for categorical variables if necessary. If two or more variables are highly correlated we will only retain the variable that is most commonly reported across studies and in the literature or the variable that has the least missing values.
2.8.2 Stage-one: developing a baseline risk model
In this step we will build a multivariable model to predict the probability that a patient, given her or his baseline characteristics, is likely to achieve remission or low disease activity irrespective of treatment, we will refer to this model as the baseline risk model. The risk model can be built using the patients from the control group only, or from both intervention and control group. The former is more intuitive, however, a simulation study indicated that models based on the whole participants produced estimates with narrower distribution of bias and were less prone to overfitting [35]. We will fit a multivariable logistic regression model:
rij is the probability of the outcome for patient i from trial j at the baseline. b0j is the intercept, which is exchangeable across studies. PFijk denotes the prognostic factor (in total there are p prognostic factors) in study j for patient i, and bkj is the regression coefficient for k prognostic factor in study j and is exchangeable across studies.
In order to select the most appropriate model, we propose two approaches: (1) use previously identified prognostic factors and through discussions with rheumatologists to decide the subset of the most clinically relevant factors and estimate the coefficients using penalized maximum likelihood estimation shrinkage method; (2) use LASSO penalization methods for variable selection and coefficient shrinkage [36].
For each possible model, we will examine the sample size first, in order to assess the reliability of the model. We will calculate the events per variable (EPV) for our model, using all the categories of categorical variables and the degrees of freedom of continuous outcomes [37]. We will calculate efficient sample size for developing a logistic regression model [38]. Validation is essential in prediction model development. Since no external data is available, we can only use internal validation. Via resampling methods like bootstrap or cross-validation we can estimate the calibration slope and c-statistic for each model, to indicate the ability of calibration and discrimination.
2.8.3 Stage-two: developing a meta-regression model for treatment effects
We use the same notation system as that in section 2.7. The rij from stage-one will be used as a covariate in the meta-regression model; both as a prognostic factor and as an effect modifier. Let denote the average of logit-risk for all the individuals in study j. The regression equation will be:
αj is the log odds in the control group when a patient has a risk equal to the mean risk, which is assumed to be independent across trials. g0j is the coefficient of the risk score, while gj is the treatment effect modification of the risk score for the intervention group versus the control group; both are assumed to be exchangeable cross trials and normally distributed about a summary estimate γ0 and γ respectively.
2.8.4 Predicting the probability of having the outcome for a new patient
Assume a new patient i who is not from any trial j, has a baseline risk score calculated from stage-one. In order to predict the absolute logit-probability to achieve the outcome, we use:
We would have estimated δ, γ0 andγ in the meta-regression stage. We will estimate as the mean of logit(rij) across all the individuals and studies. For α, we will estimate it by synthesizing all the control arms. Then we can calculate the individual probability of the outcome both for the control and the intervention and estimate the predicted absolute and relative treatment effect.
To evaluate the performance of the two-stage prediction model, we will use internally validation methods via both the traditional measures, like c-statistic, and measures relevant to clinical usefulness.
2.8.5 Publication bias
Considering that we will probably not be able to include all the relevant research works, as some studies or their results were likely not published owning to non-significant results (study publication bias and outcome reporting bias) [39, 40]. We will evaluate this issue, by comparing the search and screening results (as we will try to retrieve possibly unpublished reports) with the IPD we can get. If necessary, we will address it by adding the study’s variance as an extra covariate in the final IPD meta-regression model (see section 2.8.3).
2.8.6 Statistical software
We will use R for our analyses. Stage 2 will be performed in a Bayesian setting using R2Jags. For the development of the baseline risk model, we will use the pmsampsize command to estimate if the available sample size is enough. We will examine the linear relationship between each one of the prognostic factors and the outcome via rcs and anova commands. The LASSO model will be developed using the cv.glmnet command. We will use the lrm command for the predefined model based on prior knowledge, and then for the penalized maximum likelihood estimation we will use the pentrace command. For the bootstrap internal validation (both for the baseline risk score and for the two-stage prediction model), we will use self-programmed R-routines.