Subgroup Identification for Time-to-event Data Based on AFT Model

Considering the problem of identifying subgroup in a randomized clinical trial with respect to survival time, we present an analysis strategy to find subgroup of enhanced treatment effect. We fit univariate accelerated failure time (AFT) models with covariate-treatment interactions to identify predictive covariates. The false discovery rate is controlled by Benjamini-Hochberg procedure. Then a composite score conversion is employed to transform the set of identified covariates for each patient into a univariate score. To classify patient subgroups, a change-point algorithm is applied to searching for the threshold cutoff instead of using the median. Moreover, we adopted a biomarker adaptive design to check whether the treatment effect exists within certain subgroup. The simulation results show that the change-point method is remarkably superior to the median cutoff particularly when the subgroup sizes vary considerably. Furthermore, the 2-stage adaptive design has good power properties in detecting treatment effect while the type I error is generally controlled. As an illustration, we apply the proposed methods to an AIDS study. In conclusion, when the sample size is sufficient and the censoring rate is mild, the AFT model combined with change-point algorithm performs well in identifying subgroup. We illustrate the proposed method to data from AIDS Clinical Trials Group (ACTG175). In this study, 2139 HIV-infected subjects were randomized to four different treatment groups, including zidovudine (ZDV) monotherapy, ZDV + didanosine (ddI), ZDV + zalcitabine (zal) and ddI monotherapy. We focus on patients receiving treatments: ZDV+ddI (denoted as 1) and ddI monotherapy (denoted as 0). Among them, there are 522 receiving treatment 1 and 559 receiving treatment 0. And the censoring rate is 80.27% and 77.10%, respectively. We choose the days of observing the event ((i) a decline in CD4 T cell count of at least 50 (ii) an event indicating progression to AIDS, or (iii) death) as the response. The covariates include patient’s age and weight, the CD4 and CD8 counts (coded as CD40 and CD80 respectively), Karnofsky score (karnof), days of previously received antiretroviral therapy (preanti), hemophilia (hemo), homosexual activity (homo), history of intravenous drug use (drug), previous non-zidovudine antiretroviral therapy (oprior), previous zidovudine use in the 30 days (z30), previous zidovudine use (zprior), race, gender, antiretroviral history (str2), and symptomatic status (symptom). The first six variables are continuous while others are binary.


BACKGROUND
Personalized medicine, which tailors to a patient's genetic and other unique characteristic, is a rapidly emerging field of health care. The ultimate goal of personalized medicine is to optimize the benefit of treatment by prescribing the right drugs for the right patients with minimal side effects [1]. Today, many cancer treatments are being developed for targeted therapies [2][3][4], in which only a subpopulation is anticipated to benefit from the therapy. To ensure the effect of personalized medicine, it is essential to identify the subgroup who beneficially responses to the targeted treatment than others based on each patient's characteristic. For this reason, the subgroup analysis, if properly used, can contribute to more informed clinical decisions, improved efficiency of the treatment, and reduced cost and side effects.
The subgroup analysis has been studied by a number of scholars [5][6][7][8][9][10]. But almost all of these studies focus on continuous or binary outcome and relatively few methods pertain to right-censored survival endpoints, while it is common that the outcome of interest is a time to an event in clinical data.
For survival outcomes, development of predictive covariates for treatment selection mainly consists of [11]: predictive covariates identification, and subgroup detection. Predictive covariates identification refers to establish a mathematic model to screen the candidate covariates which contribute to diverse treatment effect of patients. Subgroup detection means to identify the potential subgroup with higher treatment response through a classification model.
When interest lies in the relationship between covariates and duration time, the most popular regression model in survival analysis is Cox proportional hazards model. Chen's research [12] used Cox models to identify predictive covariates. But it only evaluates the performance of classifiers and models when the subgroup proportion is 0.2 and 0.5. It is not clear whether and how the sample size, predictive covariate size and censoring rate affect its performance. Besides, it does not explain why the significant level is directly set to be 0.001 and 0.0025 in the subgroup identification when multiple univariate models are used. An alternative to the Cox regression model is the AFT model 4 which doesn't rely on the proportional hazard assumption. The AFT model simply regresses the logarithm of the survival time over the covariates and has an intuitive interpretation. Thus, our work extends that of Chen's [12] in some ways. Our study will focus on the AFT model with the Benjamini-Hochberg procedure to adjust the significance level. Additionally, we will assess its performance when the sample size, censoring rate, subgroup proportion or the number of predictive covariates changes, while they were not provided in Chen's study.
The remainder of the paper is organized as follows. Section 2 presents the proposed analysis strategy for subgroup identification. In Section 3, details are demonstrated for a simulation study.
In Section 4, an application to an AIDS study is described. The findings of the study are discussed in Section 5.

Predictive Covariates Identification
For a given patient, let TR denote the arm indicator (TR = 0 for control and TR = 1 for treatment), and zij represent the measurement for the j-th covariate in the i-th patient, and denote the observed survival time and is log( ) (i = 1,…,n), where n is total number of patients. Patients are randomly assigned to two arms.
Assume there exists two subgroups in the sampled patients: responder (g+) and non-responder where a significant interaction coefficient 3j indicates different treatment responses between the g+ and gpatients defined by the covariate zij; that is, the different zij values of the g+ and gpatients might contribute to diverse treatment effects. And the set of variables zij, for which the coefficient 3j is significant, are regarded as the so-called predictive covariates. Using Equation 1, however, the power to test for interaction effect 3j is often poor [13].
In contrast, Freidlin and Simon [14] proposed excluding the main effect of covariate. In our study, the univariate Weibull AFT model without term 1j zij is as follows.
Equation 2 may possess a higher power to detect an interaction effect than Equation 1. Detailed evaluation and comparisons between Equation 1 and Equation 2 will be evaluated in our study.
As mentioned above, either Equation 1 or Equation 2 must be performed for each covariate in the study so that all potential predictive covariates are identified. As it involves multiple hypotheses testing, the significance threshold should be adjusted. There are adjustment approaches based on the concept of false discovery rate (FDR) , which is the proportion of significant results that are false positives [15]. In our study, the Benjamini-Hochberg procedure is employed to control the FDR. The method sorts the m P-values in increasing order and finds the largest k for which where q is the false discovery rate desired. All hypotheses with index at most k are rejected.
Let U = {x1, x2, … , xk} denote the set of predictive covariates at the pre-specified false discovery rate of q=0.05 in the Benjamini-Hochberg procedure, where k is the number of significant x's. Based on the set U, a classification model will be developed for subgroup detection.

Subgroup Detection
Subgroup detection refers to selecting the responders in the patient population who tend to benefit from a particular treatment. It can generally be divided into two steps. The first step is to acquire each patient's response (score), which predicts each patient's treatment response on the basis 6 of the values of his/her predictive covariates x's. The second step is to dichotomize the patients by finding a cutoff-point for the predictive scores.
In the first step, let wj(j =1,…,k) be the weight assigned to the j-th predictive covariate.
According to the composite score proposed by Matsui et al. [16], the weights can be the estimated regression coefficient 3j (j = 1,…,k) from Equation 1 or Equation 2. Thus, the predictive score for a patient with a set of covariate values is l(x) = Σj wj xj= Σj 3j xj.
In the second step, it may be an intuitive and simple choice to serve a percentile of the predictive scores as the cutoff-point based on the subgroup proportion, while the median is the most shared option [17][18][19][20][21]. Unfortunately, this ratio remains a mystery in reality. Besides, Freidlin and Simon [22] proposed a voting-based classifier (VBC). But 2 pre-determined tuning parameters R and G are necessitated and their specifications are often subjective and require additional analyses.
Alternatively, we apply a change-point algorithm [23], which is a likelihood-based method and has been widely used to detect single or multiple change locations and divide data into a consecutive subset.
In our study, for a given ordered sequence of quantitative scores l={l (1) where P(.) is the probability density function associated with the distribution of the data, and ̂1 , ̂2 are the maximum likelihood estimates of the parameters. Once the cutoff-point l() is obtained, it is available to divide patients into responder and non-responder subgroup depending on their predictive score above or under l(). The R package changepoint [24] is adopted to implement the change-point algorithm. Besides, Chen et al. [12] proposed applying the change-point method to the VBC method so that there is no need to specify 2 tuning parameters.
Besides, it is also crucial to assess whether there exists subgroups prior to the subgroup detection. On the basis of the predictive covariates, we also apply a bootstrap likelihood ratio test 7 (LRT) proposed by Gail and Simon [25] to evaluate the heterogeneity of patients. Provided that the test is significant at a pre-specified  level, then we proceed to subgroup selection. Otherwise, we stop the process and conclude that the sampled patients are homogeneous. The R package mclust [26] is adopted to perform the LRT.

Biomarker Adaptive Design
When a clinical trial involves one or more subgroups, it is crucial to assess the treatment effect on the overall patients as well as the g+ subgroup [11]. Biomarker adaptive designs have been developed to evaluate the treatment effect on the g+ subgroup [14,27,28]. Simply, biomarker adaptive design (presented in Figure 1) consists of an overall test at significance level 1 and a subgroup test at 2 (1+2=, typically,  = 0.05). Firstly, a test of overall treatment effect is carried out at 1 using the whole patients. Secondly, comparison of control and treatment arms in the selected subgroup g+ is carried out at 2.  8 The pipeline of the proposed method is shown in Figure 2. Simulations are used to provide an evaluation of both the accuracy of subgroup detection procedure and the efficacy of treatment effect.

Accuracy of Subgroup Detection Procedure
We consider 100 variables divided into two blocks: the first k variables serve as predictive covariates and are generated from ( 1 , 1 2 )and ( 2 , 2 2 )for the covariates values in g+ and gpatients respectively. The remaining covariates are all generated from ( 2 , 2 2 ). We generate data sets of different sample sizes with equal number of subjects in both arms, various predictive 9 covariates size, or diverse subgroup proportion. Censoring time is simulated from the exponential distribution so that the censoring rate ranges from 10% to 65%. The settings for 1 , 2 , 1 , and 2 are to be 6, 8, 0.5, and 0.25, respectively.
Let yi be the logarithm of failure time and xij be the covariate for the i th individual in a random sample of size n. The survival time is simulated from the AFT model with the intercept term , the true regression coefficient vector , the random error i which follows the extreme value distribution and yields a Weibull regression model, and the scale parameter  which equals 1. As aforementioned, we only consider the situation where treatment is only effective among g+ patients (i.e., u00 = u10 = u01 < u11). According to Equation 3, yi is equivalent to 0 for all control arm patients, and equals to 0 + 1 + ∑ 1 and 0 + 1 + ∑ 2 for responders and non-responders respectively in the treatment arm. To satisfy the situation considered, we have In Figure 3, the performances of two predictive-covariate detection models and 4 classifiers are illustrated when the sample size, censoring rate, or subgroup proportion changes.
According to the results, model Eq_ex generally outperforms model Eq_in. An exception occurs in the model combined Composite M or VBC M, where Eq_ex yields higher sensitivity but lower specificity relative to Eq_in. In different situations, the sensitivity of four classifiers performs 10 similarly within the same model. As the sample size increases (shown in Figure 3.A), sensitivity of 4 classifiers shows dramatic increase. With growing censoring rate (presented in Figure 3.D), sensitivity of 4 classifiers declines dramatically. As the subgroup proportion rises (shown in Figure   3.G), sensitivity of 4 classifiers increases by about 10% and then maintains stable. In terms of specificity, the differences between classifiers with change-point and median become apparent.
High specificity of the classifiers with change-point (Composite C, VBC C) maintains stable while that of the classifiers with median (Composite M, VBC M) drops by 4% to 26% when the sample size increases (presented in Figure 3.B), and rises by 13% to 26% with growing censoring rate or subgroup proportion (shown in Figure 3.E, Figure 3.H), respectively. In generally, the change-point algorithm has an advantage in subgroup selection particularly when the sizes of the 2 subgroups differ substantially. As the subgroup proportion comes up to 0.5, the advantage fade away. When the enhanced treatment effect is fixed, the performance of four classifiers diverges little under different predictive covariates sizes (figure not shown). In a word, with the increase of sample size or decrease of censoring rate, Eq_ex combined with Composite C gradually shows its advantage in subgroup detection.

The Efficacy of Treatment Effect
As mentioned above, we propose using the 2-stage biomarker adaptive design to increase the power of detecting treatment effect.
The   13 We illustrate the proposed method to data from AIDS Clinical Trials Group Protocol 175 (ACTG175). In this study, 2139 HIV-infected subjects were randomized to four different treatment groups, including zidovudine (ZDV) monotherapy, ZDV + didanosine (ddI), ZDV + zalcitabine (zal) and ddI monotherapy. We focus on patients receiving treatments: ZDV+ddI (denoted as 1) and ddI monotherapy (denoted as 0). Among them, there are 522 receiving treatment 1 and 559 receiving treatment 0. And the censoring rate is 80.27% and 77.10%, respectively. We choose the days of

CONCLUSION AND DISCCUSION
Traditional methods often deal with the average effect of a new treatment. It is likely that in the new treatment group, some patients with certain characters (the so-called responders) may have a different treatment effect from the other ones when the two treatment arms do not differ in overall survival. There has been study focusing on the model-based selection for survival outcomes using Cox model combined with a change-point algorithm [12]. This study has evaluated whether the strategy proposed by Chen [12] can be used to subgroup identification in the AFT model with Benjamini-Hochberg procedure to control the FDR. These evaluations are made through simulations conducted across different sample size, censoring rate, subgroup proportion or predictive covariate size. In our study, relatively high-sensitivity and high-specificity are found in subgroup detection and largely improved power is presented in 2-stage biomarker adaptive design across different situations for the univariate AFT model excluding the main effect (Eq_ex) combined with changepoint algorithm. Besides, its corresponding typeⅠerror is controlled well no matter whether LRT is conducted whereas that of Chen's [12] (approximately 0.14) is large without performing the LRT. This is mainly due to that the use of Benjamini-Hochberg procedure that can adjust the significant level in the predictive covariate identification, while it is not clear why the significant level of the Cox model one is directly set to be 0.001 and 0.0025 in Chen's study [12]. In addition, it is found that with the increase of sample size or decrease of censoring rate, Eq_ex combined with Composite C gradually shows its advantage in subgroup selection, but that is unknown in Chen's study [12].
In the real data analysis, classifier based on median and classifier based on change-point algorithm show similar effects, which is probably related to the high censoring rate (78.6%), and the small difference between g+ and gsubgroup sizes. Remarkably, it turns out that the identified responders enjoy a significantly beneficial treatment effect than the non-responders, irrespective of which therapy they received. Besides, the responders in both arms have the same therapeutic effect.
The above simulations and results are based on continuous variables. A future simulation to evaluate the performance with binary variables in our proposed analysis strategy is anticipated.
Besides, the R package mclust is used in the LRT test, where categorical variables are not allowed.
In the future, R package mixtools can be applied to data with both continuous and categorical variables. In addition, the univariate composite model only takes into account individual covariate contributions to the treatment response and it disregards the correlation among the covariates and their possible interaction effects. Possible alternative needs to be developed.
In predictive covariate identification with high dimensional survival data, in which the number of covariates is larger compared to sample size, a penalized AFT model based on the lasso, ridge or elastic net method may be used to screen out less useful covariates in future research. Furthermore, with the development of medical studies, more and more fatal diseases are now curable. Thus, there is an urgency to develop a method to identify cured subgroup patients in treatment group. The mixture cure model is a special type of survival models and it assumes that the studied population is a mixture of susceptible individuals who may experience the event of interest, and cure/nonsusceptible individuals who will never experience the event. It is anticipated to apply the mixture cure model to find out the cure subgroup in future study.