Improved Confidence Intervals for Fixed Term Survival Probabilities in a Small Two-Arm Trial

16 Background 17 The confidence interval for survival probability at a fixed time point provides valuable 18 information on how the subject performs in terms of survival rate. However, in a two-arm trial 19 when the sample size in each group is small or when the distribution of events that occurred 20 within the group is skewed, the confidence interval might become very unstable, and thus may 21 not provide accurate information for estimating survival rate. In addition, when there are other 22 covariates available in the dataset, it is important to select those significant variables and include 23 them in the model. On the other hand, researchers such as physicians who pay more attention to 24 the final result often analyze the treatment group and control group separately, which may lead to 25 inaccurate prediction. 26 Methods 27 In this study, two treatment groups are combined, and the group indicator variable is considered 28 as a covariate and is included in the model for computation. Yuan and Rai’s adjusted effective 29 sample size methods are further extended along with Cox proportional hazard model, Weibull 30 model, and log-logistic model to compute predicted fixed-term overall survival probabilities and 31 corresponding confidence intervals with other covariates adjusted. Simulations are conducted to 32 obtain coverage probability.

The data used in this paper come from a randomized clinical trial conducted by the Radiation 88 Therapy Oncology Group [7]. The dataset is publicly available, and therefore, neither ethical 89 approval nor informed consent is needed for our study. The entire trial contains data from 15 90 sites with 16 participating institutions; however, in this paper, only the data on three sites with 91 the six largest institutions will be used. At the beginning of this study, 193 patients were 92 randomly assigned into two treatment groups. Group one (only radiation therapy) has 99 patients 93 with 27 censored subjects. Group two (radiation therapy with a chemotherapeutic agent) has 94 94 patients with 26 censored subjects. Other variables including sex, age, condition, T-staging, and 95 N-staging. Summary statistics can be found in Table 1.  To have a good understanding of the data, survdiff in R is used to calculate whether or not there 103 is any difference between the two treatment groups. It appears that according to the p-value 104 (p=0.3), the two treatment groups are not significantly different. In this case, a closer look at the confidence interval becomes necessary. Zhu et. al [8] evaluate several test procedures for 106 survival functions comparison when data is interval-censored and the distribution of censoring is 107 unequal. Similar approach could be tested for right-censored data in future research. 109 Kaplan-Meier curve can be easily produced with the help of R. Its confidence interval can also 110 be obtained. Note that the default method for R in calculating confidence interval is the 111 Greenwood (log) method, which can be treated as a Wald confidence interval, and has been 112 proven not to be robust regardless the size of the sample [9]. The Kaplan-Meier estimate S(t) is 113 ̂( ) = ∏(1 − ). =1 (1) and are the number of deaths and the number of patients at risk at time respectively. The 114 R-code is as follows:  118 Brown et al.

Agresti-Coull-Peto
[9] recommend the Agresti-Coull interval when the sample size is greater than 40. It 119 is a score interval and appears to be a better way to calculate the confidence interval. Moreover, 120 Yuan and Rai further suggest that the combination of Agresti-Coull interval with Peto's adjusted 121 effective sample size provides better coverage probability [1]. To construct the AC confidence 122 interval, the formula can be written as: where ̃= + 1− /2 2 /2 + 1− /2 2 and 1− /2 is the critical value at 95% confidence level [10].

124
Here M is defined as the number of estimated events. Also, the sample size n needs to be 125 adjusted by using Peto's effective sample size. n will be replaced by [11]. can be easily 126 obtained by is defined as the number of observations that remain at risk at time t divided by the survival 128 probability at t. Replace n with in equation 2 and it will generate the new Peto's adjusted 129 confidence interval.

130
The R code is as follows: The survival probability at a fixed time point can be carried out as where is the scale parameter and is the shape parameter. The confidence interval can also be The survival probability at a fixed time point can be carried out as where is the scale parameter and is the shape parameter. To calculate confidence intervals,  Therefore, the confidence interval for the survival function can be written as Similarly, flexsurvreg can be used to obtain survival probability and confidence interval. Same 228 approach that were used in Cox regression and Weibull to fix variable at certain level can also be 229 used in log-logistic model. R code as follow:

246
Of the 193 patients, approximately 27% are censored ( Table 1). The proportion of events within  (Table 3). In terms of coverage probability, AC-Peto has better coverage than Kaplan-

263
Meier but is close to Wilson-Peto in the early stage, whereas, and in the later time, AC-Peto has 264 the best coverage among all methods (Table 4).

265
To better predict survival outcome, significant covariates must be taken into consideration. are relatively close among all levels (Table 7). Therefore, combining level 1 and level 2 becomes 277 reasonable. In the variable Condition, the distribution of subjects is skewed wherein level 1 has 278 around 73% of the total, but level 3 and level 4 have only 3% and 0.5%, respectively (Table 7).

279
Taking a closer look at the distribution of events, due to the small sample size of level 4, the 280 proportion will be either 100% or 0% in this case. This situation does not provide much valuable 281 information, and based on the definition of this variable, people with a higher level of the 282 condition tend to have a higher risk. It is therefore reasonable to see that the proportion increases 283 from level 1 to level 3. In this study, level 3 and level 4 are combined for computation. In group 284 1, log-logistic has the highest survival probabilities at all time points (Table 8). Weibull has 285 relatively higher survival probabilities than Cox in the later stage but vice versa in the early 286 stage. In terms of confidence intervals, at 3-months, the log-logistic interval is around 15% 287 shorter than Cox, and about 21% shorter at 6-months ( least coverage probabilities at all tested time points (Table 11). Log-logistic has slightly better 290 coverage than Cox at 3-months, 6-months, and 12-months, but not at 18-months. In group 2,

291
Weibull and log-logistic have higher survival probabilities at most time points (Table 8).

292
Confidence intervals follow the same pattern as group 1 (Table 9). In terms of coverage 293 probability, log-logistic has the best improvement at all time points (Table 11). Weibull has 294 better coverage than Cox at earlier stages but becomes worse at later stages.

295
Further comparisons were made between semi/parametric models and AC/Wilson-Peto methods.

296
As seen in Table 10, in both groups semi-parametric and parametric models produce shorter 297 confidence intervals than AC/Wilson-Peto methods in earlier stages, but in the long term, basic 298 models tend to perform better. In terms of coverage, semi-parametric and parametric models 299 produce better coverage than basic models only at 3-months in group 1 (Table 12). Survival 300 curves for Kaplan-Meier, AC-Peto, and Cox regression can be found in Figure 2. All methods 301 compared to Kaplan-Meier can be found in Figure 3 and Figure 4. In most cases, semi-302 parametric and parametric models produce shorter confidence intervals in the early stage, but the 303 pattern does not hold for later stages. Similarly, coverage is higher at early stages for semi-304 parametric and parametric methods but becomes worse in the long term.

306
This paper illustrates the group effect with other covariates adjusted in survival calculation. This 307 method can also be expanded to three or more groups. It can further be expanded to determine if 308 making group as a covariate will benefit large sample size as well. The method provides more 309 important and more accurate information to researchers as well as clinicians when making a 310 survival prediction. Note that when distributions of subjects and events are skewed in certain 311 variables, it is important to determine the best way to combine levels within the variable. There 312 are many ways to combine levels, such as making it into two blocks. The best way always 313 depends on the dataset.

314
In this paper, the predictive survival probability is used rather than directly obtaining the result 315 from data analysis. The reason for doing so is that for a fixed term estimation, it is more accurate. For example, one event occurred at 9-months, and the next event occurred at 13months, if we want to find fixed term survival rate at 12-months (1-year), we know that it is the 318 same between 9-month and 13-month because it is a stepwise function. But the interval, in this 319 case, is very wide. Suppose there is no event at 12-months, then the survival function is zero, but 320 the size of the risk set is not. In order to make sense of the data, we need to calculate the 321 estimated event at 12-months, and then find its corresponding survival probability and 322 confidence interval.

323
In this paper, only 3-month, 6-month, 12-month, and 18-month survival are tested. The reason 324 for doing so is that 1-year survival is a normal clinical indicator for many terminal illnesses, thus, . This could also be tested in the future.

342
In summary, we have examined six methods for predicting overall survival probabilities and 343 confidence intervals. Coverage probabilities for each method are obtained through simulation. In 344 this paper, we combined both two treatment groups and labeled the group indicator variable as a 345 covariate. We included group covariate in all of our parametric and semi-parametric models. Our 346 aim was to see if grouping has any impact on the model. We also wanted to see which method 347 will provide the best predictive estimation with this improved confidence interval calculation 348 method. Our overall aim is to provide a guideline on how basic survival data should be analyzed.