A univariable and multivariable analysis of right censored time-to-event data based on restricted mean survival time: A combination with traditional survival methods.

Background: Many traditional survival analyses and restricted mean survival time (RMST) based analyses have been proposed for dealing with right censored time-to-event data. It is necessary to sort out the conditions and relationship among these methods for instruction. Comparison between hazard ratio (HR) and RMST in a study may promote appropriate understanding and application of the two effect measures. Method : Traditional survival analyses and RMST-based analyses were integrated into a owchart and applied for univariable and multivariable analyses, RMST-based analyses included group comparison for difference in RMST, pseudo-value (PV) regressions, inverse probability of censoring probability (IPCW) regressions with group-specic weights and individual weights with homogeneity of censoring mechanism assumption. 4405 osteosarcomas were captured from Surveillance, Epidemiology and End Results Program Database. HR and difference in RMST (survival time lost or gain, STL or STG) were considered as effect measures and the effect of all included covariates on overall survival were reported for comparison. The relationship between difference in RMST and HR were explored when proportional hazard (PH) assumption was and was not met, respectively. Results: In group comparison and univariable regressions, using difference in RMST calculated by Kaplan-Meier methods as reference, PV regressions (R 2 =0.99) and IPCW regressions with group-specic weights (R 2 =1.00) provided more consistent estimation on difference in RMST than IPCW with individual weights (R 2 =0.09). In multivariable analysis, age (HR:1.03, STL: 3.86 months), diagnosis in 1970~1980s (HR:1.39 STL:27.49 months), metastasis (HR:4.47, STL: 202 months), surgery (HR:0.58, SLG:35.55 months) and radiation (HR:1.46, SLT:44.65 months) met PH assumption and were main independent factors for overall survival. In both univariable and multivariable variables, a robust negative logarithmic linear relationship between HRs estimated by Cox regression and differences in RMST by pseudo-value regressions was only observed when PH assumption


Introduction
Survival data are frequently adopted in prospective and oncological studies which involves both survival time (until the occurrence of an event of interest) and status (event or censor) [1]. This feature introduces a series of special methods called survival analysis. The features of censoring types make survival analyses different. The most common form of censoring is right censoring, which means the event of interest is beyond the end of follow-up. For such data, survival analysis generally starts from data exploration through plotting Kaplan-Meier (KM) curves, and then log-rank test or Wilcoxon test is carried out to compare survival curves. Finally, univariable and multivariable regressions are used to investigate the association between participants' survival time and one or more factors [2]. In both randomized controlled trials and real world studies, there is usually only one factor of interest, such as group variable that estimate intervention effect. Baseline covariates with potential imbalance and covariates with strong or moderate association with primary outcome can be adjusted by multivariable regressions, and unadjusted and adjusted effect measure of the one factor of interested is reported. When focusing on multiple factors, such as risk factors of disease, unadjusted and adjusted effect measure of each factor of interested is reported [3].
In most situations, hazard ratio (HR) from Cox proportional hazard regression (referred as Cox regression) is routinely considered as the preferred and main effect measure in oncology. Note that a key assumption for appropriately using Cox regression is proportional hazards (PH) which requires a constant HR over time [4]. PH assumption should be tested for each included covariate, which may increase workload. When PH assumption is not satis ed, estimated HR from Cox regression is invalid and incorrect conclusion may be achieved [5]. A time-varying HR can be estimated from time-varying Cox regression and parametric survival regression, but its explanation may be tricky. Moreover, the interpretation of HR may be a challenging since it is not an intuitively and clinically summary statistic.
Restricted mean survival time (RMST) has been promoted as an alternative and better summary statistic for survival data when PH assumption is violated [6,7]. Interpretation of one number (RMST) as an outcome shows advantages over the interpretation of time-varying number (HR). It is estimated through calculating the area under the survival curve between 0 and pre-speci c time (τ). RMST is a simple statistic in the presence of non-proportional hazards and can be interpreted in a clinically meaningful perspective. Difference and ratio are two expressions to describe the relationship between two RMSTs, difference in RMST as an absolute effect indicates that patient will live longer (positive) or shorter (negative) and its magnitude presents the size of average gain or lost in life expectancy within pre-speci c time (τ). Generally, unadjusted estimation on difference in RMST is completed adopting Kaplan-Meier method, while adjusted one via standardized survival curves or covariance analysis. In such situation, only one categorical variable of interest can be analyzed, such as interventional group and control group. Pseudo-value (PV) technique and inverse probability censoring weighting (IPCW) technique make it possible to explore the effects of several covariates on RMST in multivariable regression [8,9].
Log-rank test, Wilcoxon test and Cox regression are traditional and common survival analyses, while the estimation of difference in RMST, RMST-based regressions by PV and IPCW are considered as novel and promising model-free methods. Few studies make a summary and comparison among those methods to help researchers to choose correctly and conveniently. The relationship among HR, difference in RMST and coe cients estimated from RMST-based regression should be further discussed. In the current research, our objective is to try to build a clear ow chart for selection appropriate method, and to compare different methods, effect measure of individual covariates on overall survival were reported based on an actual oncological data.

Methods
Design and setting of the study The Surveillance, Epidemiology and End Results (SEER) Program Database was accessed and information on all osteosarcoma cases with complete data on status and survival time from 1970s to 2000s was captured. Cases who died or censored at time 0 were excluded from the analysis because of different processing strategies in general statistical softwares and potential risk of incorrect explanation.

Covariates and outcome measures
Patients' demographic covariates included age, sex, the year of diagnosis and race. According to age, the patients could be categorized into >60years and ≤60 years while in terms of race, they were into white/black and others. Data on tumor included tumor size, the extension of disease, and American Joint Committee on Cancer for staging (AJCC), all of which contained a type of 'UNKNOW'. Tumor size was categorized into >100mm, <100nm and 'UNKNOW'. Moreover, data on whether surgery, chemotherapy and/or radiation were performed were recorded, all of which were classi ed into "yes" and 'no". Patients' outcome variables of interest referred to overall survival, including dead for all causes or alive and survival time in months.

Flowchart for statistical methods
A owchart was designed to integrate conventional survival analyses and RMST-based analyses on right-censored data ( gure 1) and it was divided into two steps: Description and group comparison, univariable and multivariable regressions.
Description should be the rst step to describe survival curves and corresponding statistics for both single group or multiple groups. Graphic description, including KM curves, can be a preferred visualization. Group comparison is suitable for a categorical covariate. Comparing distributions of two or more survival rates is performed by Log-rank test or Wilcoxon test. PH assumption is required for log-rank test, but not for Wilcoxon test [10]. Meanwhile, difference or ratio in areas under survival curve between 2 or more groups can be estimated with pre-speci c truncation time,τ(RMST).
In univariable and multivariable regressions, we consider PH assumption as the rst key step to select suitable regression.
Compared with Cox regression, parametric survival regression required to test and pre-specify probability distribution of survival times, such as Weibull and Exponential distribution, meanwhile, time-dependent covariates (covariates changing over follow-up) and strati ed analysis controlling for nuisance covariates, cannot be incorporated easily. If PH assumption was satis ed, Cox regression would be a preferred and robust method because of its attractive features. If not, two modi ed Cox regression (Cox regression with time-dependent covariates and strati ed Cox regression) and parametric survival regression can be performed instead of standard Cox regression to estimate time-varying HR. RMST-based regression can be an alternative and promising method to deal with non-proportional hazard data, pseudo-value regression (PV) and inverse probability of censoring weighting regression (IPCW). Both of them assumes that censoring distribution is non-informative, where distribution of censoring time does not provide information for distribution of survival time.
In IPCW regression, censoring weights required to be estimated initially, which suggests censoring distribution must be correctly estimated [8]. IPCW method used Kaplan-Meier method to estimate the censoring distribution (modeling censoring mechanism) and conditional probability of having remained uncensored until time t is calculated as weight for each participant. Then, inverse of the estimated weights are included into iterative estimation of coe cients of IPCW regression. Use of Kaplan-Meier method assumes that all subjects share the same censoring distribution, namely homogeneity of censoring mechanism assumption. When a categorical covariate exists and censoring distribution in each group are not the same, it is appropriate to use groupspeci c weights where censoring mechanism keep homogenous in each group and weights were estimated using Kaplan-Meier technique for each group separately. Otherwise, biased weights will be used in coe cient estimation, resulting in incorrect conclusion. There are two methods in IPCW regression, IPCW regression with individual weights and IPCW regression with groupspeci ed weights. If only one categorical variable is of interest and homogeneity of censoring mechanism is questionable, it would be more appropriate to adopt IPCW regression with group-speci ed weights.
Pseudo-values technique has been developed for direct regression modeling survival function, RSMT and competing risks based on right censored data [11]. In PV regression on RMST, jackknife leave-one-out estimation is employed to generate pseudo-values. Firstly, the area under the Kaplan-Meier curve up to time τ is estimated as on the complete data with n sample size, then remove i th subject and repeat the above estimation to get leave-one-out estimator as . Finally, the i th pseudo-values of can be calculated based on the difference between the complete sample and the leave-one-out estimator: . After calculating the pseudo-value for each subject, generalized estimating equation is used to model the effects of covariates on the RMST [9]. Since no weights are estimated from censoring distribution, PV method is not restricted by the homogeneity of censoring mechanism assumption.
PH assumption and homogeneity of censoring mechanism need to check before selecting suitable methods. There are three different methods to evaluate PH assumption: 1) graphical assessment (KM curves and ln (-ln(S(t))) vs ln(t) Curves); 2) add a time interaction, x*log(t), into COX regression and test its signi cance; 3) check global good of tness by plotting and testing association between ranked survival time and Schoenfeld residuals [12,13]. violations of the PH assumption suggests existence of interactions between one or more covariates and time. Homogeneity of censoring mechanism was evaluated applying Kaplan-Meier curves and log-rank test. Note that the event of interest was coded as censoring while others as event to draw Kaplan-Meier plot and conduct log-rank test. The curves showing substantial separation or small P value from log-rank test supported the violation of the homogeneity of censoring mechanism, which suggested censoring patterns of groups were different.

Statistical analysis
The characteristics of the participants were summarized for descriptive statistics. Categorical variables were described using frequencies and percentages, while continuous ones as means with SD or as medians with interquartile ranges when data were skewed. Statistical inference, including conventional survival analyses and RMST-based analyses, complied with the owchart based on statistical knowledge ( gure 1).
Graphics assessment was selected because large sample size may make hypothesis tests in aforementioned method 2 and method 3 be sensitive to little deviation from PH assumption. This assumption would be met when curves in Kaplan-Meier plot did not cross and curves in the log (-log (survival)) versus log of survival time graph paralleled with each other approximately.
For comparing results among these methods, the two assumptions in group comparison and regressions were ignored. Theτwas speci ed utilizing the smallest value of the largest follow-up time across groups minus 5 months. Multiple comparison was employed in group comparison if there were more than 2 groups. Both univariable and multivariable regressions were conducted for Cox regressions and RMST-based regressions. Two modi ed Cox regressions and parametric survival regressions were not applied since estimated time-varying HR may be hard for comparison and explanation in this data. τspeci ed in univariable regression was same as that obtained in group comparison. In multivariable analyses,τwas xed at 480 months (24years).
To explore the in uence of PH assumption on relationship between HR and difference in RMST as well as possible transformation for meta-analysis, we performed data visualization and model tting for the relationship when PH assumption was met or not met. Since HR was expressed as ratio while coe cients based on RMST as difference, a log transformed nonlinear regression was considered when tting the relationship between HR and difference in RMST (difference in RMST l og e (HR)).
All analyses were carried out using the SAS software for windows, version 9.4 TS1M6 (SAS Institute Inc, Cary, NC) with a 2-sided signi cance threshold of P < .05. Description and group comparison were carried out by LIFETEST procedure, and regressions were realized by PHREG procedure, and RMSTREG procedure. Data visualization were conducted in Microsoft Excel 2016.

Results
Information for 4505 patients who met the inclusion criteria was extracted from SEER database. Of those patients, 100 patients with 0 survival time were excluded and 4405 were included for nal analysis. Of 4405, 2389(54.23%) deaths were documented at the end of study during a median follow-up of 35 months. Descriptive characteristics were summarized into table 1. The median age of these patients was 30 years old, and 19.09% of them were older than 60 years. About half of the patients were male (57.05%). The minority of the patients had tumors more than 100mm (4.74%), metastasis (5.93%), III stage (0.61%) and IV stage (8.94%). Furthermore, the majority of the patients experienced surgery (77.80%) and chemotherapy (67.17%), while few cases (14.10%) underwent radiation.

Group comparison and univariable regressions
Assumption tests for all covariates were summarized into table 2 and graphic assessment were collected into supplement. In group comparison (step 1), log-rank test, Wilcoxon test and unadjusted difference in RMST presented similar results at =0.05 in all covariates but sex. Since sex did not satisfy PH assumption, Wilcoxon test was more appropriate and showed no statistical difference between male and female patients. Difference in RMST, however, showed that male patients would live shorter by 24.31 months than female ones signi cantly (P=0.004). See in table 3.
In univariable regressions (step 2), majority of covariates were categorical variables. Using difference estimation in RMST based on Kaplan-Meier method as reference, PV regression and IPCW regression with group-speci ed weights exhibited similar effect (R 2 =0.99 and R 2 =1.00, respectively), but IPCW regression with individual weights showed inconsistent effects (R 2 = 0.09), see in table 3 and gure 2. Even though dichotomous age, sex, race and radiation met the homogeneity of censoring mechanism, IPCW regression with group-speci ed weights gave more consistent and stable estimates than those estimated by IPCW regression with individual weights (table 3 and gure 2). However, both sex, race and chemotherapy turned to be insigni cant in the IPCW regressions with group-speci c weights.
It was noticeable that group comparison in survival rate and IPCW regression with group-speci ed weights were no longer applicable for age as continuous variable. At this point, PV regression indicated that the average survival time was reduced by 4.09 months for each additional year in age (p<0.001). Meanwhile, similar regression coe cient, 4.35, was acquired by IPCW regression with individual weights.

Multivariable regressions with Cox regression and RMST-based regressions
In multivariable regressions, all covariates were included (table 4). IPCW regression with group-speci ed weights was not applicable since more than one categorical covariates broke the homogeneity of censoring mechanism. To compare with Cox regression, interactions between time and covariates that broke PH assumption were not included, their estimated HRs were a sort of average effect over time and may made misleading conclusions.
Compared with results from PV regressions, IPCW regression with individual weights presented great change in both P values and coe cients which suggested that individual weights was not appropriate. Negative difference in RMST was de ned as survival time lost (STL) and positive difference was de ned as survival time gain (STG). Moreover, HR was only presented for those covariates met PH assumption. The overall survival was independently associated with age

Relationship between HR and RMST
Relationship between HR estimated by COX regressions and coe cients estimated by PV regressions were presented in both univariable analysis and multivariable analysis (table3, table 4 and gure 3). In both univariable and multivariable analyses, when covariates met PH assumption, a well-tted and stable logarithmic relationships were observed between HR estimated by COX regressions and coe cients estimated by PV method (R 2 = 0.97 and R 2 =0.92). But for those covariates that broke PH assumption, the estimated HRs were biased, both coe cients and goodness of t of logarithmic relationship changed dramatically, -16.02 vs. -58.11 for coe cients and 0.39 vs. 0.91 for R 2 , respectively.

Discussion
The owchart designed for right-censored survival data could guide the selection and combination of appropriate survival analysis methods (traditional methods and RMST-based methods). The estimation of difference in RMST between groups demonstrated a more intuitive and clinical interpretation than HR, survival time lost and gain. In univariable and multivariable regressions for RMST, pseudo-value regression provided a more robust estimation than inverse probability censoring weighting regression with individual weighs. Although parameter estimates from IPCW regression with group-speci ed weights approximated to those from PV regressions in univariable regressions, statistical power decreased. In both univariable and multivariable regressions, a robust negative logarithmic linear relationship was observed between HRs estimated by Cox regression and coe cients (differences in RMST) by PV regression when PH assumption was hold.
Area under survival curve from time 0 to a pre-speci c follow-up point was called RMST. It can be considered as the average survival time until an event of interest occurs during a de ned period and be estimated without the limitation from proportional hazard assumption. If non-proportional hazards appears, biased estimates will be given by Cox proportional hazard model because HR is inconstant over time; meanwhile, clinical interpretation of HR may not be straightforward for clinicians [14].
Thereby, RMST and difference in RMST show several inferential and clinical advantages over other statistics in survival analysis. RMST and difference in RMST have been accepted by many researchers and wildly used. In a randomized controlled trial, to re ect the convenience of Internet-accessed sexually transmitted infection testing, difference in RMST was tested to assess difference in the time to test [15]. A prospective cohort study exploring the relationship between midlife cardiorespiratory tness and chronic obstructive pulmonary disease performed both Cox models and RMST analysis, estimated HR and changes in RMST of incidence of COPD and presented death simultaneously, which increased the reliability and understandability of the conclusion [16]. In addition to original studies, RMST and difference in RMST have been considered as effect measures in metaanalysis [17,18]. RMST was regarded as an obligatory end point, while HR and difference in RMST as complementary methods to summarize treatment effect in oncological trials [19].
Unadjusted difference in RMST can be easily achieved through calculating the difference in area under the two survival curves based on Kaplan-Meier method. The existence of confounders may distort true relationship between exposure and outcome, confounders-adjusted difference in RMST is essential. Furthermore, precision and statistical power can be improved after adjusting prognostic covariates when compared to RMST [20]. An approach developed by Zucker can estimate covariatesadjusted RMST effect based on Cox regression [21]. But, it was limited to calculate RMST effect of a binary variable of interest (such as experimental group and control group) adjusted for confounders. Firstly, coe cients and model-based covariance matrix are extracted using COX regression, then a covariate-adjusted area under survival curve is constructed for each of the two groups separately, adjusted difference in RMST can be acquired nally.
When the categorical variable of interest has two or more two groups, standardized survival curve can be an alternative way is to obtain adjusted RMST and its difference [22]. This method has two steps: 1) de ning a reference population for the confounders; 2) estimation of adjusted curves for that population. The de nition of reference population is a critical step which determines the generalization of adjusted survival curve. There are two main approaches to compute adjusted curves: marginal analysis (balance data on all confounders using reweighting before modeling) and conditional analysis (modeling a comprehensive overall model and getting average predicted curves from a series of predicted survival curves for any combination of confounders).
Both aforementioned methods must make a clear division of included covariates into effect of interest vs. confounders, and adjusted RMST can be only estimated for the categorical effect of interest. Complex process and di culty in extension limit their friendly applications in RMST estimation. In addition, PH assumption is assumed to be hold as utilization of Cox regression in Zucker's method. Likewise, when Cox regression is adopted in modeling part of standardized survival curves, meeting of PH assumption is also required.
RMST-based regressions, using generalized linear modeling technique, can study the effects of multiple factors or single factor on RMST without limitation on variable type and distinction between key variable and confounding variables. Compared to Cox regression based method and standardized survival curves, RMST-based regression could model RMST directly and facilitate model-based inference and prediction straightforward. Moreover, varied regression-based methods can be adopted, such as nonlinear tting, interaction effect and subgroup analysis.
Two types of direct modeling of RMST are developed. Pseudo-value regression was proposed by Anderson in 2003 to deal with time-to-event data and to make an inference [9]. Codes for pseudo-value method are available for three main platforms (SAS, R and Stata) [23,24]. Another RMST-based regression was based on inverse probability censoring weighting technique proposed by Tian in 2014 [8]. Weights estimation based on correctly speci ed censoring distribution is the key step. In univariable analyses of the current research, point estimation of difference of RMST were similar among Kaplan-Meier method, PV regressions and IPCW regression with grouped weights. When categorical covariate presents in IPCW regressions, individual weights or group-speci ed weights should be chosen based on the result of testing on homogeneity of censoring mechanism [25]. However, in this study, group-speci ed weights gave more accurate estimates than individual weights for both categorical covariates that met assumption and those that didn't meet. Even if categorical covariates satisfy homogeneity of censoring mechanism, censoring distribution in each subgroup can share similar parameters with that in complete data, group-speci ed weights can still work well. So, IPCW with group-speci ed weights may be a preferred method for categorical covariates. When more than one covariates included in multivariable regressions violate homogeneity of censoring mechanism assumption, it is hard to specify multiple strata in group-speci c weights. At this time, forced adoption of IPCW with individual weights presents biased and misleading estimates. In addition, inverse probability may create extreme weight when the probability is close to 0, and small sample size will make inverse probability instable [26]. With such restrictions on IPCW regression, PV regression should be a preferred method in exploring the effects of more than one determinants on survival time in RMST-based regressions.
Ludovic et al reconstructed individual patient data based on survival curves for 54 randomized controlled trials, and HRs, ratio of RMST and differences in RMST were estimated and compared using those data. Accordingly, an agreement on the direction of treatment effect was observed in 50 trials, but neither linear nor non-linear relationship was tted in their study. Furthermore, adjusted RMST cannot be derived based on KM curves in those trials for further comparison [27][28]. In the current study, a stable logarithmic relationship between difference in RMST by PV regression and HR was observed when proportional hazard assumption was held. Transformation between difference in RMST and HR can provide an alternative way to combine the effect size of meta-analysis in evidence-based medicine.
Generally, since a rise in HR corresponds to a reduction in restricted mean survival time, the sign slope of logarithmic transformed HR on difference in RMST is negative. RMST should be typically calculated over a de ned period that has adequate follow-up. Particular time τ speci ed for RMST-based analysis determines the magnitude of difference or ratio in RMST and whether the difference or ratio is statistically signi cant or not. The time-dependent attribute of RMST may make difference in RMST and ratio of RMST change over speci ed τ greatly. Type of end point, such as overall survival or progression-free survival, will also affect the estimation of difference or ratio in RMST. So comparison and transformation of difference and ratio in RMST should be cautious. Fortunately, a step by step tutorial about study design, sample size estimation, and the determination of τwith RMST has been illustrated in details [29].
A strength in this study is that a owchart for method selection has been made and discussed through comparing traditional methods to RMST-based analyses on oncological data with su cient sample size. In addition, a logarithmic-transformed relationship was visualized between HR and difference in RMST with satis ed and unsatis ed PH assumption. A limitation in this study was that details on statistical theories for PV regressions and IPCW regressions were not presented. Not all of survival analyses were presented in the owchart, survival analyses on left-censored data, interval-censored data and competing risks were not discussed here.

Conclusion
In conclusion, the owchart incorporated both traditional survival methods and RMST-based methods for analyzing right censored time-to-event data, and can instruct researchers and clinicians to select appropriate statistical methods for univariable analyses and multivariable analyses. Both PH assumption and the homogeneity of censoring mechanism assumption were two important nodes to determine appropriate selection. HR and difference or ratio in RMST can be reported with equal consideration to improve the communication of clinical evidence.

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication: Not applicable.
Availability of data and materials: Data and SAS code can be available upon request.      Difference in restricted mean survival time estimated from 3 methods in univariable analyses