A univariable and multivariable analysis of time-to-event data based on restricted mean survival time: A combination with traditional survival methods.

Background Hazard ratio is considered as an appropriate effect measure of time-to-event data. However, hazard ratio is only valid when proportional hazards (PH) assumption is met. The use of the restricted mean survival time (RMST) is proposed and recommended without limitation of PH assumption. 4405 osteosarcomas were captured from Surveillance, Epidemiology and End Results Program Database. Traditional survival analyses and RMST-based analyses were integrated into a flowchart and applied for univariable and multivariable analyses, using hazard ratio (HR) and difference in RMST (survival time lost or gain, STL or STG) as effect measures. The relationship between difference in RMST and HR were explored when PH assumption was and was not met, respectively. In univariable analyses, using in by as reference, pseudo-value regressions and inverse probability of censoring probability more consistent estimation on surgery robust negative relationship estimated by when PH > 100 mm tumor size (HR:1.48, STL:49.56), metastasis and local invasion, higher AJCC staging, surgery (HR:0.58, STG:35.55), radiation (HR:1.46, STL:44.65) and chemotherapy (HR:1.26, STL:49.49).


Results
In univariable analyses, using difference in RMST calculated by Kaplan-Meier methods as reference, pseudo-value regressions (R2=0.99) and inverse probability of censoring probability (IPCW) regressions with group-specific weights (R2=1.00) provided more consistent estimation on difference in RMST than IPCW with individual weights (R2=0.09). In multivariable analysis, age (HR:1.03, STL:

Background
Survival data are frequently adopted in prospective and oncological studies which involves both survival time (until the occurrence of an event of interest) and status (event or censor) [1]. This feature introduces a series of special methods called survival analysis. The features of censoring types make survival analyses different. The most common form of censoring is right censoring, which means the event of interest is beyond the end of follow-up. For such data, survival analysis generally starts from data exploration through plotting Kaplan-Meier curves, and then log-rank test or Wilcoxon test is carried out to compare survival curves. Finally, univariable and multivariable Cox proportional hazard regressions are used to investigate the association between participants' survival time and one or more factors [2]. In most situations, hazard ratio (HR) from Cox regression is routinely considered as the preferred and main effect measure in oncology. Note that a key assumption for appropriately using Cox regression is proportional hazards (PH) which requires a constant HR over time [3]. PH assumption should be tested for each included covariate, which may increase workload.
A number of methods have been developed to assess if data meet the criteria of PH assumption, including graphical analysis (Kaplan Meier curves and estimated survival function with double logarithmic transformation), the use of time-dependent covariates and goodness of fit [4]. When PH assumption is not satisfied, estimated HR is invalid and incorrect conclusion may be achieved [5].
Moreover, the interpretation of HR may be a challenging since it is not an intuitively and clinically summary statistic.
Restricted mean survival time (RMST) has been promoted as an alternative and better summary statistic for survival data when PH assumption is violated [6,7]. It is estimated through calculating the area under the survival curve between 0 and pre-specific time (τ). RMST is a simple statistic in the presence of non-proportional hazards and can be interpreted in a clinically meaningful perspective.
Difference in RMST presents benefit or harm for the participants. Difference indicates that patient will live longer (positive) or shorter (negative) and its magnitude presents the size of average gain or lost in life expectancy within pre-specific time (τ). Generally, unadjusted estimation on difference in RMST 4 is completed adopting Kaplan-Meier plot, while adjusted one via standardized survival curves or covariance analysis. In such situation, only one binary variable of interest can be analyzed, such as interventional group and control group. Pseudo-value (PV) technique and inverse probability censoring weighting (IPCW) technique make it possible to explore the effects of several covariates on RMST in multivariable regression [8,9].
Log-rank test, Wilcoxon test and Cox proportional hazard regression are traditional and common survival analyses, while the estimation of difference in RMST, RMST-based regression by PV and IPCW are considered as novel and promising model-free methods. Few studies make a summary and comparison among those methods to help researchers to choose correctly and conveniently.
Furthermore, it was found that different conclusions were achieved from between estimated HR and difference in RSMT when reanalyzing 54 randomized trials [10]. The relationship among HR, difference in RMST and coefficients estimated from RMST-based regression should be further discussed. In the current research, our objective is to try to build a clear flow chart and to compare different methods based on actual oncological data.

Methods
Design and setting of the study The Surveillance, Epidemiology and End Results (SEER) Program Database was accessed and information on all osteosarcoma cases with non-missing outcome from 1970s to 2000s was captured.
Cases with a survival time of zero were excluded from the analysis because of lacking significant information.

Covariates and outcome measures
Patients' demographic covariates included age, sex, the year of diagnosis and race. According to age, the patients could be categorized into > 60 years and ≤ 60 years while in terms of race, they were into white/black and others. Data on tumor included tumor size, the extension of disease, and American Joint Committee on Cancer for staging (AJCC), all of which contained a type of 'UNKNOW'.
Tumor size was categorized into > 100 mm, < 100 nm and 'UNKNOW'. Moreover, data on whether surgery, chemotherapy and/or radiation were performed were recorded, all of which were classified into "yes" and 'no". Patients' outcome variables of interest referred to overall survival, including dead 5 for all causes or alive and survival time in months.
Flowchart for statistical methods a well-designed flowchart was designed to integrate conventional survival analyses and RMST-based analyses ( Fig. 1) and it was divided into difference section and regression section.
There were two key assumptions located in diamonds that need to check before selecting suitable methods, namely proportional hazard (PH) and the homogeneity of censoring mechanism.
Proportional hazard assumption was inspected employing two graphics. This assumption would be met when curves in Kaplan-Meier plot did not cross and curves in the log (-log (survival)) versus log of survival time graph paralleled with each other approximately. Homogeneity of censoring mechanism was evaluated applying Kaplan-Meier curves and log-rank test. Note that the event of interest was coded as censoring while others as event to draw Kaplan-Meier plot and conduct log-rank test. The curves showing substantial separation or small p value from log-rank test supported the violation of the homogeneity of censoring mechanism, which suggested censoring patterns of groups were different.
Difference analysis was suitable only for categorical variables. Comparing distributions of two or more survival curves was performed firstly. Log-rank test and Wilcoxon test were two common methods. PH assumption was required in log-rank test, but not for Wilcoxon test [11]. Meanwhile, difference in RMST between 2 or more groups was estimated. Truncation time,τ, was specified utilizing the smallest value of the largest follow-up time across groups minus 5 months.
PH assumption was the first key step to select suitable regression. If it was satisfied, cox proportional hazard model would be applied and HR would be calculated as effect measures. If not, RMST-based regressions would be applied. In general, there are two kinds of RMST-based regression, inverse probability of censoring weighting regression (IPCW) and Pseudo-value regression (PV).
IPCW method calculates weights using Kaplan-Meier technique and includes them into iterative estimation; and it implicitly assumes that all subjects share the same censoring mechanism, namely the homogeneity of censoring mechanism assumption. If this assumption is questionable, it would be more appropriate to adopt group-specified weights, in which censoring mechanism was homogenous 6 in each group and weights were estimated for each group. When performing multivariable regression, a series of covariates would be included, but only one categorical variable can be specified for calculating group-specified weights, so IPCW regression with group-specified weights was limited in multivariable analysis at present.
PV method employs jackknife leave-one-out estimation to generate pseudo-values, and then pseudovalues are used to model the effects of covariates on the outcome of interest through generalized estimating equation. PV method is not restricted by the homogeneity of censoring mechanism assumption.

Statistical analysis
The characteristics of the participants were summarized for descriptive statistics. Categorical variables were described using frequencies and percentages, while continuous ones as means with SD or as medians with interquartile ranges when data were skewed. Statistical inference, including conventional survival analyses and RMST-based analyses, complied with a well-designed flowchart based on statistical knowledge (Fig. 1).
For comparing results among these methods, all methods were conducted ignoring the two assumptions in difference analyses and regressions. Multiple comparison was employed in difference analyses if there were more than 2 groups. Both univariable and multivariable regressions were conducted for Cox proportional hazard regressions and RMST-based regressions.τspecified in univariable analyses was same as that obtained in difference analysis. In multivariable analyses,τwas fixed at 480 months (24 years).
To explore the relationship between HR and difference in RMST as well as the influence of PH assumption on such relationship, we performed data visualization and model fitting. Since HR was expressed as ratio while coefficients based on RMST as difference, a log transformed non-linear regression was considered when fitting the relationship between HR and difference in RMST (difference in RMST ~ log e (HR)).
All analyses were carried out using the SAS software for windows, version 9.4 TS1M6 (SAS Institute Inc, Cary, NC) with a 2-sided significance threshold of P < .05. Difference analysis was carried out by LIFETEST procedure, and regressions were realized by PHreg procedure, and RMSTreg procedure.
Data visualization were conducted in Microsoft Excel 2016.

Results
Information for 4505 patients who met the inclusion criteria was extracted from SEER database. Of those patients, 100 patients with 0 survival time were excluded and 4405 were included for final analysis. Descriptive characteristics were summarized into Table 1. The median age of these patients was 30 years old, and 19.09% of them were older than 60 years. About half of the patients were male (57.05%). The minority of the patients had tumors more than 100 mm (4.74%), metastasis (5.93%), III stage (0.61%) and IV stage (8.94%). Furthermore, the majority of the patients experienced surgery (77.80%) and chemotherapy (67.17%), while few cases (14.10%) underwent radiation. Assumption tests of proportional hazard and the homogeneity of censoring mechanism of all covariates on 10 covariates were summarized into Table 2. Dichotomous age, diagnosis year, race, the extend of disease, surgery status and radiation status met proportional hazard assumption, while 8 others failed to pass the test. Dichotomous age, sex, race and radiation met the homogeneity of censoring mechanism, and the resting 7 covariates failed, which implied IPCW regression with groupspecified weights was preferable to IPCW regression with individual weights. Table 2 Assumption test of proportional hazard and homogeneity of censoring mechanism

Curves
Graphical analysis Log-rank test  In Fig. 2 All covariates were included in multivariable regressions. IPCW regression with group-specified weights was limited. P values were highly consistent between COX regressions and PV regressions, but changed chaotically and greatly in IPCW regression with individual weights (  The relationship between HR and coefficients estimated by PV regressions were also fitted. Figure 4 showed that the slope of logarithmic transformed HR on coefficients from PV regressions changed slightly from − 109.3 to -128.2 with stably high goodness of fit (R2 = 0.92) when covariates meeting PH assumption, and the slope changed dramatically from − 16.02 to -57.92 but obtained increased goodness of fit (R 2 = 0.91) when covariates broke PH assumption. However, no significant and stable relationship was observed between HR and coefficients estimated by IPCW regression with individual weights in Fig. 5.

Discussion
Firstly, a flowchart was designed for right-censoring survival data to guide the selection and combination of statistical methods, namely traditional methods and RMST-based methods. The estimation of difference in RMST between groups demonstrated a more intuitive and clinical interpretation than HR, survival time lost and gain. In univariable analysis using RMST-based regressions, pseudo-value regression provided a more robust estimation than inverse probability censoring weighting regression with individual weighs regardless of the homogeneity of censoring mechanism assumption was met or not; meanwhile, although parameter estimates from IPCW regression with group-specified weights approximated to those from PV regressions, statistical power decreased. In both univariable and multivariable variables, a robust negative logarithmic linear relationship was observed between HRs estimated by Cox regression and coefficients (changes in RMST) by PV regression when proportional hazard assumption was hold.
Survival function was created for lifetime data using a Kaplan-Meier method (product limit estimator), the area under survival curve from time 0 to a pre-specific follow-up point was called RMST. It can be considered as the average survival time until an event of interest occurs during a defined period and be estimated without the limitation from proportional hazard assumption. If non-proportional hazards appears, biased estimates will be given by Cox proportional hazard model because HR is inconstant; meanwhile, clinical interpretation of HR may not be straightforward for clinicians [12]. Thereby, RMST shows several inferential and clinical advantages over other statistics in survival analysis. RMST has gradually become an essential effect measure in survival analysis. In a randomized controlled trial, to reflect the convenience of Internet-accessed sexually transmitted infection testing, difference in RMST was tested to assess difference in the time to test [13]. A prospective cohort study exploring the relationship between midlife cardiorespiratory fitness and chronic obstructive pulmonary disease performed both Cox models and RMST analysis, estimated HR and changes in RMST of incidence of COPD and presented death simultaneously, which increased the reliability and understandability of 13 the conclusion [14]. In addition to original studies, RMST and difference in RMST have been considered as effect measures in meta-analysis [15,16]. RMST was regarded as an obligatory end point, while HR and difference in RMST as complementary methods to summarize treatment effect in oncological trials [17]. In conclusion, RMST and difference in RMST have been accepted by many researchers and wildly used. Two types of direct modeling of RMST are developed using generalized linear modeling technique.
Pseudo-value regression was proposed by Anderson in 2003 to deal with time-to-event data and to make an inference [9]. In this method, the pseudo-value of RMST was firstly calculated using jackknife leave-one-out method for each observation, and then a generalized estimating equation method was used to estimate the regression coefficients of covariates using the previous pseudo-values of RMST as dependent variables. Codes for pseudo-value method are available for three main platforms (SAS, R and Stata) [19,20]. Another RMST-based regression is based on inverse probability censoring weighting technique proposed by Tian in 2014 [8]. IPCW regression models the probability of being censored as weights with given covariates and assigns the weight calculated by inverse of probability to each observation when solving the regression coefficients. Fortunately, both PV regression and IPCW regression on RMST have been integrated into RMSTREG procedure in the latest version of SAS9.4 [21]. In univariable analyses of the current research, the estimated difference of RMST was identical between Kaplan-Meier method, PV regressions and IPCW regression with grouped weights. It is impossible to specify more than one group to estimate group-specific weights, and only IPCW regression with individual weights is available in multivariable analysis when more than one covariates violate the homogeneity of censoring mechanism assumption. The estimated coefficients, however, were quite different from those obtained from PV regressions in multivariable analyses.
IPCW regression with grouped weights is limited to categorical variables, and testing the homogeneity of censoring mechanism is not suitable for continuous variables. In addition, inverse probability may create extreme weight when the probability is close to 0, and small sample size will make inverse probability instable [22]. With such restrictions on IPCW regression, PV regression should be a preferred method in exploring the effects of more than one determinants on survival time in RMSTbased regressions.
Ludovic et al reconstructed individual patient data based on survival curves for 54 randomized controlled trials, and HRs, ratio of RMST and differences in RMST were estimated and compared using those data. Accordingly, an agreement on the direction of treatment effect was observed in 50 trials, but neither linear nor non-linear relationship was fitted in their study. Furthermore, adjusted RMST cannot be derived based on KM curves in those trials for further comparison [23]. In the current study, a stable logarithmic relationship between difference in RMST by PV regression and HR was observed when proportional hazard assumption was held. Transformation between difference in RMST and HR can provide an alternative way to combine the effect size of meta-analysis in evidence-based medicine.
Generally, since a rise in HR corresponds to a reduction in restricted mean survival time, the sign slope of logarithmic transformed HR on difference in RMST is negative. Particular timeτspecified for RMST-based analysis determines the magnitude of difference in RMST and whether the difference is statistically significant or not. RMST should be typically calculated over a defined period that has adequate follow-up because of its time-dependent attribute. The type of end point, such as overall survival or progression-free survival, will also affect the estimation of difference in RMST. So comparison and transformation between difference in RMST and HR should be cautious. In addition, a step by step tutorial about study design, sample size estimation, and the determination of τwith RMST has been illustrated in details [24].
A strength in this study is that a flowchart for method selection has been made and discussed through comparing traditional methods to RMST-based analyses on oncological data with sufficient sample size. In addition, a logarithmic-transformed relationship was visualized between HR and difference in RMST no matter PH assumption is met or not. A limitation in this study was that details on statistical theories for PV regressions and IPCW regressions were not presented. Not all of survival analyses were presented in the flowchart, such as parametric survival analyses and survival analyses with interval-censored data, which were not discussed here.

Conclusion
In conclusion, HR and difference in RMST should be routinely reported with equal consideration for time-to-event data in prospective studies, which can improve the communication of clinical evidence.
Furthermore, difference in RMST can make the interpretation of clinical results more straightforward than HR. The flowchart will instruct researchers and clinicians to select appropriate statistical methods for univariable analyses and multivariable analyses with the consideration of both PH assumption and the homogeneity of censoring mechanism assumption. PV regression can be a preferred method for RMST-based regressions.   Relationship between hazard ratios and differences in restricted mean survival time in multivariable analyses