Survival Analysis in the presence of a Cure Fraction Using Data of Dialysis Patients: A Bayesian Approach

Background: The aim of this study was to evaluate the goodness of fit of Bayesian mixture and non-mixture cure models to find the factors affecting dialysis patient’s survival time where a significant proportion of the population has a long-term survival. Study Design: A retrospective cohort study. Methods: The data of 252 dialysis patients were used among whom 35 cases died. Since in this study a part of the population had long-term survival, Bayesian cure models were used and evaluated using DIC index. The data were analyzed by R and Openbugs Softwares. Results: Of the 252 dialysis patients, 136(54%) were males and the mean (SD) age was 53.39 (18.09) years. The patient’s follow-up time mean (SD) was 10.93(7.82) years. The 10 and 20-year survival rate of these patients were 87% and 73%, respectively. The findings show that the best fitting belonged to the Bayesian Non-mixture Cure Model (BNCM) with Dagum distribution. The variables of age, Body Mass Index, dialysis duration, frequency of dialysis, age of onset of dialysis, and occupation affected patients' survival based on BNCM with Dagum distribution. Conclusions: The results demonstrated that the BNCM with Dagum distribution can be a good selection model to analyze survival data, where there is the possibility of a fraction of cure.


Introduction
The advanced phase of chronic renal failure, in which life relies on transplantation or dialysis, is called end-stage renal disease (ESRD) [1]. Due to its chronic and disabling nature, this disorder may have a detrimental impact on patients' quality of life and contributes to decreased social interactions, depression, anger, decreased capacity of an individual to perform independent everyday life activities, and eventually increased mortality. It is necessary to determine the factors that influence the survival of these patients, with regard to the challenges and issues that dialysis patients face [2].
With significant advances in all areas of medical science, a significant proportion of patients with various types of diseases have long-term survival [3]. Thus, in many cases we have survival data that the number of patients experiencing death from a disease is much lower than that of censored cases. In such a situation, if all the events occurred at the beginning of the study and no event happened at the end of the Kaplan-Meier (K-M) curve, cure models can be used to provide a more accurate interpretation of survival data [4].
Cure models are specific types of survival models in which the population is a combination of susceptible cases (those who may experience the event, ie, patients with shortterm survival) and cured/non-susceptible individuals who have long-term survival (those who never experience the event in the follow-up period) [5]. There are 2 main methods of mixture and non-mixture to model the survival data with a cure fraction.
The existence of a long, stable flatness with high censoring rate at the tail of the K-M curve and the adequacy of follow-up time, indicate that the data are suitable for applying cure models [6]. To identify cured individuals, statistical tests can be used to detect patients with long-term survival. The null hypothesis of this test indicates that all people may experience death and there is no cure fraction. To reject or accept the stated hypothesis, critical values that have been set by Maller and Zhou can be used [7,8].
Cure models do not work well when the sample size is small, and the high censoring rate at the end of the K-M curve is one of the favorable features required to use cure models.
However, in the secondary data used in the present study, all events occurred at the beginning of the follow-up, and the K-M curve tail was stable. Still, the number of samples at the end of the study was low (Censoring rate at the end of the survival curve was not high.); thus, Bayesian analysis of cure models was used instead of classical analysis.
However, to date, no study has compared the prediction performance of the Bayesian mixture and non-mixture cure models based on Dagum, Weibull, and log-logistic distributions.
Therefore, in this study, the mentioned models were used and compared. The results of this study may be helpful for future studies and analysis of medical and health-related data, where there is the possibility of a fraction of cure.

Secondary Data
Of the total patients admitted from 2010 to 2016, the data of 252 patients were recorded in the dialysis ward of Bandar Abbas Hospitals, Iran. Mortality was considered as the event of interest and censored cases included those who were alive at the end of the study, excluded cases and those treated with kidney transplant. The survival time of the patients was calculated by years from the onset of dialysis to the end of the study in 2016.
Data were obtained from a given checklist including age, gender, body mass index (BMI), education, age of onset of dialysis, job, blood type, marital status, smoking, history of diabetes, hypertension, renal stones and obstruction, renal cysts and congenital diseases, dialysis duration (hour per session), number of dialysis sessions per week, history of cardiac-respiratory diseases, history of anemia and familial history of chronic renal failure.
In the end, 19 variables, including gender, job (five indicator variables), blood type (three indicator variables), marital status, history of smoking, diabetes and hypertension (all binary), age of censoring or death time, education, dialysis duration, number of dialysis sessions, BMI, age of diagnosis (all continuous), are used in Bayesian cure models with regard to the low frequency in some categories of independent variables.

Ethical considerations
This manuscript has been approved by the Ethics Committee of Kerman University of Medical Sciences at No. IR.KMU.REC.1397.599. Written informed consent was obtained from all the participants.

Mixture cure models
The population is separated into two parts in a mixture cure model. Cured or long-term survivors and survivors that are uncured or short-term. Let the probability of being cured be p (0 < p < 1) and therefore (1 -p) is the probability of being susceptible to an individual [9]. The corresponding t-time survival function is as follows: Where S(t) is the total population's survival function. S0(t) is the baseline survival function for the susceptible individuals for which type I Dagum, Weibull and Log-logistic distributions are assumed in this study. The distributions described below have been introduced.

Non-mixture cure models
The survival function is defined in this case as Where 0 (t) = 1 -S0(t) is the baseline cumulative distribution function for the susceptible people.
To model the cure probability (p) under both mixture and non-mixture cure models, we applied the logistic function.

The type I Dagum distribution
Suppose that survival time for the uncured individuals has the Dagum distribution with three parameters. The This distribution's cumulative distribution function is defined through (for t > 0) Where b and c are positive parameters of shape and the scale parameter is a ( = exp ( ′ ) the covariates can be included in the model through ). Notice that the c=1 case leads to the loglogistics distribution [10].
The third distribution is also assumed to be the Weibull distribution with two parameters , > 0 (S 0 ( ) = (− )), which are the shape and scale parameters respectively.

Priors
The normal prior distributions N(0, 100) were considered in the Bayesian analysis for the vector of the parameter . Also, in Dagum, Weibull, and log-logistic distributions, the uniform prior distribution was assumed for the shape parameters. Prior independence of the parameters was presumed in the model for all scenarios. .

Bayesian Inference
Here, the Bayesian mixture and nonmixture cure models were used based on Dagum, Weibull and log-logistic distributions. By combining the joint prior distribution and the likelihood function, the joint posterior distribution was obtained for the parameters of the model. The Bayes estimates of the parameters were the mean of Gibbs samples, which were drawn from the joint posterior distribution. Convergence of the MCMC algorithm was monitored by history and autocorrelation plots for the simulated samples [9,15]. Inferences were obtained using R and OpenBUGS Softwares.
In addition to the history plot, the Monte Carlo error (MC_error) were used to evaluate the accuracy of posterior estimates for each parameter, which is an estimate of the difference between the actual posterior mean and the estimated posterior mean for each parameter. If the Monte Carlo error value for each parameter is less than 5% of its standard deviation, then, convergence is obtained for that parameter, indicating no need for further simulation samples.

Model Selection
Deviance Information Criteria (DIC), as a measure of the goodness-of-fit, used to compare between the mixture and non-mixture models, where a lower DIC value indicates a better model fit.

Results
Overall, 252 hemodialysis patients were studied, of them, 35 (13.9%) cases faced the event of death and 217 (86.1%) cases were censored. The follow-up time mean (SD) was 10.93 (7.82) years. The 10 and 20-year survival rate of these patients were 87% and 73%, respectively.
Table1 shows the patients characteristics. Based on the DIC index in Table 2, the goodness of fit of the non-mixture models is better than the mixture ones, and belongs to the dagum distributions. The worst performance is related to the Bayesian Weibull cure model (mixture and non-mixture) because the most DICs belong to these two models. The posterior summaries of parameters of non-mixture cure models with type I Dagum distribution have been presented in table 3.
In section short term survival in table3, the 95% credible interval for age of death or censoring time, BMI, duration of each dialysis, frequency of dialysis per week, age of onset of dialysis, and occupation does not include zero. Therefore, these variables have a significant effect on the short-term survival.
Based on this table, by adjusting other factors, for one unit of increase in age, the failure odds is increased by 0.11 (OR=exp(0.105)). for one unit of increase in BMI and duration of each dialysis, the failure odds is reduced by 0.17 (OR=exp(-0.181)) and 0.31 (OR=exp(-0.364)), respectively. For one unit of increase in frequency of dialysis and age of onset of dialysis, the failure odds is increased by 0.28 (OR=exp(0.252)) and 0.09 (OR=exp(0.084)), respectively. The death odds of housewives is 0.17 less than employees (OR=exp(-0.192)).
Also, in section long term survival, the 95% credible interval for age of death or censoring time, BMI and age of onset of dialysis does not include zero suggesting that these variables have significant effect on the long-term survival.

Discussion
A total of 252 dialysis patients with 35 (13.9%) death events were used for this study. Due to the low-event data, use of classical methods, such as Cox, was not possible. Also, all 35 death events (target event) occurred at the beginning of the study, and all cases were censored at the end of the study. However, the number of samples at the end of the study was low (The censoring rate was not high at the tail of the survival curve). Thus, Bayesian cure models were used to analyze the data.
Given the nature of the data of dialysis patients, this study aimed to evaluate and compare the performance of Bayesian cure models with Weibull, log-logistic, and Dagum distribution to analyze these data.
Based on the findings of the present study, the variables of age, Body Mass Index, dialysis duration, frequency of dialysis, age of onset of dialysis, and occupation affected dialysis patients' survival.
The findings of present study revealed that older age was related to higher mortality in dialysis patients. This result is consistent with that of the previous studies [16,17]. Also, similar to the present study, Tsur's study confirmed that higher BMI, was significantly associated with longer survival [17], hence, high BMI is protective in these patients. Further, based on the present study, occupation is one of the most important factors affecting dialysis patients' survival that its effect is confirmed in past studies [18].
Based on the present study, for one unit of increase in the duration of dialysis, the odds of death was decreased by 0.31; the significance of this variable was verified in other studies [19]. Also, increasing one unit in the number of dialysis sessions per week increases the odds of death that is inconsistent with some other studies [20].
The results of this study indicated that the Bayesian non-mixture cure models are superior in compared to the Bayesian mixture cure models, and the results of some previous studies are consistent with those of the present study [15,21].However, The Swain study showed that the mixture cure model has a better fit than the nonmixture cure model based on the generalized Gompertz distribution [11].
Another study has compared mixture and non-mixture cure models based on the generalized modified Weibull distribution by Bayesian approach. Based on DIC values, it was found that the mixture and nonmixture cure models provide very close results [9]. Also, a study aimed to assess mixture and nonmixture cure models that revealed both classes fit the data well [6].
The results of Jafari Koushaki study showed the performance of the non-mixture cure model with log-logistic distribution is better than the Weibull non-mixture cure model, as the value of the DIC is lower for the log-logistic non-mixture cure model [22] that present study confirms it.
the implementation of the Bayesian cure models with the less used distribution of Dagum is the strength of the present study and the lack of more important and more effective variables on patient survival such as adequacy of dialysis, creatinine level, urea, albumin, and hemoglobin is its weakness.

Conclusion
The results of the present study demonstrated that the Bayesian non-mixture cure model with type I Dagum distribution can be a good choice for analyzing survival data, where there is the possibility of a fraction of cure.

Consent for publication
Not applicable: individual information has not been published.   Figure  Fig1. Kaplan-Meier estimate of the overall survival function for the dialysis patient data