Survival Analysis of Recurrent Events on Women Breast Cancer: The case of Tikur Anbessa Specialized Hospital, Ethiopia

Background: Breast cancer is the most commonly diagnosed cancers worldwide. It is a cancer that develops from breast tissue and most common invasive cancer in women. Recurrent events data have been increasingly important in clinical studies where individuals experience an event more than once and it is a major clinical indicator, which represents the principal cause of breast cancer-related deaths. The aim of the study was to investigate determinants of the recurrence of breast cancer. Methods: To reach the aim, 421 women with breast cancer were included in the study based on data taken from medical record card of patients enrolled starting from 1 January 2013 to 30 January 2019. A retrospective s t u d y has been applied to obtain data on women breast cancer that recorded in oncology department of Tikur Anbessa specialized Hospital. Unmeasured sha red similarities due to the impact of multiple events were modeled using a random effect. Cox-PH model and Shared frailty model were used to identify which factor was significantly affecting the recurrence of breast cancer. Results: From the total o f 997 recurrent events, about 609 (61.1%) of them experienced recurrence of breast cancer . The shared log-normal frailty model was chosen as the best fit for this breast cancer data set based on the value of Likelihood cross validation criterion. From the result of shared lognormal frailty model age, stage of breast cancer, tumor size, histology grade, breast feeding and oral contraceptives were significantly associated with recurrence of women breast cancer. Conclusion: generally shared log-normal frailty model shows that the stage (II, III, IV), tumor size ((3-5) cm, >5 cm), histology grade (poorly differentiated) and oral contraceptive were significantly increases the risk of recurrence of breast cancer. While, breast feeding was significantly decreases the risk of recurrence of breast cancer. It is recommended that p o l i c y maker, ministry of health and Tikur Anbessa Specialized Hospital are expected to make interventions based on these hazardous groups for recurrence of breast cancer. Key w o r d s : Breast cancer, Counting Approach, Recurrent events, Shared frailty model 1. Background of the Study Breast cancer is a cancer that develops from breast tissue and most common invasive cancer in women [1]. Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death among women worldwide, with an estimated 1.7 million cases and 521,900 deaths [2]. It is a major life threatening and has become the major public health problem of great concern and the most common cause of cancer death among women in less developed countries [ 3 ] . This cancer accounts for 25% of all cancer cases and 15% of all cancer deaths among women. Both in developed and developing countries breast cancer is a major health problem account for about onehalf of all cancer cases and more than 324, 300 deaths occurred respectively [4]. Breast cancer is the most common leading cause of cancer death problems in Africa [5]. In sub-Saharan Africa, breast cancer is responsible for one in four diagnosed cancers and one in five cancer deaths i n women. Recent data estimate that in 2012, 94,000 women developed breast c a n c e r and 48,000 died from it in sub-Saharan Africa. It has been estimated t h a t by 2050, the incidence of Breast cancer in Africa will be double of current 2012 estimate [6]. A study done on global burden of cancer showed 2.4 million women were diagnosed With 523,000 related deaths due to breast cancer in 2015 [7]. Approximately 60% of deaths due to breast cancer occur in developing countries [8]. In Ethiopia breast cancer is the most leading cancer occurring among women. It is estimated that around 9,900 Ethiopian women have breast cancer with thousands of more cases unreported as women living in rural areas often seek treatment from traditional healers before seeking help from the government health system. A retrospective study conducted in TASH from 1997-2012 indicates that of total 16,622 new cancer cases registered 3460 were new cases of breast cancer with prevalence of 20.8% and approximately 216 cases per annual [9] .


Background of the Study
Breast cancer is a cancer that develops from breast tissue and most common invasive cancer in women [1]. Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death among women worldwide, with an estimated 1.7 million cases and 521,900 deaths [2]. It is a major life threatening and has become the major public health problem of great concern and the most common cause of cancer death among women in less developed countries [ 3 ] . This cancer accounts for 25% of all cancer cases and 15% of all cancer deaths among women. Both in developed and developing countries breast cancer is a major health problem account for about one-half of all cancer cases and more than 324, 300 deaths occurred respectively [4].
Breast cancer is the most common leading cause of cancer death problems in Africa [5]. In sub-Saharan Africa, breast cancer is responsible for one in four diagnosed cancers and one in five cancer deaths i n women. Recent data estimate that in 2012, 94,000 women developed breast c a n c e r and 48,000 died from it in sub-Saharan Africa. It has been estimated t h a t by 2050, the incidence of Breast cancer in Africa will be double of current 2012 estimate [6]. A study done on global burden of cancer showed 2.4 million women were diagnosed With 523,000 related deaths due to breast cancer in 2015 [7].
Approximately 60% of deaths due to breast cancer occur in developing countries [8].
In Ethiopia breast cancer is the most leading cancer occurring among women. It is estimated that around 9,900 Ethiopian women have breast cancer with thousands of more cases unreported as women living in rural areas often seek treatment from traditional healers before seeking help from the government health system. A retrospective study conducted in TASH from 1997-2012 indicates that of total 16,622 new cancer cases registered 3460 were new cases of breast cancer with prevalence of 20.8% and approximately 216 cases per annual [ 9 ] .

Study Population and Setting
The population of this study was all women with breast cancer who had been registered at TASH starting f r o m 1 s t January 2013 to 30 th January 2019. All the data had been carefully reviewed from the registration log book and patients' registration card; if any inadequate information counters it has been checked from the file and excluded from analysis if proven to be inadequate.
Data structure for modeling recurrent events on breast cancer has been checked carefully to identify first, second, third and additional times of time to recurrence on breast cancer (i.e. recurrent event data). Table below illustrates data structures required for modeling recurrent events of time to recurrence of breast cancer. The time from the start of the follow up to first time to recurrence, second time to recurrence, third time to recurrence and the time to the last censorship in patients who did not have recurrence time of breast cancer was considered to model recurrent event data. On the other hand, data from breast cancer patients with both one and more time to recurrence were accounted when modeling recurrent event data. The time interval for recurrent events of time to recurrence for each patient was given by the difference between two successive recurrence times of breast cancer patient. The response variable for this study is time to recurrence of women breast c a n c e r in months. The following variables were considered for their influence on recurrence of women breast c a n c e r ; age, alcohol, smoking status, residence, histology grade, breastfeeding, treatment taken, stage, oral contraceptives, obesity, tumor size, family history, menopause status and marital status.

Descriptive Statistics
The description of survival data utilizes nonparametric methods to compare the survival functions of two or more groups and Kaplan-Meier plot(s) would be employed for this purpose [10]. The frequency distribution table was also used to summarize the data based on the study variables.

Survival Data Analysis
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs [11].

Cox Proportional Hazard Model
The most frequently applied analysis method for recurrent data is the model by Andersen and Gill (1982) which is based on the common Cox proportional hazards model [12]. The Andersen-Gill model assumes independence between all observed event times irrespective whether these event times correspond to the same/different patients. The assumption of proportional hazards is that the hazard of time-to-event at any given time for an individual in one group is proportional to the hazard at that time for an individual in the other group. Further assumptions of the Cox PH model are: the ratio of the hazard function for two individuals with different sets of covariates does not depend on time, time is measured on a continuous scale, censoring occurs randomly and uninformative censoring.
Modeling recurrent survival data can be carried out using a Cox PH model with the data layout constructed so that each subject has a line of data corresponding to each recurrent event. The model is typically use to carry out the counting Process approach is the standard Cox PH Model. For recurrent survival data, a subject remains in the risk set for more than one time interval until his or her last interval, after which the subject is removed from the risk set [13]. The hazard function for the j th individual in the i th group can be obtained by:- Where 0 ( ) is baseline hazard, is a p x l vector of regression coefficients and Xij is the value of x for the j th individual in i th group i=1… G, and j=1… ni

Checking the Assumption of Proportional Hazards
Schoenfeld residuals are useful to check the proportionality of the covariates over time, that is, to check the validity of the proportional hazards assumption.

Frailty Models
Inference for Cox proportional hazards model was developed under the assumption that the observations are statistically independent, at least conditionally upon covariates. However, this assumption may be violated.
Thus in many epidemiological studies, failure times are clustered into groups such as families or geographical units: some unmeasured characteristics shared by the members of that cluster, such as genetic information or common environmental exposures could influence time to the studied event [14]. In a different context, correlated data may come from recurrent events, i.e. events which occur several times within the same subject during the period of observation. In frailty models, dependence is produced by sharing an unobserved variable which is treated as a random effect, or frailty [15,16].

Shared Frailty Model
A natural extension of the univariate frailty model will be multivariate frailty models where individuals are allowed to share the same frailty value. The assumption of a shared frailty model is that both individuals in a pair share the same frailty Z, and this is why the model is called the shared frailty model. Shared frailty models are appropriate when you wish to model the frailties as being specific to groups of subjects, such as subjects within families. Here a shared frailty model may be used to model the degree of correlation within groups [17].

Shared Gamma Frailty Model
The functional form of the one parameter gamma distribution is given by: where Γ(. ) is gamma function with Laplace transform that is given by: Thus, the expectation and variance of the Frailty variable will be 1 and respectively. The shared gamma frailty model (conditional hazard) for individual j in cluster i is: Where ℎ( ) = 0 ( ) ′ in the Cox regression model for individual j in cluster i. The Zi are independent identically distributed following a gamma distribution, like in the univariate frailty models.

Shared Log-Normal Frailty Model
The log-normal frailty model stems mainly from the link with mixed models, where the standard assumption is that the random effects follow a normal distribution. Let W∼N (0, 2 ) be a normally distributed random effect and let the frailty be given by = .
The corresponding frailty has a lognormal distribution. It is function has the form:- The shared log-normal frailty model has the form:- Where ( )the hazard function is for the j th individual from the i th group 0 ( ) is the baseline hazard at time t, Xij is the vector of p covariates recorded for the individual and is the random effect for the i th group. In this model 0 ( )can be left arbitrary. The wi's, i =1... G are a sample (independent and identically distributed) from a density (. ).

Penalized Likelihood Approach
Semi-parametric hazard models without frailty terms are fitted by maximization of the partial likelihood [14].
For semi-parametric Frailty models, however, we need to account for the contribution of the unobserved frailty terms. An appropriate estimation method could be used to fit semi parametric frailty models that are the expectation-maximization (EM) algorithm and the penalized likelihood approach [18,19]

Choice of Smoothing Parameter
Sometimes it is sufficient to choose the smoothing parameter heuristically, by plotting several curves and choosing the one that seems most realistic. An empirical estimate of the smoothing parameter can be provided or the smoothing parameter can be chosen by maximizing cross-validation [20].
Where: lj is the log-likelihood contribution of individual j.
In this study, the goodness of fit of the Cox and Frailty models is provided by an approximate likelihood cross-validation criterion (LCV) [21]. Likelihood cross-validation criterion (LCV) is approximately equivalent to Akaike's criterion. Lower values of LCV indicate a better fitting model.

Computational Procedure (Algorithm)
The estimated parameters for the models we employed were obtained by the robust Marquardt algorithm which is a combination of the Newton-Raphson algorithm and steepest descent algorithms. It is more stable than the Newton-Raphson algorithm but preserves its fast convergence property near the maximum. The iteration stops when the difference between two consecutive log-likelihoods is small, the coefficients are stable and the gradient is small enough [22].

Assessing Model Adequacy
Regardless of which type of model is fitted and how the variables are selected to be in the model, it is important to evaluate how well the model fits the data. A survival model is adequate if it represents the survival patterns in the data to an acceptable degree. This aspect of a model is known as goodness of fit.

Descriptive Analyses
The data for this study has been taken from 421 women that have received treatments for breast cancer more than one time, at oncology department of TASH, Addis Ababa, Ethiopia, between 1 st January 2013 and 30 th January 2019 were considered. The outcome was time to recurrent event of breast cancer. From the total of 997 events, about 609 (61.1 %) of the experienced recurrence of Breast cancer and the remaining 388 (38.9 %) were censored (see Table 2 of the appendix).

Cox Proportional Hazard Model
First fitted Cox proportional hazard model for each risk factor before proceeding to more complicated models. Variable with p-value less than or equal to 25% in the univariable analysis were considered for multivariable model [24]. Then, the full multivariable Cox proportional hazard model was fitted including all the potential covariates that were significant at 25% level. For multivariable analysis, variables with P-value less than or equal to 5% were selected as significant covariates. The result from the standard Cox PH model is presented in the appendix ( The PH assumption of all variables included in the model was checked using the Schoenfeld residuals. The results show that the covariates are not statistically significant implying that the covariates are time independent because all the p-values are greater than 5%. The overall proportionality test is also not statistically significant implying that the proportionality assumption was not violated (see Table 4 of the appendix).

Shared Gamma Frailty Model
In recurrent event data, subjects may have more than one events of interest. Thus, patients with the same id are considered as correlated. An extension of the Cox model can be considered by taking into account the clustered structure of the data. Thus clustering can be considering as a random effect. Here the main interest is rather in the heterogeneity between subjects. In the shared gamma frailty model, first uni-variable analysis was conducted and significant variables at 25% level of significance were taken to the multiple shared gamma frailty model. Treatment taken, age, stage, tumor size, histology grade, breast feeding and oral contraceptive were significant covariates selected from the saturated multiple shared gamma frailty model. which implies that women with breast feeding were significantly decreases the risk of recurrence of breast cancer.
Finally, observing for women oral contraceptive use, the estimated hazard rate of breast cancer patients using oral contraceptive is 1.251 with [95% CI: 1.06, 1.48] which indicate the expected hazard rate is 1.251 times more than women did not use oral contraceptive and the p-value is small (0.0088) this indicated that a women using oral contraceptive were significantly increases the risk of recurrence of breast cancer.

Shared Log-Normal Frailty Model
Similarly, we conducted uni-variable analysis for the shared log-normal frailty model. The result of univariable analysis indicate treatment taken, age, smoking habit, stage, tumor size, histology grade, breast feeding and oral contraceptive were statistically significant at 25% level of significance. The result from the shared log-normal frailty model is presented on table 6 below. It is observed that age, stage, tumor size, histology grade, breast feeding and oral contraceptive were the only significant covariates selected from the saturated multiple shared log-normal frailty models. Finally, observing for women oral contraceptive use, the estimated hazard rate of breast cancer patients using oral contraceptive is estimated to be 1.250 with [95% CI: 1.05 -1.49] which implies the expected hazard rate is 1.250 times more than women did not use oral contraceptive and the p-value is small (0.013) this implies that women use oral contraceptive is significantly increases the risk of recurrence of breast cancer.
Test hypothesis for the variance term of shared log-normal Frailty term is given by: The variance of frailty term (Sigma Square): 2 = 0.0756(SE (H):0.0086) is significantly different from zero, meaning that there is heterogeneity between the subjects explained by non-observed covariates. We can deduce this by using a modified Wald test: ( 2 )= 0.075618 /0.00864336=8.75, with the critical value for a normal one-sided test. The modified Wald test ( ) is a significance test for the variance of the random effects distribution occurring on the boundary of the parameter space. The usual squared Wald statistic is simplified to a mixture of two distributions and hence the critical values must be derived from this mixture (Molenberghs and Verbeke (2007)). In the case our result have a p-value that is less than 5% for shared lognormal Frailty but not for shared gamma Frailty distribution. This mean that there is a significant Frailty effect, that within subject correlation cannot be ignored for shared log-normal frailty but not for shared gamma frailty.

Comparison of the Cox-PH and Shared Frailty Models
Efficiency of the fitted models was compared using Penalized marginal log-likelihood and LCV (likelihood cross-validation) criterion. The likelihood cross-validation criterion assesses the goodness of fit of a statistical model [21]. In case of parametric approach, LCV is approximately equivalent to Akaike Information criterion (AIC). Lower values of LCV indicate a better fitting model. Table 7 depicts the LCV results of Cox PH, shared gamma Frailty and shared log-normal Frailty models. The shared log-normal Frailty model was chosen as the best fit for our recurrent events data on breast cancer based on the residual analysis and minimum value of LCV. Although the difference in LCV value of the fitted models was negligible, the Cox-Snell residual plot suggested that the shared log-normal Frailty model fits the data better.
The Wald test results indicated that the heterogeneity parameter was significant, implying that there is a significant frailty effect, or that within subject correlation cannot be ignored.

Discussion
The main aim of this study is to identify factors affecting the recurrence of women breast cancer, which was obtained from TASH. The most frequently applied analysis method for recurrent time-to-event data is the model by Andersen and Gill was used to analyze breast cancer data set. The Andersen-Gill model assumes independence between all observed event times [12]. In addition, frailty models used to account for the dependence among the recurrent event times based on Andersen-Gill (A-G) survival model [25,26]. Efficiency of the fitted models was compared using penalized marginal log-likelihood and LCV (likelihood cross-validation) criterion and shared log-normal frailty model found to be the best fit from the Cox PH and shared gamma frailty model [21].
Under uni-variable analysis the shared log-normal Frailty model shows that treatments taken, age, smoking, tumor size, stages of breast cancer, obesity, histology grade, alcohol use, family history of breast cancer, breast feeding and oral contraceptives were significantly associated with recurrence of women breast cancer at 25% level of significance [24].
From result of multivariable analysis of shared log-normal Frailty model the recurrence of women breast cancer were significantly affected by age, tumor size, stages of breast cancer, histology grade, breast feeding and oral contraceptives.
From the result of this study the younger women experienced the greatest hazard rate for recurrence of breast cancer. The study by Dignam in 2009 shows the same results. In addition, the histology grade at diagnosis was significantly affected the recurrence of breast cancer and the hazard rate was high for women in histology grade III as compared to women in histology grade I. This was also indicated by study done [27,28].
Similarly the hazard rate of women with breast cancer for tumor size 3 to 5 centimeters and above 5 centimeters were increase as compared to women with breast cancer of tumor size 2 or below 2 centimeters this implies that tumor size has a significant effect on increasing the risk of breast cancer recurrences [28,29] The stages of breast cancer have significant effect on the recurrence of women breast cancer. The study done by Demicheli in 2010 and Dignam in 2009 also shows that the stages of breast cancer at diagnosis have been significantly affect the recurrence of women breast cancer. From the results of these two studies the hazard rate of recurrence of breast cancer was greatest as the stage increases [27,30].
Oral contraceptives were associated with increasing recurrence of women breast cancer. This implies that oral contraceptives facade a higher risk of breast cancer recurrence. The current study was consistent with the study [31,32]. Furthermore, Women treated for breast cancer that previously breast fed their babies have lower risk of recurrence than those who did not. These findings are consistent with a study conducted by [33].
The model adequacy of shared log-normal frailty model checked by plotting Cox-Snell residuals against cumulative hazard function of residuals and its result shows that the log-normal frailty model fit women breast cancer data well, since the plot makes approximately straight lines through the origin for women breast cancer data set.

Conclusion
The data consisted of 997 observations from a total of 421 patients.