Risk Factors Affecting Survival Time to First Recurrence of Breast Cancer Patients: The case of South West Ethiopia

Background: Among the cancers, next to cervical cancer, in Ethiopia, breast cancer is the second common cancer health problem for women. This article aimed to identify the potential risk factors affecting of time-to-first-recurrence of breast cancer patients in southwest Ethiopia: Jimma Medical Center, Bedelle hospital, hospital, and hospital and Methods : The data were taken from the patient’s medical record that registered from January 2012 to January 2019. The study considered a sample size of 642 breast cancer patients. Different non-parametric and parametric shared frailty survival models employed. Results: Out of 642 breast cancer patients, about 447 (69.6 %) recovered/cured of the disease. The median cure time from breast cancer found 13 months. The lognormal parametric shared frailty survival model predicted that treatment, stage of breast cancer, smoking habit, and marital status significantly affects the time to the first recurrence of breast cancer. Conclusion: Treatment, stages of breast cancer, and marital status were improved while smoking habits worsen the time to first cure of breast cancer. To mitigate breast cancer diseases awareness should be given for the community on identified factors


Background
Breast cancer is amongst the category of non-communicable diseases. Breast cancer is one of the root causes of death among women followed by cardiovascular and cervical cancer diseases.
Both in the developed and the developing world, breast cancer causes the loss of life in women (WHO, 2017).
Worldwide, breast cancer is the utmost frequently diagnosed cancer and the leading cause of cancer death in women. In 2013, the International Agency for Research on Cancer (IARC) report indicated that there were 6.7 million new breast cancer cases and 5.5 million death among females worldwide (Ferlay, 2013). By 2030, the projection showed that the new cases could increase to 8.9 million cases and 5.5 million deaths among females annually as a result of the growth and aging of the population (Ferlay, 2013).
In Africa, breast cancer is held responsible for 28 % of all cancer patients and 20 % of all utmost cancers deaths in women (Brinton et. al., 2014). Correspondingly, it is increasing as the major public health problem in Sub-Saharan Africa (SSA) for the reason that the population mature and development, as well as the factors related to social and economic variables. Nowadays, numerous low-and middle-income countries of African face a twofold burden of cervical and breast cancer, which embodies the top cancer killers in women for the last 30 years old.
Generally, in developed countries, breast cancer is the leading cause of death among women (Ferlay, 2013). However, the mortality rates of breast cancer range from 6 % to 20 % in western Africa (WHO, 2012). The incidence rate of breast cancer is 30.4 in eastern Africa,26.8

in central
Africa, 38.6 in western Africa, and 38.9 in southern Africa (Brinton et al., 2014).
Including Ethiopia and Sub-Saharan countries, breast cancer is the furthermost common cause of cancer death problems in Africa. According to a study done in "Tikur Anbessa" specialized hospital (Tigeneh.et al., 2015), in Ethiopia, breast cancer found to be the second predominant cancer following gynecological malignancy. Breast cancer becomes incurable due to late medical presentation, lack of awareness, limited medical resources, and hospitals. Especially, the symptoms of breast cancer have some strong traditional beliefs that delay biomedical care (Timotewos, 2018). Because of confined access to well-timed and popular treatments, breast 3 | P a g e cancer survival tends to be poor in most of the Ethiopian regions. Notwithstanding the government's concern on a cancer issue: reduce the recurrence, incidence, mortality, and the survival of women, breast cancer is still emergence in Ethiopia.
It is estimated that about 9,900 Ethiopian women have breast cancer. Thousands of more cases unreported as women living in rural areas regularly seek remedy from traditional healers earlier than seeking help from the government health system. A retrospective study conducted in TASH from 1997-2012 indicates that of a total of 16,622 new cancer cases registered 3460 were new cases of breast cancer with a prevalence of 20.8% and approximately 216 cases per annual (Abate et al.,2016).
Traditionally, studies on the survival analysis of breast cancer patients performed using the cox proportional hazards model. However, to fill the research gaps, in this article we advance our methods on analysis using various parametric shared frailty models. The study will expect to predict the status of patients by identifying determinants of risk factors of time-to-firstrecurrence due to breast cancer in southwest Ethiopia.
The term frailty was first suggested by (Vaupel, 1979) in the context of mortality studies, and (Lancaster, 1979) incorporated the frailty concept into a study of duration of unemployment. For a random time-to-data, T , we define the probability density function (pdf) of T as (

Stage II
In stage II, the invasive breast cancer shows one or more of the following characteristics:  The tumor may not be found in the breast. Nonetheless, cancer can be larger than 2mm in length, which found in 1 to 3 axillary lymph nodes (the lymph nodes underside the arm) or in the lymph nodes adjacent to the breast bone (found throughout a sentinel node biopsy).
 The tumor measures less than 2 cm and has spread to the axillary lymph nodes.
 The tumor is between 2cm to 5 cm and has not spread to the axillary lymph nodes has not spread to the lymph nodes with HER2-negative, estrogen-receptor-positive, progesteronereceptor-negative and has an Oncotype DX Recurrence Score of 9.

Stage III
In stage III, the invasive breast cancer shows one or more of the following characteristics:  The tumor can be any size or it may not be found in the breast. But, cancer is found in 4 to 9 axillary lymph nodes or in the lymph nodes adjacent the breastbone (found during imaging tests or a physical exam) or  The tumor is larger than 5cm, small groups of breast cancer cells are found in the lymph nodes or 5 | P a g e  the tumor is larger than 5 cm; cancer has spread to 1 to 3 axillary lymph nodes or to the lymph nodes near the breastbone (found during a sentinel lymph node biopsy)  Still, if the cancer tumor measures more than 5 cm across and is grade 2.
 The cancer is found in 4 to 9 axillary lymph nodes is estrogen-receptor-positive, progesterone-receptor-positive and HER2-positive.
 The tumor may be any size and has spread to the chest wall and/or skin of the breast and caused swelling or an ulcer and may have spread to up to 9 axillary lymph nodes or may have spread to lymph nodes near the breastbone  Still, if the cancer tumor measures more than 5 cm across and is grade 3.
 The cancer is found in 4 to 9 axillary lymph nodes is estrogen-receptor-positive, progesterone-receptor-positive and HER2-positive.
 Inflammatory breast cancer typically features reddening of a large portion of the breast skin, the breast feels warm and may be swollen. The cancer cells have spread to the lymph nodes and may be found in the skin

Stage IV
Stage IV defines invasive breast cancer, which spreads beyond the breast and neighboring lymph nodes to other organs of the body: lungs, chest wall, distant lymph nodes, skin, bones, liver, or brain.
The monthly response variable, i.e., the time-to-first-recurrent from breast cancer obtained by taking the difference of times from the start of registration at the hospital and until the patient recovered or censored.
For time-to-first-event (in our case time-to-first-recurrent) different survival models were used.
In survival analysis, the data are convenient summarizing through estimates of the survival and hazard function.

Accelerated Failure Time (AFT) Model
Models of survival analysis are used to study data in which the "time until the event" is of interest. The time to an event that can be measured yearly, monthly, weekly, daily, etc. can be failure time or survival time (Wogi et al. 2018, Chere et al. 2015, Collet, 1994.
In the AFT models, we measure the direct effect of the explanatory variables on the survival time instead of a hazard. This characteristic allows for an easier interpretation of the results because the parameters measure the effect of the correspondent covariate on the median survival time.
The model works to measure the effect of a covariate to "accelerator to "decelerate" survival vector of regression coefficients (Klein, 2003).
We can consider on a log scale of the AFT model concerning time is given analogous for the classical linear regression approach. In this approach, the natural logarithm of the survival time () y log T  is will be modeled. Let us denote 0 S the survival function by setting then we have the following relationship.
The effects of covariates on the survival function are the time scale is changed by a factor ( ' ) x   and we call this an acceleration factor. We note that, exp( ' ) 1 x   , when 1   , the survival process accelerates. exp( ' ) 1 x   , when, the survival process decelerates.
The survival function of i T can be expressed by i  (Klein, 2003). For each distribution of i  , there is a corresponding distribution forT . In this case, the AFT models can be interpreted in terms of the speed of the progression of a disease. The effect of the covariates is an accelerated time model is to change the scale and not the location of the baseline distribution of survival times.

Estimation of AFT model
AFT models are fitted using the maximum likelihood estimation technique. The likelihood of the n observed survival time 1 ,..., n tt is given by: is a vector of covariates for the th j subject.
The maximum likelihood parameter estimates are calculated by using Newton-Raphson procedure.

Weibull AFT Model
Then the Weibull distribution is a very flexible model for time-to-event data. It has a hazard rate which is monotonic increasing, decreasing, and constant. The survival and hazard function of Weibull model with scale parameter and shape parameter is given by (Klein, 2003): Using the above equation, the proportional hazard representation of the survival function of the Weibull model is given by: Comparing the above two formulas the parameter  ,  , j  in the proportional hazard model can be expressed by the parameter The AFT representation of hazard function of the Weibull model is given by: The log-logistic distribution has a frailty flexible functional form. It is one of the parametric survival time models in which the hazard rate may be decreasing, increasing, as well as humpshaped that is it initially increases and then decreases. In cases where one comes across to censored data, using log-logistic distribution is mathematically more advantageous than other distributions. The log-logistic model has two parameters  and  , where  the scale parameter is and  is the shape of the parameter. Its probability density function (pdf) given by (Bennet, 1983) and (Cox, 1972). (Bennet, 1983): The corresponding survival and hazard functions are given respectively as follows:  , the hazard rate decreases monotonically and 1 K  , it increases from zero to a maximum and then decreases to zero.
The AFT model the hazard function for the th i individual in this case is given as: The log-logistic AFT model with a covariate x may be written as: ' ;  has standard logistic distribution. The survival and the hazard functions, respectively, with covariate x are given as follows: To interpret the factor exp( ' ) x  for the log-logistic model, one can notice that the odds of survival beyond time t for log-logistic model is given by . Therefore, We can see that the log-logistic distribution has the proportional adds property. So, this model is also a proportional odds model, in which the odds of an individual surviving beyond time t are expressed as:  (Collet, 2003).
Then, the AFT representation of log-logistic survival function is given by:

Log-Normal AFT Model
Assuming the survival times are assumed to have a log-normal distribution, the baseline survival function and hazard function are given by (Collet, 2003):  then the log-normal distribution of AFT property to be Where i x is the value of a categorical variable which takes the value 0 in one group and 1 in the other group. This implies that the plot 1 [1 ( )] St t will be linear if the lognormal is appropriate for the given data set.

Multivariate Frailty Models
Frailty models are random effect models for time variables. Thus, the base baseline hazard has a multiplicative effect due to the random effects in the model. The shared frailty model is used with multivariable survival data where unobserved frailty is shared among groups of individuals, and thus a shared frailty model may be thought of as a random effect model for survival data.
The frailty model for univariate data has long been used to account for heterogeneous times-to- n is the total sample size. The hazard rate for the th j individual in the th i subgroup is given by: Where i u frailty terms for subgroups and their distribution is again assumed to be independent with a mean of 0 and a variance of 1. If the number of subjects i n is 1 for all groups, the univariate frailty models obtained (Weinke, 2010); otherwise the model is called the shared frailty model (Dauchateau, 2002) and (Klein, 1992) because all subjects in the same cluster share the same frailty value. In general, we use the notation () ,  is some positive quantity assumed mean of one with variance  , where (Klein, 1992) discuss the ramifications of the assumed distribution of the frailty, whether gamma or inverse Gaussian.

Parameter Estimation for shared frailty model
Parameter estimation for right censored clustered survival data, the observation for subject

Model building
In model building, to select the best subset of the covariates we were used purposeful selection.
Purposeful selection of covariates conducts with the following step: formally, it begins by including covariates that are statistically significant in the univariate analysis. Secondly, include covariates that are considered as more important. Thirdly, use these covariates (those selected out in steps 1 and 2) to fit the multivariable model. Fourthly, select covariates that are statistically significant (p-value < 0.05).

Comparison of Models
In this article, the AIC (Akaike Information Criterion) were used to compare various candidates of parametric shared frailty models (Tesfay, 2014).

Model Diagnostics
After the model have been compared, it is important to check the effectiveness of the model in describing the outcome. The identified accelerated failure model should be linear and goes through the origin with the baseline distribution (Klein, 1992).

The Results
The outcome was time to first recurrent breast cancer. As we observe from

Univariate Analysis
The univariate analysis is applied to select covariate variables that predict the time-to-first recurrence of breast cancer patients before proceeding to the multivariate analysis. The univariate analyses were fitted for each covariate by using different baseline distributions like Weibull, log-logistic, exponential, and lognormal and frailty distribution (gamma and inverse Gaussian) for the data set.  Table 2, the results of a univariate analysis using Weibull with Gamma and Weibull with inverse Gaussian shared frailty distribution shows each covariate variables like age of patients, treatment is taken, stages of diseases, smoking habit, obesity, alcoholic consumption, tumor size, oral contraceptive, and marital status were significantly associated with time-to-first recurrence of breast cancer at 5% level of significance. These significant variables were considered in multivariable analyses.

Multivariable Analysis and Comparison of Models
Six multivariable survival models were fitted with exponential, Weibull, and log-normal as baseline hazard function with gamma and the inverse Gaussian frailty distributions using all the covariates that are significant in the corresponding univariate analysis at 5% level of significance. To compare the efficiency of different models, the AIC was used. It is the most common applicable criterion to select a model. A model having the minimum AIC value was preferred.  Table 3 shows summaries of the AIC (Akaike Information Criterion) results of the two shared frailty models with three baseline hazard functions. Among those models, the inverse-Gaussian shared frailty model with log-normal baseline hazard function has the smallest AIC (Akaike Information Criterion).
That indicates that the inverse-Gaussian shared frailty model with lognormal was relatively the most appropriate model to fit the time-to-first recurrence of breast cancer patients in southwest Ethiopia under the selected covariate variables. Analysis based on log-normal with inverse Gaussian frailty model showed that treatment is taken, stage, smoking habit, and marital status were significant. Women who took radiotherapy and surgery had significantly different curing than the reference group (chemotherapy) with the acceleration factors of ( = 0.77,  = 0.53) and 95% confidence intervals of acceleration factor (0.595, 0.995) and (0.315, 0.897) respectively, did not include 1. The estimated coefficient of the parameters for women who took radiotherapy and surgery was -0.261 and -0.631 respectively. The sign of the coefficients is negative which implies that decreasing the log-logistic of survival time and hence, the shorter expected duration of time to cure breast cancer. Therefore, women who took radiotherapy and surgery had shrunk cure time by factors of 0.77 and 0.53 then the reference group (chemotherapy). But the other treatment taken group was not significantly different from the baseline at a 5% level of significance. The time rate and 95% confidence interval of the acceleration factor of stage IV were 0.53(0.343, 0.818) respectively when compared to the category of stage I as a reference category. This indicates patients with stage IV their survival was reduced by a factor of  = 0.53 compared with patients who were in stage I. The confidence interval of the acceleration factor of smoking habit was (1.896, 4.973), did not include 1, indicating smoking habit is a significant prognostic factor for time cure from breast cancer. Accordingly, women 19 | P a g e who had a smoking habit had prolonged cure time by a factor of 3.07 than the women who had not smoked (reference group). The confidence interval of the acceleration factor of the marital status of the married group is (0.454, 0.935), did not include 1, at a 5% level of significance. This indicates patients of married group survival were shrunk by a factor of  = 0.65 compared with the reference category (single). The value of the shape parameter in our working model (log-normal model) is ρ = 2.168. Since this value is greater than unity the hazard function is unimodal i.e. it increases up to some time and then decreases and The Km curves to the survival and hazard function of time-to-first recurrent of breast cancer is shown in Figure 1. The curve shows that the probability that time to the first recurrence is decreasing with increasing time. The hazard function increases at the beginning a few months and the curve decrease slightly after some months (Figure 1). The heterogeneity in the population of clusters (region) estimated by log-normal was θ = 0.11 and dependence within clusters was about τ = 5% (Table   4).

| P a g e
After the model has been fitted, it is desirable to determine whether a fitted parametric model adequately describes the data or not. Therefore, the appropriateness of the model is measured based on the linearity of the plot of given baseline distribution for the given dataset.
Accordingly, their respective plots are given in Figure 2 and the plot for the log-normal baseline distribution looks straight line than exponential and Weibull baseline distribution. This evidence also strengthens the decision made by AIC value that log-normal baseline distribution is appropriate for the data. A quantile-quantile or q-q plot is made to check if the AFT provided an adequate fit to the data using by two different groups of the population. We shall graphically check the adequacy of the model by comparing the significantly different groups of the covariate variables. All most all of 21 | P a g e the plot appears to be approximately linear as shown in Figure 3. Therefore, the accelerated failure time appears to be the best to describe the time-to-first recurrences of the data set.

Discussion
To select prognosticators that affect time to the first recurrence of breast cancer patients, we were used univariate analysis with different baseline distribution like Weibull, log-logistic, exponential, and lognormal and frailty distribution (gamma and inverse Gaussian) for the data set. All significant variables in the univariate were considered in multivariate analysis. The comparison of different models with their baseline hazard function was performed using AIC criteria, where a model with minimum AIC is the most appropriate model to fit the time-to-first recurrence of breast cancer patients data (Munda et.al, 2012). Therefore, Inverse-Gaussian shared frailty model with lognormal has the smallest AIC. The result of this study showed that there was a clustering effect and heterogeneity in the region in southwest Ethiopia from which 22 | P a g e women with breast cancer came and here we assumed that women coming from the same region share the same factors relating time to the first recurrence of breast cancer. Accordingly, we considered the clustering effect in hazard function.
The hazard function indicates increases more sharply at the early stage and failure occurs at random and failure happened (Duchateau, 2008). The heterogeneity and dependence in the region were estimated to be θ =0.11 and τ=5%. These values are the minimum among other variance effects and Kendall's tau of all the models, which related to the idea of the better the model, the less observed heterogeneity (Banbeta et al. 2015). The time to the first recurrence of breast cancer data set was most appropriate fitted with log-normal depending on model diagnostic (assumption of linearity) which was compared with exponential and Weibull hazard function.
The results obtained from multivariable analysis with the appropriate model Inverse-Gaussian shared frailty model with lognormal showed that treatment is taken, stage, smoking habit, and marital status were the factors that determine the time to first cure of breast cancer.
The estimated median survival time to the first recurrence from breast cancer of women in southwest Ethiopia is found to be 13 months with a 95% confidence interval of [11,14]. This finding is almost similar to (Dye et.al.2012).
Women who took radiotherapy and surgery had shrunk curing time than the reference group (chemotherapy). This finding is supported by (Abadi et.al 2014). But the results obtained by (Kuru, 2008, Hu. 2006 showed that radiotherapy increases the overall survival rate of patients with stage III cancer. Stage III was not significant in our study.
The stages of breast cancer have a significant effect on the time to the first recurrence of breast cancer patients. Patients with stage IV their survival was shrunk by a factor of $\phi=0.53$ compared with patients who were in stage I. This is done by (Demicheli et al., 2010, Dignam et al., 2009. In general, the results of this study showed that for patients with stage IV, their time to cure breast cancer was decreased.

| P a g e
The main objective of this study was to identify risk factors affecting time to the first recurrence of breast cancer patients in southwest Ethiopia using different parametric shared frailty models. 642 women of breast cancer patients from southwest Ethiopia hospitals were included in the study.
The median age of women of the time to cure breast cancer was 13 months. Treatment taken, stage, smoking habit, and marital status were statistically associated with time to first cure of breast cancer.
Awareness has to be given on breast cancer for the society regarding treatment taken, stage, smoking habit, and marital status to mitigate breast cancer.

Ethics approval and consent to participate
Ethical approval to conduct the study and human subject research approval for this study was received from Jimma University, College of Natural Sciences, Research Ethics Committee and the Medical Director of the Hospitals. As the study was retrospective, informed consent was not obtained from the study participants, but data were anonymous and kept confidential.

Consent for publication
Not applicable.

Availability of data and material
The data for this analysis will be provided based on request.