Bayesian Survival Analysis of Heart Failure Patients: A Case Study in Jimma University Medical Center, Jimma, Ethiopia


 Heart failure is failure of the heart to pump blood with normal efficiency and globally growing public health issue with high death rate over the world including Ethiopia. The aim of this study was to identify factors affecting the survival time of heart failure patients in Jimma University Medical Center. To reach the aim, 409 heart failure patients were including in the study based on data taken from medical record of patients enrolled during January, 2016 to January, 2019. Kaplan Meier plots and log rank test were used for comparison of survival function; Bayesian survival models was used to identify factors affecting the survival time heart failure patients. Of the total patients in the study 164 (40.1%) were died. The estimated median survival time of patients was 31 months. Bayesian log-normal accelerated failure time model fit heart failure data-set better than other Bayesian accelerated failure time models used in this study. From the results of this model shows that the survival time of heart failure patients significantly affected by age, chronic kidney disease, diabetes mellitus, etiology of heart failure, hypertension, anemia, smoking cigarette and stages of heart failure. Bayesian log-normal accelerated failure time model describes the heart failure data-set well. Age group (49 to 65 years and greater than 65 years); etiology of heart failure (rheumatic valvular heart disease, hypertensive heart disease and Other diseases); presence of hypertension; presence of anemic; presence of chronic kidney disease; smokers; diabetes mellitus (type I and type II diabetic); and stages of heart failure (II, III and IV) were prolong the timing death of heart failure patients. The hospital, Jimma University medical center, need to improve public awareness for early detection of heart failure.


Background of the Study
Heart failure is defined as a clinical syndrome; specifically, failure of the heart to pump blood with normal efficiency, characterized by typical symptoms (shortness of breath, persistent coughing or wheezing, ankle swelling and fatigue) that may be accompanied by the following signs (jugular venous pressure, pulmonary crackles, increased heart rate and peripheral oedema) caused by a structural and functional cardiac abnormality, resulting in a reduced cardiac output and elevated intracardiac pressures at rest or during stress. In addition, HF is a syndrome and not a disease, its diagnosis relies on a clinical examination and can be challenging (Ponikowski et al., 2016;Yancy et al., 2013).
Heart failure is global major cause of death and is a rapidly grown public health issue affecting approximately 40 million individuals worldwide and an estimated 287,000 deaths occurred a year, making it the most quickly growing cardiovascular disorder. It is ever increasing prevalence across developed and developing countries resulted as a complications from an increasing aging population (Vos et al., 2015). In the United State of America, the prevalence of HF is nearly 6.5 million, approximately 960,000 new cases of HF are diagnosed each year, the incidence of HF approaching 21 per 1,000 population and also an estimated 1 in 8 deaths in 2017 (Benjamin et al., 2019). The prevalence of symptomatic HF is estimated to 5% of the population, and the mortality is estimated at about 13% in Europe (Huffman et al., 2013).
In Africa, HF has emerged as a major public health problem, imposing enormous pressure on the health care systems; HF is not a disease by itself patients with HF have other causes of death. The sub-Saharan Africa Survey of HF, a prospective multi-center study of HF across the continent, showed that HF is predominantly non-ischemic, most commonly hypertension; HF strikes individuals in sub-Saharan Africa at a much younger age than in the United States and Europe (Damasceno et al., 2012). Similarly, HF is reported to have caused 2.5% of deaths among all age groups in a sampled hospital based mortality in Ethiopia (Misganaw et al., 2014).
In this study, the researchers applied survival analysis since it addresses the limitation of classical regressions like logistic and linear regressions. Most medical studies has been used cox regression model for assessing the survival distribution of heart failure patients, while alternative parametric models including exponential, weibull, log-normal, and, log-logistic model has been used to identify the prognostic factors (Giolo et al., 2012;Hailay et al., 2015). The parametric survival models could provide a more suitable description of the survival data if one is able to identify the distribution of the survival time (Khanal et al., 2014). The parametric AFT models (i.e exponential, weibull, log-normal and log-logistic) has a more realistic interpretation and provides more informative results than Cox-PH model (Qi, 2009). Epidemiologists have documented several risk factors for the development of HF like as age, hypertension (Sheng et al., 2018) and anemia (Ahmad et al., 2017) were increased risk of mortality among HF patients. Factors such as age, sex, stages of HF, hypertension, anemia, and diabetes mellitus has statistically significant effect on the survival of HF patients (Zeru, 2018).
The parametric survival models play an important role in Bayesian survival analysis, since many Bayesian analysis in practice are carried out using parametric AFT models and provide computational advantages. It generates conclusions based on the synthesis of new information from an observed data and historical knowledge or expert opinion. Historical knowledge from past similar studies can be very helpful in interpreting the results of the current study Ibrahim et al. (2001). The Bayesian approach assumes that the observed data is fixed and that model parameters are random. The prior probability distributions represents a powerful mechanism for incorporating information from previous studies and for controlling confounding (Ibrahim et al., 2011). The Bayesian methods combine objective prior knowledge with the information acquired from the data by using Bayes theorem (Gelman et al., 2014).
In this study, Bayesian Survival Models would be used to identify the factors that affecting the survival time of heart failure patients in JUMC, Jimma, Ethiopia so, the interesting application of INLA method of estimation with Bayesian survival models are the most key for the motivation to apply it for the HF data-set under this study. The main aim of this study was to identify the factors that affecting survival time of HF patients. It quests to identify the prognostic factors of HF patients, to determine the best parametric survival models for heart failure data-set, to estimate the survival time of HF patients and to explore the Bayesian AFT models using INLA method.

Statement of the Problem
Heart failure is a serious condition during which the guts is unable to pump enough blood to satisfy the requirements of the body (Lloyd-Jones et al., 2002). In approximately all regions of the world, HF is both common and increasing; it is predicted that the number of patients with HF to increase in countries with aging populations and the leading cause of HF death (Benjamin et al., 2019).
Studies show that HF is extremely increasing in African countries, including Ethiopia (Damasceno et al., 2012;Misganaw et al., 2014). Bayesian approach is the possibility of improving the precision of the results by introducing external information in terms of the prior distribution.
Thus, considering the advantages of Bayesian application is the most key for the motivation to apply it for the heart failure data-set under this study. So, chosen Bayesian Survival Analysis using INLA method to analysis heart failure data-set. Considering HF is a growing problem in the countries hospital based and gaps found with different studies, the researchers was explore the Bayesian Survival Analysis of HF patients in JUMC, Jimma, Ethiopia.
Therefore, this study aims to answer the following scientific questions:- Which factors significantly affect the survival time of heart failure patients?
 What is the estimated survival time of heart failure patients?
 Which parametric survival models is the most appropriate for analyzing the heart failure data set?

Generalized Objective
The aim of this study was to identify factors affecting the survival time of heart failure patients in Jimma University Medical Center, Jimma, Ethiopia using Bayesian Survival Models.

Specific Objectives
The specific objectives of this study were:- To identify the prognostic factors of heart failure patients.
 To estimate the survival time of heart failure patients.
 To determine the best parametric survival models for heart failure data-set.
 To explore the bayesian accelerating failure time models for the heart failure data-set using INLA method.

Significance of the Study
Studying the survival time of HF patients is a mechanism of overcoming the problem of healthy in the society by identifying factors associated with death. On top of this, the result of this study might be used to improve awareness on the factors that trigger the death of HF patients. It also enables to provide scientific information about the finding to ministry of health in Ethiopia that helps policymakers to enhance the awareness of the society about factors that increase the probability of death due to HF which is protect-able and curable if it is screened and treated in its earlier stage with appropriate treatment.

Study Area
The study has been conducted on the data taken from Jimma University Medical Center which is located on Oromia National Regional State, Jimma town 350Km Southwest of Addis Ababa, Ethiopia. JUMC is the only medical center in Jimma zone serving the majority of peoples living in Jimma city and its surrounding.

Study Design and Population
A retrospective study has been applied to obtain data on HF patients that recorded in JUMC, Jimma, Ethiopia. The population of this study was all HF patients who had been registered at JUMC for 3 years starting from first January, 2016 up to first January, 2019. The data has been carefully reviewed from the registration log book and patients registration card; any inadequate information encountered was checked from the file and excluded from analysis if proven to be inadequate. Thus, the data has been collected from patient follow up records based on the variables in the study.

Inclusion and Exclusion Criteria
Inclusion criteria: -All person registered with full information including study variables of interest in the registration book or in the chart were considered to be eligible for the study. The patients was to be included in the study they must take treatment at least for one time from the hospital.
Exclusion criteria: -The patient with insufficient information regarding study variables on the registration book or in the card were not eligible. Thus, the HF patients lost from the study without starting any treatment was not included.

Data Collection Methods
Ethical permission has been obtained from the JUMC, Jimma, Ethiopia. Then secondary data were taken based on data existing in the hospital by trained enumerator and the principal investigator using check list (data extraction form).
Starting Time: -the start time of the interval (in months). Time origin or the beginning of the study, the entry of the survival data would be considered from the day that the heart failure patients starts diagnosis; when the patient first received the treatment.
Ending Time: -the time (in months) at which the event was occurred, when the heart failure patients was died or was lost to follow-up at first January, 2019 (at the end of study). This means that the type of the survival data is right censored.

Variables in the Study
The response variable was survival time of heart failure patients (in months), which defined as the difference between time of diagnosis and time to one of the events "death", "lost to follow up", "dropped out", "stopped", "transferred out to other health centers or hospitals" occurred. Death was considered to be an event of interest. The status variable was coded as 0 for censored and 1 for death. The description of survival data utilizes non-parametric methods to compare the survival functions of two or more groups and kaplan-meier plot(s) would be employed for this purpose (Kaplan and Meier, 1958). The frequency distribution table was used to summarize the data obtained from registration book of patients based on the study variables in JUMC, Jimma, Ethiopia.

Survival Data Analysis
Survival analysis is the statistical analytic method used for modeling and analyzing the data that have a principal end point the time until an event occurs. Survival data are censored in the sense that they did not provide complete information since subjects of the study may not have experienced the event of interest (Aalen et al., 2008).
Survival analysis is well suited for heart failure data-set which are very common in medical research since studies in medical areas have a special feature that follow-up studies could start at a certain observation time and could end before all experimental units had experienced an event.
Right censoring: -occurs to the right of the last known survival time and the observation of patient is terminated before the event occurs. This type of censoring is commonly recognized survival analysis and also considered in this study (Klein and Moeschberger, 2006).

Estimation of Survival Function
The Kaplan-Meier estimator, non-parametric estimator used to estimate the survival function with censoring, which is not based on the actual observed event and censoring times, but rather on the order in which events and censored observations occur (Kaplan and Meier, 1958). Therefore, the Kaplan Meier estimate of the survival function at time , ( ( )) is given by: Where: -denote the set of k distinct death time in the observed in the sample, is the number of subjects alive (at risk) just before time (the j th ordered survival time) and denotes the number who died at time.

Comparison of Survival Function
The Kaplan-Meier plots are used to see whether there is difference in survival times or not between groups of covariate under investigation. But, the plot cannot be used to decide whether the survival time of heart failure patients in each covariate is different or not and log-rank test was used for this purpose (Mantel and Haenszel, 1959).
The test statistic for log rank test is given by: Where: 0 is the number of failure in j th time of 1 st group, 1 is the number of failure in j th time of 2 nd group, is the number of failure in j th time ( 0 + 1 ), 0 is the number at risk at j th time of 1 st group, 1 is the number at risk at j th time of 2 nd group and is the number at risk at jth time ( 0 + 1 ).
The hypotheses to be tested are: 0 : There is no difference between the survival curves.
1 : There is difference between the survival curves.

Bayesian Survival Analysis
Bayesian approach is preferred over the frequentist approach in survival analysis is that the power of information obtained from the approach is much better as it is the combination of likelihood data and prior information about the distribution of the parameter. In addition, Bayesian approach has several advantages over classical methods, it is well known that survival models are generally quite hard to fit, especially in the presence of complex censoring schemes. With the use of the Gibbs sampler and other MCMC techniques, fitting complex survival models is fairly straightforward, and the availability of software like BUGS eases the implementation greatly (Ibrahim et al., 2001). MCMC methods, has some limitation like the burden of time in approximating the posterior and convergence problem (Brooks and Gelman, 1998;Berger, 2013). The main reasons why one might choose to use Bayesian statistics to produce more accurate parameter estimates. In addition, Bayesian statistics one can incorporate uncertainty about a parameter and update this knowledge through the prior distribution (Depaoli, 2014). Bayesian approach is more useful in clinical data analysis over frequentist and suitable data analysis technique for clinical researchers (Bhattacharjee, 2014). Bayesian approach considers the parameters of the model as random variables and requires that prior distributions be specified for them and data are considered as fixed.

Components of Bayesian inference:
Prior Distribution: -( ), It probability distribution used to expresses uncertainty about unknown quantities parameter, before the data are taken into account. It is prior distribution, which is a probability distribution that represents the prior information associated with the parameter of interest (Ibrahim et al., 2001).
it is a likelihood functions, which is a function that gives the probability of observing of the sample data given the current parameters. For set of unknown parameters in the presence of right censoring it can be written as: Where is censoring indicator (0=censored and 1=death) and are the probability density and survival distributions respectively (Ganjali and Baghfalaki, 2012).
Posterior Distribution:-Posterior Distribution is a combination of prior distribution and likelihood using the Bayes rule, likelihood which includes information about model parameters based on the observed data, and a prior, which includes prior information (before observing the data) about model parameters. It is obtained by multiplying the prior distribution over all parameters, , by the full likelihood function, L(θ data ⁄ ) (Christensen et al., 2011). Given by Assuming that is a random variable and has a prior distribution denoted by ( ), then posterior distribution, π(θ X ⁄ ), of is given by: ( ) and thus it involves a contribution from the observed data through L(X θ ⁄ ) and contribution from prior information quantified through ( ). The quantity ( ) = ∫ L(X θ ⁄ ) * ( ) is the normalizing constant of π(θ X ⁄ ), and is often called the marginal distribution of the data or the prior predictive distribution (Ibrahim et al., 2001).
Parametric models play an important role in Bayesian survival analysis, since many Bayesian analyses in practice are carried out using parametric models (Exponential Model, Weibull Model, Log-Normal Model, and Log-Logistic Model). Parametric modeling offers straightforward modeling and analysis techniques (Ibrahim et al., 2001).

Integrated Nested Laplace Approximation (INLA) Method
The Integrated Nested Laplace Approximation Method was used to estimate the parameters in

Bayesian Model Selection Criterion
For Bayesian models, we might prefer the Deviance Information Criteria was used for Bayesian survival model comparison. The preferable model is the one with the lowest value of the DIC (Spiegelhalter et al., 2004). An alternative is the Watanabe Akaike information criterion, (Watanabe, 2010) which follows a more fully Bayesian approach to construct a criterion. (Gelman et al., 2014) claims the WAIC is preferable to the DIC.

Bayesian Model Diagnostics
The most common ways of checking goodness of fit are: Bayesian Cox-Snell residual plot and Predictive Distribution. Model checking and adequacy play an important role in models for survival data. In Bayesian analysis, Chaloner (1991) defined the Bayesian version of the residuals.
The idea on how to classify the sample data and techniques undergone for the application of criticism for predictive distribution (Piironen and Vehtari, 2017).

Descriptive Summaries
The data for this study has been taken from 409 patients receive treatments for HF, at least one Most of, about 54.87% death, HF patients go for treatment into the hospital at later stage and their survival time seems low at this stage. The Kaplan Meier Estimate for some Covariate: Figure 1 (a) below, the overall survival rate at the end of the first year was almost 93.1%, and the overall survival rate at the end of 34 months in this study was 31%, 95% confidence interval was (23.9%, 40.2%). Figure 1 (b) below, indicated that HF patients whose age was below 49 years were at a higher probability of surviving than patients whose age was 49 to 65 years and also patients whose age was greater than or equal to 65 years. The probability of surviving becomes less for the patients whose age was greater than or equal to 65 years.   Hypertension

Checking the Assumption of Cox-PH
As it can be shown in the Table 2 below, the p-values for alcohol consumption, chronic kidney disease and anemia are less than common (5%) level of significance using correlation test (rho) and these shows as the assumption of Cox-PH model was not valid for HF data set. In addition, by looking for global test the assumption of Cox-PH fails since the test result was significant.

Bayesian Survival Data Analysis
As it can be shown in the Table 2, the assumption of Cox-PH model was not valid for heart failure data set; in this case parametric AFT models was used for HF data set. For the HF data set, the time where i = 1, 2, ...., 409 of heart failure patients. Given that = ( 0 , 1 , … . , ) ′ is the vector of coefficients of the covariates considered for analysis, 0 is the intercept and p the number of covariates (p = 13), we assume that all these coefficients have a normal prior with mean 0 and variance 1000. We assume that scale parameter have a gamma prior with shape parameter 1 and inverse scale parameter 0.001, this prior was used for Weibull, Log-normal and Log-logistic distribution for INLA (Akerkar et al., 2010). From Table 3 below, shows that analysis of HF data set for model comparison using INLA method. To compare the efficiency of these different models DIC and WAIC were used and the one with smallest value and seems best fit. Accordingly, Bayesian lognormal AFT model using INLA (DIC= 1297.84;WAIC= 1297.47) found to be the best for survival time of HF patients data-set from the give alternative.

Bayesian Log-Normal AFT Model using INLA method
The final results for the Bayesian log-normal AFT model using INLA method was shown as in Table 4 and as this result shows the survival time of HF patients statistically significantly affected by age, chronic kidney disease, diabetes mellitus, etiology of heart failure, hypertension, anemia, smoking cigarette and stages of heart failure. were other heart disease as compared to ischemic heart disease of HF patients. The 95% CrI for acceleration factor of HF patients for etiology of HF were rheumatic valvular heart disease, hypertensive heart disease and other heart disease did not include one which implies that etiology of HF were rheumatic valvular heart disease, hypertensive heart disease and other heart disease has significant effect on the survival time of HF patients, while the etiology of HF were cardiomyopathy heart disease has not significant effect on the survival time of HF patients. there is no need to perform the more computationally intense full Laplace approximation.
Therefore, simplified laplace approximation was appropriate.

Bayesian Cox Snell Residual Plots
By observing Bayesian cox-snell residual plots figure below, the Bayesian log-normal AFT model best fit HF data-set among the five models, since the plot of Cox-Snell residuals against cumulative hazard function of residuals was approximately a straight line with slope one and Bayesian cox-snell residual plot for Bayesian log-normal AFT model were nearest to the line through the origin. In addition, the plot also indicated that Bayesian log-normal model describes the HF data-set well.  Table 4 shows that the kld for all significant parameters in the Bayesian log-normal AFT model were 0.

Discussions
The main aim of this study was to identify factors affecting the survival time of HF data set, which was obtained from JUMC, Jimma, Ethiopia. Heart failure is a growth problem in the world and the overall prevalence of HF in the adult population in developing countries is 7% -10% with exponential rise with age (Adebayo et al., 2017). The descriptive results of the study indicated that a total of 409 HF patients were included in this study, the minimum and maximum event time observed from HF patients follow up where 6 and 36 months respectively.
In addition, fifty percent of HF patients who receive treatments, survived 31 months or above it.
In this study, among those HF patients, about 59.90% were censored (right censored) and remaining 40.10% were died. This finding was similar to a study conducted by Hailay et al. (2015) shows among those HF patients, 31.3% of them were dead while the rest 68.7% were censored.
The Survival model was applied for this data set. But, the assumption of Cox-PH model was violated. Bayesian approach was applied on parametric AFT models and to compare the efficiency of different AFT models DIC and WAIC were used (Spiegelhalter et al., 2004;Watanabe, 2010).
Bayesian log-normal AFT model was the best model to describe HF data set from the given alternative. This result was similar with study done by Avi (2017 From the result of this study the age group has a significant effect on the survival time of HF patients. In addition, the survival time of HF patients seems less as they gets older (greater than or equal to 65 years) and different studies were also persisted with this results Adebayo et al.
(2017), Zeru (2018), and Sheng et al. (2018). The survival time of HF patients has no hypertension was higher than that of with hypertension and thus, hypertension had a significant effect on HF patients, the studies done by Ahmad et al. (2017), and Sheng et al. (2018) shows the same results.
On other hand, the survival time of smoker HF patients were decreases as compared to nonsmoker which is similar to study done by Ahmad et al. (2017). The survival time of HF patients significantly affected by both type of diabetes mellitus and the expected survival time of HF with both type of diabetes mellitus was less as compared to HF patients without diabetic, this results consistent with studies done by Ahmad et al. (2017), andZeru (2018). In addition, chronic kidney disease was significantly affected the survival time of HF patients and the survival time was high for HF patients do not have chronic kidney disease as compared to HF patients having chronic kidney disease, this results, also confirmed with study by Zeru (2018).
The studies done by Ahmad et al. (2017), and Zeru (2018)  For checking adequacy of the model, the cumulative hazard plots for the Bayesian Cox Snell residuals of the Cox-PH, Exponential, Weibull, Log-normal and the Log-logistic models were plotted as in Figure 3. The plots were more approached to the line in case of the Bayesian lognormal model that indicates the Bayesian log-normal was best in HF data-set. This result was consistent with other study done by Avi (2017). The conditional predictive ordinate and probability integral transform were also used for model checking. Before adequacy checking using graphical methods, it can be important to check whether the usual numerical problem occurred during the computation of conditional predictive ordinate. Thus, since the sum of the number of failure in conditional predictive ordinate was zero, no failure was detected and meaning that no numerical problem has occurred in HF data-set. The histogram and scatter plot of probability integral transform were indicated that the plots of predictive residual based values were to some extent uniformly distributed with some deviated outlier and there is reasonable predictive distribution matches the actual data. This result was persisted with other studies done by Akerkar et al. (2010), and Martino et al. (2011).
The Bayesian log-normal AFT model diagnostic plots including 95% credibility interval shows that the plot of posterior density for the parameters was normally distributed. Similarly, the kullback-leibler divergence is a diagnostic that measures the accuracy of the INLA approximation.
In this study the values of kld for all significant parameters in the Bayesian log-normal AFT model were 0. This indicate that Bayesian log-normal AFT model using INLA method was fast and higher accuracy. This results, also confirmed by other studies done by Martino et al. (2011), and Akerkar et al. (2010).
However, this study was not done without limitation. The study was conducted based on secondary data gathered from registration log book and patients registration card, which might have incomplete and biased information; Lack of published literature's on the countries hospital based related to the survival time of heart failure patients using Bayesian survival models; As different literature pointed out, there are different prognostic factors (Body mass index and Weight) that are assumed to have impacts on the survival time of HF. However, data on those variables could not be available in hospital records.

Conclusions
This study used survival time of heart failure patients data set, for those patients who were receiving treatments for at-least one time in Jimma University Medical Center. Bayesian lognormal AFT model performing better than various parametric models with baseline distribution (Exponential, Weibull, Log-logistic and Log-normal) for this study. Fifty percent of heart failure patients who receive treatments, survived months or above it.
The survival time of HF patients significantly affected by age, chronic kidney disease, diabetes mellitus, etiology of heart failure, hypertension, anemia, smoking cigarette and stages of heart failure. Of all this statistically significant covariates; age (49 to 65 years and greater than 65 years); etiology of heart failure (rheumatic valvular heart disease, hypertensive heart disease and Other diseases); presence of hypertension; presence of anemic; presence of chronic kidney disease; smokers; diabetes mellitus (type I and type II diabetic); and stages of heart failure (II, III and IV) were prolong the timing death of heart failure patients.

Recommendations
Based on the finding of the study it is recommended as follows:- The ministry of health and policy makers should work on awareness by letting to know the risk factors for heart failure.
 The hospital, JUMC, need to improve public awareness for early detection of HF.
 Awareness has to be given for the society regarding smoking cigarette. The mass media can play an effective role in this regard.

Declaration
During conducting the study, the investigators have included the following declarations.