Identification of factors affecting the survival lifetime of HIV+ terminal patients in Albert Luthuli municipality of South Africa

Background: South Africa has the biggest HIV epidemic in the world, with Mpumalanga province in which Albert Luthuli municipality is located having the second highest HIV prevalence rate after KwaZulu-Natal province. The objective of the study was to identify the factors that affect the survival lifetime of HIV+ terminal patients in rural district hospitals of Albert Luthuli municipality. Methods : This is a typical retrospective cohort longitudinal design study whereby cohort of HIV+ terminal patients was retrospectively followed from 2010 to 2017 until a patient died, transferred to another hospital, lost to follow-up or was still alive at the end of the observation period. The follow-up time for each patient started at the time the patient got initiated to the ART programme at the hospital’s wellness centr e. Nonparametric survival analysis and semiparametric survival analysis methods were used to analyse the data. Results : Through Cox proportional hazards regression modelling, it was found that ART adherence (poor, fair, good), Age, Follow-up mass, Baseline sodium, Baseline viral load, Follow CD4 count by Treatment (Regimen 1) interaction and Follow-up lymphocyte by TB history (yes, no) interaction had significant effects on survival lifetime of HIV+ terminal patients (p-values < 0.1). Furthermore, through quantile regression modelling, it was found that short, medium and long survival times of HIV+ patients, respectively represented by the 0.1, 0.5 and 0.9 quantiles, were not necessarily significantly affected by the same factors. Discussion: The Cox PH modelling and the quantile regression analysis complemented each other in answering the research question. However, although the Cox PH modelling was the main approach in this study, the quantile regression analysis results are more informative than the Cox PH modelling results. Conclusion: The study identified and modelled the factors affecting the survival of HIV+ terminal patients in Albert Luthuli Municipality by using Logistic regression, Cox PH regression, and Quantile regression modelling. . Cox regression modelled the factors affecting the survival lifetime of HIV+ terminal patients as: ART adherence, Age, Follow-up mass, Baseline sodium, Baseline viral load and interactions of Follow-up lymphocyte by TB history and Follow-up CD4 by Treatment (Regimen 1). The covariates for the quantile models were selected using an automated ‘ Proc QUANTSELECT ’ procedure in SAS version 9.4. The results from quantile regression modelling the data produced the significant factors with negative significant effects on the 0.1 st quantile of the log (survival time) at the 0.1 level of significance as: poor and fair ART adherence relative to good ART adherence, ln (Follow-up viral load) and Follow-up white blood cell count while Follow-up CD4 and the interaction effect of Follow-up white blood cell count by Treatment (Regimen 1) have positive effects. Significant factors with negative significant effects on the 0.5 th quantile of the log (survival time) at the 0.1 level of significance were WHO stage 3 relative to WHO stage 4, poor and fair ART adherence relative to good ART adherence and Baseline haemoglobin while the males relative females, Carolina hospital relative to Embhuleni hospital, WHO stage 1 relative to WHO stage 4 and Follow-up CD4 have positive effects. Lastly, the significant factors with negative significant effects on the 0.9 th quantile of the log (survival time) at the 0.1 level of significance were poor and fair ART adherence relative to good ART adherence and l n(Baseline viral load) while factors with positive effect are Carolina hospital relative to Embhuleni hospital, Follow-up CD4, Baseline white blood cell count and Follow-up sodium.


Introduction
According to United Nations Agency for International Development (UNAIDS) Gap Report (2016), South Africa has the biggest HIV epidemic in the world, with estimated 7 million (12.7%) living with HIV in 2015. Statistics South Africa Report (2015) indicates a rise in AIDS-related deaths from 29.2% in 2014 to 30.5% in 2015. South Africa's Mpumalanga province has the second highest HIV prevalence rate after KwaZulu-Natal province, and the Gert Sibande district which is in Mpumalanga is leading all districts in the country with 46.1% prevalence rate (Masinga, 2014;Motsoaledi, 2013). The Gert Sibande district HIV prevalence stood at 40.5% in 2011 (Bezuidenhout et al., 2014) and at 38.6% in 2015 (National Department of Health, 2015). The Gert Sibande district has Albert Luthuli as one of its seven municipalities.
South Africa is known to be having the largest antiretroviral treatment (ART) programme globally. The number of patients in Gert Sibande receiving ART up to November 2011 was 32 979 (3.2%) (Bezuidenhout et al., 2014), and this number rose to 76 632 (6.75%) in 2015 (Mpumalanga Provincial AIDS Council, 2016). South Africa ART programme has saved millions of lives and infections averted. However, the dual burden of Tuberculosis (TB) and HIV disease continues unabated. South Africa continues to see an increase in HIV related TB incidences and has not yet felt the impact of HIV prevention programmes. There are some statistical methods such as survival analysis methods that can be used to analyse data for monitoring the survival time benefit of HIV/AIDS interventions, and for investigating potential factors that may affect the survival probability of HIV/AIDS patients.
Survival analysis methods are used to estimate the risk of death or progression of a disease and to provide predictions that help clinicians to estimate trends in their patient outcomes (Nakhae & Law, 2011). On one hand, the methods are used to estimate the time period during which an event (for example, death) can happen, and on the other hand, to estimate the impact of various independent factors on the time distribution to the occurrence of an event (Melnyk et al.,1995).
Despite an array of expensive treatments and preventive interventions used to combat HIV/AIDS in South Africa, the prevalence and death toll due to the disease, still remain too high. Consequently, the broad question is "Are there other factors, that include treatments and 3 preventive interventions, which negatively or positively significantly affect the survival lifetime of HIV/AIDS patients in South Africa despite the availability of ARTs?". The main objective of this study is therefore to use survival analysis methods to identify factors that affect the survival lifetime of HIV/AIDS patients in South Africa (despite the availability of ARTs), using data from Albert Luthuli municipality hospitals. 4

Study design
The cohort of HIV+ terminal patients on ART was followed until a patient died or until a patient was censored as a result of loss to follow-up or transferred to another hospital or as a result of the conclusion of the study. The cohort of patients was divided into children (age ≤ 10 years), adolescents (age 11-19 years) and adults (age ≥ 20 years) as was done in previous studies by Bakanda et al. (2011) and by Moshago et al. (2014). Each stratum was further divided into males and females. The main event of interest in this research was death of an HIV+ terminal patient. The patients who transferred out of the hospital or got lost to follow-up were classified as drop-out patients.

Study variables
The variables which form part of the routine hospital records in Albert Luthuli municipality were used in this study and are described as follows. The length of the survival time of an HIV+ terminal patient was the dependent variable for the study. The categorical independent variables were gender, hospital (Carolina, Embhuleni), WHO stage (1, 2, 3, 4), marital status ( single, married, staying together, widowed/separated/divorced), treatments (regimen 1)(NVP+D4T+3TC, EFV+D4T+3TC, EFV+AZT+3TC and EFV+3TC+TDF), ART adherence (poor, fair, good), transferred from hospital (yes, no) and lost to follow-up (yes, no).
The continuous independent variables for the study which except age were classified into baseline and follow-up variables were mass, CD4 cell count, haemoglobin, lymphocyte, white blood cell count, viral load, creatinine, total protein, sodium and alanine transaminase. Diabetes and hypertension were excluded because their records were found in very few patient files.

Sampling procedures and demographics
An optimum sample size was determined using Monkey Survey online sample size calculator and also using Rao soft online sample size calculator (Rao & Rao, 2004), with a margin of error of 5% or 95% confidence level. Alternatively, considering Leulseged and Ayele (2019), the following statistical assumptions could have had been considered in determining the same sample size: survival probability of HIV patients of 0.85 (from Tadege, 2018), 5% precision or margin error, loss of 0.45 (from Damtew et al., 2015) and 95% level of confidence interval.

5
The application of sample size formula (Eneyew et al., 2016) , where N = sample size, Z = 1.96 (critical value at 95% level of confidence), = proportion of deaths and = type-1 error (0.05) would have yielded the same sample size as 357. The estimated total sample size was proportionally and randomly allocated to the two study sites (Embhuleni and Carolina hospitals with proportions of 79% and 21% respectively) and according to the age and gender proportions as per population proportions.

Basics in survival analysis
In the context of this study, suppose that is a random variable representing the survival time of an HIV+ terminal patient during the observation period. Furthermore, suppose that ( ) and ( ) are the distribution and probability density functions of , respectively. Then, the respective survivor and hazard functions of interest are (Box-Steffensmeir et al., 2004): (1) Furthermore, the respective mean and the median survival times are given by: Also, of interest, can be the cumulative hazard function given by (Klein & Moeschberger, 2003;Kleinbaum & Klein, 2012):

The Kaplan-Meier estimator of the survivor function
In the context of this study, let: 0 < 1 < 2 < 3 < … < < ∞ be observed times of death of patients in the cohort of HIV+ terminal patient during the observation period; 1 , 2 , 3 , … , be the respective number of deaths at each of these times; and 1 , 2 , 3 , … , be the corresponding number of remaining patients in the cohort at the respective times. The Kaplan-Meier (KM) estimator or the Product-Limit estimator of the survivor function ( ) when death times are tied is given by (Etikan et al., 2017;Cleves et al., 2010): The cohort of HIV+ terminal patients in this study is not homogeneous with respect to their characteristics that may affect their survival. Hence, it will be necessary to test the equality of survivor functions among groups of patients. That is, to test the null hypothesis of the form 0 : 1 ( ) = 2 ( ) = ⋯ = ( ) ≡ ℎ 1 ( ) = ℎ 2 ( ) = ⋯ = ℎ ( ), for ≥ 0, where ( ) and ℎ ( ) are the respective survivor and hazard functions of the i th group patients, and is the number of groups. The alternative hypothesis of most interest is that the survival time of one group is stochastically bigger or smaller than the survival time for the other group, and is given by: A vertical gap between two graphs of the survival functions shows that at a given period of time, one group had a higher probability of surviving while a horizontal gap shows that one group took longer to experience an event (Etikan et al., 2017).
The log-rank test is the most commonly used statistical test for comparing survival functions of two or more groups (Clark et al., 2003;Etikan et al., 2017). A very important assumption for the appropriate use of the log rank test (and the Cox PH regression model) is that the hazards are proportional over time and this implies that the effect of a risk factor is constant over time (Sullivan, 2016).

The Cox proportional hazards (PH) model
The objective of the study is to identify factors that affect the survival of HIV+ terminal patients. The Cox proportional hazards (PH) model expresses the patient hazard rates as functions of potential such factors (covariates) as follows. Let = ( 1 , 2 , … , ) be a pdimensional vector of the values of the covariates associated with the i th patient. Then, the Cox proportional hazards regression model is as follows (Klein & Moeschberger, 2003;Etikan & Babatope, 2018): where =( 1 , 2 , … , ) is a p-dimensional vector of regression coefficients to be estimated from the data, and ℎ 0 ( ) is the unspecified baseline hazard function that does not have to be estimated. The hazard model in model (7) makes no assumptions about the shape of the hazard function over time. A hazard function could be constant, increasing, decreasing, or it could be a combination of two or three of these graph trends.
Model (7) can be written in terms of the survivor function (Kleinbaum & Klein, 2012) : The model (7)  The parameters are estimated as values of which maximize the Cox likelihood (also called partial likelihood) function for censored data. The partial likelihood function is as follows (Hosmer et al., 2008;Cleves et al., 2010): where is the group of patients at the risk of death at time . In particular, the maximum likelihood estimate of , which is ̂, is found by iteratively solving the equations ln ( ) | =̂= 0. The popular iteration algorithms are the Fisher's scoring (Storvik, 2011) and Newton-Raphson algorithms (Zhou, 2016). The estimate of the covariance of ̂ is a function of the inverse of the matrix 2 ln ( ) 2 | =̂, and is of the form: where ̂ is a diagonal weight matrix, and is the design/incidence matrix whose i th row is . The standard error of ̂ is the square root of the ℎ diagonal element of model (18).
A graph of the cumulative hazard of the Cox-Snell residuals versus the residuals was used to check the goodness of fit of the model. The Cox-Snell residual for the i th observation from which the cumulative hazard of the residuals is (Ansin, 2015;Cleves et al., 2010): where ̂0 ( ) is an estimate of the cumulative baseline hazard rate which is obtained from the Cox PH model fit when all covariates are set to equal to zero ( i T = for all i). The graph of the cumulative hazard of the Cox-Snell residuals versus the Cox-Snell residuals is expected to approximately follow a 45 0 continuous straight line passing through (0,0) with a slope approximately equal to 1 if the conditional distribution of the cumulative hazard function given the covariate vector is reasonably well approximated by the exponential distribution.

Quantile regression in survival analysis
Recall that ( ) = ( ≤ ) = 1 − ( ) (equation (1) Suppose that the potential factors which affect the survival of HIV+ patients are represented as in the Kaplan-Meier estimator of the survivor function above, and that they are related to the survival time by the regression model: where is the survival time of the i th patient, is an unknown scale parameter and the are independent random errors with an assumed distribution. Model (40) is called an accelerated failure time (AFT) model, from which the quantile regression model corresponding to the Cox regression model in is (Chaudhuri et al., 1997;Peng et al., 2008;Flom & Peter Flom Consulting, 2011): where ( ) is the distribution function of which is derived from the distribution of the , and is the unknown vector of effects of the covariates on the ℎ quantile of the distribution of . may vary with which means measures the effect of covariates not only in the centre but also in the upper and lower tails of the survival time distribution.

Nonparametric inferences about the survivor functions
The cohort of HIV+ terminal patients in this study is not homogeneous with respect to their characteristics that may affect their survival. Hence, it will be necessary to test the equality of survivor functions among groups (strata) of patients. Log-rank tests and the Kaplan-Meier functions presented in this section are for independent factors which are presumed by the researcher to be statistically and clinically significant. Hazard ratios for pairwise comparisons of strata are presented at the end of this section in Table 7.  Table 7 show that patients with poor ART adherence and those with fair ART adherence experience the event (death) earlier than those with good ART adherence. The test statistics in Table 2 show that ART adherence strata in Figure 1 are statistically different ( − < 0.0001). The analysis of hazard ratios shows that patients with poor adherence are about 6 times more likely to die from HIV/AIDS related illness relative to patients with fair ART adherence while patients with poor adherence are about 18 times more likely to die from HIV/AIDS related illness relative to patients with good adherence.

Kaplan-Meier survival functions for ART adherence groups
13  Table 3 indicate that the survivor functions of the Baseline viral load strata are statistically different (p − value < 0.0001). The analysis of hazard ratios in Table 7 show that patients with Baseline viral load between 50 HIV RNA copies/mm 3 and 5000 HIV RNA copies/mm 3 have HIV hazard which is around 87% lower relative to patients with Baseline viral load above 5000

Kaplan-Meier survival functions for Baseline viral load
HIV RNA copies/mm 3 .

Kaplan-Meier survival functions for Follow-up lymphocyte by TB history groups
Patients with Follow-up lymphocyte below normal level and with TB history have the lowest   Table 3: Log-rank and other tests for equality of Baseline viral load survivor functions survival experience, and they experienced the event (death) earlier than other groups (see Figure 4). In addition, patients belonging to this stratum did not reach the end of the study alive.
Patients with normal Follow-up lymphocyte levels and without TB history have the highest survival experience. The survivor functions of the strata in Figure 4 are significantly different as evidenced by the p-values in Table 4 (p − value < 0.0001). The hazard ratios in Table 7 show that patients with TB history relative to those without TB history (at Follow-up lymphocyte below normal) are about 2.8 times likely to experience HIV related hazard (death), while patients with TB history relative to those without TB history (at normal Follow-up lymphocyte ) are about 52 times likely to experience HIV related hazard (death).

Kaplan-Meier survival function estimates for Follow-up CD4 by Treatment (Regimen 1)
In Figure 4,

Kaplan-Meier survival functions for Follow-up CD4 and related results
CD4 cell count is a key independent factor in this study because it remains the best measurement of a patient's immune and clinical status, and supports diagnostic decisionmaking, particularly for patients with advanced HIV disease. Figure 5 shows some relevant strata in CD4 count while Figure 6 shows the bar graphs for Baseline CD4 and Follow-up CD4.    Quantile regression modelling at quantile levels 0.1, 0.5 and 0.9 The covariates for the quantile models were selected using an automated 'Proc

Some quantile process plots to study heterogeneity in the data
A visual comparison of the magnitudes of the effects of each covariate as shown in Figure 6 at quantile levels 0.1, 0.5 and 0.9 suggests the existence of heterogeneity in the data. In this section, heterogeneity is investigated using quantile process plots for (effects versus quantile levels) only selected highly significant covariates Follow-up CD4, ln (Baseline viral load) and Treatment (Regimen 1). These covariates were significant either as main effects or as interaction effects in all of Cox regression, logistic regression and quantile regression modelling.

Survival analysis of the effect of covariates on quantiles of the survival time of patients
According to Lin and Rodriguez (2013)

Discussions
The findings in this study might not adequately compare with some related studies because of the following reasons: i) In the case of Damtew et al. (2015) and Moshago et al. (2014), the focus was mainly on death as the survival threat. This study, in addition to considering death as a threat to HIV/AIDS patients, it also considered patient transfer out of hospital and loss of patients to follow-up as phenomena which affect the long-term success of antiretroviral therapy programme and ultimately leading to increased death cases among HIV/AIDS patients especially in the case of lost to follow-up. ii) The iii) Related studies done by Damtew et al. (2015) and Sieleunou et al. (2008) focused on adult patients and on patients with ages greater than 15 years. This study is related to the study by Moshago et al. (2014) in that the sample was exhaustively handled by dividing it into children, adolescents and adults. This was done with the view to find a solution to all groups at once using the limited resources available.

iv)
Most similar studies which were carried out were limited mostly to a follow-up period of 4-5 years as in the case of Sieleunou et al. (2008), Damtew et al. (2015) and Mlangeni & Senkubuge (2016). This research study spanned over a longer follow-up period of 7.5 years with the view to improve on the quality of the findings.
v) The sample size for this research closely matches the sample sizes for studies by Tadege Furthermore, one of the assumptions used in survival analysis is that patients who drop out or are censored have the same survival prospects as those who continue to be followed. This assumption may theoretically hold, but from the point of view of the practicalities of this study, patients lost to follow-up were defaulters to ART adherence and hence were presumed to be at higher risk to HIV/AIDS than those who were followed continuously.

vii)
Unlike most similar studies carried in South Africa for example by Mlangeni & Senkubuge (2016), and other countries which concentrated most on urban institutions; this study just like the study by Sieleunou et al. (2008), analysed the outcomes of ART in a marginalised rural set-up which presents its own peculiar challenges.
viii) This research focused on complete hospital records only as independent factors, that means any other factors with the potential to affect the survival of HIV+ terminal patients which were not part of the hospital records or were incomplete hospital records (for example records on hypertension and diabetes) were not considered in this research.

ix)
Unlike in similar studies by Damtew (2015), Moshego (2012) and by Shebeshi (2011) where WHO stage emerged as a significant covariate, in this study it was not. The difference, the discrepancy or lack of statistical significance could be due to difference in calibration of the research tools, could be attributed to sample size or could probably be as a result of difference in research designs.
The application of quantile regression in identification of how covariates effects vary across patient quantiles, warrants the need for differential treatment based on lower tail (shorter length of follow-up time of a patient), median (median length of follow-up time of patients) and on upper tail (long length of follow-up time of patients). Thus, the study of quantile process plots may be applied in designing and timing of treatments which suit patients with short, medium or long follow-up times.

22
This study has shown that most patients (61.9%) died within 1 year from ART initiation and this concurs with similar studies by Shebeshi (2011) in which most of the HIV/AIDS patients also died during the first 12 months. In addition, this study has found out that the greatest number of patients died within the interval from 6 months to 1 year. These findings on the distribution of mortality among HIV/AIDS patients are important in the timing of the treatment interventions.
The Kaplan-Meier and hazard ratios in this study have shown that HIV/AIDS male patients have lower survival experience than HIV/AIDS female patients. Men are far more likely to die from an HIV-related illness than women, although women are becoming infected with HIV at a much faster rate (Merab, 2018 have neuropsychiatric side-effects, for example efavirenz (EFV) is commonly associated with insomnia and headache (Reid et al., 2012). In addition, ART may lead to the manifestation of TDF nephrotoxicity as has been mentioned above.
Finally, Cox regression, logistic regression and quantile regression are compared in answering the research question of this study. Logistic regression analysis was performed not to directly answer the question but as an exploratory analysis to determine whether Patient status (censored, uncensored) was associated with the covariates. The Cox PH modelling should be preferred (versus the logistic modelling) as it directly answered the research question. In addition, Cox model should be preferred over the logistic model when survival time information is available and there is censoring. As pointed out by Kleinbaum and Klein (2012), the Cox regression uses more information in form of 'survival times', whereas the logistic regression considers a (0, 1) outcome and ignores survival times and censoring.
The Cox PH modelling and the quantile regression analysis complemented each other in answering the research question. However, although the Cox PH modelling was the main approach in this study, the quantile regression analysis results are more informative than the Cox PH modelling results.