Spatial Models for Infants HIV/AIDS Incidence Using an Integrated Nested Laplace Approximation Approach

Background: Kenya has made signiﬁcant progress in the elimination of mother to child transmission of HIV through increasing access to HIV treatment and improving the health and well-being of women and children living with HIV. Despite this progress, broad geographical inequalities in infant HIV outcomes still exist. This study aimed at assessing the spatial distribution of HIV amongst infants, areas of abnormally high risk and associated risk factors for mother to child transmission of HIV using INLA and SPDE approach. Methods: Data were obtained from the Early infant diagnosis (EID) database that is routinely collected for infants under one year for the year 2017. We performed both areal and point-reference analysis. Bayesian hierarchical Poisson models with spatially structured random eﬀects were ﬁtted to the data to examine the eﬀects of the covariates on infant HIV risk. Spatial random eﬀects were modelled using Conditional autoregressive model (CAR) and stochastic partial diﬀerential equations (SPDEs). Inference was done using Integrated Nested Laplace Approximation. Posterior probabilities for exceedance were produced to assess areas where the risk exceeds 1. The Deviance Information Criteria (DIC) selection was used for model comparison and selection. Results: CAR model outperformed similar competing models in modeling and mapping HIV Relative Risk in Kenya. It had a smaller DIC among the rest (DIC = 306.36)) The SPDE model outperformed the spatial GLM model based on the DIC statistic. Highly active antiretroviral therapy (HAART) and breastfeeding were found to be negatively and positively associated with infant HIV positivity respectively [-0.125, 95% Credible Interval (Cred. Int.)= -0.348, -0.102], [0.178, 95% Cred. Int. -0.051, 0.412]. Conclusion: : The study provides relevant strategic information required to make investment decisions for targeted high impact interventions to reduce HIV infections among infants in Kenya. 2017 early infant diagnosis (EID) program data collected routinely by the Ministry of Health to investigate the spatial patterns of HIV amongst infants in Kenya. The covariates considered were maternal prophylaxis and infant breastfeeding. HEI are tested through EID which gives an opportunity for early identiﬁcation of HIV and linkage to care and treatment services. The facility-level data comprised of 68,600 PCR tests types (Initial, 2nd, 3rd and conﬁrmatory), PCR test results, sex and age of infant, infant and maternal prophylaxis, mother HIV status, infant breastfed, Entry point, testing laboratory, date samples received, tested and dispatched collected in the 47 counties in Kenya. The analyses were re-stricted to infants under one year born to HIV positive women. The data were aggregated to provide county level summaries that were used for areal analysis. The facility level EID data was used for geostatistical analysis.


Background
Globally, there has been a remarkable progress in elimination of mother to child HIV transmission. There was a decline in the number of new infections among children from 290,000 (250,000-350,000) in 2010 to 150,000 (110,000-190,000) in 2015, reflecting the scale-up of coverage of prevention of mother-to-child transmission services (PMTCT) [1]. In Sub-Saharan countries, new HIV paediatric infections declined marginally between 2010 to 2015. A decline of 66% was reported in Southern and Eastern Africa while 31% was reported in Western and central Africa [1]. Despite these achievements, concerted efforts are required to reach the UNAIDS targets by 2020 [2]. This can be achieved through ensuring all pregnant women living with HIV receive lifelong ART, scale up the provision of early diagnosis, treatment optimization and care for HIV exposed infants to prevent mother to child transmission of HIV [3]. Elimination of mother to child transmission (eMTCT) of HIV would directly contribute to the attainment of Sustainable Development Goal targets of reduction of maternal mortality ratio, ending preventable deaths of newborns and children under 5 years and ending the AIDS epidemic [4]. Kenya is one of the 22 priority countries focused for reduction of mother-to-child Transmission (MTCT) of HIV and has been chosen to be validated for the preelimination of MTCT of HIV and Syphilis by 2021. An estimated 6,613 infants were infected with HIV through mother to child transmission in 2015 [5]. This is a decline in mother to child transmission rates from 16% in 2013 to 8.3% in 2015 [5]. These gains are attributed to the increase in HIV treatment coverage for women and infants. The government of Kenya through the Ministry of Health (MOH) established the Prevention of mother to child transmission of HIV (PMTCT) program as part of continuing strategies for dealing with the epidemic and interventions to reduce MTCT of HIV. The goal of the program is to reduce the rate of MTCT of HIV to less than 5% and reduce maternal mortality by 50%.
In Kenya, the early infant diagnosis (EID) program, under the umbrella of the PMTCT program, is responsible for making HIV diagnosis in HIV exposed infants (HEI) and young children under 18 months of age. Polymerase chain reaction (PCR) is the most common virological test used in PMTCT settings for HIV exposed babies and there are seven laboratories nationally with capacity to conduct PCR testing of HIV. Despite the marked progress in elimination of mother to child transmission (eMTCT) of HIV, few counties in Kenya are still registering high number of infections. New HIV infections amongst infants exhibits marked geographical disparities with counties contributing disproportionately high number of new infections annually. To accelerate and achieve a drastic reduction in the new HIV infections, focussed efforts needs to be directed to the non performing counties. Disease mapping has developed immensely in the recent years due to availability of geo-referenced data with advances in computing, geographical information systems (GIS) and statistical methodology. The interest lies in providing estimates of the relative risks of a disease across a geographical study area, assessing clustering and clusters of disease and assessing geographical distribution of disease in relation to potential risk factors (spatial regression). Bayesian methods which offer a flexible and robust approach are increasingly being utilized in disease mapping. In the last few years, MCMC methods have boosted the implementation of fixed effects and hierarchical models particularly in spatial and spatial temporal field [6]. Despite the progress made in Bayesian computing, Markov Chain Monte Carlo (MCMC) methods are not without potential problems. It's samplers involve computationally and time intensive simulations especially for high dimensional models such as hierarchical models. Furthermore, Parameter estimation might be impossible and the algorithms may induce large Monte Carlo standard errors if they're not run for many iterations [7]. Additionally, MCMC methods present issues with convergence of the algorithm to the posterior distribution as well as choice of prior distributions. Analysis of large datasets with vast level of spatial disaggregation could lead to long computation time to perform Bayesian inference via MCMC. Cross validation tests and sensitivity analyses might be impractical because of the computational demands that come with MCMC. This can translate to poor interpretation of the results. The Integrated Nested Laplace Approximation (INLA) proposed by Havard Rue [8] provides a alternative approach to handling these complexities and providing precise and consistent estimates within a short computational time. INLA is designed for latent Gaussian models ranging from generalised linear (mixed) models, generalised additive (mixed) models, geoadditive models and time series models. The development of R package named R-INLA has proved valuable for the implementation of INLA. INLA can be integrated with the Stochastic Partial Differential Equation (SPDE) approach proposed by [9] to execute spatial and spatio-temporal models for geostatistical data. There has been a gradual increase in the utilization of these methods in analysis of epidemiological and public health data particularly in spatial and spatial temporal models. This paper utilizes spatial models to assess the association between maternal and infants covariates and HIV sero-conversion among HEI using INLA.

Study area and Data
The study was conducted in Kenya. The country is divided into 47 administrative units (counties). We used 2017 early infant diagnosis (EID) program data collected routinely by the Ministry of Health to investigate the spatial patterns of HIV amongst infants in Kenya. The covariates considered were maternal prophylaxis and infant breastfeeding. HEI are tested through EID which gives an opportunity for early identification of HIV and linkage to care and treatment services. The facilitylevel data comprised of 68,600 PCR tests types (Initial, 2nd, 3rd and confirmatory), PCR test results, sex and age of infant, infant and maternal prophylaxis, mother HIV status, infant breastfed, Entry point, testing laboratory, date samples received, tested and dispatched collected in the 47 counties in Kenya. The analyses were restricted to infants under one year born to HIV positive women. The data were aggregated to provide county level summaries that were used for areal analysis. The facility level EID data was used for geostatistical analysis.

Statistical analysis Variables
The study aimed to find out the relationship between infant HIV count test results and important covariates consisting of Highly active antiretroviral therapy (HAART) and proportion of infants breastfed. We excluded infant prophylaxis since it is highly correlated with HAART.

Statistical models
We developed Bayesian hierarchical Poisson regression models to investigate the spatial heterogeneity of HIV across the counties and assess the effects of the selected covariates. For the i th district, the count of HIV cases y i is modelled as where the mean θ i is the risk of infection and E i is the expected number of infants infected with HIV in district i, i = 1,2,...,47. The linear predictor is defined as: where β 0 denotes HIV outcome rate for all the 47 counties, β m quantify the effect of the covariates t = (t 1 , ...., t m ) and f l (.) are the spatial random effects that capture spatial variation modelled using conditional intrinsic autoregressive model (ICAR) prior distribution [10]. The ICAR model is given as The conditional expectation of random effect φ i is the average of the effects of its neighbours. The conditional variance depends on its number of neighbours n i . An area with many neighbours will have a smaller variance.
To investigate the association between the outcome variable and the covariates, we fitted four Poisson models to the county level data : Bayesian inference was done in R software [11] using Integrated Nested Laplace approximation (R-INLA) package. A stochastic partial differential equation (SPDE) with INLA was employed to estimate the posterior marginals in the geostatistical analysis.
Model comparison and selection was done using the deviance information criterion (DIC) that takes into account the trade off between model fit and complexity [12]. The best model was given by the model with the smallest value of DIC.
In the SPDE model, the Gaussian random field with its covariance function was represented as a Gaussian Markov random field. A non-stationary model was achieved by modifying the SPDE to obtain the Gaussian Markov random field with a defined dependence structure that is different from the stationary Mtern covariance. The local nature of differential operators was used to allow for local specification of the range and variance parameters. Table 1 presents the summary statistics of the variables in the data. In total, of the 68,600 HIV exposed infants who provided their blood samples for PCR HIV testing, 2,363 (3.4%) turned positive for HIV. Of the 2,363 HIV positive women, 86.33% breastfed their infants while the remaining 13.24% did not breastfeed their infants. Majority (77.06%) of these women accessed the health facility through the MCH/PMTCT. 61.32% of the infants who turned out positive were on prophylaxis while 63.98% of their mothers were on maternal prophylaxis. 81.17% of the infants who turned HIV positive had their first diagnosis after 2 months.

Areal data analysis
The results of fitting the hierarchical models are shown in table 2. Based on these results, we deduce that the model with spatially structured random effects (Model 3) offered a better fit (DIC 306.36). Increased use of HAART is associated with mother to child transmission of HIV. -0.802 (95% credible interval:-2.19, 0.60). Breast feeding is positively associated with mother to child transmission of HIV 0.558 (95% credible interval: -1.622,-2.778). The results however suggest that the covariates are not significant. Figure 1 shows the relative risk map of the counties based on the best fitting model. The relative risk ranges from 0.71 to 1.71. Counties that were predicted to have high risk of Infant HIV infection were Makueni, Lamu, Turkana, Marsabit, Samburu, Mombasa, Nairobi, Narok, Taita Taveta, Kilifi. Bungoma, Embu, Muranga, Garissa, Tana river, Elgeiyo Marakwet, Homabay and Nyandarua were associated with lower risk of infant HIV infection (RR < 0.8). The exceedence probability map in Figure 2 displays counties with relative risk above the national risk (RR > 1). This map confirms counties associated with high risk of HIV infection (darker color). The distribution of spatial random effects (Fig 3) revealed strong spatial patterns at multiple scales. According to these patterns, the risk of HIV infection is associated with living in the following counties; Makueni, Turkana, Marsabit, Lamu. This map shows a similar pattern to the relative risk map signifying counties highly affected by spatially structured random effects. Clustering of risk and elevated risk can be observed in the North-west and south west counties of Kenya.

Geostatistical analysis
The variogram (Figure 4) obtained from the Poisson GLM indicates presence of spatial correlation in the data. Table 3 displays the posterior estimates of the Poisson GLM and the SPDE model. The best fitting model was the SPDE model (DIC 5591.732). Consistent with the results of the areal analysis, HAART was found to be negatively associated with infant positivity whereas breastfeeding was positively associated with infant positivity. The associations were however not significant. Figure 5 shows a map of the sampled locations and the selected mesh is shown in Figure 6. The triangulation produced a mesh with 1249 vertices. To avoid the boundary effects, we used a mesh that extends the study region. The choice of mesh is a trade-off between the GMRF representation accuracy and the computational costs [13]. The map of the spatial field (Figure 7 top left) reveals that the spatial random effects causes an increase or decrease in the expected disease counts in specific regions. The spatial pattern of the posterior mean of the latent field and of the mean response are similar whereas the latent field has a higher variability than the mean response.

Discussion
In this work, deterministic Bayesian approaches (INLA and INLA-SPDE) were used for analysis of areal and point reference data to assess the spatial distribution of HIV amongst infants, risk factors for mother to child transmission and to produce continuous maps of the positivity rates. We employed INLA to fit Bayesian hierarchical spatial models within the R library INLA. CAR model was identified to be suitable for modeling and mapping relative risk of HIV amongst infants in Kenya. The findings revealed that breastfeeding increases mother to child transmission of HIV. Previous studies [14,15], noted a substantially higher risk of infection among breast fed infants within the first months of breastfeeding compared to later months. A high early transmission rate might be explained by the milk which is rich in HIV infected cells and the immaturity of the infant's immune system [16]. In resource limited settings complementary feeding increases the risk of morbidity and mortality from infectious disease. Until recently, WHO recommended that HIV positive mothers breastfeed exclusively for the first six months and continue breastfeeding with appropriate complementary foods for at least 12 months while taking their antiretroviral to reduce risk of post-natal transmission. [17] showed that perinatal transmission can be greatly reduced in breastfeeding populations on antiretroviral therapy. The results of the study indicated that the use of HAART by the mother reduces the transmission rate of HIV. The use of HAART clearly stands out as a key determinant of MTCT risk as has been consistently reported in many studies [18][19][20]. The risk of HIV infection among infants was found to be high in Makueni, Lamu, Turkana, Marsabit, Samburu, Mombasa, Nairobi, Narok, Taita Taveta, Kilifi counties suggesting the need of geographical prioritization of infant HIV prevention interventions to optimise reduction of new HIV infections. The random effects map revealed residual variation suggesting unaccounted variation after including the covariates in the model. Additionally, the spatial patterns in the random effects suggested possible covariates omitted from the model. INLA which is a promising alternative to the MCMC is a recent methodology for Bayesian inference used in hierarchical models. The numerical inference approach has gained popularity due to the fast and accurate estimates to the posterior distributions produced by the methodology. This alleviates one of the most important bottlenecks associated with MCMC which is computationally intensive especially for models that are complex in nature. We employed INLA-SPDE approach to perform Bayesian inference on spatial hierarchical Gaussian fields. This model was applied to produce continuous predictions of positivity rates. The SPDE model was found to be suitable for modelling and predicting point reference data as compared to the Poisson GLM. The posterior covariates effects were similar to those of the areal analysis. It is apparent from the maps of the random effects and the spatial random field that inclusion of spatial random effects yields strong latent spatial patterns that were not explained by the explanatory variables. As shown in Figure 7, the predicted risk of infant HIV infection is high in the western part of Kenya and lowest in the eastern regions. The possible reason for this could be that the data exhibits clustering in the western regions with little or no observations in the other parts. The standard errors are high on the peripherals which should be expected since they are extensions of the triangulation. One advantage of INLA-SPDE is its capability of estimation and prediction of the posterior marginal distributions of the model parameters and the model responses without carrying out extensive simulations. A major limitation with INLA-SPDE is that it becomes computationally intensive when dealing with non-Gaussian likelihood [9]. Limitations of this study include the use of routinely collected data which may not be of quality due to missing information. Additionally, the data has limited information on potential explanatory variables that could be important in explaining the positivity rates. Lack of data on important covariates such as virological status and maternal immunologic which could influence transmission rates.

Conclusion
The aim of this study was to investigate the geographical variation of HIV and identify risk factors among infants in Kenya in 2017. INLA which is a computationally effective alternative to MCMC was used for Bayesian inference. This study has shown the determinants of infant HIV infection and the spatial distribution through mapping of HIV risk. It is apparent that there are still geographical disparities in infant HIV infection and targeted PMTCT interventions and resources should be directed to counties that have high infection rates. It is hoped that the presented outcome variations will stimulate further research efforts to investigate the reasons underlying the disparities and inform policy. Many aspects of this study could be extended. Further research is required to determine suitable spatio-temporal models for modeling and mapping relative risk of HIV among infants in Kenya. This will allow to explain evolution of the relative risk of HIV in space and time. Incorporating other covariates for mother to child transmission of HIV that were not captured in the analysis could yield important insights on the spatial distribution of the disease.

Competing interests
The authors declare that they have no competing interests.
Author's contributions SNM conceived and designed the study, analyzed the data, prepared figures and tables, drafted the manuscript. TNOA analyzed the data, authored or reviewed drafts of the paper, trained the lead author in analytical methods, supervised the writing of the manuscript.
Ethics approval and consent to participate Ethical approval for the study was obtained from Strathmore University Institutional Ethics Review Committee (SU-IERC).