A prediction risk score for HIV among adolescent girls and young women in South Africa: identifying those in need of HIV pre-exposure prophylaxis

Background In sub-Saharan Africa (SSA), adolescent girls and young women (AGYW) have the highest risk of acquiring HIV. This has led to several studies aimed at identifying risk factors for HIV in AGYM. However, a combination of the purported risk variables in a multivariate risk model could be more useful in determining HIV risk in AGYW than one at a time. The purpose of this study was to develop and validate an HIV risk prediction model for AGYW. Methods We analyzed HIV-related HERStory survey data on 4,399 AGYW from South Africa. We identified 16 purported risk variables from the data set. The HIV acquisition risk scores were computed by combining coefficients of a multivariate logistic regression model of HIV positivity. The performance of the final model at discriminating between HIV positive and HIV negative was assessed using the area under the receiver-operating characteristic curve (AUROC). The optimal cut-point of the prediction model was determined using the Youden index. We also used other measures of discriminative abilities such as predictive values, sensitivity, and specificity. Results The estimated HIV prevalence was 12.4% (11.7% − 14.0) %. The score of the derived risk prediction model had a mean and standard deviation of 2.36 and 0.64 respectively and ranged from 0.37 to 4.59. The prediction model’s sensitivity was 16. 7% and a specificity of 98.5%. The model’s positive predictive value was 68.2% and a negative predictive value of 85.8%. The prediction model’s optimal cut-point was 2.43 with sensitivity of 71% and specificity of 60%. Our model performed well at predicting HIV positivity with training AUC of 0.78 and a testing AUC of 0.76. Conclusion A combination of the identified risk factors provided good discrimination and calibration at predicting HIV positivity in AGYW. This model could provide a simple and low-cost strategy for screening AGYW in primary healthcare clinics and community-based settings. In this way, health service providers could easily identify and link AGYW to HIV PrEP services.


Background
Adolescents and young people represent a growing share of people living with HIV worldwide.In 2022, it was estimated that over 1.7 million adolescents were living with HIV worldwide with approximately 88% of all adolescents living with HIV residing in Sub-Saharan Africa (SSA). 1 The World Health Organization (WHO) estimated that 30% of all new HIV infections globally are predicted to occur in young people aged 10 to 24.Adolescents aged 10 to 19 years account for approximately 5% of all people living with HIV worldwide, but 10% of all new HIV infections are occurring amongst adolescents. 2In South Africa, the 2016 South African Demographic and Health Survey (SADHS) showed that AGYW are disproportionately affected by HIV compared to their Correspondence to: Reuben Christopher Moyo, Faculty of Medicine and Family Health, Division of Epidemiology and Biostatistics, Stellenbosch University, P.O.Box 241, Cape Town 8000, South Africa.Email: reuben.moyo2014@yahoo.commale counterparts.HIV prevalence was approximately 4 times higher in AGYW (12%) compared to their male counterparts (3%). 3IV infection in AGYW is predicted by many factors broadly categorized into structural and sexual behavior factors. 4Structural drivers of HIV are factors that relate to socio-economic status, education, and organizational factors such as health service delivery points which play a significant role in offering biomedical HIV prevention services including PrEP as well as schools that offer primary HIV prevention messages. 5exual behavior factors that predict HIV are those that relate to multiple and concurrent partnerships, early sexual debut, transactional sex, and low condom use. 6- 10Sexual behavior factors such as history of anal sex, having a partner suspected of having or known to be living with HIV and having concurrent partners have been shown to predict HIV acquisition in AGYW. 11ome studies have shown a protective effect of higher levels of parental education as well as the AGYW's levels of education on HIV acquisition. 8Gender inequalities, violence against women (VAW), stigma and discrimination, limited access to sexual and reproductive health information and services are some of the structural factors that hinder AGYW's ability to protect themselves from HIV. 4 AGYW living with HIV experience low school attendance and are associated with high school dropouts due to HIV-related morbidity if they do not adhere to HIV treatment. 12hile HIV infection does not affect school enrolment and retention, a South African study on HIV and educational attainments found that adolescent HIV infection significantly reduced their school progress index. 13Among women aged 15 -44, HIV is one of the leading causes of death globally with higher death rates observed in AGYW in SSA. 14 The South Africa's National HIV, Tuberculosis (TB) and sexually transmitted infections (STIs) strategic plan prioritizes the provision of a comprehensive package of high impact, context tailored and carefully targeted combination prevention interventions. 15ombination prevention focuses on the combined delivery of structural, biomedical, and behavioral interventions to maximize the impact of interventions on HIV incidence.Combination HIV prevention is a key strategy in achieving the United Nations AIDS 95-95-95 targets set in 2020. 16The target states that by 2030, 95% of people living with HIV will know their status, further 95% of people diagnosed with HIV will receive sustained antiretroviral therapy and lastly 95% of people receiving antiretroviral therapy will have their viral load suppressed. 17Most HIV prevention strategies mainly focus on correct and consistent male condom use which leaves women with less power and control in their intimate relationships. 18Evidence from studies on use of HIV PrEP for HIV prevention has shown that PrEP is an effective additional preventive measure for AGYW. 18,19In December 2015, South Africa became the first SSA country to start implementing PrEP as a biomedical HIV prevention strategy.As of 2022, it was estimated that 792,000 people were using HIV PrEP in both ongoing and planned projects across South Africa against the target of 250,364. 20The huge disparity between planned and achieved targets shows that the targets were hugely underestimated.There are three forms of HIV PrEP and these are oral drugs (TDF-FTC), vaginal ring (Dipivefrine) and long acting injectables. 21,22In South Africa, oral PrEP is the one that is widely used at present.
Risk perceptions are key in determining high-risk AGYW to be initiated on PrEP, however evidence from studies on risk perception and PrEP use have shown that risk perceptions may be inaccurate and driven by incomplete understanding of epidemiologic risk profile often influenced by factors not related to sexual behavior. 23Evidence from risk perception longitudinal studies among AGYW in South Africa proved that there were no significant differences in HIV positivity between those categorized as 'low' versus 'high-risk' participants which proved that their risk perceptions were inaccurate. 24Offering PrEP to highrisk populations based on risk scoring may maximize impact and minimize cost by offering PrEP to highrisk populations based on risk stratification.Successful coverage and implementation of PrEP may be affected by health service providers inability to identify candidates to be initiated on PrEP in part due to the limited use of risk scoring tools which have been recommended to maximize PrEP impact. 25Several barriers that limit PrEP uptake and utilization have been identified.Individual factors such as fear of HIV acquisition, fear of side effects, and burden of taking PrEP daily.Interpersonal factors that limit PrEP uptake are parental influence and absence of a sexual partner.There are also community factors (peer influence, social stigma), institutional factors (long waiting times at clinics, attitudes of health workers as well as structural factors (cost of PrEP and mode of administration, accessibility concerns) that affect utilization of HIV PrEP services. 26Given the increase in new infections coupled with low PrEP coverage among AGYW, our study aimed at developing and validating a risk prediction model for HIV acquisition in order to identify highrisk AGYW who should be linked to HIV PrEP services.

Study design, setting and population
The data used in this study came from the HERStory survey conducted by South African Medical Research Council (SAMRC) between 2017 and 2018.We analyzed HIV-related data on 4,399 AGYW from six South African provinces namely the City of Cape Town, Ehlanzeni, OR Tambo, Tshwane, uThungulu, and Zululand.The HERStory survey used community household survey that linked information from the community household survey to participants clinic records.To ensure that participants were representative of the population from where they were drawn, participants in the survey were selected using a stratified probability proportional to size (PPS) sampling design.Sampling frames for the survey were compiled for each district based on the 2011 census small area layers and were limited to the areas targeted for the planned HIV prevention programs for AYGW.Interviewers were trained prior to data collection on how to avoid different types of information bias during data collection.Details of the methodology of the HERStory survey, inclusion and exclusion criteria have been explained in detail in their final report. 4

Outcome of interest
HIV status: The HIV status of the AGYW was ascertained by HIV testing of dried blood spots.HIV status was coded 1 for HIV positive status and 0 for HIV negative status.

Predictors of the outcome
Age: This was the exact age of the participant.Age at first sex: This variable described the age at which the AGYW first had penetrative sex.Condom use: This variable described whether the participant used a condom the last time she had sex.History of Sexually transmitted infections (STIs): This variable indicated whether the participant presented with any STI symptoms.The number of sexual partners: This variable indicated the participant's number of penetrative sexual partners she ever had.Partner HIV status: This variable described the HIV status of the participant's sexual partner.Marital status: Whether the AGYW was legally or traditionally married or not.Transactional sex: This variable describes whether the participant engaged in sex for money or other items.Orphanhood: Whether the AGYW was an orphan or not.Use of drugs and substances: This variable described whether the participant used drugs and other substances.
Partner age: This variable referred to the age of the current or last partner participant's sexual partner.Socio-economic status: This variable was derived from the wealth index score of the AGYW which measured participants socio-economic status based on household asset.Highest Education: This variable described the highest education level attained.District: The is the exact geographical district where the AGYW was captured during data collection.Ever pregnant: This variable described whether the AGYW has ever been pregnant or not.Rape: A variable that described whether the AGYW was ever raped or not.

Statistical analysis
Statistical analyses were conducted in Stata version 16.1.We conducted descriptive analysis using the frequency procedure to show descriptive statistics in the form of numbers and proportions.To examine association between HIV status and its purported predictor variables, Pearson's chi-square tests (X 2 ) were conducted.Predictor variables were considered significant at p < 0.05.To ensure that estimates produced in this study were representative of the AGYW population from their respective geographical areas, we applied sampling weights to facilitate analysis of survey data which has the ability to correct unequal representation of the sampled population.Missing data was not a concern in this study because the data had only one variable with a missing observation.

Development and validation of a risk prediction model
We used 70% of the data for training the model and 30% for testing the performance of the model at predicting the outcome when applied to an external population of AGYW.To quantify the amount of HIV risk associated with each explanatory variable after controlling for the independent effects of other covariates such as age, education levels and age at sexual debut, a multivariable binary logistic regression model of HIV status on its predictors was used to obtain coefficients for use in deriving the HIV risk scores.Variables in a model were selected using least absolute shrinkage selection operator (LASSO).LASSO is a machine learning feature selection method to maximize prediction accuracy of the model.Age, number of sexual partners, pregnancy, rape, and transactional sex were forced in the model because they were treated as priori confounders and have been shown to mediate HIV infection in AGYW. 23,27We used likelihood ratio test to select the best preforming model among successive models.The performance of the final model was assessed using discrimination and calibration measures.
We used area under receiver operating characteristic curve (AUC) to assess the performance of the model at discriminating HIV positive versus negative status on both training and testing datasets.Calibration was assessed using Hosmer Lemeshow, Brier score and Pseudo R 2 .

Scoring of a risk prediction tool
The HIV risk prediction scores were developed by summing coefficients of the risk variables from the multivariable logistic regression model of HIV positivity (26).The optimal cut-point of the risk score at which AGYW were likely to have an HIV-positive status was determined using the Youden index. 28

Distribution of study participants
A total of 4,399 AGYW participated in the survey.Overall, most participants were drawn from Zululand (17.9%),Ehlanzeni (17.8%),O. R. Tambo (17.0%), and Tshwane (15.7%).Approximately 57% of the study participants were aged 15 to 19 while 43% were aged 20 to 24 years.69.2% of the participants reported having ever had sex.Of the participants who ever had sex, 8.8% reported having started sex before or at the age of 15.The proportion of participants who reported that they were ever raped was 6% while the proportion of participants who had been pregnant before accounted for 38%.Only 12.1% of the participants reported that they ever engaged in transactional sex at some point.Table 1 shows the distribution of selected participants' characteristics.
The overall HIV prevalence among the study participants was 12.4% (11.7% À 14.0%).HIV prevalence was high among AGYW aged 20 to 24 (19.7%) compared to participants aged 15-19 (6.75%).AGYW who reported having been involved in transactional sex had a higher prevalence (20.0%) compared to those who never practiced transactional sex (11.6%).HIV prevalence was high in AGYW with low socio-economic status (13.2%) compared to those in the high socioeconomic status category (9.2%).The prevalence of HIV was extremely high (57.6%)among AGWY in a relationship with a known HIV-positive partner.The HIV prevalence among AGYW who did not know the partner's HIV status and those who preferred not to reveal their partner's status was approximately 24%.Participants who reported any STI symptom had a slightly higher prevalence of HIV (18.5%) compared to those who were not treated for STIs (10.6%).There were no significant differences in HIV prevalence between AGYW who ever used drugs and substances and those who did not.Table 2 shows a comparison of HIV status by selected predictor variables and their corresponding p-values.

Independent HIV risk scores for AGYW
Table 3 shows coefficients and their corresponding risk scores from the final multivariable prediction model of HIV risk factors.13 candidate predictors were used in the final model.Compared to AGYW aged 15 to 19, participants aged 20 to 24 had higher risk of HIV (b ¼ 0.70, p < 0.001).The risk of HIV infection was slightly high for AGYW in the lower socio-economic status (b ¼ 0.17, p ¼ 0.010) compared to those in high socio-economic status.Being in a relationship with a partner known to be living with HIV was associated with more than 11 times the odds of HIV infection (b ¼ 2.6, p < 0.001).AGYW who did not know the status of the partner and those who preferred not to reveal the status of their partners were also associated with higher risk (b ¼1.0, p < 0.001 and b ¼1.4,p < 0.001) respectively.The risk score associated any STI symptom was slightly high (0.3) compared to those who did not report any STI symptom.AGYW who lost a parent had slightly higher risk score (0.3) compared to those who did not.Higher levels of education were associated with lower risk of HIV infection compared those with primary education (b ¼ À0.7, p ¼ 0,020 and b ¼ À1.11, p < 0.001) for secondary level and post-secondary education levels respectively.There were no significant differentials in the risk of HIV with respect to age at sexual debut.The optimal coefficient cut-off point estimated using the Youden index was 2.43 with a sensitivity cut point of 71%, a specificity cut point of 60% and an AUC of 0.66.Table 4 shows selected cut-points at various levels.The optimal cut-off point of the risk score was 2.43 with a sensitivity of 71% and specificity of 60%.Key.Sensitivity: The proportion of actual positives which are correctly identified as such.Specificity: The proportion of negatives which are correctly identified as such.

Performance of the risk prediction model
LRþ: The ratio of the probability of a positive test among the truly positive subjects to the probability of a positive test among the truly negative subjects.LR-: The ratio of the probability of a negative test among the truly positive subjects.

Discussion
This study aimed at developing and validating a risk prediction model for predicting AGYW with elevated HIV risk based on selected predictors of HIV status.Our risk prediction model showed both good discrimination and calibration at predicting HIV in AGYW.Based on the Youden Index cut-off point score, an optimal risk score cut-off point of 2.43 may be indicative of a positive HIV status.This means that AGYW with a risk score of at least 2.43 should be offered screening and linked to HIV PrEP services.This study has found that AGYW exposed to HIV-positive partners have more than twice the risk of HIV compared to HIV-negative partners.This finding is not unusual because of the increased exposure to the HIV in the absence of protection and HIV PrEP use.Deliberate efforts are required to initiate such AGYW on PrEP to reduce their likelihood of seroconversion.Similarly, the risk score was slightly high in AGYW who reported any STI symptom.This finding supports and strengthens the policy recommendation of offering HIV testing to all clients visiting STI clinics to ascertain their HIV status.The optimal cut point identified in this study does not replace routine screening in clinical care settings.The HIV risk prediction models and their cut-off points are meant to help health care workers stratify AGYW based on risk scoring and provide services according to their risk stratification.These findings inform users of the risk prediction model that HIV programming for AGYW should particularly target AGYW with elevated risk to ensure HIV prevention interventions are impactful and cost-effective.
Based on the number of variables selected to assess risk, the risk classification and their optimal cut-off points may change to fit the user's situation.While HIV risk prediction models based on HIV prevalence exist, many available models in literature do not have key predictors of HIV in AGYW such as partner HIV status, any symptoms of STIs, sexual violence and substance use.This study has investigated the contribution of all these predictors and included them in the model.understanding of the risk prediction approach in HIV epidemiology by presenting the most up-to-date risk assessment tool for AGYW.Our prediction may be applied in similar settings with high prevalence of HIV in sub-Saharan Africa.Some prediction tools are based on proximate predictors of HIV only while others include mediators including socio-demographic and proximate predictors.The risk tool developed in Malawi using data from VOICE trial included 6 predictors in the final model that lacked key predictors of HIV such as partner HIV status, age at first sex and whether the AGYW was involved in transactional sex or not. 29Similarly, an HIV risk assessments tool for AGYW in South Africa also lacked variables such as partner HIV status and transactional sex. 30Lack of key variables that strongly predict HIV in a prediction model may reduce the model's performance at discriminating AGYW with an elevated risk of HIV.One feature that has been shown to predict HIV but has not been included in our model is history of TB treatment.History of TB was found to significantly increase the probability of HIV in a study on use of machine learning techniques to identify HIV predictors in sub-Saharan Africa. 31A recent study on HIV risk score among adult populations in sub-Saharan Africa also showed that younger age, non-cohabiting and recent STIs were consistently identified as predicting future HIV infection with moderate prediction accuracy. 32his study therefore presents a robust tool that has been developed and validated to accurately capture AGYW with elevated risk.Based on behavior change and other circumstances, risk scores for AGYW may change hence the need to periodically review and follow up risk over time.Clinical prediction models for HIV have the potential to increase the number of AGYW to be initiated on HIV PrEP to reduce their risk of acquiring HIV infection.This, however, depends on the accuracy of the prediction model at identifying high-risk AGYW.If AGYW have been falsely identified as high risk when their actual risk is low, they will be initiated on PrEP when they are not supposed to be initiated on PrEP.Likewise, AGYW falsely identified as low risk when their actual risk is high will not be initiated on PrEP which may expose them to HIV infection.Monitoring risk status over time is important to prevent and correct this misclassification.Studies on use of risk prediction models to identify high risk populations have shown that these models lose their prediction power over time in part due to changes in prevalence of the outcome which may affect the prediction model's performance at predicting the outcome.It is important to continuously monitor the performance of the model and update it when it no longer predicts the outcome.The period for updating and recalibrating the risk prediction model depends on changes on the prevalence of the outcome and the need to add or remove predictors.The model maybe updated by either using new datasets or adding candidate predictors. 33,34Given the high prevalence of HIV among AGYW, we suggest that programming for PrEP should not only target high risk AGYW but a larger proportion of AGYW with both high epidemiologic risk and those with high perceived risk to increase PrEP coverage.Neglecting those with low epidemiologic risk but with high-risk perception will reduce PrEP coverage and lead to an increase in new infections among AGYW.

Policy implications and applications
Our findings have policy implications and applications in HIV programming.Firstly, risk scores may be used by service providers to supplement health education and counseling to AGYW in high HIV prevalent settings in SSA to increase coverage of both screening and PrEP initiation.Secondly, HIV risk prediction models may also be useful in monitoring changes in risk over time to check if the AGYW's risk score is changing from low to high or vice versa depending on circumstances. 35Depending on geographical areas where many AGYW are scoring above the cut-off point on a risk score, this tool may be used to allocate resources to such areas so that more resources allocated to areas where high risk scores are likely.

Limitations
This study has limitations.Firstly, this study utilized cross-sectional survey data from six South African provinces only.This can affect generalizability in other settings and countries where HIV prevalence is low.However, results from this study could potentially be relevant and be applied in countries with high HIV prevalence mostly in SSA.The validation of the model was done using the same data set, this may affect the accuracy of the model if there are systematic differences between the sampled population and AGYW from other settings not represented in the study.This study used HIV prevalence data to develop a risk prediction model for AGYW, use of HIV prevalence data at the expense of incident data may affect the model's capacity to predict new infections.There are many methods of developing risk prediction models such as generalized linear models and ensemble methods, this study used generalized linear models at the expense of ensemble models that perform both feature selection and prediction modeling to increase prediction accuracy.The risk tool does better with more in-depth and personal exposure questions, this many affect its use in busy settings where more time is required and when participants are not willing to disclose such information.

Recommendations
We recommend the use of this risk prediction model to supplement clinical decision making to increase coverage of PrEP use.We also recommend a feasibility study of using the risk tool in clinical settings to assess its user-friendliness and its accuracy at identifying high-risk AGYW.

Conclusion
Our risk prediction model has shown good discrimination and calibration at predicting AGYW with elevated risk of acquiring HIV.Our risk prediction model provides a data-driven way of identifying predictors as well as predicting AYGW at high risk of infection to be targeted with HIV PrEP services both at clinic level as well as community level.

Figure 1 .
Figure 1.Training and testing AUC of the prediction model.X-axis is the model's sensitivity while y-axis is 1-Specificity.The blue line represents the testing ROC while red line represents the training ROC.

Table 2 .
Comparison of HIV status by predictor variables (N ¼ 4,399).

Table 3 .
Coefficients and risk scoring for the independent predictors of HIV using the training dataset.

Table 4 .
Performance characteristics of the selected risk scores based on the training dataset.

Table 5 .
Prediction model's classification table based on the training dataset.

Table 6 .
Performance of the risk prediction model on the training and testing datasets.