Association of medical male circumcision and sexually transmitted infections in a population-based study: a targeted maximum likelihood estimation approach

Background: Epidemiological theory and many empirical studies support the hypothesis that there is a protective effect of male circumcision against some sexually transmitted infections (STIs). However, there is a paucity of randomized control trials (RCTs) to test this hypothesis in the South African population. Due to the infeasibility of conducting RCTs, estimating marginal or average treatment effects with observational data, are of increasing interest. Using targeted maximum likelihood estimation (TMLE), a doubly robust estimation technique, we aim to provide evidence of association between medical male circumcision (MMC) and two STI outcomes. Methods: We investigated the associations between MMC and the two STI outcomes, HIV and HSV-2, using data from the HIV Incidence Provincial Surveillance System (HIPSS) study in KwaZulu-Natal, South Africa. We estimated marginal odds ratios using TMLE and compared estimates with those from propensity score full matching and inverse probability of treatment weighting (IPTW). Results: TMLE estimates suggest that MMC was associated with 46.9% lower odds of HIV (OR: 0.531; 95% CI: 0.455, 0.621) and 20.5% for HSV-2 (OR: 0.795; 95% CI: 0.694, 0.911). The propensity score analyses also provided evidence of association of MMC with lower odds of HIV and HSV-2. For full matching: HIV (OR: 0.546; 95% CI: 0.402, 0.741), and HSV-2 (OR: 0.705; 95% CI: 0.545, 0.910). For IPTW: HIV (OR: 0.541; 95% CI: 0.405, 0.722), and HSV-2 (OR: 0.694; 95% CI: 0.541, 0.889). Conclusion: Using a TMLE approach, we present further evidence of a protective effect of MMC against HIV and HSV-2 in this hyper-endemic South African setting. TMLE has the potential to enhance the evidence base for recommendations that embrace the effect of public health interventions on health or


Background
Numerous public health initiatives to better control the prevalence of HIV/AIDS and other sexually transmitted infections (STIs) have been implemented. One such public health intervention has been medical male circumcision (MMC), which focuses on the anatomical structure of the penis. It is well established that the inner foreskin of the penis is highly susceptible to infection and that the surgical removal of the foreskin, or the retractable fold of tissue covering the head of the penis, reduces susceptibility to infections. Therefore, MMC is recognized as being one modifiable vector of STIs, including HIV in men. Among men, MMC has a protective effect against HIV infection and some sexually STIs via heterosexual transmission [1][2][3]. Evidence from three randomized controlled trials (RCTs) showed that MMC decreased heterosexual acquisition of HIV by 53% to 60%, herpes simplex virus type-2 (HSV-2) by 28% to 34% and genital ulcer disease among men [4][5][6].
The numerous studies highlighted above, amongst others, underline the importance of the relationship between MMC and the acquisition of STIs. However, these studies investigating the associations between MMC and STIs have estimated conditional effects, usually by using traditional regression models. To date, none has estimated average or marginal treatment effects, usually estimated by RCTs and propensity score analyses. By mimicking a RCT, where a marginal treatment effect is obtained by contrasting the outcomes between the exposed and non-exposed groups, there is increasing interest in estimating marginal treatment effects using observational data, not adjusted or conditional treatment effects [7].
Besides estimating marginal treatment effects, model misspecification is another problem for the assessment of association between the exposure (or treatment) and outcome. A misspecification of the model terms could substantially bias the estimated effects, as well as the statistical inference [8]. Machine learning methods, using automated data-adaptive strategies that capture important patterns and interactions among variables, can typically overcome these limitations [9,10].
Though machine learning has traditionally focused on risk prediction or classification, its utility has been extended to effect estimation and inference [11,12]. Targeted maximum likelihood estimation (TMLE) is a doubly-robust semiparametric method that estimates exposure effects or associations without relying on model specifications [13]. It combines semiparametric estimation, using machine learning algorithms, with an additional estimation process to optimize a parameter of interest (e.g. risk difference, risk ratio, and odds ratio) [12].
The goal of this analysis is to investigate the associations between MMC and two STI outcomes.
Specifically, we used a population-based study to estimate the association between MMC and two STIs; namely, HIV and herpes simplex virus type-2 (HSV-2), among males in the KwaZulu-Natal region of South Africa. We obtained marginal odds ratios using TMLE and further compared our results with estimates from propensity score analyses, including full matching and inverse probability of treatment weighting (IPTW) methods.

Study design and participants
We used data from the HIV Incidence Provincial Surveillance System (HIPSS), a detailed and robust surveillance project that monitored HIV prevalence and incidence trends in KwaZulu-Natal, South Africa. The HIPSS study aimed to assess the impact of programmatic intervention efforts, including HIV-related prevention and treatment programmes on HIV prevalence, uptake of antiretroviral therapy (ART), CD4 cell counts and viral suppression, in a real-world nonexperimental setting. Survey weights that adjust for varying selection probabilities and differential non-response rates were included in the study design. The HIPSS study design, source population and recruitment procedures, have been described previously [14,15]. among inhabitants of occupied households and 86.7% of enrolled households. Details on the variables for which data were collected have been previously published [14,15].

Variables and Inclusion criteria of participants
We included men who self-reported their MMC status and were sexually active. The main exposure of interest was the MMC status; i.e. whether a participant had MMC or not. Those who reported being uncircumcised or traditionally circumcised (represents partial removal of the foreskin), or did not know their circumcision status, were classified as not having MMC. The two outcomes of interest in our analysis were the HIV test result (+ve = 1, -ve = 0) and HSV-2 test result (+ve = 1, -ve = 0). Covariates collected included age (in years), marital status (married, widowed/divorced/separated, single), education (no education, primary/ not completed high school, completed high school, degree/diploma), number of lifetime sexual partners (one, multiple), condom usage (always/sometimes, never), and had sex in the last 12 months (yes or no).
These variables are epidemiologically plausible or possible confounders for the relationship between MMC status and the HIV and HSV-2 outcomes.
From the original 3547 male participants, we removed 692 participants who reported never having had sex. We further excluded participants who had missing values for MMC status (n = 5). Our analytic sample consisted of 2850 male participants.

Statistical Analysis
We contrasted the marginally adjusted odds of the HIV and HSV-2 outcomes that would be observed for the MMC exposure. In other words, we compared the odds, for each of the two outcomes, when the men were medically circumcised with not being circumcised. Further, all contrasts were adjusted for the predefined set of important confounders, which include age, marital status, educational level, number of lifetime sexual partners, condom usage, sexual activity in the last 12 months. We estimated associations of MMC with HIV and HSV-2 using TMLE, full matching on the propensity score [15] and inverse probability of treatment weighting (IPTW) [16].
The implementation of TMLE is straightforward. Let T, Y denote the exposure (or treatment) indicator and observed outcome (MMC status and STI outcome, respectively, in this context), and let W be a vector including the identified confounders for the effect of T on Y. We first estimated the initial conditional odds of the STI outcome Y, given the MMC status and covariates, Q 0 (T, W) = 0 ( | , ). The estimate Q 0 ( , ) and the predictions Q 0 (1, ) and Q 0 (0, ) were estimated with Super Learner. Super Learner is an ensemble learner of a pre-specified library of algorithms with parameters. It uses cross-validation to adaptively create an optimally weighted combination of estimates from candidate algorithms [17]. Optimality was defined based on each ensemble learner fit using 10-fold cross-validation, thereby reducing the chance of overfitting.
These estimates Q 0 ( , ), Q 0 (1, ), and Q 0 (0, ) form additional columns in our data matrix. We then plugged-in our estimates Q 0 (1, ), and Q 0 (0, ) into our substitution estimator of the parameter of interest, log odds ratio, to obtain an untargeted estimate: We , respectively. In addition to adding the columns * (1, ) and * (0, ), these values are then combined to form a column * ( , ) in the data matrix.
In the second and final step, we estimated the fluctuation parameter ℇ by fitting an intercept-free logistic regression of Y on * (T, W) with the logit of Q 0 (T, W) being an offset (fixed quantity), where is the resulting coefficient of the clever covariate * (T, W). We next updated the estimate For the two propensity score methods, full matching and IPTW, we defined the propensity score as the conditional probability that a participant was circumcised, given the covariates. As suggested by [16], we also included the survey weight as an additional covariate in the propensity score model. Secondly, we used the estimated propensity scores to create two sets of weights, each derived from full matching and IPTW. These induced weights, for each of the two propensity score methods, are then incorporated in a logistic regression model, which involves regressing the STI outcome on the MMC status.  [18] and WeightIt [19], respectively.

Results
In the analytical sample of 2840 men, 29.1% reported receiving MMC. These men were more likely than their uncircumcised counterparts to be younger and single. They also had a majority who had completed high school, wore condom with a recent partner, had sex in the last one year, and had more than five sexual partners (Table 1).
To examine possible violations of the positivity assumption for estimators that rely on the propensity score, including the TMLE, we examined the distribution of the estimated propensity score. The histogram of the estimated propensity score by the exposure groups is shown in Figure   1. As shown in Figure 1,  The prevalence of HIV and HSV-2 were lower among men who had received MMC than those who did not ( Table 2). HSV-2 prevalence was higher (53.2%) than HIV (32.4%). Estimates of the unadjusted odds ratios showed a significant effect of MMC for each of the two STI outcomes.
After adjusting for the identified confounders, we found evidence of protective associations between MMC and HIV when the propensity score techniques were utilized (Figure 2  <insert Table 2 here>

Discussion and conclusion
We examined the utility of a relatively new methodology, targeted maximum likelihood estimation technique (TMLE) to estimate the association between MMC and sexually transmitted infections among males in a South African population-based study. This study adds to the body of growing knowledge providing evidence of the benefits of MMC in STI prevention. Specifically, we found that for men, MMC has a protective effect against HIV and HSV-2. Though the utilization of TMLE did not indicate a null effect nor alter the direction of the association, we found evidence of more precise effects. high prevalence compared to other STIs, and the application of TMLE for rare outcomes is still in its infancy [20]. Moreover, previous reports [1], have shown that the associations of MMC with STIs other than HIV and HSV-2 were unreliable as the study was underpowered to detect rare STI outcomes.
Public health interventions for HIV and HSV-2 are critically important to study, especially in African settings with high burden syndemics. South Africa as a country -with over seven million HIV positive individuals -has the highest number of people living with HIV in the world, and the KwaZulu-Natal province is the worst hit, with a prevalence of 27% as recorded at the end of 2017 [15]. There was an estimated 417 million cases of HSV-2 globally in 2012 [21]. The currently reported prevalence of HSV-2 in sub-Saharan Africa is as high as 80% among men and women aged 35 and older [22]. Biological and epidemiological evidence further suggests a cofactor effect of HIV and HSV-2. In other words, HSV-2 infection increases the odds of HIV acquisition [23,24].
Parametric models require the correct specification of the functional form of the relationship between the exposure and the confounders, or the outcome-confounders relationship. This requirement is challenging and not usually satisfied in practice. The most attractive and unique property of TMLE is its double-robustness, which reduces bias due to model misspecification [12].
This doubly-robust property ensures that TMLE estimates are unbiased if either one of the exposure or outcome model is consistently estimated. TMLE, like other doubly-robust techniques, offers an opportunity to rely on nonparametric methods (like machine learning) in its estimation process, thereby increasing efficiency [13]. Previous theoretical and simulation studies have shown that TMLE has greater efficiency and less bias when compared with mis-specified parametric and nonparametric singly robust methods [11,28]. This was also evident from the result of our TMLE estimates and confidence intervals in this study.
The proportion of refusals or non-participation of the utilized HIPSS study, both at the household and the individual level was lower than most community-based surveys [25]. Although the utilized data source is robust, it is cross-sectional. We are thus limited by the ability to conclude the temporal relationship between the self-reported factors with the STI outcomes. In other words, it cannot be determined whether observed associations existed before the STI outcomes or viceversa. Data other than from the STI outcomes came from self-reports; hence, our work is likely to suffer from self-recall bias due to differential recall or social desirability. Misclassification of circumcision status is also a possibility. Not controlling for important risk factors such as a history of narcotics usage and additional comorbidities, which were not in the HIPSS database, is another limitation of this study.
For HIV, we did not exclude key subpopulations whose odds of acquisition would not result from heterosexual transmission. Our inclusion of these subpopulations would most likely bias associations towards the null since there will be less impact of their circumcision status on their HIV acquisition risk. Most of these limitations will be partly addressed by a planned analysis of a longitudinal cohort study capturing STI incidence, thereby validating findings from this study as well as others that have utilized the HIPSS study.
Our TMLE results provide further evidence of the protective effect of MMC against some STIs in men. This study has important practical implications for studies using nonparametric estimation techniques. Notably, TMLE estimates should be interpreted in light of a careful assessment of the propensity score distribution among the exposed and unexposed, and be compared with results from alternative parametric and nonparametric techniques. Due to its double robustness, TMLE, in comparison to its competitors, often results in efficiency gains and bias reduction of estimated exposure effects. In general, the TMLE method has the potential to advance the field of epidemiology and public health, enhancing the evidence base for recommendations that embrace the effect of public health interventions on health or disease outcomes.

Declarations
Ethics approval and consent to participate