Host Genetic Determinants of the Vaginal Microbiome in Kenyan Women

Background Women with vaginal microbial community state types (CST) with high diversity and a paucity of Lactobacillus crispatus have increased risk of HIV acquisition. Identifying host genetic factors associated with the vaginal microbial composition may aid in elucidating the biological mechanisms regulating these microbial traits and inter-individual variations in associated diseases. Methods We conducted genome-wide associations studies (GWASs) on vaginal microbiome traits on 171 Kenyan women. Study participants were genotyped using the Infinium Global Screening Array and 16S rRNA gene amplicon sequencing was performed to characterize the vaginal microbiome. Linear and logistic regression were performed, adjusting for age and principal components of genetic ancestry, to evaluate the association between L. crispatus, L. iners, G. vaginalis, Shannon diversity index, and CST with host genetic single nucleotide polymorphisms (SNPs). Pathway enrichment analyses were performed to identify biological processes putatively associated with the vaginal microbiome traits. Results At baseline, the median age of study participants was 22 years (IQR: 22 – 25) with 22% having Bacterial vaginosis (Nugent score 7-10). L. crispatus and L. iners were present in 24% and 83% of samples with a mean relative abundance of 31% and 45%, respectively. The most significant SNPs associated were: rs73330467 located between LOC101927488-GRAMD2B ( P =4.79x10 -6 ) for L. crispatus ; rs527430 in the FOXD2-TRABD2B ( P =6.98x10 -7 ) region for L. iners ; rs1229660 in the SNX10-LOC441204 ( P =4.65x10 -6 ) region for G. vaginalis; rs972741 in the ZKSCAN2-HS3ST4 ( P =8.52x10 -7 ) region for Shannon diversity index; and rs2302902 in ELK3 ( P =3.09x10 -6 ) for CST. During pathway enrichment analysis, Toll-like receptors, cytokine production, and other components of innate immune response Conclusions We identified numerous genetic loci on several pathways related to host immunity and infection that were associated with vaginal microbiome traits, providing insight into potential host genetic influences on vaginal microbiome composition. This new information should guide larger longitudinal studies, with genetic and functional comparison across microbiome sites within individuals, and across populations.


Abstract
Background Women with vaginal microbial community state types (CST) with high diversity and a paucity of Lactobacillus crispatus have increased risk of HIV acquisition.
Identifying host genetic factors associated with the vaginal microbial composition may aid in elucidating the biological mechanisms regulating these microbial traits and interindividual variations in associated diseases.
Methods We conducted genome-wide associations studies (GWASs) on vaginal microbiome traits on 171 Kenyan women. Study participants were genotyped using the Infinium Global Screening Array and 16S rRNA gene amplicon sequencing was performed to characterize the vaginal microbiome. Linear and logistic regression were performed, adjusting for age and principal components of genetic ancestry, to evaluate the association between L. crispatus, L. iners, G. vaginalis, Shannon diversity index, and CST with host genetic single nucleotide polymorphisms (SNPs). Pathway enrichment analyses were performed to identify biological processes putatively associated with the vaginal microbiome traits.
During pathway enrichment analysis, Toll-like receptors, cytokine production, and other components of innate immune response were associated with L. crispatus, L. iners, and CST. Multiple genomic loci were replicated, including IL-8 (Shannon, CST), TIRAP ( L. iners Introduction Bacterial vaginosis (BV) is a condition of clinical and public health significance. BV is prevalent in 20-50% of women in sub-Saharan Africa [1]. In a meta-analysis, Atashili et al.
estimated the relative risk of HIV acquisition to be 1.6 times higher for women with BV, equating to a population attributable fraction of 15% due to the high population prevalence [2]. BV is also associated with an increased likelihood of sexually transmitted infections (STI) and adverse pregnancy outcomes [3,4]. BV represents a polymicrobial shift in the vaginal microbiome, often from a Lactobacillus dominant community to one that is diverse, with multiple species of anaerobic bacteria [5]. In particular, L. crispatus enrichment has been shown to be protective against BV, HIV, and STIs [6].
BV is considered a sexually enhanced infection, and individuals with new or multiple sex partners, those who engage in unprotected vaginal sex, or those whose sex partner is uncircumcised are at an increased risk of BV [3,[7][8][9]]. In addition to sexual risk factors, non-sexual risk factors that are associated with BV include intravaginal and vaginal hygiene practices [10], cigarette smoking [11], and race/ethnicity [12]. In numerous studies, women of African descent have increased risk of BV and non-optimal vaginal microbiome composition [13]. These associations with race persist even when controlling for socio-demographics and sexual practices [14,15], leading to the hypothesis that genetic factors may influence the vaginal microbiome composition and subsequently BV [16].
To date, a limited number of studies have examined the host genetic contribution to the vaginal flora and BV. Primarily these studies have focused on host differences in candidate genes responsible for inflammatory mucosal immune response in other infectious conditions [16,17]. As reviewed by Turovskiy et al., studies have targeted single nucleotide polymorphisms (SNPs) in: (1) Toll-like receptor (TLR) genes because of their role in recognizing potential threats and initiating inflammatory responses and activating other immune cells, and (2) pro-inflammatory and inflammatory cytokines (e.g., IL-1β, TNF-α) [15]. Most of these studies, however, evaluated SNPs in relation to BV or to specific taxa associated with BV (e.g., G. vaginalis, A. vaginae spp.) and none of the studies measured the vaginal microflora or host genetic contribution comprehensively, such as with cultivation-independent molecular microbial characterization or a genome wide association study (GWAS), respectively.
To address this gap, we conducted GWAS on vaginal microbiome traits among native Kenyan women. We hypothesized that by conducting GWAS, we would identify novel genomic loci associated with vaginal microbiome traits, furthering our understanding of the underlying genetic factors. Identifying genetic variants may aid in elucidating the biological mechanisms associated with these complex vaginal microbiome traits and associated diseases. Complementary to GWAS, we conducted replication analyses of SNPs previously reported to be associated with vaginal microflora or BV.

Study Sample
Among 171 women included in this analysis, the prevalence of BV at baseline was 22% (Table 1). Over half (53%) of women were HSV-2 seropositive and 10% were HIV positive, which is in keeping with the prevalence of HIV among women in this age range in the general population in this region of western Kenya [18]. L. crispatus was present in 25% of samples (28% mean relative abundance), L. iners was present in 83% of samples (mean relative abundance 45%), and G. vaginalis was present in 75% of samples (25% mean relative abundance).

Genome-wide association results
The genomic control (GC) inflation factor, l, was 1.01, 1.00, 1.00, 1.00, and 1.00 for L. crispatus, L. iners, G. vaginalis, the Shannon diversity index, and CST, respectively, after adjusting for age and the first three principal components. Upon visual inspection of the   Table 2 summarizes the top SNPs (P < 1 x 10 -5 ) for each vaginal microbiome trait and minor allele frequencies (MAF) derived from our data. Overall, no SNP reached genome-wide significance (0.05/336,151 = 1.49E-07). The SNP with the lowest adjusted p-value associated with presence of L. crispatus was rs73330467 (P = 4.79 x 10 -6 , Figure 1A), located in an intergenic region between LOC101927488 and GRAMD2B on chromosome 5. The minor allele G (MAF = 0.07) was associated with an 11.6fold increased odds of detecting L. crispatus. The most significant SNP associated with relative abundance of L. iners was rs527430 (P = 6.98 x 10 -7 , Figure 1B) on chromosome 1, located in an intergenic region between FOXD2 and TRABD2B. The minor allele A (MAF = 0.06) was associated with an increase in relative abundance of L. iners (β = 1.05). The most significant SNP associated with G. vaginalis was rs1229660 (P = 4.65 x 10 -6 , Figure   1C) located between SNX10-LOC441204 on chromosome 7. The minor allele C (MAF = 0.10) was associated with a decrease in G. vaginalis relative abundance (β = -0.99). The most significant SNP associated with Shannon diversity index was rs972741 (P = 8.52 x 10 -7 , Figures 1D) located between ZKSCAN2 and HS3ST4 on chromosome 16. The minor allele A (MAF = 0.19) was associated with an increase in the Shannon diversity index (β= 0.66). The most significant SNP associated with CST was rs2302902 (P = 3.09 x 10 -6 , Figure 1E) located in ELK3 on chromosome 12. The minor allele T (MAF = 0.20) was associated with an increased likelihood of membership in lower diversity CST-I and CST-II as compared to CST-IV (β= 0.41).

Results from Imputed SNPs
We performed genotype imputation in order to expand genomic coverage and interrogate additional SNPs not directly genotyped. After removing SNPs with low imputation quality and low frequency (Rsq < 0.30 and MAF < 1%), several imputed SNPs (but no additional genomic regions) were identified. Figure 2 presents the regional SNP association plots for each of the vaginal microbiome traits. Within the ZKSCAN2-HS3ST4 intergenic region on chromosome 16, several imputed SNPs were more significantly associated with the Shannon diversity index than those directly genotyped ( Figure 2E). The most significant SNP in this region was rs115869045 (P = 4.77 x 10 -7 , Rsq = 0.70).

Conditional Analyses
We conducted conditional analyses for the genomic loci in which the most significant SNP was identified for each vaginal microbiome trait to determine whether additional SNPs contribute to the observed associations. Supplementary Figure 2 displays the regional SNP associations conditioning on the most significant directly genotyped SNP by including the SNP as a covariate in the regression model. As shown in each plot, all of the immediate SNP associations reduced towards the null, suggesting the SNPs identified are the leading SNPs for the observed associations.

Analysis of Previously Reported Loci
We investigated the associations of previously reported SNPs in relation to BV or vaginal microflora to determine whether these associations are relevant in a Kenyan population. To replicate genomic loci associated with the microbiome traits, we extracted the most significant imputed SNP within ± 100 kb of the previously reported SNPs for each microbiome trait in our study. This approach resulted in 6 genomic regions for L. iners, 1 genomic region for G. vaginalis, 7 genomic regions for Shannon Diversity Index, and 4 genomic regions for CST that were significant after Bonferroni correction ( iners, Shannon). vaginalis, as well as 'neutrophil degranulation', though it did not meet Bonferroni significance. Similarly, two pathways were associated with CST, including the most significant pathway 'Class B/2 (secretin family receptors)', and one pathway was associated with the Shannon diversity index, 'Class B/2 (secretin family receptors)'. Three phenotypes were associated with L. iners, with 'Autosomal dominant inheritance' the most significant (P = 0.018). No phenotypes were significantly associated with G. vaginalis. One phenotype was associated with the Shannon diversity index, 'Abnormality of the female genitalia' (P = 0.037). Five phenotypes were associated with CST, including 'Abnormality of the integument' (P = 0.00023).

Discussion
To our knowledge, this is the first GWAS of vaginal microbiome traits. In native Kenyan women, we identified novel genomic loci and biological pathways with biological relevance to host immunity, cell signaling, and infection, supporting the role of host genetics in inter-individual variability in vaginal microbiome traits.
In GWAS, the most significant SNP associated with CST (rs2302902) is located in an intron of ELK3, a member of the ETS transcription factor family, which has not been previously associated with any vaginal microbiome traits or BV. Proteins in the ELK3 subfamily are recruited by serum response factor and participate in transcription regulation, and has been associated in various cancers [19][20][21], including HPV-positive tumors of oropharyngeal cancer [22]and HPV16 in cervical tumors [23]. In meta-analysis, molecular and clinical measure of BV is associated with two-fold increase in risk of high-grade cervical epithelial neoplasia (CIN) or cancer [24]. Non-optimal vaginal microbiome leads to mucosal disruption and a pro-inflammatory environment that can facilitate HPV acquisition [25], but there may also be underlying shared or synergistic mechanism with ELK3. Other SNPs identified in GWAS occurred primarily in intergenic regions that haven't previously been associated with BV or vaginal microbiome traits, and should be assessed for replication in future studies.
Although CST-IV accounted for 89% of BV cases in this sample, 57% of women in CST-IV did not have BV. This is in keeping with many other studies: women with diverse CSTs are much more likely to have BV, but as many as 50% of women with these CSTs do not have BV [26]. Host genetics may partially explain this variability. For example, women with similar vaginal bacterial colonization (such as those within CST-IV) may have different response to bacterial colonization depending on genetic traits. For women who mount an inflammatory response, this immune response may potentiate BV rather being solely a response to BV. In our study, restricting analyses to HIV uninfected women did not change our findings. This is not entirely unexpected, as associations between HIV and subsequent impact on the vaginal microbiome are variable, with some finding no effect [27,28].
Conversely, there is substantial data that demonstrate the temporal association between non-optimal vaginal microbiome composition, mucosal inflammation, and subsequent HIV risk [29], and our results suggest that host genetic differences may underpin this. Wellpowered longitudinal studies that examine how host genetics is associated with varying CST, mucosal immune markers, and BV trajectories could be informative to understanding this variability in pathology within non-optimal CSTs. Additionally studies are needed to explore the potential long range regulation of these SNPs on genes, as well as evaluate for expression or methylation quantitative trait loci associations.

Study Design and Participants
Subjects in this analysis were enrolled in Afya Jozi, Afya Jamii (Kiswahili for "Healthy Pair, Healthy Community"), a prospective cohort study of heterosexual couples in Kisumu, Kenya. Recruitment and eligibility criteria have been previously published [48]. Briefly, we included men aged 18-35 years and their female partners aged 16 years and older, who independently confirmed a sexual relationship of at least 6 months duration. After the baseline visit, couples were scheduled for follow-up at 1 month, 6 months, and 12 months.

Specimen Collection
At each scheduled study visit, participants underwent a standardized medical history and physical examination, followed by a personal interview to obtain socio-demographic and behavioral information. At baseline and each follow-up visit, clinicians collected vaginal swabs for assessment of BV. Gram stained slides were evaluated according to Nugent's criteria, and a score of 7-10 was defined as BV following clinical recommendations [49].
Vaginal swab collection was followed by collection of cervicovaginal lavage (CVL) in 10 ml sterile water. Aliquots of 2mL were stored at -80º until shipment to Chicago for processing.

Vaginal Microbiome Characterization
We used the vaginal microbiome as measured from the baseline visit for this analysis, because antimicrobial treatment was provided for BV and other cervicovaginal infections which may confound subsequent vaginal microbiome observations. DNA extraction, library preparation and sequencing were performed at the University of Illinois at Chicago Sequencing Core (UICSQC). DNA was extracted from CVL specimens using a PowerFecal DNA kit (Qiagen). A two-stage PCR protocol was used to amplify the V3-V4 variable region of bacterial 16S rRNA genes [50]. Illumina MiSeq was used to sequence the amplicons after pooling, implementing V3 chemistry (600 cycles). Quality control and taxonomic annotation were conducted by University of Maryland Institute for Genomic Science (UMD IGS) following previously published protocol [51]. Community state types (CSTs) were determined by hierarchical clustering of relative abundances of taxa, using VALENCIA algorithm implemented by UMD IGS, which uses a distance-based metric to classify each sample to a CST based on the similarity of that sample to the centroid of CSTs identified in a reference set [52]. Raw sequence data files are available in the Sequence Read Archive (National Center for Biotechnology Information; BioProject identifier PRJNA516684).

Genotyping and Quality Control
Genomic DNA was obtained from 200 Kenyan women using oral swabs and was extracted using QIAamp DNA blood mini kit (Qiagen). DNA extraction and isolation was performed at the Center for Population Epigenetics at Northwestern University. Study participants were genotyped using the Infinium Global Screening Array (~640,000 markers; Illumina, Inc., San Diego). The software Illumina GenomeStudio (v2.0.4, Illumina, Inc., San Diego) was used to call single nucleotide polymorphisms (SNPs) at the Genomic Facility, University of Chicago. Study participants with a genotyping call rate less than 95% were excluded from analysis as were study participants with missing vaginal microbiome traits. Due to cryptic relatedness identified during quality control, we restricted all analyses to the maximum number of unrelated individuals. Implementation of these exclusion criteria resulted in 171 study participants for analysis. SNPs with a call rate less than 95%, a minor allele frequency less than 1%, or a Hardy-Weinberg equilibrium P value less than 10 -6 were omitted. After these exclusion criteria, 336,151 remained for further analysis.

Genotype Imputation
We performed genotype imputation to identify additional SNPs associated with the vaginal microbiome traits that were not directly genotyped using the Luhya in Webuye, Kenya reference panel from the 1000 Genomes Project [53]. The program Shapeit2 [54] was used to phase genotypes and imputation was performed using Minimac3 [55]. Imputed genotypes were coded as allelic dosages (estimated counts ranging from 0 to 2). Low quality imputed SNPs, i.e. Rsq < 0.30 and MAF < 1%, were excluded from further analyses.

Statistical Analysis
We evaluated the association between single nucleotide polymorphisms (SNPs) with five Principal components of genetic ancestry were generated using the program EIGENSOFT [56] and the first three principal components were retained and included as covariates.
Quantile-quantile (Q-Q) plots were generated for each trait to visualize test statistics and the genomic control inflation factors were calculated. We used PLINK (v1.90) [57] to test the associations between L. iners, G. vaginalis, Shannon diversity index, and CST with SNPs using linear regression, and association between L. crispatus and SNPs using logistic

Consent for publication: Not applicable.
Availability of data and material: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests:
The authors declare that they have no competing interests