Study Population
We studied 544 women participating in a cohort study of cervical HPV infection and cervical cancer at National Hospital, Abuja and University of Abuja Teaching Hospital, Nigeria, and enrolled between 2012 and 2014, as previously described (5, 12-14). All the study participants were 18 years of age or older, had a history of vaginal sexual intercourse, were not currently pregnant and had no history of hysterectomy. We collected data on socio-demographic characteristics, sexual and reproductive history, and confirmed participants’ HIV status from hospital medical records at study entry. Participants were asked to return for follow-up after six months, at which time, the history, physical examinations and sample collections were repeated. We collected venous blood samples and performed pelvic examinations on all study participants at each study visit. Elution swab system (Copan, Italy) was used to collect exfoliated cervical cells, which were inserted in 1 ml Amies’ transport media (Copan).
HPV detection by SPF10/LiPA25
We extracted DNA from the cervical exfoliated cells as previously described (11). Samples were tested for HPV DNA by hybridization of SPF10 amplimers to a mixture of general HPV probes recognizing a broad range of high-risk, low-risk, and possible hrHPV genotypes in a microtiter plate format, as described previously (15). All samples determined to be HPV DNA positive by SPF10 DNA Enzyme Immunoassay (DEIA) were genotyped using the LiPA25 version 1. The LiPA25 assay provides type-specific information for 25 different HPV genotypes simultaneously and identifies infection by one or more of 13 hrHPV genotypes: 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68 (16, 17). However, as this assay does not differentiate between HPV 68 and 73, we defined this HPV genotype (i.e. HPV68/73) as low-risk. We defined hrHPV infection as prevalent if at least one hrHPV genotype was detected in the baseline sample and persistent if at least one hrHPV genotype was detected in samples provided at both the baseline and follow-up visits. We defined persistently negative as absence of hrHPV genotype in the baseline and follow-up visit samples.
Genotyping and Imputation
Samples from the study participants were genotyped using the Illumina Multi-Ethnic Global Array (MEGA) which has ~1.7 million markers. Sample-level genotype call rate was at least 0.95 for all the study participants. We filtered out from the genotyped dataset SNPs that did not meet the following criteria: autosomal SNPs (n=78,713), variant missingness < 0.05 (n=96,410), Hardy-Weinberg equilibrium (HWE) p > 1 x 10-6 (n=7,692) and minor allele frequency (MAF) >= 0.01 (n=564,791). The resulting 958,363 SNPs that passed these quality control filters had a SNP success rate of 0.9985 and were used as the basis for imputation.
Imputation was performed using the Sanger Imputation Service (https://imputation.sanger.ac.uk/) (18) with pre-phasing done with the Eagle2 algorithm (19) and imputation done with positional Burrows-Wheeler transform (PBWT) (20). The reference panel used is the African Genome Resources Haplotype Reference Panel, an African genome imputation reference panel based on 9912 haplotypes (4,956 samples) which includes all African and non-African 1000 Genomes Phase 3 populations and additional African genomes from Uganda, Ethiopia, Egypt, Namibia and South Africa (including 2,298 African samples with whole genome sequence data from the African Genome Variation Project (AGVP) (21) and the Uganda 2,000 Genomes Project (UG2G) (22). The IMPUTE2 INFO score was used as a quality metric to evaluate the uncertainty in genotype imputation. Imputation yielded a total number of ~104 million markers. We filtered the resulting imputation dataset for variants with info score ≥ 0.3 and MAF ≥ 0.01, with a final set of ~18 million SNPs which was used for association analysis.
Statistical Analysis
From the original set of 544 women, we excluded 27 women from the baseline analyses because of incomplete data (5 missing HPV, 22 missing both HPV and HIV results), leaving only 517 women in the baseline analyses. Of the 517 women, we excluded those who did not return for the follow-up visit (n=9), those with missing HPV results (n=35) and included the remaining 473 women in the analyses for persistent hrHPV infections. For the prevalent hrHPV analysis, we compared 125 women with cervical hrHPV infections (cases) to 392 women without cervical hrHPV infections at baseline (controls). For the persistent hrHPV analysis, we compared 51 women with hrHPV infection at both the baseline and follow-up visits (cases) to 355 women without hrHPV infections at either the baseline or follow-up visits (controls). Using LD-pruned SNP genotype data available on the same women, we computed principal components based on the variance-standardized relationship matrix using PLINK 1.9 (23, 24) using the parameters “--indep 50 5 2 ”, namely with a window size of 50 SNPs, 5 SNPs to shift the window at each step and a variance inflation factor of 2. We found that the first principal component was significant in the test for population differentiation and included it in downstream association analyses. The association between the genetic variants and prevalent or persistent hrHPV infection was estimated using unconditional multivariate logistic regression, assuming an additive genetic model adjusted for age, HIV status and the first principal component. Genome-wide significance was set at p-value < 5 x 10-8. We used an additive genetic model adjusted for HIV status to test for replication of SNPs associated with HPV and cervical neoplasia in other populations and considered p-values < 0.05 as statistically significant evidence for replication. The analyses were conducted using PLINK.