Following stringent quality-control steps on genotyping data9, 1,643 OPC cases and 5,256 cancer-free controls remained for analyses (Table 1, Supplementary Tables 1-3). The OC case series comprised 2,359 cases and was used to complement HPV(-)OPC analyses given their common risk factors (Supplementary Table 4).
Based on the serology data, 65.6% of the OPC cases (1,078 of 1,643) were classified as HPV(+). Among the controls with serology data available (1,543), 16 (1%) were classified as seropositive for HPV16; thus, the population specificity was 99.0% (95%CI: 98.3 to 99.4%). Median MFI values of HPV16 serology markers (E1, E2, E7), were markedly higher in cases compared to controls (Supplementary Table 5).
GWA analyses stratified by HPV16 status (1,078 HPV(+) and 565 HPV(-) OPC cases, and 5,256 controls) within each continent and meta-analyzed yielded 7,574,753 SNPS for risk association analyses. GWAs results of the two traits are presented in Figure 1. There was no evidence of genomic inflation for any of the analyses after scaling it to 1,000 cases and controls9 (Supplementary Figure 1). All findings with a P-value less than 1×10−6 are reported in Supplementary Tables 12-14.
The HLA region is significantly associated with HPV(+)OPC risk
A strong signal (P=4.5 x 10-11) mapped to 6p21.32 was identified for HPV(+)OPC cases (Figure 1; Table 2). This area of the genome encodes the HLA region, which regulates the immune response. In addition, two suggestive associations at locus 3q26.1 and 4q35.1 were detected just below the GWAS significance level (P < 1x10–7, Supplementary Table 6).
Stratified analyses, by study characteristics are summarized in Supplementary Figure 2. No evidence of differences in effect size between characteristic strata were observed.
Of note, in this larger study population with HPV status information, we also replicated the previously observed association of three HLA specific alleles (DRB1*1301, DQA1*0103 and DQB1*0603) and their haplotype with decreased HPV(+)OPC risk (Table 2 and Supplementary Figure 2)9.
Fine-mapping of the HLA region reveals two independent loci associated with risk of HPV(+)OPC
The HLA region is highly polymorphic with extensive linkage disequilibrium (LD) and assessing the causality of identified associations is difficult. To fine-map the region, we therefore imputed sequence variation in classical HLA genes. The final set of imputed variants used in association analyses were of high quality; 92.5% of the variants had R2≥0.9, and 68.2% of the less common variants (MAF<0.05) had R2≥0.9.
Associations with risk of HPV(+)OPC extended into the class I and II regions (Figure 2a). The most significant variant was the A allele of rs4713462, an HLA class I intergenic variant that maps 5.6kb 5' of AL671883.1. This variant had not been identified in the previous GWAs analysis9 in which HPV status data was available for only a limited number of cases. (OR =0.66, P=4.5 x 10−11; Figure 2a; Table 2). Functional annotation revealed that rs4713462 is predicted to be an eQTL for HLA-C across different tissues including upper digestive tract tissues. Given the extended LD in the MHC region and the considerable number of variants below genome-wide level significance showing similar associations with HLA-C mRNA levels (Supplementary Table 7), we explored HLA-C expression genetic control across the MHC region. There did not appear to be a consistent association between the rs4713462 A allele and HLA-C gene expression in oral related tissues (Supplementary Figure 3). This may suggest that HLA-C expression alone is unlikely to be mediating this association.
After conditioning on rs4713462, evidence remained of another protective association for the HLA class II region (P < 5 × 10-8). This mapped to an A allele of rs9269942, a multiple nucleotide variation that results in an amino acid change to glutamine in codon 71 (71-Glu) of HLA-DRB1 protein (OR =0.56, P=2.8 x 10−9; Figure 2b, Table 2) and translates to a location in the peptide binding groove of HLA-DRB1*1301 and in three other HLA-DRB1 alleles (0402, 1302 and 1102) for which a protective trend was also identified (Supplementary Figure 4; Supplementary Table 8). This result further reinforces the possible functional role of this variant and refines the association of the HLA haplotype DRB1*1301-DQA1*0103-DQB1*0603 identified in our previous GWAs analysis for OPC9 and with decreased cervical cancer risk10.
When both rs4713462 and HLA-DRB1 71-Glu were conditioned on, no remaining signals were detected (Figure 2c).
In addition to identifying loci associated with HPV-positive OPC risk, haplotype analysis was used to define the boundaries of associated variants within the HLA region given its complexity and extended LD. The only haplotype found associated with OPC risk was the HLA class II haplotype, DRB1*1301-DQA1*0103-DQB1*0603 (Table 2; Supplementary Figure 2a). No other haplotype showed a significant association when considering Class I or both Class I and Class II blocks. Conditional analyses considering DRB1*1301-DQA1*0103-DQB1*0603 and rs4713462 in the same model, also revealed the independence of both effects (Table 3).
We subsequently assessed whether the independent variants identified to be protective of OPC were indicative of a protective effect from a haplotype, from specific amino acids within haplotypes, or from the variants themselves which could be functionally mapped to other variants unrelated to HLA. To refine the associations within the MHC region, model selection was used to identify the combination of variant, amino acid and/or haplotype that best explained the data using BIC (Bayesian Information Criterion) as model selection criterion. Table 3 displays the combined models with the best fit to the data from this search. Model A was considered the best fitting model. This implies that the amino acid 71-Glu in HLA-DRB1*1301 explains the association of the class II haplotype DRB1*1301-DQA1*0103-DQB1*0603 with risk of HPV(+)OPC (Table2; Table 3; Supplementary Figure 2a).
Association of each HLA variant with antibodies against viral proteins suggests differential HLA response to viral proteins
We conducted a protein quantitative trait loci (pQTL) analysis of 8291 MHC imputed variants using HPV16 E6 or L1 antibody levels in all OPC patients to explore associations between these variants and antibody levels. Rs4713462 and 71-Glu were the variants most strongly associated with E6 and L1 antibody levels, respectively (Figure 3, Supplementary Figure 9). Specifically, results showed that the rs4713462 allele (A) was exclusively and strongly associated (B= -0.31, P= 1.3 x 10-8) with lower E6 antibody levels and not with L1 MFI levels (B= -0.07, P= 0.05); 71-Glu, which is located in the peptide binding groove of HLA-DRB1*1301, showed a robust association (B= -0.30, P= 7.0 x 10-8) with lower L1 MFI levels but no correlation with E6 MFI levels (B= -0.12, P= 0.11). This suggests that their association with HPV(+)OPC could potentially be mediated via influencing humoral immune response during viral tumorigenesis and corroborates the independence of these variants in their associations with lower risk of HPV(+)OPC.
We repeated the analyses for the top associated variants in the subset of OPC cases classified as HPV(+) expecting to find a similar association to the one using all OPC cases. Analyses restricting to HPV16 E6+ cases (>1000 MFI) captured again the association with rs4713462 (B=-0.1; P=0.01). The association was less significant as expected given the smaller numbers and reduced range of data (Supplementary Figure 11). We did not see association with the 71-Glu variant and L1 levels when restricting to the HPV16 L1+ cases (B=-0.04; P=0.22). This could again be explained by the more limited data range when one restricts to L1+ cases (Supplementary Figure 11).
Combined seropositivity to HPV16 E6 and L1 was relatively common among OPC cases (in 42%, 685 of 1643 OPC). We therefore investigated the possible influence of E6 or L1 antibody levels on their association with the respective genetic variant by adjusting for the alternative serology marker. As shown in Supplementary Table 11, results are extremely similar to the original results in Figure 3, indicating that adjustment by the other serology marker makes little difference.
Due to the functional connection and correlation of other E proteins (E1, E2, E7) with HPV16 E6, we explored the relationship between the top associated HLA variants and antibodies against these proteins as a sensitivity analysis. We also explored association with other well-known but less common high-risk HPV types (E6 and E7 of HPV18/31/33/35/45/52/58) to assess the question of whether the identified HLA variants are E6 specific associated variants independent of HPV type. Our results suggest that the rs4713462 association might be specific to HPV16 E6 antibodies given that the strongest association was still the one detected with HPV16 E6 (Supplementary Figure 6). Seropositivity for other HPV high-risk types represents a very limited fraction among HPV(+) or HPV(-)OPC reducing the chances for non-HPV16 markers to explain HLA genetic findings (Supplementary Table 9-10).
In the GWA analyses within HPV(-)OPC cases and controls, a novel risk variant at 12p23.3 (rs35189640, OR =2.73, p=1.1 x 10−9; Figure 1b; Table 2) was found significantly associated with risk of HPV(-)OPC. Genomic annotation revealed that rs35189640 is an intronic variant in the BTB domain-containing 11 (BTBD11), a DNA-binding protein such as zinc-finger transcription factors. Interestingly, rs35189640 also maps into a repressed Polycomb-group protein (PcG) binding site specific to epithelial tissue based on chromatin states evaluated in more than 30 tissues (Supplementary Figure 5). No evidence for heterogeneity was detected by geographic region or study characteristics displayed in Table 1 (Supplementary Figure 2d). Other suggestive susceptibility variants were found at five additional loci at 3p26.2, 1p36.13, 13q33.3, 5q34 and 6p21.32 that fell just under the GWA significance level (P < 1x10–7, Supplementary Table 6).
Pooled GWAs of OCC and HPV(-)OPC cases
To improve power and test the reproducibility of our findings, we pooled 2,359 OCC cases with the 565 HPV(-)OPC cases and repeated the analyses. This was done under the assumption that OCC is a predominately HPV(-) disease and has risk factors in common with HPV(-)OPC, i.e. tobacco and alcohol usage.
Manhattan plots for the pooled dataset are displayed in Supplementary Figure 7. In addition to previously reported variants9 for both cancer sites, we identified two susceptibility loci at 6p21.32 (rs3828805, OR=0.77, p=2.5 x 10-9) and 15q21.2 (rs10851478, OR=1.22, p=1.3 x 10-8) common to both cancer entities. The rs3828805 is in LD with HLA-DRB1 71-Glu variant (r2=0.48) that we found to be associated with HPV16 L1 antibody levels in HPV(+) OPC. As expected, rs4713462, which was found exclusively associated with antibodies against the HPV16 E6 viral oncoprotein and is independent from rs3828805 (r2= 0.0004), was not associated with risk of OC/OPC(-) (Table 4). Regarding rs10851478, this variant is a cis-eQTL for the FGF7 gene (Supplementary Figure 8) which is a potent epithelial cell-specific growth factor with an important role in morphogenesis of epithelium.
Assessment of the consistency of effects across studies
Overall, heterogeneity tests showed that HPV(-)OPC and OC share their top disease-related genetic associations while those for HPV(+) and HPV(-)OPC were different revealing a different genetic predisposition and probably genetic architecture (Supplementary Figure 10). In brief, significant heterogeneity (p< 0.05) was identified in the meta-analysis of HPV(+) and HPV(-)OPC for rs4713462 at HLA region and for rs35189640 at BTBD11 locus suggesting an exclusive association with HPV(+) and HPV(-)OPC, respectively (Supplementary Figure 10b). FGF7 and LHPP association showed common and consistent effects to the HPV(-)OPC and OCC analyses (Supplementary Figure 10a) as well as ADH1B that was previously identified as associated with overall OPC and OCC9. BTBD11 showed a stronger effect for HPV(-)OPC (Supplementary Figure 10).