Genome-wide association study identifies novel risk loci for apical periodontitis

Apical periodontitis (AP) is a common consequence of root canal infection leading to periapical bone resorption. Microbial and host genetic factors, and their interactions, have been shown to play a role in AP development and progression. Variations in a few genes have been reported in association with AP however, the lack of genome-wide studies has hindered progress in understanding the mechanisms involved in AP Here, we report the first genome-wide association study of AP in a well-characterized population. Male and female adults (n=932) presenting with deep caries with AP (cases) or without AP (controls) were included. Genotyping was performed using the Illumina Expanded Multi-Ethnic Genotyping Array. Single-variant association testing was performed adjusting for sex and five principal components. Subphenotype association testing, analyses of genetically regulated gene expression, polygenic risk score and phenome-wide association (PheWAS) analyses were also performed. Eight loci reached near-genome-wide significant association with AP (p < 5 × 10–6); gene-focused analyses replicated three previously reported associations (p < 8.9 × 10–5). Sex-specific and subphenotype analyses revealed additional significant associations with variants genome-wide. Functionally oriented gene-based analyses revealed eight genes significantly associated with AP (p < 5 × 10–5), and PheWAS analysis revealed 33 phecodes associated with AP risk score (p < 3.08 × 10–5). This study identified novel genes/loci contributing to AP and revealed specific contributions to AP risk in males and females. Importantly, we identified additional systemic conditions significantly associated with AP risk. Our findings provide strong evidence for host-mediated effects on AP susceptibility.


Introduction
Genome-wide single variant association testing Single variant association analyses were performed using SAIGE (Zhou et al., 2018), which utilizes a generalized mixed model to account for relatedness and a saddlepoint approximation to allow for imbalance in case and control sample sizes. The imputed data was ltered to retain variants with minor allele frequency > 1% and imputation R 2 > 0.5. Genome-wide association analyses for apical periodontitis were conducted, adjusting for ve principal components, and i) adjusting for sex and ii) in sex-speci c male and female analyses. Additional exploratory analyses of AP subphenotypes considering the presence or absence of pulpal and periapical pain were also conducted, adjusting for sex and ve principal components. A conventional p < 5 x 10 -8 threshold was used to denote genome-wide signi cance; p < 5 x 10 -6 was used to identify variants suggestively associated with the phenotype. The variant with the lowest p value within each locus was selected as the sentinel variant. Each sentinel variant was then annotated using ANNOVAR to determine its position relative to the nearest gene(s) .

Power calculations
We calculated power for our single variant GWAS analysis using GAS Power Calculator (Johnson, 2017) Assuming 330 cases and 550 controls, a simple logistic regression model with additive effect, and a prevalence of 0.5, we are 80% powered at genome-wide signi cance (α=5x10 -8 ) to detect a common variant (MAF = 0.25) with an OR of 1.42, or a less common variant (MAF = 0.1) with an OR of 1.60.

PrediXcan analysis
Genetically regulated expression (GReX) levels were estimated using PrediXcan (Gamazon et al., 2015), which employs transcriptome prediction models derived from reference expression and genotype data. Using Joint-Tissue Imputation tissue-speci c expression models (Zhou et al., 2020) for 49 tissues built in the Genotype-Tissue Expression (GTEx) project (Consortium, 2020) reference data, gene expression was imputed for each gene for all available tissues. The imputed expression values were tested for association with AP using a logistic regression model, adjusting for sex and ve principal components. Then, to assess for causality of the genes identi ed using PrediXcan, we used the Mendelian randomization approach described in (Zhou et al., 2020). We created LD scores for all variants in the single variant GWAS results using GCTA v1.93.2 (LD window size = 1000 kb, LD r 2 threshold = 0.01) (Yang et al., 2011). For expression quantitative trait locus (eQTL) data, we used publicly available Genotype Tissue Expression Project v8 "eQTL Tissue-Speci c All SNP Gene Associations" data (https://www.gtexportal.org/home/datasets), retaining for each gene-tissue pair, variants associated with gene expression with P value < 0.05 and all variants present in the JTI model.

Polygenic risk score
We used the summary statistics resulting from our AP GWAS to develop a polygenic risk score (PRS) model using data from 9,221,680 autosomal variants. The PRS model was developed using the PRScs Python tool (https://github.com/getian107/PRScs), which uses an external linkage disequilibrium (LD) reference (the 1000 Genomes Project phase 3 European superpopulation reference panel) and continuous shrinkage model priors on SNP effect size to calculate SNP weights from the AP summary statistics. Our global shrinkage parameter (phi) was set to "auto".
Polygenic risk score application We applied our AP PRS to genetic data in the Vanderbilt University Medical Center biobank with linked electronic health records, BioVU. All BioVU subjects were genotyped on Illumina's MEGA EX Array. Quality control was performed using PLINK v.1.9. All indels and duplicate variants, variants with call rates < 98% and samples with call rates < 97% were removed. Principal components were generated for ancestry classi cation.
Asian. Additional ancestry-speci c quality control steps were performed, removing all variants with a minor allele frequency < 1% and variants missing in > 5% of samples. Variants that deviated from Hardy-Weinberg equilibrium were also removed (p < 1 x 10 -10 ). Haplotype phasing and imputation to TOPMed freeze 8 were performed using Eagle and Minimac4 on the TOPMed Imputation Server (Das et al., 2016;Fuchsberger et al., 2015).
For application of our AP PRS, we restricted to individuals of European ancestry, which comprise the largest BioVU ancestry group. Individual polygenic scores were calculated using PLINK v.1.9. We tested two AP case de nitions for BioVU subjects, based on International Classi cation of Disease (ICD-10) codes. A broader AP case de nition included all individuals with at least one instance of an ICD code under the category of "Disease of pulp and periapical tissues" (K04, K04.*) as a case. A narrower case de nition included only individuals with at least one instance of ICD codes for pulpitis (K04.0), reversible pulpitis (K04.01), irreversible pulpitis (K04.02), acute apical periodontitis (K04.3), or chronic apical periodontitis (K04.4) as a case. We used a two-sample t-test to compare the overall score distribution between AP cases and controls.

Phenome-wide association analysis (PheWAS)
We used the PheWAS R package (Carroll et al., 2014) to run a logistic regression analysis, testing association of each BioVU health record phecode against the AP polygenic risk scores calculated within BioVU. The analysis was restricted to phecodes present in at least 30 subject record sets. The logistic regression model was adjusted for age, sex, and record length. Record length was de ned as each subject's number of unique clinical visits. We corrected for multiple testing by applying a Bonferroni correction for 1,620 phecodes.

Results
Of the 932 study subjects, 53 were excluded during quality control procedures, and 879 individuals (333 cases and 546 controls) were included in the analyses (Supplementary Table 1). A total of 1,736,793 variants passed post-imputation quality control procedures and were included in the single-variant GWAS.

PrediXcan ndings
In our functionally oriented gene-based analyses using PrediXcan, of the 16,709 total genes tested, 8 genes were signi cantly associated with AP (p < 5 x 10 -5 ). Genetically regulated expression of GATC was associated with AP in ve tissues (adrenal gland, one brain tissue, two cell lines, and kidney cortex), while other associated genes, AC010173.1, ZNF750, POP5, ERICH3, COX6A1, SBSN, and AC078788.1, had suggestively signi cant signals in only one tissue (Appendix Table 2). GATC, POP5, and COX6A1 are all in a small region on chromosome 12, suggesting that linkage disequilibrium among variants in the PrediXcan models for these genes, or co-regulation of the genes may be driving association for some of these genes. Mendelian randomization analyses nd a signi cant causal effect of only GATC predicted expression on trait.

Polygenic risk and AP
We created a polygenic risk score from our AP GWAS summary statistics and applied it to an independent biobank with linked electronic health records, BioVU (https://victr.vumc.org/), where we tested association of AP risk score with two AP phenotype de nitions based on International Classi cation of Disease (ICD-10) codes. Using a broader AP case de nition (i.e., all individuals with at least one instance of an ICD code under the category of "Disease of pulp and periapical tissues" [K04, K04.*]), we identi ed 394 cases and 72,204 controls in BioVU. We found a higher risk score in cases compared to controls, albeit not statistically signi cant (two-sample t test, p = 0.34) (Fig. 2a). The area under the receiver operating characteristics curve (ROC) was 0.52, indicating minimal separation of BioVU AP cases from controls (Fig. 2b).
We also performed a sensitivity analysis using a narrower AP case de nition (i.e., including only individuals with at least one instance of ICD We observed slightly less separation in the PRS between cases and controls, though we note that power in the narrower case de nition is reduced due to the lower number of cases (two-sample t test, p = 0.91, ROC = 0.52) (Supplementary Figure 1).

Phenome-wide association analysis (PheWAS)
Although we did not nd evidence of genome-wide signi cant difference in AP PRS in BioVU, we tested our AP risk score for association with 1,620 other phecodes in BioVU. Using Bonferroni correction, a phenome-wide signi cance threshold of p < 3.08 x 10 -5 was obtained, resulting in 33 phecodes signi cantly associated with AP risk score. Among the associated phecodes, nine are related to a circulatory system pathway, ve to endocrine/metabolic pathways, and four to hematopoietic pathway, among others (Fig. 2c, Supplementary Table 3).

Discussion
Here, we report the results of the rst GWAS of AP and reveal novel genes and loci contributing to AP risk. Our results show 24 loci reaching neargenome-wide signi cant association signals, and eight variants associated with higher risk of AP. Consistent with most GWAS studies, most of the associated variants are located in noncoding regions of the genome and may act as proxies of causal variants which are often located in regulatory regions that modulate gene expression .
In the genome-wide analyses, the strongest associated variant is located close to RAP1 GTPase activating protein (RAP1GAP), a negative regulator of Ras-associated protein1 (RAP1). RAP1 plays a major role in normal and in ammatory process such as macrophage phagocytosis, chemokine-induced adhesion and leukocyte migration, lymphocyte and dendritic cell homing, as well as adhesion to extracellular proteins such as bronectin, brinogen, collagen, and laminin (Retta et al., 2006). By inhibiting RAP1 function, RAP1GAP appears to contribute to severe impairment of macrophages to ingest opsonized particles (Maffei et al., 2013). The second highest association was noted with a variant in PALLD (Palladin, cytoskeletal associated protein), which plays a central role in promoting cell motility. Increased palladin expression and cell migration were observed in wounded tissue, suggesting a conserved function for PALLD in post-injury tissue remodeling events (Goicoechea et al., 2008). Of the remaining signi cantly associated variants, those located within SPP1 and MEPE, and those nearby long non-coding RNAs (lncRNAs) are also noteworthy. SPP1 (secreted phosphoprotein 1) facilitates the attachment of osteoclasts to the mineralized bone matrix and displays high-a nity binding to hydroxyapatite; it also acts as a cytokine that upregulates the expression of IFNG and IL12. MEPE (matrix extracellular phosphoglycoprotein) encodes a secreted calcium-binding phosphoprotein that belongs to the SIBLING family of proteins and plays an important role during early odontoblastic differentiation and late dentin mineralization. Long non-coding RNAs (lncRNAs) are emerging as novel promising biomarkers for diagnosis and prognosis of in ammatory conditions such as cancer and cardiovascular disease and have recently been reported as novel players in oral in ammatory disorders including oral squamous cell carcinoma (Magagula et al., 2017;Zhang et al., 2020). Although the expression and function of lncRNAs in AP remain to be elucidated, our results provide new insight into additional regulatory mechanisms in AP pathogenesis.
When stratifying AP into symptomatic and asymptomatic subphenotypes, signi cant associations were noted with 15 variants suggesting that the individual response to pain in cases of AP may have a genetically linked component. Additional gene-based association analyses revealed novel susceptibility genes in known and new signaling pathways and replicated the association of previously reported genes with AP. Most of the associated genes belong to pathways involved in immune-regulation, bone metabolism and peripheral pain. The top associated gene, IL4, has been suggested as likely playing a protective role in AP-induced bone resorption due to its antiosteoclastogenic action, as shown in IL4null mice, and faster AP progression (Freire et al., 2021). TNF expression in AP tissues and its role in NFkB-mediated bone resorption is also well documented (Alvares et al., 2018), whereas TAC1, which encodes products of the tachykinin peptide hormone family (e.g., substance P, neurokinin A, neuropeptide K and gamma), is also a plausible player in AP pathogenesis, given its expression during tooth development and role as a neurotransmitter and vasodilator of relevance to the dental pulp (Weil et al., 1995).
Evidence from epidemiological and animal model studies support sex-based differences in predisposition to AP disease as well as response to treatment (Valerio and Kirkwood 2018). In this study, sex-speci c single-variant association analyses revealed signi cant and non-overlapping associations in males and females. The associated genes are dispersed throughout the genome and have not yet been reported to be involved in AP, although this merely re ects a limitation of publicly available genome-wide expression data generated from oral/dental tissues. These ndings are intriguing and provide evidence of sex bias in predisposition to AP and further corroborate studies in animal models which clearly Integrating GWAS and expression quantitative trait loci (eQTL) data is a powerful analytical framework to predict potential susceptibility genes for complex diseases (Li et al., 2022). Here, we utilized PrediXcan to estimate genetically regulated expression and test for association with AP, genome wide. Predicted expression of eight genes was signi cantly associated with AP; in particular, GATC, COX6A1, and POP5 expression was noted in brain or pituitary gland; GATC expression was predicted to be causal for trait based on Mendelian randomization. Of note, while tissue types in the GTEx project do not include oral/dental tissues (exception for minor salivary gland), gene expression is highly correlated across tissue types for many genes, meaning that even tissues without a clear link to AP can serve as important source for discovery, as gene expression may be very similar in AP-relevant tissues.
AP is a polygenic disorder and therefore association with a single genetic variant is not enough to assess disease risk (de Souza et al., 2019). Instead, the combined effect of a set of variants is necessary to obtain a measure of genetic risk. Though power is often a limitation of sample size, we created a polygenic risk score based on our GWAS ndings and applied it to an independent biobank dataset where we tested the association of AP risk scores with AP phenotypes based on ICD codes.
A non-signi cant difference was found in cases compared to controls; we note that prevalence of AP in BioVU is substantially lower than prevalence estimates elsewhere (Mehrazarin et al., 2017), indicating that AP is likely underreported in the biobank, which likely impacts our power to test our AP risk score in this dataset. Because of these limitations, we note that our PheWAS results should be interpreted cautiously, however, AP risk scores were signi cantly associated with the risk of 33 phecodes related to circulatory system, endocrine/metabolic, and hematopoietic pathways. These ndings reinforce the hypothesis that an individual's AP risk may be modi ed based on the presence of other systemic health/disease markers in the background.
Indeed, increasing evidence has shown the association of AP with systemic conditions including diabetes and cardiovascular disease (CVD) (Aminoshariae et al., 2017;Jakovljevic et al., 2020;Liljestrand et al., 2016). Most of the studies, however, re ect epidemiological associations that do not directly address the underlying biological mechanisms potentially contributing to a shared genetic etiology. In a recent prior study, we showed that AP was signi cantly associated with CVD and related risk factors, particularly hypertension. Further, we showed that a variant in KCNK3 (potassium two-pore domain channel subfamily k member 3), associated with pulmonary arterial hypertension (Ma et al., 2013), was also signi cantly associated with AP (Messing et al., 2019). Corroborating these ndings, one of the top genes associated with AP identi ed in this study, PALLD, has been previously reported to have variants associated with increased risk of pancreatic cancer type 1 and myocardial infarction (Pogue-Geile et al., 2006). These ndings support the potential for a shared genetic predisposition between AP and systemic conditions and further highlight the critical need for dissecting the genetic etiology of AP (Messing et al., 2019).
Our study has some limitations, including that the data was generated from a single dataset of limited size and no replication data is currently available. Nonetheless, our subject population consists of a well-characterized, multi-ethnic sample, including individuals with European, Hispanic/Latino, admixed African, and Asian ancestry. Although population strati cation is a common source of potential confounding in genetic studies, we utilized a genotyping array that is well-suited for capturing variants relevant to multi-ethnic study populations and employed stringent analytical methods that correct for population strati cation, thereby increasing con dence in our results. Additionally, our data also enabled validation of the association between AP and known candidate genes/pathways (e.g., ILs, MMPs, TNF, NOS, NFKB1, and CCL genes) previously reported in different populations.

Conclusion
Our study identi ed novel genes/loci that may contribute to AP phenotypes and strengthened the evidence of sexual dimorphism in uencing AP risk. Despite low power to develop a PRS for AP risk, additional systemic conditions were found to be associated with AP genetic risk. Identifying independent phenotype-associated genes with high reliability remains challenging, especially for complex diseases. Overall, our ndings provide evidence for host-mediated genetic effects on AP susceptibility.

Declarations
Funding This study was supported by National Institute of Dental and Craniofacial Research grant R03DE027494. The content is solely the responsibility of the authors and does not necessarily represent the o cial views of the National Institutes of Health.

Competing Interests
The authors have no relevant nancial or non-nancial interests to disclose.
Author contributions LEP contributed to data acquisition, analyses, interpretation, and drafted the manuscript. LCS, ARV, contributed to sample collection, data interpretation, and critically revised the manuscript. DMS contributed to data acquisition and drafted the manuscript. JEB contributed to design, data interpretation, drafted and critically revised the manuscript. RS, AL contributed to conception, design, sample collection, data interpretation, drafted and critically revised the manuscript. All authors gave their nal approval and agreed to be accountable for all aspects of the work.

Data Availability
The datasets generated during and/or analyzed during the current study are available in dbGaP repository.

Ethics approval
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the UTHealth-Houston Committee for the Protection of Human Subjects and the University of Pittsburgh Institutional Review Board. Informed consent was obtained from all individual participants included in the study. b Imputation r 2 c p value < 5 x 10 -6 indicates suggestive association. a Based on human genome assembly GRCh37. dist, distance to closest gene in base pairs (in parenthesis).
b Imputation r 2 c p value < 5 x 10 -6 indicates suggestive association. a Based on human genome assembly GRCh37. b p value < 5 x 10 -6 indicates suggestive association (bold) Figure 1 Manhattan plot showing the results of our genome-wide association study of AP. a, AP vs. controls; b, AP subphenotypes vs. controls; c and d;

Figures
sex-speci c analysis in females (C) and males (D). Each individual dot represents one SNP. Each point represents a tested genetic variant, in order by location. P-values shown on -base-10 logarithmic scale.
The dashed line represents the P threshold = 5 x 10 -6