We conducted gene-based rare variant aggregation tests to identify novel candidate genes that could explain the missing heritability for NSOFCs. In total, we identified 13 genes with suggestive associations primarily driven by rare missense variations. Seven genes (ALKBH8, CENPF, CSAD, EXPH5, PDZD8, SLC16A9, and TTC28) showed consistent expression in relevant mouse and human craniofacial tissues during the formation of the face and one gene (ABCB1) without a mouse ortholog showed expression in human craniofacial tissues. Further, three genes (ABCB1, TTC28, and PDZD8) were predicted to be intolerant to missense variations.
Using biological plausibility to prioritize loci/genes with suggestive association in GWAS has been previously shown as a valid approach to bypassing the large sample size requirement needed to achieve significant association 22. While gene mutation constraint is a good metric for identifying pathogenic genes, it is more commonly seen with a dominant disease-causing gene and may not be informative in other disease models (e.g., recessive). Thus, we prioritized the 8 associated genes (ABCB1, ALKBH8, CENPF, CSAD, EXPH5, PDZD8, SLC16A9, and TTC28) with consistent expression in relevant craniofacial tissues during human or mouse face development. Majority of these genes (TTC28, CSAD, EXPH5, SLC16A9, ALKBH8, and CENPF) showed biased expression towards mesenchymal cells subtypes from human craniofacial tissues. This could point to the importance of mesenchymal cells in palate formation since these genes were associated with either NSCLP or NSCPO and not NSCLO. Moreover, this could also be due to the stage (CS17) of embryonic development at which the facial prominences were harvested. The CS17 stage (~ 7 weeks post fertilization), a period that coincides with the later stages of lip formation and the beginning of palate formation.
Five genes (ABCB1, TTC28, PDZD8, CENPF, and ALKBH8) have been previously implicated in NSOFCs or diseases presenting with cleft phenotypes. The ABCB1 and TTC28 were associated with NSCL/P while the PDZD8, CENPF, ALKBH8 genes are associated with NSCPO. These genes except the ABCB1 without a mouse ortholog showed consistently high expression and enrichment in mouse craniofacial tissues especially the palate. The ABCB1 gene is an ATB binding cassette gene that functions to regulate fetal exposure to xenobiotics through the placenta 23. Single nucleotide variations in the ABCB1 gene have been reported to increase the risk of NSCL/P 23. The TTC28 gene is in the 22q12.2 region and previous case report on patients with microdeletion of this region implicate this gene as potential candidate for pierre robin sequence- a condition that presents with cleft palate 24. Furthermore, copy number variations overlapping this gene have been reported in cleft palate patients 25. The PDZD8 gene assists with lipid transfer from the endoplasmic reticulum to the endosomes and lysosomes 26,27. Burden of variations though not statistically significant have been previously reported in this gene among patients with cleft lip with or without palate 28. CENPF is a kinetochore associated protein that colocalizes with the IFT88 (a ciliopathy gene) and compound heterozygous mutations in the CEPNF gene were reported in a human fetus with ciliopathic malformations, including cleft palate 29. Mutations in ALKBH8 cause intellectual developmental disorder, autosomal recessive 71, MRT71 (OMIM #618504) in humans. This condition presents with craniofacial dysmorphic features, which include long lips with V-shaped upper lip, macrostomia, and retruded mandible 30. Macrostomia and retruded mandible may cause cleft palate by impeding the elevation of palatal shelves during palate formation 31.
We identified three potential novel genes (CSAD, EXPH5, SLC16A9) for NSOFCs. The burden of variants in CSAD and EXPH5 were associated with NSCLP while those in the SLC16A9 gene were associated with NSCPO. Although these genes lack previous reports of direct association with NSOFCs phenotypes, they have been implicated in processes crucial for craniofacial development. The EXPH5 gene, for instance, has been shown to play a role in cell-cell adhesion 32; a critical process in craniofacial morphogenesis. The SLC16A9 gene is linked to lipid metabolic traits 33; a functionally relevant downstream target of the non-canonical transforming growth factor beta (TGFbeta) signaling - a signaling mechanism involved in face formation. The CSAD gene functions in the biosynthesis of taurine and the role of taurine in organogenesis has been demonstrated in mice 34.
In the current study, we replicated the ABCB1 gene association which was previously reported for NSCL/P in a common variants’ association study. While some reports have demonstrated the accumulation of rare variants in genes previously identified through common variants association analyses 35, suggesting a convergence of common and rare variants in loci/genes associated with NSOFCs phenotypes, other reports suggest that common and rare variants may act through separate loci/genes28. For instance, targeted sequencing of regions surrounding genome-wide significant loci for NSOFCs showed no evidence of rare variants burden in genes/regulatory regions proximal to these loci 28. Additionally, rare variants are population-dependent, which could explain why we did not replicate previously reported genes/loci from rare variant association studies in other populations. Furthermore, our tests of association require that we exclude any gene with only one rare variant, even if the same gene harbored common variants. This might have resulted in the omission of genes with contributions from both rare and common variants if participants in our cohort only harbor one rare variant in the gene. Therefore, future studies should consider a model that allows for the incorporation of both common and rare variants.
Our study has some limitations. First, we used array-based genotype data for discovery. This approach means some rare variants were not examined. Second, we controlled for population stratification by adding the top 10 genotype PCs as covariates in our gene-based association regression models. Adjusting for top PCs has been shown to prevent p-value inflation and reduce false positive rates in common variant analysis, but its performance for rare variant analysis remains controversial 36. The reason is that rare variants, being newer mutations, reflect a more granular population substructure compared to common variants 37. Finally, we restricted our analysis to only the coding (missense and Lof) and splice-altering variants. Rare variants including insertions/deletions in non-coding regulatory regions also contribute to the etiology of NSOFCs; however, defining these regions' analytical units and their functional characterization remains a challenge. Hence, future studies should use WGS data for the gene-based analysis to capture more rare variants and leverage annotated craniofacial enhancers regions to analyze rare variants in non-coding regions.
In summary, we identified 13 genes with suggestive associations in our GWAS data. Among the 13 genes, 3 genes (ABCB1, TTC28 and PDZD8) were predicted to be intolerant to missense variations. Human and mouse transcriptomics data further supported the association of 8 genes. Of the 8 genes, five genes (ABCB1, TTC28, PDZD8, CENPF, ALKBH8) were previously associated with NSOFCs or diseases presenting with cleft phenotypes. The remaining three genes (CSAD, EXPH5, SLC16A9) are potentially novel candidate genes for NSOFCs.