Transcriptome-wide association study identi es novel susceptibility genes contributing to hearing impairment

Chengyong Xie Medical College of Guizhou University Yuguang Niu the First Medical Center of General Hospital of PLA Jie Ping Beijing Institute of Radiation Medicine Qian Zhang Medical College of Guizhou University Zheng Zhang Beijing Institute of Radiation Medicine Yahui Wang Beijing Institute of Radiation Medicine Chenning Yang Beijing Institute of Radiation Medicine Gangqiao Zhou Beijing Institute of Radiation Medicine Yuanfeng Li (  liyf_snp@163.com ) Beijing Institute of Radiation Medicine


Introduction
Hearing impairment (HI) is the most common sensorineural disorder around the world. It is estimated that HI is the fourth leading cause of disability, which affects approximately 1% of the population worldwide 1,2 . HI may lead to decreased work e ciency, social withdrawal and depression, therefore seriously affecting the life quality of humans, and causing huge economic losses to society.
It is commonly agreed that both genetic and environmental factors play important roles in the development of HI 3 . The known environmental risk factors of HI include noise, ototoxic drugs, ear infections and so on 2 . In addition to these environmental risk factors, genetic factors were also strongly suggested to be involved in HI by twin studies 4,5 . Recently, using genome-wide association studies (GWASs), we and others have identi ed a number of single nucleotide polymorphisms (SNPs) that were signi cantly associated with HI, such as rs7598759 (NCL), rs161927 (GRM7), rs2687481 (GRM8), rs681524 (SIK3), rs78043697 (PCDH20), rs7032430 (SLC28A3), rs9574464 (DCLK1), rs10815873 (PTPRD), rs898967 (CMIP), rs4932196 (ISG20) and rs5756795 (TRIOBP) [6][7][8][9][10][11][12][13][14][15] . Despite the signi cant success of GWASs in identifying loci that contribute to the genetic architecture of HI, a number of these identi ed loci re ect little direct biological relevance to this disease 16 . The transcriptomewide association studies (TWASs), which integrated the GWAS data and the expression quantitative trait loci (eQTL) data, has emerged as a valuable supplyment to the GWAS. TWASs examine the associations between gene expressions and traits, rather than the associations between SNPs and traits. Thus, TWASs allowed for more interpretable biologically relevant results 17 . By use of TWAS, a number of genes have been identi ed to be signi cantly associated with several types of common diseases, including attention de cit/hyperactivity disorder (ADHD) and calci c aortic valve stenosis (CAVS) [18][19][20] .
To identify novel genes conferring susceptibility to HI, we here performed a TWAS by combining the individual-level genotype data of two HI cohorts from two previously published HI GWASs 13,21 with the whole blood eQTL data from Genotype-Tissue Expression (GTEx) (v8). We identi ed ARL6IP6 on chromosome 2q23.3 and TMEM127 on chromosome 2q11.2 as the candidate genes of HI. Our ndings expanded our understanding of the genetic susceptibility to HI.

Results
Transcriptome-wide association analyses To detect novel genes conferring susceptibility to HI, we conducted a TWAS using PrediXcan (Fig. 1). In this TWAS, we collected two publicly available GWAS datasets from a Chinese case-control population and an European case-control population, respectively ( Table 1). The Chinese cohort included 298 individuals, consisting of 89 cases and 209 controls. The European cohort included 1,161 individuals, consisting of 578 cases and 583 controls. No outlier was present using the PCA (Supplementary Fig. 1). To ensure the robustness of follow-up analyses, we conducted a uni ed quality controls (Methods), and nally retained 3,830,463 SNPs in the Chinese cohort and 5,767,726 in the European cohort, respectively. We then imputed the gene expressions based on the genotypes of these SNPs using the eQTL data of 670 blood tissues from GTEx (v8) as reference. Thus, we generated the expression levels for a total of 12,624 genes.
Next, we carried out gene-phenotype association analyses for the two cohorts in a linear regression model, with adjustment for the sex, principal component (PC) 1 and PC2 from PCA. In the European cohort, a total of 171 genes reached the transcriptome-wide signi cance threshold of P < 3.96 × 10 -6 (0.05/12,624) (Supplementary Table 1). Among these 171 genes, 90 ones were upregulated, and 81 ones were downregulated in HI cases. In the Chinese cohort, no gene reached the transcriptome-wide signi cance threshold, however, 231 genes showed nominal evidence of association (P < 0.05; Supplementary Table 2).
Among these 231 genes, 121 ones were upregulated, and 110 ones were downregulated in HI cases.
We then conducted meta-analyses to combine the association results of the European cohort and the Chinese cohort. Manhattan plots were created to show the association and direction between genes and HI risk in the meta-analyses (Fig. 2). In the meta-analyses, we identi ed a total of 46 genes signi cantly associated with HI (P < 3.96 × 10 -6 , Supplementary Table 3). Among these 46 genes, four ones (ARL6IP6, TMEM127, TOMM7 and JAM3) showed nominal evidence of association in both cohorts (P < 0.05), and the association directions for these 4 genes were consistent across the two cohorts (Table 2). Subsequent conditional analyses showed that the associations of these four genes were independent each other. Further, these four genes were outside of the loci that were reported to be signi cantly associated with HI (± 500 kb). These results suggested that these four genes may be the novel candidate susceptibility genes contributing to HI.
Four candidate susceptibility genes ARL6IP6 (ADP Ribosylation Factor Like GTPase 6 Interacting Protein 6) gene, which is located on chromosome 2q23.3, reached the transcriptome-wide signi cance (P TWAS = 5.85 × 10 -8 ). The rs7589899 was the lead SNP at the ARL6IP6 locus that was signi cantly associated with HI (odds ratio [OR] for T allele = 0.90, 95% con dence interval [CI] = 0.76-1.07, P GWAS = 7.70 × 10 -14 ). This SNP was located in the third intron of ARL6IP6. The risk allele [C] of this SNP was signi cantly associated with the high expression of ARL6IP6 in whole blood (P eQTL = 0.011; Supplementary Figs. 2a and 2b). Formal Bayesian colocalization revealed a posterior probability (PP) of shared signals of 57.5% (Fig. 3a), providing evidence that the GWAS and eQTL signals may share the same variants at the ARL6IP6 locus. Conditional analyses showed that the association between rs7589899 and HI was not signi cant after adjusting for the expression of ARL6IP6 (P GWAS = 0.88), indicating that ARL6IP6 explains all of the signal at its locus. These results suggested that high expressions of ARL6IP6 may increase the risk of HI.
TMEM127 (Transmembrane Protein 127) gene, which is located on chromosome 2q11.2, reached the transcriptome-wide signi cance (P TWAS = 1.78 × 10 -6 ). The rs607302 was the lead SNP at TMEM127 locus that was most signi cantly associated with HI (OR = 1.12, 95% CI = 0.95-1.32, P GWAS = 1.01 × 10 -4 ). rs607302 is an intergenic variant. The blood eQTL rs607302-TMEM127 indicated that the risk allele [G] for HI is associated with lower mRNA expression levels of TMEM127 in whole blood (Supplementary Figs. 2c and 2d). The formal Bayesian colocalization revealed a PP of shared signals of 75.5% (Fig. 3b), providing evidence that the GWAS and eQTL signals may share the same variants at the TMEM127 locus. Conditional analyses showed that the association between rs607302 and HI was not signi cant after adjusting for the expression of TMEM127 (P GWAS = 1), indicating that TMEM127 explains all of the signal at its locus. These results suggested that higher expressions of TMEM127 decrease the risk of HI. Bayesian colocalization also revealed a PP of shared signals (PP4) of 1.29%, suggesting that the GWAS and eQTL signals may not share the same variants at the TOMM7 locus. These results suggested that the association between TOMM7 expression and HI may be a joint effect of multiple SNPs. JAM3 (Junctional Adhesion Molecule 3) gene, which is located on chromosome 11q25, reached transcriptome-wide signi cance (P TWAS = 3.33 × 10 -6 ). rs10750560 was the lead GWAS SNP at JAM3 locus that was normally associated with both HI (OR = 0.79, 95% CI = 0.66-0.95, P GWAS = 2.42 × 10 -3 ) and the expression of JAM3 (P eQTL = 1.69 × 10 -2 ; Fig. 3d, Supplementary Figs. 2g and 2h). Formal Bayesian colocalization revealed a PP of shared signals (PP4) of 0.76%, providing evidence that the GWAS and eQTL signals may not share the same variants at the JAM3 locus. These results suggested that the association between JAM3 expression and HI may be a joint effect of multiple SNPs.

Pathway enrichment
To understand the biologically relevant pathways involved in HI, pathway analyses were conducted using KEGG, Wiki and Reactome Pathways in the EnrichR web server 22,23 . We used the 46 signi cantly associated genes in the meta-analyses of the two cohorts (P TWAS < 3.96 × 10 -6 ). In this analysis, no gene sets reached the signi cant threshold of P < 2.59 × 10 -5 (0.05/1,927) accounting for multiple testing corrections. However, 12 candidate gene sets were identi ed to be nominally signi cantly associated with HI (P < 0.05; Supplementary Table 4). Intriguingly, four of these 12 candidate gene sets were biological plausibility in the development of HI [24][25][26][27] , including ciliary landscape (P = 5.55 × 10 -4 , ranks the rst), cholesterol biosynthesis (P = 7.09 × 10 -3 , ranks the third), glycerophospholipid metabolism (P = 9.65 × 10 -3 , ranks the fth) and IL-4 signaling pathways (P = 1.11 × 10 -2 , ranks the sixth) (Supplementary Table 4). For example, the cilia on the inner and outer hair cells can convert mechanical de ection signals into electrochemical signals, and participate in the transmission of sound 28 . Cholesterol metabolic pathways were demonstrated to play important roles in recovery of hearing ability following noise-induced hearing loss 25 . IL-4 plays a key role in ototoxicity by activating the phosphorylation of STAT6 and promoting the production of in ammatory cytokines 27 , which were involved in mediating cochlear damage during the acute stage of HI 27 . Glycerophospholipid metabolism was reported to be involved in the the regulation of voltage-gated K+ channel 26 , which plays important roles in the inner ear and hearing loss. Taken together, using TWAS, we identi ed several pathways signi cantly associated with HI. However, the underlying mechanisms of these pathways in the development of HI require further investigation.
To further determine whether the pheWAS traits were genetically correlated with HI, and in which direction, we performed the genetic correlation analyses between HI and the 12 phenotypes from pheWAS 29 ( Supplementary Fig. 3). Interestingly, we detected signi cant positive genetic associations between HI and two other diseases: the narcolepsy (r = 0.164, P = 9.38 × 10 -7 ) and waist-to-hip ratio (r = 0.114, P = 1.28 × 10 -5 ). In addition, we also found normally signi cant positive genetic associations between HI and type 2 diabetes (r = 0.087, P = 7.31 × 10 -4 ), as well as "Never eat eggs, dairy, wheat, sugar" (r = 0.290, P = 8.18 × 10 -3 ) and schizophrenia (r = 0.070, P = 4.60 × 10 -3 ). On the contrary, we detected a normally signi cant negative genetic association between HI and the age at menarche (r = -0.061, P = 1.70 × 10 -2 ). Several of these phenotypes were previously thought to be closely related to HI, thus rea rming the genetic relevance [30][31][32][33] . For examples, higher waist-hip ratio and type 2 diabetes have been previously shown to be risk factors for HI [34][35][36] . In addition, there is a lot of overlapping symptoms in HI and neurological diseases, including schizophrenia 37 . Taken together, these results suggested that there are relationships between HI and several other traits.

Discussion
In the present study, we conducted a TWAS on the risk of HI. We identi ed four novel candidate genes ARL6IP6 (2q23.3), TMEM127 (2q11.2), TOMM7 (7p15.3), JAM3 (11q25) contributing to the genetic susceptibility to HI. We also showed that the ciliary landscape, cholesterol biosynthesis, glycerophospholipid metabolism and IL-4 signaling pathways are associated with HI. To our best knowledge, this is the rst TWAS for the genetic susceptibility to risk of HI.
Recent GWASs have successfully identi ed several loci signi cantly associated with HI. However, the functional signi cance of these associations remain elusive due to the inability to ne-map to tissue-speci c and tissue-relevant genes. In fact, GWAS loci for other complex traits are also di cult to interpret. Thus, the development of methods to prioritize causal genes of GWAS attracts the attentions of geneticists. TWAS is one of the methods to solve this problem, which integrates genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits. In this study, using TWAS, we identi ed four genes associated with HI risk in the whole blood. Thus, our results demonstrated the power of TWAS.
ARL6IP6 is involved in DNA repair and apoptosis 38 , which is mainly expressed in nucleus and plasma membrane. Previous GWAS identi ed that ARL6IP6 is signi cantly associated with ischemic attacks 39 . Moreover, a homozygous truncating mutation in ARL6IP6 was suggested to be the likely cause of cutis marmorata telangiectatica congenita (CMTC), which is associated with ischemic attacks 40 . Interestingly, evidence has shown that ischemic stroke is associated with HI. Previous studies have shown that the blood supply of the auditory system comes from the vertebral basal system, so a ischemic stroke in the vertebral basal circulation can present with acute HI 41 . In fact, the incidence of HI following ischemic stroke is 8.0% 42 .
Thus, the nding that ARL6IP6 is a potential HI susceptibility gene provided genetic evidence for this phenomenon.
TMEM127 encodes a transmembrane protein with 3 predicted transmembrane domains. Previous studies suggested that TMEM127 is mainly expressed in the plasma membrane and endosome 43 , and it dynamically associates with the endomembrane system 44 . In vitro assays indicate that TMEM127 is a negative regulator of mammalian target of rapamycin (mTOR) signaling 44 . Mouse models showed that sustained mTOR activation disrupted the redox balance in neurosensory epithelium, thus causing early-onset death of cochlear hair cells and progressive HI 45 . Treatment with rapamycin reduced the activity of mTOR in cochlear hair cells, and reduced hair cells from injury in vivo 45 . Collectively, these observations suggested that TMEM127 is a candidate gene for the genetic susceptibility to HI.
TOMM7 encodes a component of the protein translocase of the outer mitochondrial membrane (TOM) complex 46 . Previous studies have revealed that TOMM7 was involved in the sorting and accumulation of the preproteins at the outer membrane, which played an important role in protein import 47 . JAM3 encodes a member of the junctional adhesion molecule family, and is reported to be involved in the regulation of cell migration or polarization 48 . However, no evidence has ever shown that these two genes was involved in HI.
Further studies are needed to reveal the underlying mechanisms of these two genes contributing to HI. Several limitations of this study are worth mentioning. First, although this study included two independent cohorts, the sample size remains limited. Replications in additionally independent cohorts are needed.
Secondly, the pleiotropy is statistically indistinguishable from truly causal genes. Thus, the susceptibility genes identi ed by TWAS may be not the truly causal genes. Experimental validations are needed.
In summary, our TWAS reveals novel genes contributing to the susceptibility to HI across the European cohort and the Chinese cohort. These ndings expand our understanding of the genetic susceptibility to HI and might shed light on the prevention and clinical implications of the candidate genes in the treatment of HI.

Study participants
In the present study, we collected two independent datasets for GWAS of HI ( Table 1). The rst GWAS dataset was derived from a Chinese HI cohort, consisting of 89 cases and 209 controls 21 . The second GWAS dataset derived from an European cohort, consisting of 578 cases and 583 controls 13 . The genetic origins of the subjects in these two cohorts were con rmed by principal component analysis (PCA).
According to the World Health Organization, if the minimum threshold of hearing on either side of the ears of a subject is greater than 25 decibel (dB), this subject can be regarded as a case of HI 49 . According to this criteria, the Chinese cohort contains 89 cases and 209 controls. The mean age of the cases (23.8) is similar to that of controls (23.3) (P > 0.05), and all the cases and controls are males ( Table 1). The European cohort contains 578 cases and 583 controls. The mean age of the cases (61.2) is similar to that of controls (61.4) (P > 0.05). The male/female ratios of the cases and controls are 0.98 (286/292) and 1.06 (300/283), respectively (P > 0.05; Table 1).

GWAS data of the Chinese cohort
This GWAS dataset was derived from our previous GWAS for HI in Chinese population, which was publicly available through the following link http://cbportal.org/pub les/JCMM-07-2020-200.zip. Detailed information on these subjects was described previously 21 . Brie y, this GWAS contained a total of 89 cases and 209 controls. All the subjects were recruited from a single factory at Bengbu city, Anhui province, China. The SNPs were genotyped using the Illumina In nium Asian Screening Array-24 (v1.0), and imputed using IMPUTE2 software (v2.3.1). We performed SNP quality controls for further analyses. SNPs were excluded if they: (i) had a call rate < 90%; (ii) had a minor allele frequency (MAF) < 0.05; (iii) were not autosomal SNPs; or (iv) a P value < 1.0 × 10 -4 in Hardy-Weinberg equilibrium (HWE) test. After ltering, a total of 3,830,463 SNPs were retained.

GWAS data of the European cohort
This GWAS dataset was derived from a GWAS for HI in European population, which was publicly available through the following link https://tgen.org/research/research -divisions/neurogenomics/supplementary-data/gwas_polygenic_arhi_fransen_et_al.aspx. Detailed information on these subjects was described previously 13 . Brie y, this GWAS contained a total of 578 cases and 583 controls. All the subjects were recruited from the Antwerp University Hospital at Antwerp city, Belgium 13 . The SNPs were genotyped using the Illumina CNV370 Quad Chip or Illumina HumanOmniExpress BeadChip (Illumina, Inc., San Diego, CA, USA), and imputed using IMPUTE2 software (V2.1.2). We performed the same quality controls as that used in the GWAS dataset of the Chinese cohort. After ltering, a total of 5,767,726 SNPs were included in this analysis.

Principal component analysis (PCA)
To quantify the population structure, principal component analysis (PCA) was implemented in EIGENSTRAT (v3.0). PCA was conducted on the subjects in these two cohorts of this study and additional 2,504 subjects from 1,000 Genomes Project

Transcriptome-wide association study
The TWAS was performed using the PrediXcan software 50 . PrediXcan is a gene-based association test software that prioritizes genes that are likely to be causal for the phenotype. PrediXcan rstly builds transcriptome expression prediction models based on expression quantitative trait loci (eQTL) data. Then, using the prediction models, PrediXcan imputes gene expressions at the individual level based on the genomewide SNPs data. Finally, PrediXcan performs the association tests based on the imputed transcriptome levels 51 . In the present study, we used the prediction model from the blood tissues (n = 670) from GTEx (v8) database in the PredictDB data repository. We performed TWAS on the Chinese cohort and the European cohort separately. Meta-analyses were then performed to combine the association results of these two cohorts. In this study, a total of 12,624 genes were analyzed, leading to the transcriptome-wide signi cance threshold of P = 3.96 × 10 -6 (0.05/12,624).

Bayesian colocalization
To test whether the leading variant of the GWAS and the eQTL signal was the same one, we used the COLOC (version 3.2-1) pack in R to perform colocalization analyses 52 . As the majority of gene expression datasets were generated on the basis of European-ancestry samples, colocalization analyses were based on the GWAS summary statistics from the European cohort and the whole blood eQTL data from European population from the GTEx (v8). COLOC tested for ve hypotheses: H0, no eQTL and no GWAS association; H1, association with eQTL, but no GWAS; H2, association with GWAS, but no eQTL; H3, eQTL and GWAS association, but independent signals; and H4, shared eQTL and GWAS associations. The main interest is to assess whether the GWAS and eQTL signals are consistent with shared causal variants (i.e., H4). The result of this procedure is ve posterior probabilities (PP0, PP1, PP2, PP3 and PP4). In practice, a high posterior probability (PP4 > 50%) indicates that the GWAS and eQTL signals colocalize 53 . We used an R package, LocusCompareR, for visualization of colocalization events in local environments, which generates a combined plot with two locuszoom plots (eQTL and GWAS in the same gene region) and a locus-compare scatter plot (eQTL -log 10 (P) to GWAS -log 10 (P)). The gure could indicate whether the GWAS top locus is also the leading SNP in the eQTL result 54 .

Pathway enrichment analyses
To explore the potential biologically relevant pathways involved in HI, we performed pathway enrichment analyses using the EnrichR web server (https://maayanlab.cloud/ Enrichr/) 22,23 . In this analyses, we used 46 signi cantly associated genes in the meta-analyses of the two cohorts (P TWAS < 3.96 × 10 -6 ). A total of 186 gene sets from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.genome.jp/kegg/), 242 gene sets from the WikiPathways (https://www.wikipathways.org/index.php/WikiPathways) and 1,499 gene sets from the Reactome database (http://www.reactome.org) were included in our analyses. We performed multiple testing corrections using the Benjamini-Hochberg (BH) method, and the false discovery rate (FDR) values of less than 0.05 were considered to be statistically signi cant. However, no gene set reached a signi cant threshold accounting for multiple testing correction (all FDR > 0.05).

Phenome-wide association studies
To check the associations between the four signi cantly associated genes in this study (ARL6IP6, TMEM127, TOMM7 and JAM3) and a wide range of phenotypes in the UK Biobank, we performed a phenome-wide association study (pheWAS) using the publicly available data provided by GWAS Atlas (https://atlas.ctglab.nl) 29 . The database contains 4,756 GWAS from 473 unique studies across 3,302 unique traits. We used the colocalization procedure described above to test for shared causal associations between HI and other phenotypes. The top three phenotypes of each gene were considered to share causal associations with HI 55 . phenotypes were identi ed by pheWAS.

Statistical analyses
The gene-phenotype association analyses were performed in a linear regression model with adjustment for the sex, principal component (PC) 1 and PC2 from PCA. Meta-analyses of the association results generated from the Chinese cohort and European cohort were conducted to assess the pooled genetic effects using the R statistical package (version 4.0.3). Cochran's Q statistic was calculated to test the heterogeneity between groups. The P-values from the meta-analyses were corrected for multiple testing based on FDR correction 57   Meta-analyses were performed to combine the association results in the two cohorts. However, only 4 of the 46 genes (ARL6IP6, TMEM127, TOMM7 and JAM3) showed nominal significance at P < 0.05 in both cohorts, and the association directions for these 4 genes were consistent across the two cohorts. Figure 1 Schematic work ow of this study. We performed a transcriptome-wide association study (TWAS) for the hearing impairment (HI) based on the publicly available genome-wide association study (GWAS) datasets and expression quantitative trait loci (eQTL) dataset. The GWAS datasets were from a Chinese cohort (n = 298) and an European cohort (n = 1,161). The eQTL dataset was from 670 blood tissues from Genotype-Tissue Expression (GTEx) (v8). Follow-up analyses, including the colocalization analyses, phenome-wide association analyse (pheWAS), genetic correlation analyses and pathway analyses, were performed to extensively characterize the identi ed associations. SNP, single nucleotide polymorphism.  The locus-compare scatter plot for the association signals at ARL6IP6, TMEM127, TOMM7 and JAM3 in the European cohort. Colocalization analyses results are shown for ARL6IP6 (a), TMEM127 (b), TOMM7 (c), and JAM3 (d) loci. The locus-compare scatter plot compares the expression quantitative trait loci (eQTL) results and the genome-wide association study (GWAS) results, which indicates whether the GWAS top locus is also the leading SNP in the eQTL result. The eQTL results were based on 670 blood tissues from Genotype-Tissue