Gene expression prediction model
To uncover novel genes significantly associated with susceptibility to RA, after the quality control procedures, we constructed a gene expression prediction model for the target tissue in RA using genotyping and transcriptome sequencing data obtained from 202 matched synovial and blood samples sourced from patients undergoing knee replacement surgery[18] (Figure 1). In order to explore the potential utility of TWAS based on imputed gene expression, we first calculated the heritability of each gene in the synovium. We identified 3,246 genes with significant heritability (P < 0.01), with a mean heritability of 0.277. Subsequent analyses were limited to genes that displayed significant heritability[21].
For each gene exhibiting significant heritability, a gene expression prediction model was built using the single best eQTL, lasso and elastic net. In the synovium, the mean cross-validated R2 values for genes calculated by the lasso models was 0.147, which is a 28.95% enhancement in predicted R2 compared to the predicted values calculated by the single best eQTL models (mean cross-validated R2 = 0.114). The mean cross-validated R2 values for genes calculated by the elastic net models was 0.162, which is a 42.11% enhancement in the single best eQTL models, and 10.20% enhancement in the lasso models. The best prediction model for each gene was selected based on the highest cross-validation R2. The results of the three models are presented in Figure 2, and the elastic net models yielded the best results in comparison.
TWAS results of RA
We performed a TWAS analysis of RA using the predictive expression model developed in the previous step. Since a total of 2,492 features were generated, we set a significance threshold for TWAS.P of 2.01 × 10-5 (0.05 divided by 2,492). Using this threshold, we identified a total of 11 genes that showed potential associations with RA, including 4 genes that were situated beyond 500kb upstream and downstream of outside of previously reported significant loci by RA GWAS, and were labeled as novel genes (Figure 3, Additional file 1: Table S1). RNASET2 was identified as the most significant gene (PTWAS = 1.19×10-19).
Joint/conditional analysis results
In cases where multiple significant candidate features were identified at the same genetic locus, we performed a joint/conditional analysis to determine whether these signals were due to multiple associated features and to evaluate the remaining GWAS signal after removing the gene expression. We identified eight genes that were conditionally independent (Figure 4, Table 1, and Additional file 1: Table S2). Among them, RNASET2 was the most significant gene in the synovial panel (Joint.P = 1.20×10-19), 3 genes were novel.
Several features on chromosome one was no longer significant after modulation of predicted MMEL1 expression in the synovial panel, suggesting that the significant signal at this locus was driven by MMEL1. It can be seen in the bottom panel that the GWAS signal for this gene was not significant by itself (>5×10-8), but was significant in the TWAS results, demonstrating the ability of the TWAS method to identify new significant genes not found by GWAS.
Causal analysis
To identify the causal genes in the association regions, we used FOCUS to derive the PIP for each conditionally independent gene, thereby quantifying its likelihood of being a true disease-causing gene. In addition, to determine whether the gene-RA relationships were due to a common genetic variation (direct causal or multifactorial effect) rather than LD, we used the SMR and HEIDI tests with correction for multiple hypothesis testing. As a result, we identified three genes with PIP > 0.9 for the eight conditionally independent genes identified in the joint/conditional analysis (Table 1 and Additional file 1: Table S3), indicating that these genes may have causal associations with RA. Notably, the genes RNASET2, CD40 and INPP5B had a PIP value of 1.00, indicating a high probability of causality. In addition, we performed SMR analyses and HEIDI tests using EUR and EAS datasets from the 1000 Genomes project, respectively. A total of three causal genes were identified, namely CD40, MMEL1 and INPP5B (using the criteria of PHEIDI>0.05 and PBH FDR<0.05). (Table 1 and Additional file 1: Table S4). These two methods identified a total of four causal genes. Of the genes reported by both FOCUS and SMR analyses, INPP5B regulates the function of inflammatory and immune genes in RA patients by affecting calcium signaling mobilization, and CD40 can contribute to RA development[31,32].
Integrative analysis of TWAS and mRNA expression profiling of RA
To enhance the reliability of the genes identified by TWAS, we performed differential expression analysis. As a result, four out of the eight conditionally independent genes retained significance in the differential expression analysis, with HIP1 and TPRA1 being identified as novel genes (Table 2). HIP1 can regulate arthritis severity and synovial fibroblast invasiveness by altering platelet-derived growth factor receptor (PDGFR) and Ras-related C3 botulinum toxin substrate 1 (Rac1) signaling[33]. The potential of these genes for functional studies is obvious. Furthermore, the statistical significance of CD40 in causal analyses highlights its potential in the field of RA.
TWAS-GSEA
A screening criterion was used, requiring a corrected P-value of less than 0.05. After multiple testing correction, nine candidate gene sets were found to be significantly enriched. The top three were high-density lipoprotein particle assembly, positive regulation of B cell proliferation, abnormal circulating immunoglobulin M (IgM) level. Among the nine gene sets, five contained at least 10 of our genes. The immune response pathway contained the most genes in our results, with 147 genes identified (Additional file 1: Table S5).
These pathways are linked to various diseases and physiological processes. Two gene sets, HP.INCREASED.CIRCULATING.IGM.LEVEL and HP.ABNORMAL.CIRCULATING.IGM.LEVEL, were associated with changes in the levels of IgM levels, which may be linked to a variety of autoimmune diseases, infections, and certain blood malignancies. The gene set GOMF.UBIQUITIN.LIKE.PROTEIN.LIGASE.BINDING is associated with processes related to ubiquitin ligase, which are linked to a wide range of diseases, including neurodegenerative disorders and cancer, as well as physiological processes such as the cell cycle and apoptosis. Other gene sets are associated with the processes of B cell proliferation, activation, and the immune response in various immune diseases such as RA, systemic lupus erythematosus, infections, and tumor immunotherapy.
Drug targets
The TTD and Drugbank (Additional file 1: Table S6) reveal that two genes, CD40 and INPP5B, have existing drugs targeting them. Specifically, the drug D-Myo-Inositol-1,4-Bisphosphate with the Drugbank accession number DB03158 targets Type II inositol 1,4,5-trisphosphate 5-phosphatase. The specific function of Type II inositol 1,4,5-trisphosphate 5-phosphatase is to hydrolyze phosphatidylinositol 4,5-bisphosphate (PtIns(4,5)P2) and the signaling molecule phosphatidylinositol 1,4,5-trisphosphate (PtIns(1,4,5)P3), thereby modulating cellular signaling events[34]. Boehringer Ingelheim Pharmaceuticals has developed a drug called BI 655064 that targets the CD40 gene, which has been approved for use in clinical trials. The drug is currently undergoing Phase II trials for lupus and Phase I trials for RA and thrombocytopenia[35]. The discovery of these existing drugs represents a promising and cost-effective approach to identifying potential therapeutics for RA.