Overview of this study
The Fig. 1 shows the rationale underlying this study. During gene expression, a gene is transcribed into a pre-mRNA, after which the introns are removed while the exons are connected into the mature mRNA. The mature mRNA is under post-transcriptional regulation by miRNAs and other mechanisms. As shown, genetic variants can not only regulate mRNA splicing but also regulate gene expression-related traits by affecting transcription rate or mRNA stability (stability QTL, denoted as stQTL hereafter). From RNA-seq data, we are able to determine the reads mapped to exonic regions to obtain gene expression levels. The mRNA stability can also be calculated by combining the reads aligned to exonic and intronic regions using the REMBRANDTS [32] algorithm. Through the eQTL analysis, genetic variants associated with gene expression are identified to obtain eQTLs. In fact, eQTLs are a mixture of QTLs that affect transcription and stQTLs, as gene expression is controlled by both transcription rate and mRNA stability. Performing an association analysis of gene expression or stability on genetic variation can identify eQTLs and stQTLs, respectively. Simultaneous identification of eQTLs and stQTLs can provide a higher resolution to understand how genetic variants affect gene expression, as well as the information to infer that a genetic variant regulates gene expression by affecting transcription activity or RNA stability. As a proof-of-concept, in this study we applied this framework to GTEx data to simultaneously investigate the eQTLs and stQTLs in lung tissue.
Expression QTLs and Stability QTLs of human lung tissue
To identify and explore stQTLs and eQTLs, we processed the raw RNA-seq data for lung tissues generated by the GTEx project. After performing quality trimming, alignment, and replicate merging from the same donors, we obtained the expression profiles of genes for a total of 289 subjects with matched genetic variation data. With REMBRANDTS, for each subject, we calculated the relative mRNA stability for 13,476 genes with intronic regions and constitutive exons. For QTL identification, we performed the association analysis on 15,122,700 genetic variants located within 100Kb upstream of TSS and 100 Kb downstream of TTS for 13,476 genes using gene expression or mRNA stability as the traits. We identified a total of 142,801 stQTLs and 186,132 eQTLs at the significance level of FDR < 5%. The numbers of QTLs were summarized in Table 1 according to the location of genetic variants on each QTL’s corresponding genes.
Ideally, we would expect that all stQTLs are also eQTLs since a genetic variant that regulates RNA stability should also affect gene expression. However, in practice, identification of different QTL types is complicated by multiple factors, including differential statistical power and LD between genetic variants. Nevertheless, we still observed that there is a very high proportion (70,105) of overlap between stQTLs and eQTLs (Fig. 2A). We also found that 49% of stQTLs were also eQTLs (Fig. 2B), suggesting that nearly half of stQTLs do also significantly affect gene expression. On the contrary, only 37% of eQTLs were also stQTLs. This indicated that although a considerable part of eQTLs were derived from genetic variants that significantly affect stability, more of them were regulated by genetic variants that affect other factors related to gene expression.
Table 1
The summary of the stQTLs and eQTLs identification in GTEx lung tissue samples.
Location
|
Number of stQTL (%)
|
Number of eQTL (%)
|
Upstream
|
50,109 (35.09%)
|
73,647 (39.57%)
|
5'UTR
|
3,877 (2.71%)
|
5,802 (3.12%)
|
CDS
|
3,949 (2.77%)
|
4,404 (2.37%)
|
Intron
|
32,675 (22.88%)
|
37,232 (20.00%)
|
3'UTR
|
2,506 (1.75%)
|
2,411 (1.30%)
|
Downstream
|
49,685 (34.79%)
|
62,636 (33.65%)
|
Total
|
142,801 (100.00%)
|
186,132 (100.00%)
|
The location indicates the genomic position of the genetic variant in its corresponding gene in a QTL. The percentage was calculated from the number of QTLs at each location divided by the total number of QTLs.
By investigating stQTL and eQTL together, it is possible to determine the regulatory mechanisms underlying an eQTL. For example, genetic variant rs3167757 is significantly associated with the HMGN1 expression level (eQTL, FDR = 5e-18) with CC > CT/TT (Fig. 2C). As shown, this genetic variant is also associated with HMGN1’s mRNA stability (stQTL, FDR = 3.7e-30). This result indicated that rs3167757 might regulate the expression level of HMGN1 by affecting its mRNA stability. Indeed, HMGN1-rs3167757 has also been reported as an eQTL in lymphoblastoid cell lines (LCLs) [34, 35]. The rs3167757 is located at the 3'UTR of the HMGN1 gene and overlaps with binding sites of 20 different RBPs [36]. According to the analysis using RBPmap [37] (S1 Table), while the variant C of rs3167757 confers a motif for eight RBPs (CUG-BP, HNRNPF, MBNL1, SFPQ, TRA2B, HNRNPL, SRSF3, and YBX2), the variant T disrupts the binding motifs of five of the RBPs (HNRNPF, MBNL1, SFPQ, TRA2B, and YBX2). Notably, among them, HNRNPF [38, 39], MBNL1 [40, 41], and YBX2 [42] are known to contribute to mRNA stabilization. This is consistent with the observation that genotype CC is associated with higher stability of HMGN1 mRNA than CT and TT. As another example, genetic variant rs34873612 is significantly associated with DDX11 expression level (eQTL, FDR = 2e-60) but not with DDX11 mRNA stability (FDR > 0.1) with GG > GA/AA (Fig. 2D). This result suggested that rs34873612 might regulate the expression level of DDX11 by affecting the transcription rate rather than its mRNA stability. According to the PROMO prediction [43], the rs34873612 is located at the 5'UTR of the DDX11 gene and overlaps with the binding site of three TFs: GR-alpha, GATA2, and GATA3. While the variant G contributes to the binding motifs of these TFs, the variant A disrupts the binding motif of GATA3, which potentially contributes to the decreased DDX11 expression seen in the GA and AA genotypes (Fig. 2D). mRNA stability only contributes partially to gene expression level; consistently, many genetic variants are found to be stQTLs but not eQTLs. For example, rs1062976 is significantly associated with the mRNA stability of SCYL3 (stQTL, FDR = 5e-07) but not its expression level (not an eQTL, FDR > 0.1) with CC > CT/TT (Fig. 2E). Overall, our results indicated that simultaneous identification of stQTLs and eQTLs can provide us with more detailed biological insights on the regulatory effects of genetic variants on a large scale.
Distributions of eQTLs and stQTLs across genic regions
stQTLs are associated with mRNA stability while eQTLs are associated with gene expression by affecting either mRNA stability or gene transcription. Therefore, we expect that their distributions in genes would be different. To examine this, we looked at the distribution of eQTLs and stQTLs in the DNA regions surrounding TSS and TTS of genes. We found that eQTLs are more likely to be located upstream of TSS of their corresponding genes while stQTLs tend to be located downstream of TSSs (Fig. 3A). On the other hand, stQTLs are more likely to be located in the region from TTS to its 10Kb upstream than eQTLs. Both stQTLs and eQTLs are more likely to be located in the upstream region of TTS rather than the genes’ downstream regions (Fig. 3B).
Subsequently, we divided genomic regions associated with genes into upstream, 5'UTR, CDS, Intron, 3'UTR, and downstream regions and then examined the distributions of eQTLs and stQTLs in these regions. Using the distributions of all genetic variants as the background, we calculated the enrichment ratio of stQTLs and eQTLs by using a hypergeometric test [44]. As shown in Fig. 3C, stQTLs are enriched by 2.89-fold in the CDS (P < 2e-308) and by 2.25-fold in 3'UTR (P = 2e-152). This result is consistent with the fact that genetic variants located in these regions might have functional impacts on mRNA stability by affecting RNA secondary/tertiary structure or RBP/microRNA binding. stQTLs are also slightly enriched in intron regions (ER = 1.19 and P = 2e-150). In contrast, eQTLs are enriched in the CDS (ER = 2.22, P = 4e-274), upstream (ER = 1.10, P = 3e-65), 5’UTR (ER = 1.37, P = 5e-76), and 3’UTR (ER = 1.30, P = 6e-20, Fig. 3D), respectively. The enriched eQTLs in these regions may be due to the fact that gene expression can be determined not only by transcriptional activity (genetic variants in upstream, 5’UTR, or CDS regions) but also by RNA stability (genetic variants in CDS or 3’UTR regions). We compared the enrichment ratios of stQTLs and eQTLs and found that stQTLs are more likely to be located in the CDS, intron, and 3'UTR regions, while eQTLs are enriched in the upstream and 5'UTR regions (Fig. 3E).
It should be noted that the resolution of QTL analysis is affected by linkage disequilibrium (LD) between neighboring genetic variants. Based on the genotype data for lung samples used in this study, we performed LD analysis and observed that many eQTL/stQTL loci were in high LD (r2 > 0.9) with each other (S1 Fig). To best exclude the influence of LD, we determined all LD blocks (r2 > 0.9) and within each block selected the most significant genetic variant as the representative stQTL/eQTL. Following that, we re-evaluated the distribution of stQTLs and eQTLs. After LD filtering, we found that the stQTLs are more enriched in CDS (ER = 4.93 and P < 2e-308), 3’UTR (ER = 2.64 and P = 1e-164), and slightly enriched at 5'UTR (ER = 1.14 and P = 9e-07). Of note, we no longer observed the enrichment of stQTLs in intron regions (Fig. 3F). In contrast, the distribution of eQTLs in each gene region is similar to that before LD filtering (Fig. 3G). When the distributions of stQTLs and eQTLs were directly compared, eQTLs were more likely than stQTLs to locate in the 5'UTR and upstream of genes (Fig. 3H), while stQTLs are more likely to locate in the CDS and 3'UTR. No obvious difference was observed between stQTLs and eQTLs in the intron and downstream regions.
stQTLs are significantly enriched in RBP binding sites
Having shown the enrichment of stQTLs in the 3’UTR and CDS regions, we then examined whether stQTLs tend to locate in the binding sites of RBPs or miRNAs, many of which are known to be involved in post-transcriptional regulation of mRNAs. To this end, we investigated the binding sites of RBPs and miRNAs provided by Postar2 [36] and TargetScan [45], respectively, to annotate the stQTLs identified in our analysis. Our results indicated that stQTLs (P = 3e-18, Fisher’s exact test) but not eQTLs (P > 0.1, Fisher’s exact test) are enriched in RBP binding sites. In fact, we found that 26.81% (2,770/10,332) of stQTLs overlap with the binding sites of at least one RBP, which is significantly higher (P = 7e-17, Fisher’s exact test) than 22.10% (2,788/12,617) for eQTLs (Fig. 4A). In addition, we have also examined the overlap with miRNA binding sites and observed a higher proportion of stQTLs (0.19%, 20/10,332) than eQTLs (0.15%, 19/12,617) in the miRNA binding sites, although no statistical significance was detected due to very small genomic regions covered by miRNA binding sites (Fig. 4B).
Then we performed Fisher’s exact test to identify RBPs whose binding sites were enriched for stQTLs (S2 Table) or eQTLs (S3 Table). We identified a total of six significant RBPs (P < 1e-04) including SND1, YTHDC1, DDX3X, ATXN2, RPS3, and UPF1 (as shown in Table 2). Interestingly, SND1 [46–48], DDX3X [49, 50], ATXN2 [51, 52], and RPS3 [53] were known to stabilize their bound mRNAs, while UPF1 is the key factor of nonsense-mediated mRNA decay pathway [54–56]. Moreover, YTHDC1 is a well-known m6A (N6-Methyladenosine) reader [57], which has been found to regulate mRNA splicing [58, 59], alternative polyadenylation [59], and stability [60, 61] through recognizing m6A. Similarly, we identified four RBPs whose binding sites were significantly enriched for eQTLs (P < 1e-04, Fig. 4D and Table 3), among which the two most significant RBPs, DDX3X and SND1, were also enriched for stQTLs. NCBP3 can regulate gene expression by forming a cap binding complex that binds to the 5'cap of pre-mRNA to promote splicing, 3’-end processing, and mRNA exporting [62–64], and AGGF1 was found to repress the expression of pro-inflammatory molecules [65]. These results indicate that stQTLs or eQTLs located in the binding sites of RBPs in lung tissue are indeed likely to have significant regulation on gene stability or expression by affecting the binding of the RBPs.
Table 2
The RBPs whose binding sites were enriched for stQTLs.
RBPs
|
stQTL
|
non-stQTL
|
ER
|
p-value
|
FDR
|
SND1
|
64
|
1,504
|
2.17
|
4E-08
|
5E-06
|
YTHDC1
|
100
|
2,850
|
1.80
|
1E-07
|
7E-06
|
DDX3X
|
226
|
8,348
|
1.39
|
3E-06
|
1E-04
|
ATXN2
|
436
|
17,911
|
1.25
|
7E-06
|
3E-04
|
RPS3
|
91
|
2,923
|
1.59
|
3E-05
|
0.001
|
UPF1
|
198
|
7,533
|
1.35
|
5E-05
|
0.002
|
Six RBPs significantly overlap (Log2 Enrichment-ratio > 0.3 and p-value < 1e-04, Fisher’s exact test) with stQTLs in mature mRNAs in lung. ER: Enrichment ratio.
Table 3
The RBPs whose binding sites were enriched for eQTLs.
RBPs
|
eQTL
|
non-eQTL
|
ER
|
p-value
|
FDR
|
DDX3X
|
250
|
8,331
|
1.69
|
5E-14
|
1E-11
|
SND1
|
58
|
1,509
|
2.15
|
2E-07
|
2E-05
|
NCBP2
|
60
|
1,819
|
1.84
|
1E-05
|
6E-04
|
AGGF1
|
30
|
689
|
2.43
|
2E-05
|
6E-04
|
Four RBPs significantly overlap (Log2 Enrichment ratio > 0.3 and p-value < 1e-04, Fisher’s exact test) with eQTLs in mature mRNAs in lung. ER: Enrichment ratio.
Gender-specific stQTLs
We then examined whether some genetic variants were associated with mRNA stability in a gender-specific manner and denoted them as gender-specific stQTLs. We divided 289 samples into 187 males and 102 females, and then performed association analysis with covariates to implement the gender-specific stQTL classification. If a gene is specifically expressed in males or females, then an stQTL/eQTL association can only be performed in the corresponding gender. Therefore, we focused our analysis on 13,116 genes that are not differentially expressed (FDR > 0.05, t-test) between both genders and then investigated a total of 14,987,511 genetic variants located from 100Kb upstream to 100Kb downstream of a gene. Out of these gene/genetic variants, we identified 71,694 stQTLs in males and 22,841 stQTLs in females (FDR < 0.05), as well as 117,065 eQTLs in males and 48,516 eQTLs in females (FDR < 0.05), respectively. Then we defined male-specific QTLs as those that are significant in males (FDR < 0.05) but not significant in females (P > 0.1), and similarly for female-specific QTLs. In total, we identified 18,893 male-specific and 2,879 female-specific stQTLs, and 32,716 male-specific and 7,484 female-specific eQTLs. After excluding intersection with gender-specific eQTLs, we finally identified 14,683 male-specific and 2,279 female-specific stQTLs. As an example, the association between genetic variant rs397781453 and the RNA stability of SREBP2 is female-specific (Fig. 5A). As shown, we detected a significant association in females with FDR = 4e-04 but not in males (FDR ≥ 0.1). On the other hand, the association between AQP4 and genetic variant rs12954879 is male-specific (Fig. 5B). The RNA stability of AQP4 is significantly associated (FDR = 2e-05) with genetic variant rs12954879 in males but not in females (FDR > 0.1). Of note, both SREBP2 or AQP4 have similar expression levels between males and females (the right panel of Fig. 5A and 5B).