We leveraged WGS and RNA-seq data from 2,622 FHS participants to create a powerful scientific resource of eQTLs. We identified significant unique cis-eQTL variants-eGene pairs (n = 2,855,111 unique variants with cis-15,982 eGenes) and 526,056 unique trans-eQTL variants-eGene pairs (526,056 unique variants and unique 7,233 trans-eGenes. A large proportion of reported cis-eQTL variant-eGene pairs were replicated with directionally concordant in our study including 88% of cis-variant-eGene pairs from GTEx.
Consistent with our previous study and others, 7–12,22,23 90% of eQTL variants identified in the present study are located in within 1 Mb of the corresponding cis-eGene and 83% are within 100 kb of the TSSs of the corresponding eGene. While the majority of (85% of cis- and 96% of trans-) lead eQTL variants explained only a small proportion (R2 < 0.2) of interindividual variation in expression of the corresponding eGenes, 15% of lead cis-eQTL variants and 4% of lead trans variant explained 20% or more of interindividual variation in expression of the corresponding eGenes 24. Additionally, eQTL variants were enriched (p < 0.0001) in disease-associated SNPs identified by GWAS. We further demonstrated the utility of our eQTL resource for conducting causal inference testing. Our MR analyses revealed putatively causal relations of gene expression to several disease phenotypes including SBP, CAD, and COVID-19 severity. Taken together, the comprehensive eQTL resource we provide can advance understanding of the genetic architecture of gene expression underlying a wide variety of diseases. The interactive and browsable eQTL resource will be posted to the National Heart, Lung, and Blood Institute’s BioData Catalyst site and will be freely accessible to the scientific community.
Our study expands current knowledge by creating an accessible and browsable resource of eQTLs based on WGS and RNA-seq technologies. It also includes eQTLs for lncRNAs that were not reported in prior eQTL studies that used array-based expression profiling. Over the past decade, accumulating evidence shows that lncRNAs are widely expressed and have key roles in gene regulation.25,26 It is estimated that the human genome contains 16,000 to 100,000 lncRNAs.25 We identified 447,598 cis-eQTL variants for 1518 cis-lncRNAs and 121,241 trans-eQTLs for 475 trans-lncRNAs (Supplemental Tables 3 &4). In addition, we identified six lncRNAs that showed putative causal associations with SBP. However, the functions of these six lncRNAs remain to be determined. Thus, our novel eQTL database may also help in the study of non-protein-coding RNAs in relation to health and disease.
As a proof of concept of the application of the eQTL resource, we performed MR analyses on a small number of cardiovascular traits and COVID-19 severity and demonstrated that the eQTL database can identify promising candidate genes with evidence of putatively causal relations to disease that may merit functional studies. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread across the globe and caused millions of deaths since it emerged in 2019. Recent GWAS of COVID-19 susceptibility and severity 27–29 have identified SNPs in several loci on chromosomes 3, 9 and 21.30 Using our eQTL resource in conjunction with COVID-19 GWAS, we conducted MR analyses that identified seven genes, including OAS1 and IFNAR2, as putatively causal for COVID-19 severity. The OAS1/2/3 cluster has been identified as a risk locus for COVID-19 severity.27. This area harbors a protective haplotype of approximately 75 kilo-bases (kb) at 12q24.13 among individuals of European ancestry.19 A recent study identified an alternative splicing variant, rs10774671, at exon 7 of OAS1 for which the protective allele “G” leads to a more active OAS1 enzyme.20 Our MR results suggest that both the OAS1 gene expression level and its splice variation are causal for COVID-19 severity.
The IFNAR2 gene encodes a protein in the type II cytokine receptor family. Mutations in IFNAR2 are associated with Immunodeficiency and measles virus susceptibility and play an essential and a narrow role in human antiviral immunity.31 A recent study further showed that loss-of-function mutations in IFNAR2 are associated with severe COVID-19.32 These studies, considered alongside our MR results provide evidence of a causal role of IFNAR2 expression in severe COVID-19 infection.
This study has several noteworthy limitations. This study included White participants of European ancestry who were middle-aged and older; therefore, the eQTLs identified may not be generalizable to other races or age ranges. The current RNA-seq platform included ~ 7700 lncRNAs, which is a modest subset of all lncRNAs in the human genome.25 We used MR analyses to infer causal relation of genes to disease traits. MR analysis is predicated on a set of critical assumptions that may not be testable in the setting of eQTL analysis.33,34 Replication of our eQTL findings is warranted in studies with larger sample sizes and more diverse populations.
Our study also has several strengths. The advent of high-throughput RNA sequencing technology provides an unparalleled opportunity to accelerate understanding of the genetic architecture of gene expression. Our study extends and expands the existing literature by identifying novel eQTLs based on WGS and RNA-seq. We demonstrate the potential applications of a vast eQTL resource by analyzing the concordance of eQTL variants with SNPs from GWAS of several disease phenotypes followed by causal inference analyses that identified promising disease-related genes that may merit functional studies. We created an open and freely accessible eQTL repository that can serve as a promising scientific resource to better understand of the genetic architecture of gene expression and its relations to a wide variety of diseases.