Exploring the impact of complex diseases on non-diseased human tissue transcriptomes


 Background

The development of complex diseases is contributed by the combination of multiple factors and complicated interactions between them. Inflammation has recently been associated with many complex diseases and may cause long-term damage to the human body. In this study, we examined whether two types of complex disease systematically altered the transcriptomes of non-diseased human tissues and whether inflammation was linked to identifiable molecular signatures, using post-mortem samples from the Genotype-Tissue Expression (GTEx) project.
Results

Following a series of differential expression (DE) analyses, dozens to hundreds of DE genes were identified in multiple tissues between subjects with and without a history of cerebrovascular disease (CVD) or major depression (MD). DE genes from these disease-associated tissues—the visceral adipose, tibial artery, caudate, and spinal cord for CVD; and the hypothalamus, putamen, and spinal cord for MD—were further analyzed for functional enrichment. Many pathways associated with immunological events were positively enriched in the DEGs of the CVD-associated tissues, as were the neurological and metabolic pathways in the MD-associated tissues. Eight gene–tissue pairs were found to overlap with those prioritized by our transcriptome-wide association studies (TWAS), indicating a potential genetic effect on gene expression for circulating cytokine phenotypes.
Conclusions

Complex diseases like CVD and MD may cause observable changes in the gene expression of non-diseased tissues, suggesting that a long-term impact of diseases, lifestyles and environmental factors may together contribute to the appearance of transcriptomic “scars” on the human body. Furthermore, inflammation is probably one of the systemic and long-lasting effects of cerebrovascular events.


Introduction
Complex diseases are caused by genetic, environmental and lifestyle factors and their interactions, most of which have not yet been identi ed. Recent studies have revealed that the immune system and in ammatory responses are involved in a wide range of complex diseases, such as cardiovascular disease [1], stroke [2], cancer [3], and psychiatric disorders [4]. In ammation is generally de ned as the immune system's response that defends against injury or stress [5]. In a normal in ammatory response, the upregulation of in ammatory activity is strictly regulated. However, with psychological, environmental and biological factors [6][7][8][9], the regulated process can become uncontrolled in the resolution phase, causing a systemic chronic in ammation that contributes to damage in all tissues and organs and increases the risk of diseases that remain global leading causes of disability and mortality [10,11].
The Genotype-Tissue Expression (GTEx) project [12] has established a database of expression data, whole genome sequences, and whole-exome sequences of 54 non-diseased tissue sites across nearly 1,000 individuals (as of the current v8 release). GTEx has also collected subject phenotype data, including demographic information, general medical histories, histories at the time of death, the circumstances of death, and so on. The medical histories were provided by the hospital systems, which recorded the prior care of deceased donors.
Here we evaluate whether past chronic in ammatory diseases can leave biological alterations ('scars') in non-diseased tissues of the human body by comparing the expression pro les of subjects with and without a history of chronic in ammatory disease. We have focused on two types of complex diseasecerebrovascular disease (CVD) and major depression (MD)-since they are typical chronic in ammatory diseases with heritable components [13][14][15][16], and there are enough cases of these in the GTEx project. CVD comprises clinical conditions that impair blood ow to the brain, such as strokes, transient ischemic attacks, intracranial aneurysms, and other vessel diseases [17]. MD is one of the most common psychiatric illnesses, ranging from 3 to 16.9 percent worldwide [18,19], and has a signi cant impact on society. It is characterized by a persistent feeling of sadness or a loss of interest or pleasure in outside stimuli. Previous large-scale genome-wide association studies (GWAS) and meta-analyses have identi ed a large number of genetic loci associated with stroke [20,21] and depression [16,22,23] in multi-ancestry groups. However, genetic variability contributing to the susceptibility mechanism underlying CVD and depression and their interactions with in ammation remains not fully identi ed or characterized.
In this study, we aimed to answer the following two questions: 1) is there any signi cant transcriptomic difference in non-diseased tissues between subjects with and without CVD or MD? 2) if yes, is there any evidence to indicate that in ammation may play a role in shaping the transcriptomic landscape? We performed a differential expression (DE) analysis on each GTEx tissue by comparing the expression pro les between subjects with and without a history of CVD or MD. Signi cant differentially expressed genes (DEGs) identi ed in multiple tissues from the series of DE analyses were included in the downstream functional enrichment analysis. We also performed transcriptome-wide association studies (TWAS) on in ammation biomarkers to nd any overlaps with the DEGs.

Cohorts and risk factors
Multi-tissue RNA-seq data were compiled from the GTEx project, as described in the Materials and Methods section. Subjects with an explicitly reported medical history of cerebrovascular disease or major depression were considered in this study. A total of 16,412 samples across 46 tissues, obtained from 928 subjects, were included in the CVD analysis (Fig. S1), and 16,221 samples across 45 tissues from 926 subjects in the MD analysis (Fig. S2).
Risk factors of complex diseases include clinical variables such as age [24], sex [25], and BMI [26]. The average age of the cohort with a history of CVD was signi cantly higher than that of the non-CVD cohort (CVD 52.09 ± 13.09, non-CVD 57.74 ± 10.79 yrs, P = 3.87 × 10 − 6 , t-test), while BMI, sex and race showed no signi cant differences between the two groups (Fig. S3). The average age of the MD cohort was younger than that of the non-MD cohort (MD 49.69 ± 13.33, non-MD 53.07 ± 12.91 yrs, P = 1.82 × 10 − 2 , ttest). Moreover, females had a higher incidence of developing depression (P = 0.01, Chi-squared test) (Fig.  S4). This is consistent with the higher prevalence of major depressive disorder in women than in men [27].

Differentially expressed genes identi ed in multiple tissues
To investigate whether past CVD or MD left "scars" on any tissues or organs, we implemented the voomlimma [28,29] pipeline to identify genes differentially expressed between the cohorts with and without a history of CVD or MD, using the linear model described in the Materials and Methods section. Seventeen out of 46 and 16 out of 45 tissues displayed signi cant differential expression (false discovery rate < 0.05 and absolute fold change > 1.5) in the CVD and MD analyses, respectively ( Fig. 1a, b). The top four tissues with the highest number of signi cant DEGs were included in the functional enrichment analysis (Adipose -Visceral, Artery -Tibial, Brain -Caudate, and Brain -Spinal cord for CVD), as well as the top three MD tissues (Brain -Hypothalamus, Brain -Putamen, and Brain -Spinal cord).
There was no common DEG shared by the four CVD or the three MD tissues (Fig. 2a, b). A large number of signi cant DEGs identi ed by our CVD model (Table S1) were associated with in ammation. For instance, the most signi cant upregulated gene in the spinal cord, CHI3L1, is related to a variety of in ammatory disorders [30][31][32] and coronary artery disease [33]. Gene FCGR3A-upregulated in three of the CVD tissues (Brain -Spinal cord adj.P.val = 0.02; Brain -Caudate adj.P.val = 0.03; Artery -Tibial adj.P.val = 0.02)-encodes a receptor that binds the Fc portion of IgG, and it affects the pharmacokinetics in patients with Crohn's Disease [34]. LPAR5, which was overexpressed in both the brain caudate and the spinal cord of the subjects with a history of CVD, has been reported to be activated during nerve injury [35], and it transmits pro-in ammatory signals [36]. DLG2, which was downregulated in hypothalamus tissues with MD (Table S2), has been reported to be associated with interferon production [37]. These results suggest that the systematic effects left by CVD and MD, on which in ammation may play a role, can still be identi ed in several post-mortem human tissues on the transcriptomic level.
In ammatory events enriched in differentially expressed genes A functional overview can be gained through gene set enrichment analysis. CVD DEGs from the Adipose -Visceral, Artery -Tibial, Brain -Caudate, and Brain -Spinal cord; and MD DEGs from the Brain -Hypothalamus, Brain -Putamen, and Brain -Spinal cord were further analyzed using the Gene Set Enrichment Analysis method [38]. A broad spectrum of Gene Ontology (GO) terms, with the top signi cantly enriched GO terms in the CVD spinal cord, are presented as an example in Fig. 3.
Strikingly, upregulated genes in all four CVD tissues were signi cantly enriched in immunological events, including antigen binding, T cell proliferation, the interferon-gamma-mediated signaling pathway, and so on (Fig. 3, Table 1), although the visceral adipose only had one signi cant upregulated gene. This is consistent with a large body of evidence that shows that in ammation plays a crucial role in cerebrovascular diseases. In ammation can rupture the intracranial aneurysm wall [39], lead to secondary injury after an ischemic stroke [40], and impact the progression of symptomatic intracranial atherosclerosis [41]. In ammation has also been linked to blood-brain barrier dysfunction [42] and tissue injury [43] in cerebrovascular diseases.
For the MD tissues, in ammatory events were only mainly enriched in downregulated genes in the spinal cord (Table 1) and a few in upregulated genes in the hypothalamus (Additional File 3). Likewise, depression has been associated with increased in ammatory activation in both the periphery and the central nervous system. Many antidepressant agents reduce in ammatory activation in immune cells and lower circulating in ammatory cytokine levels, supporting this association [44]. Furthermore, it is worth mentioning that mitochondrial events and cellular respiration were signi cantly upregulated in the putamen. Since mitochondrial energy metabolism in the putamen has been reported to be highly correlated with emotional and intellectual impairment in Schizophrenics [45], it might also have some hidden links with depression as well. Another interesting result is that hypothalamus upregulated genes were mapped to terms related to cilia (Additional File 3). There is still no obvious evidence connecting cilia with depression so far, but it is an underexplored area worth investigating [46].
We further explored our differential expression results using Disease Ontology (DO) [47] and Human Phenotype Ontology (HPO) [48] annotations. Top enriched diseases and human phenotypes are similar to the biological phenotypes found by our GO analysis (Additional File 3). Immune responses and cerebrovascular lesions were signi cantly enriched in the upregulated genes of the CVD tissues. For example, six DO terms-human immunode ciency virus infectious diseases, temporal arteritis, alopecia areata, autoimmune thrombocytopenic purpura, intracranial aneurysms, and primary immunode ciency diseases-were shared by all four CVD tissues' overexpressed genes (Fig. S5), and they are all diseases associated with in ammation and cerebrovascular accidents.
Our enrichment results reinforce the strong evidence linking in ammation to CVD, as well as other interesting biological phenomena that probably have associations with CVD and MD in analyzed tissues.

Transcriptome-Wide Association Analysis
We then enquired whether the transcriptomic 'scars' were associated with the subject's genotype. Transcriptome-wide association analysis (TWAS) is a powerful approach to prioritize target genes by combining genetic variants identi ed in GWAS with transcriptome data, and can help shed light on possible associations between genetic loci and human complex diseases. Here, we carried out TWAS analysis with S-PrediXcan [49] on four public GWAS summary statistics datasets, which evaluated human circulating levels of C-reactive protein (CRP) [50], monocyte chemotactic protein-1 (MCP1), interleukin-6 (IL-6), and interferon gamma [51] (Table 3). CRP is known as a systemic biomarker of in ammation and has been shown to be a CVD risk biomarker [52] and to increase in patients with MD [53]. Cytokines MCP1, IL-6 and interferon gamma have been reported to have a great probability of contributing to both CVD [54][55][56] and MD [57][58][59]. Our aim was to nd whether any overlaps between gene-tissue pairs identi ed from TWAS and from our DE analyses amongst the tissues with expression variance (including Adipose -Visceral, Artery -Tibial, Brain -Caudate, Brain -Hypothalamus, Brain -Putamen, and Brain -Spinal cord), plus whole blood, since circulating cytokine levels were measured in serum or plasma samples. There were 1,575 and 25 signi cant gene-tissue pairs passing the corrected P-value threshold (P-value/number of genes) found in the CRP and MCP1 data respectively, regardless of tissue types (Table S3). There were eight overlapping gene-tissue pairs (Table 2), amongst which PPP1R18, RP11-238F2.1, FRK had the same direction of variations but others had the opposite direction. All seven protein-coding genes were more or less related to immune responses. Speci cally, it was demonstrated that complement gene C2 was expressed in human post-mortem brain-derived cerebrovascular smooth muscle cells and may amplify the pro-in ammatory effects in brain vessels [60].
The major histocompatibility complex class I chain-related gene A (MICA) is a highly polymorphic gene that encodes protein variants functioning in immune activation and surveillance; our results therefore indicate that there may be a link between MICA and depression.

Discussion
Most CVD and MD transcriptome analyses [61][62][63][64][65][66][67] are restricted to mouse models or a limited sample size of human expression data. In this study, we systematically analyzed expression data for over 16,000 healthy human samples across multiple tissues from GTEx, investigating global transcriptomic alterations on the human body in cases with a history of CVD or MD. We rst built a linear mixed model and applied it to the expression data. Dozens to hundreds of differentially expressed genes were identi ed in the visceral adipose, tibial artery, caudate, and spinal cord for CVD, and in the hypothalamus, putamen, and spinal cord for MD. Furthermore, functional enrichment analysis showed that a large number of annotations pertaining to in ammatory responses were positively enriched in CVD DEGs from all four tissues, and that MD DEGs were mostly associated with neurological and metabolic events. Our results suggest that the long-term sequelae of cerebrovascular accidents and depressive symptoms can still be re ected in post-mortem samples, and that in ammation may be maintained for a long period of time after recovery from CVD.
A growing body of evidence indicates that in ammation not only contributes to the initiation and development of CVD [68,69], it also persists globally in the brain for the long-term after CVD [70]. Neuroin ammation followed by cerebrovascular accidents may promote recovery and further injury, playing both bene cial and detrimental roles [71]. A large-scale GWAS discovered one genetic variant (rs1842681) in the gene LOC105372028 associated with post-stroke outcomes [72]. Furthermore, proteomic studies of post-stroke depression (PSD) reveal that immune dysfunction in stroke survivors is associated with PSD [73,74]. The connection between in ammation and depression is undeniable [53,75]. However, unlike the CVD results, only a small portion of the MD DEGs were enriched in immune responses. This is understandable, since MD is highly heterogeneous and not all individuals exposed to in ammatory challenges develop depression. Still, in ammatory responses that occur before and after cerebrovascular accidents or depression are very complicated, and the underlying mechanism is yet to be elucidated.
Our study also provides evidence of the general and long-term effects left by cerebrovascular events and depression from the transcriptomic aspect. Non-brain tissues with signi cant CVD DEGs were related to vascular diseases and may pose risks to CVD. To be more speci c, adipose tissue and its secreted in ammatory proteins contributed to obesity-associated vasculopathy and cardiovascular risk [76], and they may contribute to CVD as well. Peripheral arterial disease occurring in the tibial artery shared similar risk factors with CVD [77]. Moreover, DEGs such as CHI3L1 and LPAR5 may reveal possible mechanisms for post-CVD outcomes, but further experiments are necessary for validation. Interestingly, the hypothalamus had the highest number of MD DEGs, which is compatible with one of the most enduring and replicated ndings in psychiatry -the activation of the hypothalamic-pituitary-adrenal axis in a subset of MD patients. The identi ed DEGs may play a role in the neuroendocrine function of the hypothalamus. The putamen, which also had many MD DEGs, indicates that ageing is accelerated in patients with major depressive disorders [78]. The pathways enriched in putamen positive DEGs were mainly about mitochondrial functions and the electric transport chain, which replicates previous results and provides new insights into the long-term effects of depression.
Only a few DEGs identi ed by our linear model overlapped with genes prioritized by TWAS for selected cytokine phenotypes. This was expected and is probably due to the small fraction of genetic risk factors shared by complex diseases and these circulating cytokine levels. Additionally, only about 11 percent of heritability was explained by bulk tissue expression quantitative trait loci, according to this study [79]. Therefore, long-term transcriptomic alterations across tissues and organs are probably caused by external factors such as lifestyle and the social and physical environment. Nevertheless, we used bulk RNA-seq data for our analyses, and further utilizing techniques with higher resolution-like single-cell sequencing [80] or decomposition-could reveal more precise signals on speci c cell types.

Conclusions
This study reveals molecular signatures of chronic effects and damage on multiple tissues potentially contributed by complex diseases and associated factors. These signatures may be linked to in ammation and other disease-related pathways. Together, these results indicate that suffering from a complex disease can cause a tissue-wide impact on the transcriptomes, and they also suggest that treatment to attenuate in ammation may improve the body's health in patients recovering from CVD. Our study not only provides insights into these disease mechanisms but also offers a possible route to studying the long-lasting changes caused by chronic diseases on multiple tissues or organs.

GTEx data
Multi-tissue RNA-seq data were collected from the GTEx project [12] v8 release (dbGaP: phs000424.v8.p2). The genes and samples were ltered and quantile-normalized in a tissue-aware manner, as described in the YARN pipeline paper [81].
Subjects with an explicitly reported medical history of cerebrovascular disease (MHCVD, phv00169142.v8.p2) or major depression (MHDPRSSN, phv00169145.v8.p2) were considered in this study. We removed subjects with missing values in their Hardy scale (DTHHRDY), ischemic time (SMTSISCH), or batch ID (SMNABTCH). All cell lines and tissues with less than 12 samples with a history of CVD or less than 10 samples with a history of MD were excluded from our analyses.

Differential Expression Analysis
Differential expression analysis between the samples with and without a history of CVD/MD was conducted using the voom-limma pipeline [28,29]. Brie y, RNA-seq read counts were transformed to log counts per million (log-cpm) with associated precision weights to stabilize the variance in the data using the voom function, followed by linear model tting and the empirical Bayes procedure. According to this paper [82], the multivariate linear regression model that adjusted for known confounders outperforms other methods correcting for hidden confounders, which may remove some of the desired biological signals. Hence, we adopted the linear regression model but replaced the experimental batch (SMGEBTCH) with another batch information (SMNABTCH). This model ts for gender (GENDER), the interval between the time of the donor's death and the sample collection (SMTSISCH), age (AGE), the type of nucleic acid isolation batch (SMNABTCH), and the type of death (DTHHRDY) for the gene expression data (Y): Y ~ β 1 GENDER + β 2 SMTSISCH + β 3 AGE + β 4 SMNABTCH + β 5 DTHHRDY + β 6 MHCVD + ε Y ~ β 1 GENDER + β 2 SMTSISCH + β 3 AGE + β 4 SMNABTCH + β 5 DTHHRDY + β 6 MHDPRSSN + ε The GENDER term was removed from sex-speci c tissues, and the SMNABTCH term was removed from tissues in only one batch. P-values from the regression model were adjusted for multiple testing using the Benjamini-Hochberg method [83].

Functional Enrichment Analysis
Pre-ranked Gene Set Enrichment Analysis (GSEA) [38] was conducted with gene lists ranked by the tstatistics from the results of our DE analyses, with default program parameters and a default background set on GSEA v4.0.  [50] and study GCST004421 [51] downloaded on 19/10/2020. These GWAS datasets examined biomarkers of in ammatory responses, and they were obtained from Caucasian subjects (Table. 3). Gene expression variation was inferred using S-PrediXcan [49] with GTEx v8 elastic-net prediction models (http://predictdb.org/) for the four tissues with expression variation between CVD and non-CVD cohorts: Adipose -Visceral, Artery -Tibial, Brain -Caudate, and Brain -Spinal cord; and the three tissues with expression variation between MD and non-MD cohorts: Brain -Hypothalamus, Brain -Putamen, and Brain -Spinal cord. We ran S-PrediXcan on these tissues one by one in each phenotype. Tissue-gene pairs with P-value < 0.05/(number of tested genes) were considered as signi cant.

Consent for publication
Not applicable.

Availability of data and materials
This project is under the approval of access request #84958 for the dataset General Research Use in Genotype-Tissue Expression (dbGaP: phs000424). The GTEx data were downloaded from dbGaP.

Competing interests
The authors declare that they have no competing interests.