3.1. The blood transcriptome splits into three types
SOM analysis provides one portrait for each of the 3,388 LIFE-adult participant’s WPB transcriptomes (Supplementary File 2 and Figure S 1A). For an overview we performed unsupervised similarity analysis based on pairwise comparisons of these expression portraits using Pearson’s correlation coefficients and their visualization as pairwise similarity heatmap (Figure 2A). The samples split into three major types where type 1 and type 2 show pronounced anti-correlated expression portraits while type M forms an intermediate group. The network presentation reveals that WPB transcriptomes of type 1 and type 2 split into separate clusters while type M samples overlap between them (Figure 2B). The functional context of activated genes was estimated using gene set analysis (Figure 2A, part below). Type 1 associates with functional categories related to oxygen transport, heme metabolism, neutrophil accumulation and repressed chromatin states of T-cells while type 2 relates to immune response, transcriptional activity, T-cell accumulation and active chromatin states (see below). Males and females were represented in all types. Higher percentage of men was noted in type 1 (29% versus 19% for women) and reverse relation was found for type 2 (percentage of women: 37% versus 51%; Figure 2C). Type 1 is higher among elderly persons compared with type 2; however, in the latter, the age-dependence is different between women and men (Figure 2D). The composition of types for women changes virtually monotonously with a steadily increasing percentage of type 1 in contrast to men, who show a maximum of type composition in the age range of 50 – 55 years. Note also that the age dependence of type M more resembles that of type 1 than that of type 2 which suggests functional correspondence between types M and 1 (see below). The type-composition of men and women is virtually independent of BMI (body mass index) except for very obese persons (BMI> 35 kg/m2) which seem to accumulate more type 1 transcriptomes (Figure 2D).
Taken together, we identify two major blood transcriptome types and an intermediate type partly resembling type 1. Type 1 accumulates more men, elderly participants and it upregulates genes associating with inflammation and increased heme metabolism, while type 2 accumulates women and younger participants. It associates with activated immune response and transcriptional activity. The composition of types changes in a gender- and age-specific fashion.
Figure 2
3.2. A modular map of gene activation
Clusters of genes with correlated expression profiles appear as red spot-like areas in the transcriptomic portraits, which indicates their overexpression in the respective samples (Figure S 1A). Overall we identified 13 such major overexpression spots and labelled them with capital letters A – M (Figure 3A, for spot lists of genes see Table S 3 and Supplementary File 3). It roughly divides into two major areas containing spots predominantly upregulated either in type 1 (and partly also type M) or type 2 samples, respectively, and a third area with mixed spot assignment as illustrated by mean portraits of the transcriptomic types (Figure 3B), the spot profiles (Figure 3C and Figure S 3) and their correlation network (Figure 3D). Gene maps indicate the positions of genes taken from selected functional gene sets within the SOM grid of metagenes (Figure 3A). For example, genes upregulated in erythrocytes and platelets accumulate in spots C and N (up in type 1), respectively, while genes associated with mitochondrial function and RNA processing are found in spot E and G. Signature genes of T cells and of ribosomal function accumulate in and near spots I and J (up in type 2). Spot H accumulates the signature of CD4 cytotoxic T lymphocytes (CTLs) including the marker genes GZMA and PRF1, which were recently found to associate with extreme longevity [31]. Genes with function in interferon (IFN) response accumulate in spot L without preferential upregulation in one of the three types.
Typically, each of the individual sample portraits shows more than one spot, which reflects the parallel activation of different transcriptional programs and/or their mutual couplings. We subsume frequently observed combinations of expressed spots as so-called combinatorial pattern types (cPATs). Overall we identified 33 cPATs, which were then used to sub-stratify each of the major transcriptomic types into three subtypes (STs, annotated by 1.1, 1.2, 1.3, M.1, M.2, M.3 and 2.1, 2.2, 2.3, respectively) differing in their mean expression portraits (Figure 3D) and spot expression (Figure 3B and Figure S 2). Part of the spot profiles show marked expression differences between the STs (e.g. spots A, B, D, F) while others change continuously (e.g. spots H- J). Most of the spots upregulate either in type 1 or 2 samples. Interestingly, spot F enriching genes encoding ribosomal subunit S26 proteins shows a specific expression patterns with strong upregulation in part of STs without preference to either type 1 or type 2. Spot co-occurrence analysis indicates that adjacent spots are often observed together, but also spots from different areas can co-occur, especially in samples of type M, which supports their intermediate position between type 1 and type 2. Part of the STs are dominated by samples expressing only one spot while others, especially of type M, show a broader distribution owing to more heterogeneous expression patterns (Figure S 2C). The sample similarity net indicates that most samples of the different STs accumulate into well localized clouds reflecting their mutual similarity (Figure 3E and Figure S 2F). The ST-composition is virtually age-independent except ST 1.1, which collects an increasing percentage of men and women at an age above 65 years (Figure S 4). In summary, the diversity of transcriptional states can be described by the combinatorics of about one dozen modules of co-expressed genes of different functional context which decompose each of the transcriptional types into three subtypes.
Figure 3
3.3. Footprints of functions: cellular programs, infections, telomeres and epigenetics
Next we performed functional analysis of the transcriptome strata using gene sets taken from the functional categories ‘biological process’ [33] (Figure 4A), ‘hallmarks of Cancer’ offering disease characteristics in a more general context [34](Figure S 6), ‘telomere maintenance’ [35] and ‘epigenetic states’ (Figure 4A-E). Profiling of these signatures splits them into two major clusters either upregulated in type 1 (marked with green color in the figures) or type 2 (apricot color), respectively. Type 2 associates, for example, with activation of cell cycling, MYC-target genes, oxidative phosphorylation (oxphos) while inflammation, hypoxia, coagulation, reactive-oxygen species and pathway signaling of TNFalpha-, TGFbeta-, PI3K-Akt-MTOR-, IL6-JAK-Stat3 activate in type 1. A third cluster (blue color) accumulates signatures related to interferon (IFN) response, which eventually suggests association with viral infections. We analyzed expression signatures derived recently to differentiate between bacterial and viral infections [36-41] (Figure 4B and C, respectively). The former signatures associate with the ‘inflammatory’ spots A, O and, also M, which upregulate in type 1 samples. In contrast, viral signature genes accumulate strongly in the IFN-response spot L, which is found upregulated in about 10% of all samples.
We are also interested in expression profiles of genes involved in telomere length maintenance (TM) via activation of telomerase. Mean telomere length in human leukocytes is negatively correlated with lifespan and BMI [42, 43] and it associates with heart diseases, type 2 diabetes, cancer [44-46], lifestyle factors [47], diet [48] and psychological stress [49]. TM-genes are more active in type 2 transcriptomes, which suggests that they stronger counteracts telomere shortening in younger (and healthier) individuals (Figure 4D). TM expression associates with cell cycle activity, starvation, oxidative stress, ageing, DNA- methylation and other functions related to spots I and J this way indicating strong mutual coupling between TM and our transcriptome types (Figure S 7).
Next we analyzed the expression sets of genes assigned to distinct chromatin states in blood cells under healthy conditions, among them T-, B- and T-regulatory-cells [77] (Figure 4E and Figure S 8). States involving genes with active promoter (TssA) and completed transcription (Tx) and in repressed promoter states are expected to show high and low expression levels, respectively. This relation is indeed observed in type 2 transcriptomes, however it reverses in type 1. This reversal suggests de-repression of nominally repressed states and repression of active states in type 1 transcriptomes by epigenetic chromatin re-modelling. We recently demonstrated that differentiation and adjustment of cellular programs are governed by subtle cooperation of transcription factor (TF-) networks and epigenetics, e.g., via regulation of the polycomb repressive complex 2 (PRC2) and its targets[50]. We find that signatures, related to TF-networks regulate cell function requiring relatively high expression levels of their major regulatory genes such as cell cycle, oxphos and transcription predominantly in type 2 transcriptomes (Figure S 9). On the contrary, repressive epigenetic signatures related to PRC2 function, repressive histone (H3K27me3) marks and DNA-methylation antagonistically change compared with those of the TF-networks. Interestingly, these profiles show moderate and low expression levels according to the accumulation of their signature genes in the central region of the map. In summary, type 2 transcriptomes associate with cell cycle, oxphos-metabolism, telomere maintenance and immune system activity regulated mainly via transcription factor networks, which become repressed in type 1 transcriptomes in parallel with epigenetic de-repression of inflammatory cellular programs including responses to infections.
Figure 4
3.4. Previous gene expression signatures of the blood transcriptome
Modules of co-regulated genes of a previous blood transcriptome study [53] well agree with our spot clusters and further specify functional interpretation in terms of associated blood compounds such as cytotoxic plasma-, T- and B-cells (up-regulated in type 2) and erythrocytes, platelets, neutrophils and cells of myeloid lineage (up in type 1) (Figure 3A and Figure S 10). Another study extracted ageing signatures of the blood transcriptome [11]. Genes of decreasing expression (‘age_dn’) accumulate near spots I and J (up in type 2) while genes of increasing expression (age_up) are found in wider areas around spots A, M and H (up in type 1) (Figure S 11). This asymmetry of the numbers of spots suggests that age_up involves a more heterogeneous collection of molecular mechanisms than age_dn (see below). Another set of signatures was obtained recently in a study of the blood transcriptomes collected from patients of sepsis framed with CAP (community acquired pneumonia) [8](Figure 4F). These signatures surprisingly correspond to signatures of nominally healthy individuals, e.g. patients with less severe CAP show signatures of type 2 transcriptomes, and while more severe CAP cases show type 1 transcriptomes associating partly with activation of inflammatory and endotoxin tolerance characteristics [8].
Next, we made use of a repertoire of 382 functionally annotated expression modules extracted from a recent meta-analysis of the blood transcriptomes of 16 disease and physiological states [52](Figure 4G and Figure S 12). Clustering of these signatures sub-stratifies them into three of type 1-like clusters which are strongly affected by spot O (C1 in Figure 4G), A (C2) or C (C3), respectively. Their profiles resemble those of the different severe CAP transcriptomes and can be interpreted as inflammatory signatures which are modulated by increased and decreased erythrocyte (spot C) and thrombocyte (spot N) activation patterns, respectively. Further, the 382 modules provide a rich repertoire of functional annotations, which support interpretation of our data (see example profiles in Figure 4G and Supplementary File 4 for the full set of profiles). For example, age_dn modules agree with DNA-methylation signatures in the blood. Methylation of CpG’s in the promoters or enhancers upon ageing obviously repress transcription of the respective downstream gene (see also Figure S 1), which is in agreement with the finding that altered methylation sites enrich in ageing genes [11]. Moreover, we find strong enrichment of 91 of these modules in at least one of our spots (Figure S 12A). Hence, the spots provide a sort of basis set of co-regulated genes, which further expands into a rich collection of functional annotations of different categories via a multitude of combinations as considered by our cPATs (see above). Correlation analysis of different previous blood signature sets [8, 11, 52, 53] and our spot profiles provides very similar patterns in support of this view on the modular structure of the blood transcriptome (Figure S 13). In summary, comparison of previous blood signatures with our data shows that our spot-modules represent a sort of minimum set describing co-expression of the blood transcriptome. It expands into a rich collection of functional annotations including molecular mechanisms, cellular programs, cell types but also lifestyle factors, diseases and ageing effects.
3.5. Blood cell signatures and seasonal effects
Gene sets implemented in blood cell deconvolution algorithms such as Cibersort [18] show the characteristic correlation patterns observed also in the other blood signatures (compare Figure 4H and Figure S 14). They link the expression patterns of 22 blood cell types with our spot profiles. Elevated expression (and cell fractions, Figure S 15) of monocytes, neutrophils and eosinophils is observed in type 1 transcriptomes while overall expression of T- and B-cells upregulates in type 2. Expression of M1 macrophages and dendritic cells associate with the IFN-response signature (spot L). Furthermore, signatures of monocytes, M0 and M2 macrophages are also enriched in spot L, however in combination with the inflammatory spot O. Recent studies report seasonal changes of gene expression of the blood transcriptome and of blood cell counts [54, 55]. We find a slight shift of transcriptome characteristics towards type 1 in winter compared with summer both for men and women (Figure S 16). It is characterized by increased expression levels of inflammation (spot A) and erythrocyte expression (spot C) and counts and decreased levels of thrombocyte characteristics (spot N) and reticulocyte and eosinophil counts (Table S 5). Overall, the seasonal changes of type compositions are relatively small (less than 3% in men and 1% in women) and are not explicitly considered further.
3.6. Phenotype portrayal: Blood cell counts, lifestyle, medication and disease history
Previous blood transcriptome studies also extracted gene signatures which associate with health-related features such as BMI (body mass index) and smoking status and also with the development of different diseases such as heart failure [56], dental caries [57], schizophrenia and neoplasms [52]. We find that they predominantly upregulate in type 1 transcriptomes showing characteristics of ageing and/or inflammation (Figure S 17). The LIFE-adult study provided a series of features characterizing health and lifestyle of the participants in terms of so-called phenotypes (Table S 1). We associated them with the blood transcriptomes in a participant-matched fashion using phenotype portraits, which typically show areas of positive (colored in red) and negative (in blue) correlation between phenotype features and expression profiles in the transcriptome landscape with metagene resolution (Figure 5A, and for details Figure S 19 - Figure S 23). For example, phenotype associations with expression patterns of type 1 (red in the lower left part of the map) or type 2 (red in the upper right part) can be distinguished. In addition, overview maps were generated for each of the phenotype categories, which mark the metagene of maximum (and minimum) correlation for each of the phenotypes studied. Enrichment of phenotypes is evaluated in terms of the distribution of cases among the transcriptome types (Figure 5C, for enrichment significance evaluation using Fishers exact test see Figure S 19D - Figure S 23D).
We find that most blood count data correlate either with type 1 (e.g. erythrocytes, reticulocytes, platelets, neutrophils) or type 2 (lymphocytes) transcriptomes in agreement with the blood cell transcriptomes analyzed above. Smokers, alcohol consumers (> 30 g/day), obese and elderly people, men and participants taking different categories of medication according to the ATC (Anatomical Therapeutic Chemicals) classification and also participants with different self-reported lifetime diseases show preferences for type 1 (and partly type M) transcriptomes while younger, under- and normal-weight participants, women and non-consumers of medication associate preferentially with type 2. The degree of correlation with metagene expression is markedly higher for blood counts compared with the other phenotypes (Figure 5C).
Part of the blood count portraits indicate fingerprint-like correlation patterns specific for the different blood compounds (Figure 5A, B, Figure S 18, Figure S 19 and Figure 4H). The portraits of the phenotypes of the other categories partly resemble those of blood counts, this way reflecting close association between them. For example, the ‘ageing’ portrait (visualizing the correlation between age and transcriptome) can be understood as superposition of the red blood cell (RBC)- and neutrophil (NE)-phenotype portraits indicating the increased levels of RBC and NE in elderly people (see next subsection). The ‘alcohol consumption’ portrait also resembles the RBC-portrait while smoking reveals an eosinophil (EO)-like patterns. Part of the medication and disease history portraits can be interpreted similarly reflecting, e.g., that part of medications and diseases are more prevalent in elderly people (see the mean age data of each of the phenotypes listed in Table S 1) and consequently associate with increased RBC- and NE-levels and decreased lymphocyte (LY)-counts (Figure S 22-Figure S 23).
Other phenotype portraits, e.g. those of different age ranges (see next subsection) and of different medications, cannot be simply interpreted as composite of blood count portraits. For a more detailed view we performed correlation and multiple regression analysis to estimate the particular effect of phenotypes on spot expression (part C – F in Figure S 19 - Figure S 23). We find a close relationship between high correlation coefficients and significant contributions of phenotype-coefficients (log p <-6) especially for spots located in the lower left and upper right corners of the map. These refer first of all to age, obesity, gender, RBC and white blood cell (WBC) counts and LY, medications of the groups C (cardiovascular system) and B (blood forming organs) and the previous diseases HL (hyperlipidemia), DIA (diabetes), HT (hypertension) and CAN (cancer).
In summary, phenotype portrayal visualizes fine structures of the effect of health and lifestyle factors on the blood transcriptome. They reflect alterations of blood cell composition and presumably also the specifics of the transcriptional programs activated in the different cells. The transcriptome types (and subtypes) resolve the heterogeneity of blood transcriptomes while the spot modules provide a metrics for its quantification. Overall, the phenotype portraits enable an intuitive, perception-based interpretation in terms of function and mutual associations between the different features.
Figure 5
3.7. Portrayal of ageing, obesity and of serum markers
Ageing and alterations of the BMI are accompanied by changes of the composition of transcriptome types in a gender-specific fashion (Figure 2D). Functional analysis shows that expression of type 1_up transcriptomes gains with age while expression of type 2_up decays on average (see the plots with age-ranked samples in Figure S 5 - Figure S 11). Plots of spot expression as a function of age and BMI reveal further details (Figure 6A): Spot expressions related to red blood (spot C) and platelet (spot N) characteristics increase as a function of age and BMI with differences between the mean LOESS-curves for men and women (compare the red and blue curves) in correspondence with the blood count data (Figure S 18). In turn, the expression curves of spots related to immunity (I and J) decay with age and BMI in a nearly sex-independent fashion. On the other hand, the curves show similar courses at different levels for the transcriptomic types which suggests type-independent ageing mechanisms. The ageing curves are partly non-linear where the slopes get steeper for ages above 55- 60 years (e.g. for spot A and I, indicative for inflammation and immune response, respectively) or above 65- 70 years (spot L, IFN response), which suggests altered mechanisms in elderly people above different age thresholds.
Gene maps of previous ageing signatures [11] reveal an asymmetrical distribution of ageing_up and ageing_dn genes (Figure 6B). The latter ones accumulate within a narrow area in and around spots I and J in the right upper corner of the map giving rise to strong correlation between signatures’ expression and that of these spots. Deactivation of associated cellular functions such as immune response, telomere maintenance and/or ribosomal and mitochondrial activities with age obviously proceeds homogenously, presumably driven via mechanisms such as DNA hyper-methylation (Figure S 12). In contrast, ageing_up genes distribute much more heterogeneously between different spot-regions where each of them shows a specific profile of expression gaining with age (see curves of spots A, O, N, M, L and H in Figure 6A). Ageing is obviously accompanied or even driven by the activation of a multitude of inflammatory mechanisms involving different molecular and cellular components (see spot characteristics), which combine in a patient specific fashion giving rise to a relatively heterogeneous ageing_up signature.
The mean ageing portrait (‘all ages’ in Figure 6C) corresponds to the distribution of ageing_up and ageing_dn genes of the ageing signature [11] (compare the respective gene set maps with the red and blue areas in Figure 6B, respectively). Moreover, the ageing portrait can be roughly interpreted by the superposition of increasing RBC- and NE-like (positive correlation in red) and decaying LY-like (negative correlation in blue) contributions (compare with the cell count portraits in Figure 6E) in agreement with the increase/decrease of the expression of the respective landmark spots C, O and I, J, respectively. Inspection of gender- and age (decade)-stratified portraits reveals that elderly women and men (> 60 years) are similarly affected by an increase of NE- and IFN-related (found especially for subtype M.3) characteristics while the RBC-like pattern (typical for subtype 1.3) is more pronounced for mid-aged men (40 – 60 years). The mean BMI-portrait (‘all BMI’ in Figure 6D) shows characteristics of type 1 transcriptomes without the NE-like patterns and the elevated expression of spot L observed in the respective ageing portrait. Interestingly, the BMI-stratified portraits ‘switch’ from type 2 into type 1 for obese women and men (BMI> 30 kg/m2), due to gained (positive) correlations between BMI and inflammatory (spot A), RBC- (spot C) and platelet (spot N) characteristics, on one hand, and decaying immune response (spots I, J) expression signatures on the other one. Interestingly, this behaviour possibly associates with the so-called obesity-paradox claiming that an intermediate BMI about 25 kg/m2 associates with minimum health risk [58] and thus with a switch from positive to negative effect of increasing BMI on health.
For further comparison, we generated phenotype (correlation) portraits of four selected serum protein markers (Figure 6E). The portraits of hsCRP (human serum C-reactive protein) and of cytostatin C reflect footprints of inflammation (spot O) and IFN-response (spot L) in the blood transcriptome associated with NE-like patterns of the blood counts. The portrait of ferritin closely resembles that of RBC reflecting correspondence between the level of stored iron and erythrocyte expression (spot C). The transferrin portrait reveals a different patterns associating with diminished spots O (inflammation) and especially L (IFN-response) and enhanced spot N (thrombocytes), possibly due to the role of platelets in iron transport [59]. In summary, ageing and obesity associate with characteristic alterations of the blood transcriptome reflecting a fine interplay between inflammatory and iron physiology as mediated by molecular (as IFN-response), cellular (e.g., WBC and RBC) and serum protein compounds.
Figure 6