Portrayal of the human blood transcriptome of 3,388 adults and its relation to ageing and health

Background: The blood transcriptome is expected to provide a detailed picture of organism’s physiological state with potential impact for applications in medical diagnostics and molecular and epidemiological research. We here present the first systematic analysis of blood specimen of 3,388 adult individuals collected in the Leipzig Research Center for Civilization Diseases, together with phenotype characteristics such as disease history, medication status, lifestyle factors and body mass index (BMI). Methods: Multidimensional Self Organizing Maps-portrayal was applied to study transcriptional states on a population-wide scale. The method permits a detailed description and visualization of the molecular heterogeneity of transcriptomes and of their association with different phenotypic features. Results: The diversity of transcriptomes is described by about one dozen modules of co-expressed genes of different functional context. We identify two major blood transcriptome types where type 1 accumulates more men, elderly and overweight people and it upregulates genes associating with inflammation and increased heme metabolism, while type 2 accumulates women, younger and normal weight participants and it associates with activated immune response, transcriptional, ribosomal, mitochondrial and telomere-maintenance cell-functions. We find a striking overlap of signatures shared by multiple diseases, ageing and obesity driven by an underlying common pattern, which reflects the increase in inflammatory processes. Conclusions: Our portrayal framework provides a holistic view on the diversity of the human blood transcriptome. It provides a tool for comparative analyses of


Background
Blood is the pipeline of human organism's physiology. The accessibility and minimal invasiveness during sampling made it the most feasible source in scientific research and clinical diagnostics as they could replace more invasive and risky tests [1].
Although relatively simple to extract, blood is not a simple tissue. It accounts for about 8% of body weight and composed of acellular fluid (plasma) and a mixture of multiple cell types at different stages of their life cycles, lineage and function. On transcript level, blood is a complex mixture where changes in transcript abundance can be attributed to either alterations of transcriptional regulation in the different cell types or to relative changes in composition of cell populations. Computational cell type deconvolution techniques such as CIBERSORT [18] (see [19] for an overview) attempt to deduce cellular composition from transcriptome data using statistical methodologies and cell-specific RNA-signatures, which are usually obtained independently in calibration experiments and are assumed to remain 5 invariant. This assumption is imprecise because blood cells can change their state, e.g. via transformation from naïve to specialized cell types involving biological processes such as chemotaxis, surface signalling and cell-cell adhesion, all governed by expression changes of associated genes [16]. Therefore, cell subset classification is to a greater or lesser extent artificial, reflecting our current ability to distinguish cells based on specific sets of available markers [11]. Since blood flows throughout the whole body it implements functions by recirculating between central and peripheral lymphoid organs as well as to and from different tissues, sites of inflammation and/or injury. The cells specifically recapitulate the influence of genetic, epigenetic, cellular and environmental factors, which can vary between individuals and between their particular constitutions over time and age. In conclusion, the blood transcriptome is expected to provide a detailed picture of organism's physiological state with individual resolution. Most of the studies mentioned above focused on only one or a few diseases and/or environmental and lifestyle conditions and didn't answered the question whether the identified gene expression signatures are condition-specific or more widely applicable. Moreover, factors contributing to transcriptome variability among nominally healthy individuals are relatively unexplored so far.
We here present the first systematic analysis of the transcriptomes obtained from whole peripheral blood specimen of more than 3,000 adult individuals collected in LIFE (-adult), the Leipzig Research Center for Civilization Diseases, together with a large collection of phenotype characteristics such as disease history, medication status, lifestyle factors and body mass index (BMI) [9] (Table S 1). Our analysis aims (i), at characterizing the inter-individual variability of the blood transcriptomes in terms of a suited classification scheme; (ii), at describing the diversity of 6 transcriptome states using a collection of modules of co-expressed genes and at characterizing their biological functions; and (iii) at associating the transcriptome landscapes with a series of phenotype data collected in a participant-matched fashion. Overall, these issues are expected to provide a holistic view on essential properties of the blood transcriptome, a methodical framework for its analysis and its possible impact for future applications.
Previously, we have developed an omics 'portrayal' methodology based on selforganizing maps (SOM) machine learning [20,21]. It has been applied to a series of data types and diseases [22][23][24][25][26], among them a study about footprints of pneumonia in the blood transcriptome [8]. SOM-portrayal takes into account the multidimensional nature of gene regulation and pursues a modular view on coexpression, reduces dimensionality and supports visual perception by delivering 'personalized', case-specific transcriptome portraits. They enable a straightforward and intuitive interpretation and mutual comparison of whole transcriptome landscapes between cases and classes. By applying this method to the blood transcriptomes of thousands of individuals we aim at demonstrating that multidimensional SOM-portrayal permits a detailed description and visualization of the molecular heterogeneity of transcriptional states and of their association with different phenotypes with potential impact for applications in medical diagnostics and molecular and epidemiological research.  The LIFE (-adult) research project conducted one of the largest population studies in Germany focusing on extensive phenotyping of urban individuals from Leipzig city [27]. It included more than 10,000 participants in order to discover the interplay between molecular, environmental and lifestyle factors and their impact for the health status of the population. The study was approved by the ethics board of the Medical Faculty of the University of Leipzig. In this publication we analyzed transcriptomic data of whole peripheral blood (WPB) samples, which were obtained from 3,388 adult participants of the LIFE-adult study. They roughly divide equally into women and men covering an age range between about 20 and 80 years with a strong bias towards elderly persons. The LIFE-adult study overall collected a broad survey of lifestyle and health items (see [27] for details). We made use of selected lifestyle characteristics of the participants such as smoking behaviour and alcohol consumption, medication according to ATCs (Anatomical Therapeutic Chemicals) indexing and disease history of the participants collected via questionnaires, blood count data from clinical laboratory including selected serum markers and body mass index (BMI) (Table S 1).

Blood transcriptome sampling, microarray measurements and data preprocessing
We made use of pre-processed gene expression data extracted from WPB samples of individuals as provided by the LIFE data base. Participant's recruitment, blood collection, storage and mRNA preparation, microarray measurements and primary data pre-processing was realized by different groups of the LIFE center [27]. WPB was collected in Tempus Blood RNA Tubes (ThermoFisher, Waltham, MA, USA) and 8 stored at -80 °C until further processing. RNA was isolated and then hybridized to

Self-organizing maps (SOM) transcriptome portrayal
Pre-processed expression values were analyzed using the oposSOM pipeline, available as R-package "oposSOM" [28]. It uses SOM neuronal network machine learning to translate the high-dimensional expression data of N= 19,049 gene transcripts into K=10,000 metagene expression data per individual [20,29]. Each metagene represents a 'micro'-cluster of co-expressed genes showing mutually similar expression profiles across the samples. Metagenes were arranged in a 100x100 two-dimensional grid coordinate system and colored according to their expression level for each sample thus providing a 'personalized' image of the blood

Function mining
We applied gene set analysis to the lists of genes located in each of the spot modules to discover their functional context using right-tailed Fisher's exact test [36,37]. In addition, the gene set enrichment z-score (GSZ) was used to evaluate the impact of the gene sets in the different transcriptomic strata [32,38]. The GSZmetrics considers the mean expression of the gene set normalized by its variance, i.e. it provides high values for homogeneous gene sets reflecting activation of biological functions with high relevance for the respective transcriptional states.
Gene set maps complement this analysis by visualizing the position of the gene of a set within the SOM grid. According to their degree of accumulation in or near the spots, one can deduce their potential functional context [20].

Phenotype portrayal
Phenotype information of the participants comprises their blood cell and marker counts, BMI and information about their lifestyle (smoking and alcohol consumption), medication and disease history (Table S 1). The enrichment of categorical phenotypic characteristics in each of the transcriptomic classes (types and subtypes) was estimated using one-tailed Fisher's exact test and visualized as enrichment heatmaps. Phenotype-to-metagene correlation maps were generated by correlating each of the phenotype parameter-profiles over all participants with each of the metagene expression profiles. The matrix of correlation coefficients obtained was then visualized in the SOM-grid as 'phenotype' portraits using a red-to-blue (maximum-to-minimum correlation) color-code. The metagene of maximum correlation coefficient was marked in the SOM-grid of a phenotype overview map.
Expression of each of the spots was fitted using multiple regression with the phenotype values of the participants of each of the categories as variables.
Standardized regression coefficients were then visualized as heatmaps. represented in all types. Higher percentage of men was noted in type 1 (29% versus 19% for women) and reverse relation was found for type 2 (percentage of women: 37% versus 51%; Figure 2C). Type 1 is higher among elderly persons compared with type 2; however, in the latter, the age-dependence is different between women and men ( Figure 2D). The composition of types for women changes virtually monotonously with a steadily increasing percentage of type 1 in contrast to men, who show a maximum of type composition in the age range of 50 -55 years. Note also that the age dependence of type M more resembles that of type 1 than that of type 2 which suggests functional correspondence between types M and 1 (see below). The type-composition of men and women is virtually independent of BMI (body mass index) except for very obese persons (BMI> 35 kg/m 2 ) which seem to accumulate more type 1 transcriptomes ( Figure 2D).
Taken together, we identify two major blood transcriptome types and an intermediate type partly resembling type 1. Type 1 accumulates more men, elderly participants and it upregulates genes associating with inflammation and increased heme metabolism, while type 2 accumulates women and younger participants. It associates with activated immune response and transcriptional activity. The composition of types changes in a gender-and age-specific fashion. Genes with function in interferon (IFN) response accumulate in spot L without preferential upregulation in one of the three types.
Typically, each of the individual sample portraits shows more than one spot, which reflects the parallel activation of different transcriptional programs and/or their mutual couplings. We subsume frequently observed combinations of expressed spots as so-called combinatorial pattern types (cPATs). Overall we identified 33 cPATs, which were then used to sub-stratify each of the major transcriptomic types into three subtypes (STs, annotated by 1.1, 1.  In summary, the diversity of transcriptional states can be described by the combinatorics of about one dozen modules of co-expressed genes of different functional context which decompose each of the transcriptional types into three subtypes. Profiling of these signatures splits them into two major clusters either upregulated in type 1 (marked with green color in the figures) or type 2 (apricot color), respectively. Type 2 associates, for example, with activation of cell cycling, MYCtarget genes, oxidative phosphorylation (oxphos) while inflammation, hypoxia, coagulation, reactive-oxygen species and pathway signaling of TNFalpha-, TGFbeta-, PI3K-Akt-MTOR-, IL6-JAK-Stat3 activate in type 1. A third cluster (blue color) accumulates signatures related to interferon (IFN) response, which eventually suggests association with viral infections. We analyzed expression signatures derived recently to differentiate between bacterial and viral infections [36][37][38][39][40][41] ( Figure 4B and C, respectively). The former signatures associate with the 15 'inflammatory' spots A, O and, also M, which upregulate in type 1 samples. In contrast, viral signature genes accumulate strongly in the IFN-response spot L, which is found upregulated in about 10% of all samples.
We are also interested in expression profiles of genes involved in telomere length maintenance (TM) via activation of telomerase. Mean telomere length in human leukocytes is negatively correlated with lifespan and BMI [42,43] and it associates with heart diseases, type 2 diabetes, cancer [44][45][46], lifestyle factors [47], diet [48] and psychological stress [49]. TM-genes are more active in type 2 transcriptomes, which suggests that they stronger counteracts telomere shortening in younger (and   ). This asymmetry of the numbers of spots suggests that age_up involves a more heterogeneous collection of molecular mechanisms than age_dn (see below).
Another set of signatures was obtained recently in a study of the blood transcriptomes collected from patients of sepsis framed with CAP (community acquired pneumonia) [8] (Figure 4F). These signatures surprisingly correspond to signatures of nominally healthy individuals, e.g. patients with less severe CAP show signatures of type 2 transcriptomes, and while more severe CAP cases show type 1 transcriptomes associating partly with activation of inflammatory and endotoxin tolerance characteristics [8].  Figure S 1), which is in agreement with the finding that altered methylation sites enrich in ageing genes [11]. Moreover, we find strong enrichment of 91 of these modules in at least one of our spots ( Figure S 18 12A). Hence, the spots provide a sort of basis set of co-regulated genes, which further expands into a rich collection of functional annotations of different categories via a multitude of combinations as considered by our cPATs (see above).
Correlation analysis of different previous blood signature sets [8,11,52,53] and our spot profiles provides very similar patterns in support of this view on the modular structure of the blood transcriptome ( Figure S 13). In summary, comparison of previous blood signatures with our data shows that our spot-modules represent a sort of minimum set describing co-expression of the blood transcriptome. It expands into a rich collection of functional annotations including molecular mechanisms, cellular programs, cell types but also lifestyle factors, diseases and ageing effects.

3.5.
Blood cell signatures and seasonal effects Gene sets implemented in blood cell deconvolution algorithms such as Cibersort [18] show the characteristic correlation patterns observed also in the other blood signatures (compare Figure 4H and Figure (Table S 5). Overall, the seasonal changes of type compositions are relatively small (less than 3% in men and 1% in women) and are not explicitly considered further.

3.6.
Phenotype portrayal: Blood cell counts, lifestyle, medication and disease history Previous blood transcriptome studies also extracted gene signatures which associate with health-related features such as BMI (body mass index) and smoking status and also with the development of different diseases such as heart failure [56], dental caries [57], schizophrenia and neoplasms [52]. We find that they We find that most blood count data correlate either with type 1 (e.g. erythrocytes, reticulocytes, platelets, neutrophils) or type 2 (lymphocytes) transcriptomes in agreement with the blood cell transcriptomes analyzed above. Smokers, alcohol consumers (> 30 g/day), obese and elderly people, men and participants taking different categories of medication according to the ATC (Anatomical Therapeutic Chemicals) classification and also participants with different self-reported lifetime diseases show preferences for type 1 (and partly type M) transcriptomes while younger, under-and normal-weight participants, women and non-consumers of medication associate preferentially with type 2. The degree of correlation with metagene expression is markedly higher for blood counts compared with the other phenotypes ( Figure 5C).

Part of the blood count portraits indicate fingerprint-like correlation patterns
specific for the different blood compounds (Figure 5A, B, Figure S 18, Figure S 19 and Figure 4H). The portraits of the phenotypes of the other categories partly resemble those of blood counts, this way reflecting close association between them.
For example, the 'ageing' portrait (visualizing the correlation between age and transcriptome) can be understood as superposition of the red blood cell (RBC)-and neutrophil (NE)-phenotype portraits indicating the increased levels of RBC and NE in elderly people (see next subsection). The 'alcohol consumption' portrait also resembles the RBC-portrait while smoking reveals an eosinophil (EO)-like patterns.
Part of the medication and disease history portraits can be interpreted similarly reflecting, e.g., that part of medications and diseases are more prevalent in elderly people (see the mean age data of each of the phenotypes listed in Table S 1)   Gene maps of previous ageing signatures [11] reveal an asymmetrical distribution of ageing_up and ageing_dn genes ( Figure 6B). The latter ones accumulate within a narrow area in and around spots I and J in the right upper corner of the map giving rise to strong correlation between signatures' expression and that of these spots.  Figure 6A). Ageing is obviously accompanied or even driven by the activation of a multitude of inflammatory mechanisms involving different molecular and cellular components (see spot characteristics), which combine in a patient specific fashion giving rise to a relatively heterogeneous ageing_up signature.
The mean ageing portrait ('all ages' in Figure 6C) corresponds to the distribution of ageing_up and ageing_dn genes of the ageing signature [11] (compare the respective gene set maps with the red and blue areas in Figure 6B, respectively). associates with minimum health risk [58] and thus with a switch from positive to negative effect of increasing BMI on health.
For further comparison, we generated phenotype (correlation) portraits of four selected serum protein markers ( Figure 6E).

discussion
We 'portrayed' the diversity of the blood transcriptome of a cohort of more than 3,000 nominally healthy adult individuals included into the Leipzig Health Study 'LIFE-adult' in terms of intuitive SOM-images and classified them into three major transcriptome types. The expression patterns were decomposed into a minimum set of modules of co-regulated genes. Their functional impact was interpreted based on previous knowledge including the results of previous blood transcriptome studies.
Finally, we associated the blood transcriptomes with a series of phenotype-features collected in LIFE-adult for the same participants such as age, obesity-status, blood cell count, disease history and medication by means of phenotype portraits. Overall our study provides a comprehensive characterization of the blood transcriptome taking into account the whole spectrum of transcriptional states on a populationwide scale.

SOM-portrayal reduces dimensions of the blood transcriptome
Dimension-reduction and feature extraction are important issues in high-throughput data analysis. Our machine-learning approach reduces the dimensionality of data (number of individuals x (number of genes + number of phenotypes)) into a handful transcriptome types and subtypes [60]. Their expression is governed by about one dozen expression (spot-) modules in close correspondence with a previous modularization of the blood transcriptome [53]. Moreover, data portrayal transforms high-dimensional data landscapes into easy-to-interpret images. Their visual inspection strongly supports analytic tasks on different levels of stratification ranging from individual 'personalized' to subtype-and type-averaged expression portraits. Our study thus provided a sort of album of transcriptomic 'faces' of the LIFE-participants (Supplementary File 2). Importantly, the phenotype portrayal projects low dimensional features such as age or BMI onto the high-dimensional transcriptome landscape, which generates highly granular correlation images serving as 'fingerprint' of the respective phenotype. Their mutual comparison helps to identify associations between them and also singular patterns.

4.2.
Typing reveals parallels between a series of biological 26 functions and health phenotypes The tree in Figure 7A illustrates the similarity relations between the subtype portraits. It reveals a virtually linear arrangement of subtypes along the backbone.
The portraits at the left and right margins (type 1-versus-type 2) differ mainly in antagonistic expression of genes located in opposite corners of their portraits. Our analysis thus uncovered a striking simplicity of the transcriptome. It reflects a characteristic alteration of cell components, namely a decrease in signatures of myeloid-lineage cells and increase in signatures of lymphocytes from the left to the right. This basic pattern is superimposed by transcriptional footprints of erythrocyte and thrombocytes giving rise, e.g., to gender-specific differences; of cytotoxic CTL's playing role in longevity and by patterns reflecting interferon-response, whose amplitude increases, on average, in elderly people (especially above 65 years). The transcriptional (spot-) modules diversify these basic patterns in a subtype-specific fashion ( Figure 7C).
Transcriptome typing and modularization also paved an avenue to describe the effect of age and BMI on the blood transcriptome, and in a wider context, on human's physiology. For example, the percentage of type 1 transcriptomes in the population reflecting more inflammatory characteristics gains with age and, to a less degree, BMI in a non-linear, gender-specific fashion. In agreement with previous results [61] we find a striking overlap of signatures shared by multiple diseases, ageing and obesity driven by an underlying common pattern. It reflects the increase of inflammation and decrease of immune-responsibility ( Figure 7B).
Obesity is associated with leukocytosis and it may be regarded as a state of chronic low-grade inflammation [13,62], which, in turn is considered a driver of many agerelated disorders (inflammaging) [63].
Telomeres are protective nucleoprotein structures that cap the ends of linear chromosomes. They shorten a bit after each cell division. In consequence, telomere length reflects cell ageing [64]. Telomere maintenance mechanisms counteract this process and thus their activation can be indicative for counteracting cell ageing [65,66]. Expression of genes involved in telomerase-maintenance pathway [67] directly correlate with T-cell and immune response signatures suggesting that cell's ability to maintain telomeres associates with better immune responsibility and overall health constitution observed especially in younger, non-obese, non-smoking and non-alcohol consuming people. Interestingly, decay of telomere length with age [68] partly resembles the decay of the amount of type 2 transcriptomes with age and women having a higher fraction of type 2 transcriptomes with activated telomere maintenance mechanisms possess on average longer telomeres than men [68,69].

Parallels between health and disease and asymmetry of transcriptomic changes
Our recent study on sepsis framed by community acquired pneumonia [8], a highgrade inflammatory disease, identified three major axes of the variation of the blood transcriptome, namely the inflammatory axis (endotoxin tolerance, cytotoxic cells), a 'blood-disturbance' axis including mostly erythrocyte and thrombocyte characteristics and the IFN-response axis. These axes of variation were also found in the blood transcriptomes of nominally heathy subjects, however with a decreased amplitude of inflammatory expression changes. Part of our modules continuously change along the subtypes (e.g. I and J: immune response; H: cytotoxic cells), others show subtype-specific activations (C, N; erythrocytes, platelets), while a third category spreads over almost all subtypes (L; IFN-response). Interestingly, also the sepsis signatures split into three type 1_up and only one type 2_up mechanism (Figure 4). They reflect a similar asymmetry of the heterogeneity of mechanisms as found between ageing_up and ageing_dn signatures (see above). It reflects multifactorial activation mechanisms accompanying ageing and/or disease development (see [11] and Figure S 11). Also the phenotype portraits of ageing, obesity, disease history, medication, alcohol consumption and smoking reflect this asymmetry because the majority of them associates with type 1 showing more heterogeneous patterns than type 2 portraits.
We estimated ageing and obesity trajectories of spot expression by non-linear fits through our cross sectional data where stratification by types (and gender) reduces the variance of data points ( Figure 6A). A recent longitudinal study revealed that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age [70]. Type-strata of the blood transcriptome thus eventually more adequately describe ageing effects.
Longitudinal follow-up studies over different age ranges are required to study individual 'life-courses' of the blood transcriptome and their impact for lifetime-risk prediction.

Epigenetic background and ageing clocks
Our analysis also underlines the importance of epigenetic mechanisms, particularly of chromatin (re-) organization for changes the blood transcriptome. Using gene expression of nominally repressed and activated chromatin states as an indicator of gene activity we find a pronounced mutual switching between type 1 and type 2 transcriptomes suggesting that active states in type 2 become repressed in type 1 and that repressed states in type 2 become activated in type 1. Differences of biological mechanisms behind transcriptomic [11] and the epigenetic predictors [71] of ageing or disease progression are not completely clear [11]. We recently reported that transcriptomic and epigenetic mechanisms partly decouple in cancer development [72,73] and cell differentiation [50]. Thus, the expression changes observed might reflect changed chromatin organization leading to altered cell function in type 1 compared with type 2 as discussed, e.g., as epigenetic mechanisms accompanying ageing [74] and inflammation [75][76][77][78] and associating with changes of DNA-methylation and histone-marks governing gene activity.
The search for reliable indicators of biological age, rather than chronological age, attracted large efforts in the last decades [79,80]. DNA methylation-derived epigenetic clocks are currently better in estimating chronological age than transcriptomic or telomere length measures [81]. Currently it is not entirely clear how molecular clocks work, what aspect(s) of physiological or cellular aging they represent and whether age-related changes, such as telomere shortening or DNAmethylation contribute to the causes of ageing or are the results of it. There is only weak correlation between DNA-methylation and transcriptome age predictors meaning that the transcriptomic age and the epigenetic clock describes different aspects of biological aging [11]. DNA-methylation clock was assumed to reflect the function of the epigenetic maintenance system [71]. In support of this we found that DNA-methylation maintenance methyltranferase DNMT1 is part of spot gene cluster J (Table S 3) showing decaying expression with age (age_dn signature of [11]) and correlating with DNA-methylation signatures ( Figure S 12). Hypomethylation accumulates with the number of cell divisions, due to insufficient re-methylation, and also inflammatory factors that increase cell turnover led to increased methylation loss, which in turn, disrupt DNA binding patterns of transcription factors 30 and modify their regulatory role in cell function as discussed above. Coupled transcription and DNA-methylation epidemiological studies are required to better disentangle the relation between epigenetic and transcriptomics of the blood [81].

Consent to publish
Written consent of the participants to publish results of LIFE-adult was obtained.

Availability of data and materials
The data that support the findings of this study are available from the LIFE centre but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of LIFE.
Secondary data are available from 'The Leipzig Health Atlas' repository (https://www.health-atlas.de/; accession number will be available after acceptance of the manuscript).

Competing interests
All authors declare that they have no competing interests.