1. Overview of workflow
We aimed to construct a lncRNA-module network composed of modules associated with sepsis pathology and lncRNAs with prognostic potential. To construct the network, we started by collecting sepsis gene expression datasets. Two datasets GSE65682 and GSE69528 were used in this study and were served as the primary and validation datasets, respectively. Then the analysis was performed mainly on the primary dataset following the procedure in Fig. 1. (1) Preprocessing the raw data using RMA. (2) Establishing the gene expression matrix for the genes with high expression variance. (3) Constructing the gene coexpression network represented by the Pearson correlation coefficients of all gene pairs. (4) Identifying gene coexpression modules using WGCNA. (5) Filtering out unstable modules by another validation expression dataset. Only the modules detected in both datasets were retained for subsequent analysis. (6) Screening DEGs between the sepsis and normal samples for the primary dataset. (7) Identifying DEMs using hypergeometric test. (8) Calculating module eigengene (ME) and perform survival analysis. (9) Identifying survival associated modules (SAMs) by examining the correlation between ME and patient survival outcome. (10) Combing DEMs and SAMs and define them as sepsis modules. (11) Constructing the lncRNA-module interaction network. The interactions were established using hypergeometric test to assess whether a sepsis module significantly overrepresents the target genes of a lncRNA. (12) Select the hub lnRNAs connecting more than three sepsis modules as candidate sepsis lncRNAs. Five sepsis lncRNAs were ultimately identified, FENDRR, MALAT1, TUG1, CRNDE, and ANCR.
2. Coexpression network and modules
The primary results were based on the GSE65682 dataset as it has the largest sample size, 760 sepsis vs 42 normal samples. For this working dataset, we constructed a coexpression network consisting of 11,222 genes with expression variance over 0.5 across the 760 sepsis patient samples. The topological overlap matrix illustrates an apparent organizational structure of the sepsis gene coexpression network, demonstrating that sepsis configures an array of specific coexpression structure. In total 59 modules were detected with sizes ranging from 30 to 750 (Fig. 2A). Different coexpression modules are highlighted in distinct colors. The detailed procedure of module identification and the module dendrogram are shown in Material and Method section and Supplementary Fig. 1.
Using the same procedure, we also identified another set of gene module based on an independent microarray dataset GSE69528 for validation (Supplementary Figs. 2 and 3). Common genes detected in both datasets were used for the coexpression network construction. Only the reproducible modules were retained for the subsequent analysis to investigate the expression change of modules during disease progression. As shown in Fig. 2B, rows are modules identified from our primary dataset GSE65682, while columns are modules determined from the validation dataset GSE69528. Significance of pairwise module overlap was measured by the -log10 transferred hypergeometric test p-values. It is clear that a high reproducibility was achieved for the two module lists. 52 out of 59 modules have at least one significant (P < 0.01, hypergeometric test) overlapping modules in the validation dataset (Fig. 2C).
3. Establishment of sepsis modules
Using the t-test p-value of 0.01 and absolute fold change of 2 as thresholds, we screened 750 down-regulated DEGs and 391 up-regulated DEGs from the primary dataset (Fig. 3A). The down-regulated DEGs are significantly involved in biological processes like neutrophil mediated immunity, defense response to bacterium, platelet degranulation, etc. (Fig. 3B), while the up-regulated DEGs are enriched in the functional categories of T cell activation, Lymphocyte activation, T cell receptor signaling pathway, etc. (Fig. 3C).
To determine the expression difference of modules between the sepsis and normal samples, we adopted the hypergeometric test to evaluate whether a module significantly overrepresents up-regulated or down-regulated DEGs. A module is referred to as Differentially Expressed Module (DEM) if a substantial large fraction of genes is differentially expressed, indicating distinct expression pattern between the sepsis patients and the healthy samples. Thus, some modules are over expressed in sepsis whereas some others are low expressed. In total ten up-regulated and 13 down-regulated DEMs were detected from the sepsis coexpression network (Fig. 2D).
Moreover, to identify the modules associated with clinical outcome in sepsis, we performed multivariate Cox regression analysis to assess the significance of the correlation between patient overall survival and the Eigengene (EG) values of each module. As shown in Fig. 3E, the risk scores of the EG values were sorted with corresponding survival information for module 32. The dotted line in the middle of the figures corresponds to the median of EG value, which stratifies the sepsis patients into two subgroups with high and low risk. Figure 3D illustrates the Kaplan–Meier curves for the patients with clinical information according to the EG of M32. Patients with high EG values show much poorer prognostic than those with low EG values, indicating that the dysfunction or oncogenic of M32 is close related to the prognosis of sepsis patients. The expression profilings of the DEGs in module 32 are illustrated as a heatmap in Fig. 3F. In total, we identified 14 modules from sepsis samples whose EGs are substantially correlated with patient overall survival and we defined them as survival-associated modules (SAM).
4. Characteristic of sepsis modules
31 sepsis modules, including both SAM and DEM, were screened from the primary dataset, implying novel gene signatures associated with sepsis pathology. We note an overlap of six modules (around 20%) between the two sets of SAM and DEM. Three of them are down-regulated, i.e., M22, M32, and M4, while the other three are up-regulated, i.e., M15, M23, and M47. The down-regulated module M22, for instance, consists of 22 genes closely co-expressed with each other; nine out of them are down-regulated DEGs playing as hub genes in the module (Fig. 4A). Kaplan–Meier curves were plotted for the rank-ordered Eigen Module values of M22 to carry out the 28-day survival analysis (Fig. 4B). It is apparent that patients with high EG value have substantial shorter survival time than those with low EG value. M22 are mainly implicated in biological processes like T cell activation, regulation of lymphocyte activation, leukocyte cell-cell adhesion, etc. (Fig. 4C). For the up-regulated module M47, it has 26 gene members and nine of them are DEGs that up-regulated and more topologically important (Fig. 4D). The Kaplan–Meier curves show that patients with high EG value of M47 have a significantly worse prognosis than the low- EG ones (Fig. 4E). M47 are involved in function categories of neutrophil mediated immunity as well as neutrophil activation and degranulation (Fig. 4F). Some other sepsis modules and their corresponding Kaplan–Meier curves are shown in Supplementary Figs. 4 and 5.
Interestingly, we found that DEGs in the sepsis modules, either up-regulated or down-regulated, are prone to play a central role topologically in comparison to the non-DEGs. For instance, DEGs in M47 have an average correlation coefficient of 0.6 while the connectivity is merely 0.45 for the other genes (p < 3.96E-05, Mann–Whitney U test). Similar results can be observed for the other four modules of both SAM and DEM, including M15, M22, M23, and M45 (Fig. 4G). The DEGs of M32 are overall more correlated with module gene members in expression, although not significantly. Generally, the average correlation coefficient of module DEGs is significantly higher than that of the other genes, suggesting that genes differentially expressed may drive the biogenesis or dysfunction of the coexpression gene modules.
5. Sepsis lncRNA candidates
We constructed a lncRNA-module interaction network including 251 interactions between 23 sepsis modules and 201 lncRNAs (Fig. 5A). Although most of the lncRNAs regulate none or merely a single sepsis module (Fig. 5B), FENDRR, MALAT1, TUG1, CRNDE, and ANCR connect multiple sepsis modules with the connectivity of 14, 10, 10, 8, and 5, respectively, which are expected to have high potentials to be involved in the sepsis progress.
A subnetwork concentrating on the five candidate sepsis lncRNAs and their regulated sepsis modules is shown on the bottom panel of Fig. 5A. FENDRR, the FOXF1 adjacent non-coding developmental regulatory RNA, plays as a hub regulator mediating 14 sepsis modules in the lncRNA-module interaction network. Both MALAT1 and CRNDE regulate ten sepsis modules and they share seven common modules, six out of them are down-regulated. TUG1 interacts with six down-regulated and two up-regulated DEMs, indicating that TUG1 tend to involve in under expression pathways. In contrast, ANCR (Angelman syndrome chromosome region) links four up-regulated DEMs and only one down-regulated DEMs, suggesting that ANCR may mediate some over expression pathways implicated in sepsis.
Furthermore, we investigated the regulatory mechanism of how the lncRNAs regulate the modules in sepsis from the perspective of competing endogenous RNAs (ceRNAs), which impact the translation rate of mRNAs by competing for shared miRNAs [21, 22]. lncRNAs are able to share the same miRNA response elements with mRNAs transferred by the sepsis modules, thereby sponging miRNAs intended to bind to these mRNAs and depressing the overall expression level of sepsis modules (Fig. 5C). Several miRNAs have been previously validated as potential regulators in sepsis, such as miR-34a, miR-206, and miR-199b-5p. By these miRNAs, we found that CRNDE regulates module 5 and module 20 through miR-199b-5p (CRNDE ◊ miR-199b-5p ◊ M5/M20), indicating that CRNDE acts as a miRNA sponge of miR-199b-5p and thereby modulating the transcripts of genes in module 5 and module 20 (Fig. 5D). Similarly, MALAT1 regulates module 7 and module 20 through the miR-206-mediated lncRNA-mRNA interactions (MALAT1 ◊ miR-206 ◊ M7/M20). The in-detail information of these candidate lncRNAs including secondary structure and miRNA targets are provided in Fig. 5E and F.
6. Validation using independent lncRNA datasets
Following a computational pipeline (see Methods), we reannotated the probes from three array datasets to obtain the lncRNA expression profilings. We separately screened the differentially expressed lncRNAs (DELs) from the datasets of GSE95233, GSE57065, and GSE28750. Our finding shows that four out of the five sepsis lncRNA candidates are differentially expressed in at least one independent dataset except for ANCR, whose probes were not covered by the array platform (Fig. 6). Specifically, CRNDE is significantly up-regulated in the sepsis samples of all the three datasets, in which the (log2 transferred) fold changes are 0.43, 0.55, and 0.66, respectively. FENDRR and MALAT1 are significantly down-regulated in two datasets, while TUG1 is differentially expressed only in the dataset of GSE95233.
CRNDE, an oncogene that is usually overexpressed in tumor cells, contributes a lot to cellular proliferation, migration, invasion, and apoptosis [40]. More importantly, CRNDE can modulate the TLR3-NF-κB cytokine signaling pathway to trigger inflammation [41, 42], suggesting that CRNDE may serve as a regulator in sepsis. In sepsis, genes or gene modules inducted by MALAT1 may modulate their expression pattern in endothelial cells, which is critical as MALAT1 has been reported to mediate inflammation in traumatic brain injury [42]. Also, it was reported that TUG1 is able to affect the development of sepsis-associated acute kidney injury via modulating NF-κB pathway [43]. FENDRR has never been mentioned in the induction or progress of sepsis before, so it can be consider as a novel lncRNA regulator for sepsis.