Identify the distinct gene sets based on GSVA
The flowchart of this study is illustrated in Figure 1. All patients in GSE6535 were grouped according to the infection status and analyzed through GSVA. The variation of the activity for gene sets was estimated and the matrix containing enrichment scores was depicted in a heatmap (Fig. 2). Next, the enrichment score (ES) of gene sets between Gram-positive sepsis patients and Gram-negative sepsis patients was compared. There were totally 373 differential gene sets were confirmed. The ES heat map showed that the ES patterns may distinguish Gram-positive sepsis patients from Gram-negative sepsis patients easily (Fig. 3a). In addition, we also screened 640 differential gene sets between Gram- negative sepsis patients and mixed infection patients, 682 differential gene sets between Gram-positive sepsis patients and mixed infection patients, which were also displayed in heatmap (Fig. 3b, c). The top ten representative differential gene sets between different groups were also listed (Additional file 1: table S1). After intersection analysis, two distinct immunologic gene sets“GSE13522_CTRL_VS_T_CRUZI_Y_STRAIN_INF_SKIN_129_MOUSE_UP” , GSE23308_WT_VS_MINERALCORTICOID_REC_KO_MACROPHAGE_CORTICOSTERONE_TREATED_DN” were identified (Fig. 3d). The detailed expression of each infected patient is also described in the heatmap, which Gram-positive sepsis patients exhibit the relatively highest expression in gene set “GSE13522” and the lowest expression in gene set “GSE23308” (Fig. 3e).
PPI network construction, module analysis and hub genes identification
Next, the PPI network of the two distinct gene sets (335 genes) was constructed from STRING. Based on the information from this public database, a total of 242 nodes and 479 protein pairs were obtained while the isolated genes without interaction were removed. To further investigate the hub genes, the plug-in app “cytoHubba” were used to parse the network and top 5 hub genes were identified according to the “Degree” algorithm (Fig. 4a), including SRC (degree =33), IL1B (degree =20), CD40 (degree =20), TLR6 (degree =16), and CCL2 (degree =16). After that, the Module analysis was performed by MCODE and three modules were screened. The module 1 was the most significant module, located in the center of the entire PPI network, included 8 genes and 24 edges (Fig. 4b). The module 2 and module 3 had 11 nodes (Fig. 4c) and 6 nodes (Fig. 4d) respectively, containing several hub genes such as IL1B, TLR6 and CCL2 (Fig. 4d).
Screening differential gene sets with GSEA and GSVA
To further elucidate the different pathway involved in Gram-positive sepsis and Gram-negative sepsis, GSEA was performed between the two groups in GSE6535. It evaluates the microarray data by performing unbiased global searches for genes that are coordinately regulated in the three predefined gene sets. The results showed significant differences in enrichment. The analysis of the hallmark gene sets revealed that there were four significantly enriched gene sets, HALLMARK_APICAL_JUNCTION, HALLMARK_NOTCH_SIGNALING, HALLMARK_KRAS_SIGNALING_DN, HALLMARK_INTERFERON_ALPHA_RESPONS. The enrichment of c2 indicated that there were 226 differential gene sets while the enrichment of c7 showed 199 differential gene sets. The representative plots of each gene sets were are shown in Figure 5. After that, the intersection gene sets based on the two algorithms, GSVA and GSEA, were finally confirmed through Venn analysis. A total of 19 gene sets were obtained (Table 1), most of which are related to immunity.
GO and KEGG enrichment analysis
To gain more biological insight of the screened gene sets, GO annotation and KEGG pathway enrichment analysis were conducted with the 19 gene sets. The top 10 enriched GO terms and KEGG pathways were identified and presented in Figure 6. GO analysis showed that the most enriched MF terms were actin binding, cadherin binding, cytokine receptor binding and protein-macromolecule adaptor activity (Fig. 6a). For GO CC analysis, the top five significantly enriched terms were cell-substrate junction, focal adhesion, collagen-containing extracellular matrix, cell leading edge and membrane region (Fig. 6b). In the BP, the genes were mainly enriched in response to virus, defense response to virus, response to interferon-gamma, cellular response to interferon-gamma and NF-κB signaling ((Fig. 6c). KEGG pathway analysis demonstrated that genes were mainly enriched in MAPK signaling pathway, Pathogenic Escherichia coli infection, Salmonella infection, Epstein-Barr virus infection and Influenza A ((Figure 6d).
Differential gene sets verification with GSE13015
The differential gene sets between patients with Gram-positive Sepsis and Gram-negative Sepsis were further verified with dataset GSE13015. According to GSEA, there were 7 significantly enriched gene sets in the hallmark gene sets, 296 in c2 gene sets and 404 in c7 gene sets. Although there was no common gene set with dataset GSE6535, 31 differential gene sets were confirmed after Venn analysis based on GSVA and GSEA (Table 1), including 5 in c7 gene sets and 26 in c2 gene sets. The further analysis showed that there were 10 common differential gene sets between dataset GSE13015 and dataset GSE6535 based on GSVA (Fig. 7a). Next, the result of gene sets comparison and the corresponding P value was also shown (Fig. 7b). Compared with Gram-negative sepsis patients, the expression of most gene sets was increased in gram-positive sepsis patients (6 in GSE6535 and 7 in GSE13015).