Immune and stromal scores are associated with AML clinical parameters
We obtained gene expression profiles and clinical information of 138 AML patients from TCGA database (Supplementary Material 1). Among them, 74 (53.6%) were male and 64 (46.4%) were female, with a median age 55.5 (range 21-88) years old. According to the FAB classification, there were 13 cases of M0, 32 M1 cases, 34 M2 cases, 15 M3 cases, 27 M4 cases, 14 M5 cases, 2 M6 cases, and 1 M7 case. The ESTIMATE algorithm was used to calculate the immune scores and stromal scores of AML patients. The median immune score was 3027.01 (range from 1686.31 to 4291.79) and the median stromal score was -1024.97 (range from -1910.13 to 297.36).
We analyzed the relationship between immune scores and stromal scores and clinical parameters of AML patients. Cases with M4 subtype AML had the highest immune scores while cases with M3 subtype had the lowest immune scores (P < 0.0001; Figure 1A). Similarly, M4 cases had the highest stromal scores, whereas M0 subtypes had the lowest (P < 0.0001; Figure 1B). According to cytogenetics, AML patients were divided into three groups: favorable, intermediate/normal and poor. There was an obvious correlation between the cytogenetic risk and the immune scores (P =0.0141; Figure 1C), but no significant correlation between the cytogenetic risk and the stromal scores was observed (P =0.6192; Figure 1D).
Using the median immune or stromal score as a threshold, AML patients were divided into two groups with low immune/stromal score and high immune/stromal score. Survival analysis showed that the survival rate of AML patients with low immune scores was significantly higher than that of patients with high immune scores (P = 0.0224; Figure 1E). However, there was no significant difference in survival between patients with low stromal scores and those with high stromal scores (P = 0.4166; Figure 1F).
Identification of differentially expressed genes (DEGs) based on immune scores and stromal scores
Setting the cut-off criteria as |log FC|> 1.3 and FDR < 0.05, we identified 1399 mRNAs (Figure 2A) and 258 lncRNAs (Figure 2B) based on immune scores, and 1166 mRNAs (Figure 2C) and 212 lncRNAs (Figure 2D) based on stromal scores. Setting the cut-off criteria as |log FC|> 2 and FDR < 0.05, we identified 26 and 17 miRNAs based on immune scores (Figure 2E) and stromal scores (Figure 2F), respectively. The DEGs of the low vs. high immune score or stromal score groups were illustrated in the heat map (Figure 2).
Functional enrichment analysis of DEGs
Based on the DAVID (The Database for Annotation, Visualization and Integrated Discovery) gene annotation tool, we performed gene ontology (GO) analyses of both upregulated and downregulated DEGs. The top 25 GO biological process indicated that the upregulated DEGs based on immune or stromal scores were primarily enriched in neutrophil degranulation, regulation of immune response, signal transduction and inflammatory response (Figure 3, A-B), while the downregulated DEGs based on immune/stromal scores were primarily enriched in rRNA processing, regulation of translation, regulation of transcription and cell differentiation (Figure 3, C-D). Subsequently, we performed KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment and interrelation analysis. KEGG analysis revealed that the upregulated DEGs were mainly enriched in infection, osteoclast differentiation, NOD-like receptor signaling pathway, hematopoietic cell lineage and cell adhesion molecules (CAMs) pathways (Figure 4, A-B), while the downregulated DEGs were mainly enriched in ribosome, metabolism, PI3K-Akt signaling pathway, transcriptional dysregulation in cancer and miRNAs in cancer (Figure 4, C-D). Above analyses revealed that these DEGs play a vital role in the development of AML and thus require further research to determine their biological contribution.
Construction of weighted correlation network analysis (WGCNA) and identification of key modules
Based on the results of survival analysis, DEGs based on immune scores were selected for subsequent analysis. The best β value in the lncRNA/mRNA coexpression network was 5, which was calculated using the 258 differential lncRNAs, 1399 differential mRNAs, and their expression data in leukemia samples. Next, the method of dynamic tree cutting was used to produce coexpression modules. Finally, 8 modules of lncRNA-mRNA coexpression networks were generated and the heat map plot of topological overlap matrix (TOM) was shown (Figure 5A). Each module was calculated and plotted with its corresponding clinical characteristics. Correlation analysis showed that Turquoise module displayed the highest relationship with AML immune scores (r=0.77), which included 760 mRNAs and 86 lncRNAs (Figure 5B). These 760 mRNAs were further used to perform the gene enrichment analysis. The genes were most related to neutrophil degranulation, immune response, inflammatory response, signal transduction and toll-like receptor signaling pathway (Figure 5C). In addition, genes were highly enriched in infection, osteoclast differentiation, NOD-like receptor signaling pathway, metabolic pathways and hematopoietic cell lineage by KEGG analysis (Figure5D).
ceRNA network construction
Since the Turquoise module showed the highest relationship with AML immune scores, we selected lncRNAs and mRNAs in the Turquoise module and 26 differentially expressed miRNAs based on immune scores to construct a ceRNA network. Firstly, based on the PITA and miRcode online database that matches potential miRNAs with lncRNAs, a total of 315 lncRNA-miRNA pairs contained 63 lncRNAs and 25 miRNAs. Then, we searched for the mRNAs targeted by the differentially expressed miRNAs using three target gene prediction websites, miRanda, miRWalk, and TargetScan. Using these websites, we detected 664, 671, and 655 target mRNAs, respectively. Based on the Venn intersection analysis, 289 target mRNAs were selected. Subsequently, we matched the predicted target genes with the mRNAs in the Turquoise module. Then, we constructed the ceRNA network by integrating the miRNA-lncRNA-mRNA interactions. At last, a final lncRNA-miRNA-mRNA ceRNA network was constructed with 33 lncRNAs, 21 miRNAs and 135 mRNAs (Figure 6A).
Protein-protein interaction (PPI) network analysis
To further explore the interplay among the mRNAs in ceRNA, we constructed a PPI network based on the STRING (The Retrieval of Interacting Genes) online database (Figure 6B). In the network, TLR8 (Toll Like Receptor 8), ICAM1 (Intercellular Adhesion Molecule 1), TLR6 (Toll Like Receptor 8), and IL10RA (Interleukin 10 Receptor Subunit Alpha) had higher degrees (16, 13, 10, and 10, respectively) (Supplementary Table 1). The genes encoding these proteins have been confirmed to be associated with immune microenvironment and leukemia progression[24-27].
Association between mRNAs, miRNAs and lncRNAs in ceRNA and overall AML survival
We further analyzed the prognostic values of mRNAs, miRNAs and lncRNAs in the ceRNA network. Subjects were divided into high-expression and low-expression cohorts according to the median value of these genes. For overall survival, the high-expression and low-expression cohorts were split for the log-rank test through the package survival in R software. 15 out of 33 lncRNAs (AC009948.5, CMAHP, CTA-331P3.1, FCGR2C, GRK6P1, LINC00539, LINC01272, MIPEPP3, PSMB8-AS1, RP11-266L9.5, RP11-320G10.1, RP11-421F16.3, RP11- 439E19.10, RP11-792A8.3 and STAG3L2), 2 out of 21 miRNAs (hsa-miR-125b-5p and hsa-miR-338-3p) and 31 out of 135 mRNAs (AGPAT3, ANKRD27, CBX2, CCND2, CD300LB, CYTH1, ERG, GDF11, IGF1R, KIAA0513, KIAA0930, LARP1, LFNG, LPCAT1, NREP, NUDT16, POU2F2, PPM1H, PTAFR, QSOX2, RAB3D, RALGPS2, SIGLEC7, SLC43A2, SRSF6, TNFAIP2, TNS3, TRIB1, ZBTB5, ZNF70, and ZNRF1) were associated with overall survival according to the log-rank test (P < 0.05). Representative genes are shown in Figure 7.