Identification of a prognostic gene signature associated with microenvironment in acute myeloid leukemia

Background ： Acute myeloid leukemia (AML) is a common hematologic malignancy with poor prognosis. Accumulating reports have indicated that the tumor microenvironment (TME) performs a critical role in the progress of the disease and the clinical outcomes of patients. To date, the role of TME in AML remains clouded due to the complex regulatory mechanisms in it. In this study, We identified key prognostic genes relate to TME in AML and developed a novel gene signature for individualized prognosis assessment. Methods: The expression profiles of AML samples with clinical information were obtained from the Cancer Genome Atlas (TCGA). The ESTIMATE algorithm was applied to calculate the TME relevant immune and stromal scores. The differentially expressed genes (DEGs) were selected based on the immune and stromal scores. Then, the survival analysis was applied to select prognostic DEGs, and these genes were annotated by functional enrichment analysis. A TME relevant gene signature with predictive capability was constructed by a series of regression analyses and performed well in another cohort from the Gene Expression Omnibus (GEO) database. Moreover, we also developed a nomogram with the integration of the gene signature and clinical indicators to establish an individually quantified risk-scoring system. Results: In the AML microenvironment, a total of 181 DEGs with prognostic value were clarified. Then a seven-gene ( IL1R2, MX1, S100A4, GNGT2, ZSCAN23, PLXNB1 and DPY19L2 ) signature with robust prediction was identified, and was validated by an independent cohort of AML samples from the GSE71014. Gene set enrichment analysis (GSEA) of genes in the gene signature revealed these genes mainly enriched in the immune and inflammatory related processes. The correlation between the signature-calculated risk scores and the clinical features indicated that patients with high risk scores were accompanied by adverse survival. Finally, a nomogram with clinical utility was constructed. Conclusion: Our study explored and identified a novel TME relevant seven-gene signature, which could serve as a prognostic indicator for AML. Meanwhile, we also establish a nomogram with clinical significance. These findings might provide new insights into the diagnosis, treatment and prognosis of AML. common clinical indexes to provide guidelines for individually clinical decisions.


Introduction
Acute myeloid leukemia (AML) is a hematologic malignancy with severe lethality. It is one of the most common types of blood leukemia in adults [1] and characterized by impaired hematopoiesis and extensive infiltration of immature myeloid hematopoietic cells, especially in the bone marrows [2]. The conventional chemotherapy for the majority of patients with de novo AML includes the "7+3" induction regimen [3,4], comprising 7 days of cytarabine and 3 days of anthracycline, and the following consolidation chemotherapy or haemopoietic stem cell transplantation (HSCT) [5,6]. Although many patients received initial remission with this standard strategy, either drug resistance or disease relapse occurred frequently and resulted in final treatment failure.
The treatment effects of AML are influenced by confounding factors.
Preceding researches have demonstrated that poor prognosis correlated intensively with advanced age, problematic physical status [7], medical comorbidities [8] and genetic characteristics, which refers to cytogenetics [9][10][11] and molecular abnormalities [12,13]. Based on these, targeted therapy specifically against cytogenetics and molecular aberrations are in vigorous clinical development currently [14][15][16]. Additionally, targeted treatment for tumor microenvironment(TME) has also been proposed and studied for the convincing evidence that the bone marrow microenvironment acts as a vital role in the initiation, progression, and relapse of AML [17,18].
The TME is defined as a complex regulatory network comprised of tumor cells and surroundings such as immune cells, mesenchymal stem cells, fibroblast cells, endothelial cells, inflammatory cells and extracellular matrix [19,20]. Among these non-tumor components in the TME, immune cells and stromal cells are two kinds of major participants that work together to impact tumorigenesis, tumor progression, tumor metastasis and therapeutic efficacy [20][21][22]

. The Estimation of Stromal and Immune cells in Malignant
Tumour tissues using Expression data (ESTIMATE) [23] method is an algorithm used to calculate immune and stromal scores of tumor samples to predict the infiltration of immune and stromal cells.
Previous studies have widely applied the ESTIMATE algorithm and clarified the meaningful relationship between microenvironment and multiple tumors, including glioblastoma [24], lung adenocarcinoma [25], gastric cancer [26], ovarian cancer [27],etc. The role of TME in AML has been explored and become a hotspot as well. Components in the bone microenvironment have been demonstrated to increase the proliferation of AML cells [28,29] and significantly impeded the therapy efficiency [30,31]. Targeted drugs against the TME of AML have been investigated and displayed good potential for clinical applications [32]. Considering the complex regulatory mechanisms of TME, we therefore believed that elucidating a definite impact of microenvironment on AML is still warranted and might provide new thoughts for the targeted treatment in AML.
In this study, the gene expression profiles and corresponding clinical data of AML samples were obtained from the Cancer Genome Atlas (TCGA) [33] and Gene Expression Omnibus (GEO) [34]. Then the immune and stromal scores for AML cases from TCGA were calculated by ESTIMATE algorithm based on gene expression data. A list of microenvironment relevant differentially expressed genes (DEGs) were screened based on the immune and stromal scores. Several prognostic DEGs were subsequently identified with survival analyses. We further constructed a seven-gene signature with predicting value and validated its robustness using another cohort of AML patients from GEO. Finally, a nomogram with common clinical indexes were generated and revealed favorable predictive performance.

Database and processing
The Construction of the TME-related gene signature with the prognostic DEGs The mRNA data from the TCGA cohort were used as the training set.
DEGs with predictive capability that determined by the K-M analysis were incorporated into the univariate Cox regression analysis to identify the prognosis relevant genes. We then performed the least absolute shrinkage and selection operator (LASSO) regression with the glmnet package in R to optimize the COX model. The selected genes were later used for multivariate Cox regression analysis to construct a TME-related gene signature. The risk score of individual patient in the gene signature were calculated as follows: risk score = Gene1 expression*β1 + Gene2*β2 +...+ Genen*βn, in which β represents the regression coefficient of each variable.
Verifying the robustness of the gene signature

Gene set enrichment analysis
We also explored principal pathways that were up-or down-regulated ranged from different expression levels of genes in the risk model by gene set enrichment analysis [39] (GSEA)(GSEA 4.0.3). The javaGSEA Desktop Application was downloaded from https://www.gsea-msigdb.org/gsea/index.jsp.
The enrichment FDR value was identified based on 1,000 permutations, and FDR < 0.05 was considered to be statistical significance.
Deciphering the immune landscape via the CIBERSORT Patients were separated into high and risk groups based on the seven-gene signature, we then deciphered the unique immune infiltration characteristics in the two groups through the CIBERSORT [40] algorithm. demonstrated that patients with low immune scores showed longer overall survival than patients with high immune scores, although this difference showed no statistical significance (Fig. 1G). As for the stromal groups, no revealing difference was noted in the two groups (Fig. 1H).

Identification of differentially expressed genes (DEGs) related to TME in AML patients
To identify differentially expressed genes (DEGs) associated with TME in AML patients, differential expression analysis was conducted after classifying AML samples into low and high immune/stromal score groups. The cut-off criteria was set as FDR<0.05 and |log2fold change (FC)|>1.0. In all, we identified 885 upregulated genes and 450 downregulated genes in the high immune score group, compared with the low immune score group ( Fig. 2A).
Likewise, comparison between the high and low stromal score groups revealed 776 upregulated genes and 369 downregulated genes (Fig. 2B). The DEGs retrived from low and high immune/stromal score groups were presented in the heatmaps respectively (Fig. 2C, D). The intersected genes from both stromal and immune groups were achieved through Venn diagrams for further analysis, including 662 jointly upregulated genes and 196 commonly downregulated genes ( Fig. 2E,F). These crossed DEGs were chosen for further discussion.

Survival analysis and functional annotation of crossed DEGs
To determine the prognostic impact of intersective DEGs on OS, we firstly Establishment and validation of a seven-gene signature based on TME analysis Firstly, we conducted univariate Cox proportional hazard regression analysis among these 181 prognostic DEGs. The resulting 88 genes altogether showed remarkable significance on predictive role (P<0.05; Table S1). These genes were subsequently incorporated into the LASSO regression to avoid overfitting problems (Fig. 5A). The multivariate Cox regression analysis were finally constructed with the screened genes from LASSO regression analysis to establish a gene signature with prognostic value (Fig. 5B, Table S2). The risk score for each patient was calculated as follows: risk score = 0.  (Fig. 5D), indicating the robust ability of the gene signature to distinguish inferior survival events in patients with high risk scores. As the risk score increased, the expression level of IL1R2, MX1, S100A4 and GNGT2 were upregulated while ZSCAN23, PLXNB1 and DPY19L2 were decreased (Fig. 5E). Concomitantly, patients with shorter OS were accumulated in the high risk score group (Fig. 5F).
Another cohort of AML patients from the GSE71014 data was treated as testing group. The AUCs of the ROC curves in the GE71014 for predicting the 1-, 3-and 5-year survival were 0.662, 0.665 and 0.679 respectively (Fig. 5G).
Likewise, as the risk score rises, the expression of IL1R2, MX1, S100A4 and GNGT2 were increased while ZSCAN23, PLXNB1 and DPY19L2 declined (Fig.   5H ). Meanwhile, patients with high risk scores usually kept pace with worse clinical outcomes (Fig. 5I, J). These results verified the efficient performance of the seven-gene signature in prognosis prediction.
Gene set enrichment analysis of genes in the gene signature GSEA was implemented to explore the potential mechanisms of the prognostic effects of the TME-related gene signature. KEGG sets analysis revealed that high expression of IL1R2, MX1, S100A4, GNGT2 and low expressed ZSCAN23, PLXNB1 and DPY19L2 mainly participated in the activation of immune and inflammatory related pathways, such as B cell receptor signaling pathway, T cell receptor signaling pathway, TOLL like receptor signaling pathway and chemokine signaling pathway. Moreover, highly expressed IL1R2 was observed to be closely correlated with cancer pathways, including pancreatic cancer and renal cell carcinoma process (Fig. 6).
CIBERSOT algorithm was further applied to better understand the discrepancy of the immune landscape between the high and low risk score groups.
Interestingly, the result indicated that activated CD4 memory T cells, monocytes presented a higher proportion in the high risk score group, although this dominance revealed no survival influence. As for the low risk score group, there existed elevated plasma cells, CD8 T cells, CD4 memory resting T cells, follicular helper T cells, resting mast cells, activated mast cells and eosinophils (Fig. 7A).
And patients with higher proportion of resting mast cells or activated mast cells were considered to deserve better survival (Fig. 7B).

Correlation of gene signature and clinical characteristics
Patients were divided into high and low risk score groups based on the risk model, and the significance test was subsequently applied to investigate the correlation of the gene signature and clinical factors. As demonstrated in the Table1, more of the elderly patients (P=0.000) or patients with normal or poor karyotype (P=0.002) were classified into the high risk score group. Moreover, the OS of patients in the high risk score group were significantly shorter than that of patients in the low risk group, whether for the median OS or range consideration (P=0.000). We were interested to find that patients with low risk score were observed with higher bone marrow blasts (P=0.003). No significance differences in the gender distribution, FAB monocyte, peripheral blasts percentage and the molecular abnormalities were identified between the low and high risk groups.

Nomogram deduction by integrating gene expression with clinical indicators
We further constructed a nomogram via the Cox regression analyses for clinical utility. This nomogram was comprised of the seven gene transcriptome and several clinical variables, including age, gender, cytogenetics, bone marrow blasts, peripheral blasts, FAB subtypes and molecular aberrations (Fig. 8A).
Both the univariate and multivariate Cox regression models revealed that the the seven-gene signature could act as a powerful biomarker for predictive application (Fig. 8B, C).

DISCUSSION
Accumulating evidence indicates that abnormal actions in the tumor microenvironment are essential mechanisms that impact the development, progression and relapse of AML [20][21][22]. Thus, drugs that specifically target the TME have been studied recently and some of which have already been investigated to improve the clinical efficacy of AML patients [32]. In this study, we aimed to identify the prognostic biomarkers of TME in AML to further enrich the insights of target therapy and provide feasible strategy for clinical practice by analyzing the transcriptome and clinical data retrieved from the TCGA and GEO databases. Previous study have demonstrated that overexpressed S100A4 was highly prevalent in AML and caused disappointing clinical events [50,51]. Although several genes among these still retained limited evidence regarding the impacts on AML, the correlation between these genes and other tumors or immune response are experiencing a universal investigation. High IL1R2 expression was observed in the Hodgkin lymphoma (HL) and was deemed as a novel risk factor in HL [52]. Additionally, previous study have already pinpointed the contribution of IL1R2 in breast tumorigenesis, demonstrating its potential as a drug target [53]. MX1, namely MxA, a kind of GTPase protein that accumulates in the cytoplasm in response to interferon alpha/beta stimulation, displaying antiviral activity against several RNA viruses [54]. MX1 was considered to involve in the tumor invasion and lymphatic metastasis in colorectal carcinoma [55]. GNGT2 has been shown to promote the proliferation of esophageal cancer cells [56]. In our study, the role of PLXNB1 has been viewed as a protective factor.
Conversely, one research revealed that the crosstalk operated by the CD100+ leukemic cells and PLXNB1 might facilitate the proliferation and survival of chronic lymphocytic leukemia cells [57]. The controversial performance might be brought about due to the unique features in the different disorders, and explicit exploration is needed in the future.
As expected, GSEA indicated that high expression of IL1R2, MX1, S100A4, In this study, we firstly explored and identified the TME-related DEGs in AML patients from the TCGA database. Then we investigated the prognostic role of these DEGs. A gene signature with predictive value was later constructed and its efficacy and accuracy was validated by another cohort from the GEO datasets. We finally established a nomogram combing the seven-gene signature with clinical features in favor of supplying diagnostic and prognostic relevant information effectively. Our results suggested that genes involved in the signature showed significant correlation with clinical outcomes of AML and displayed robust potential as biomarkers for targeted therapy in AML.

Conclusion
In conclusion, we explored the prognostic value and pathogenic mechanisms of microenvironment in AML via the transcriptome analysis. A seven-gene signature with precise predictive capability, including IL1R2, MX1, S100A4, GNGT2, ZSCAN23, PLXNB1 and DPY19L2, was established and was combined with common clinical indexes to provide guidelines for individually clinical decisions.
Our work might also refined the insights of AML microenvironment theoretically.   Table 1 Relationship between risk scores and clinical characteristics.
Supplement Table S1 A total of 88 genes showed predictive value via univariate Cox proportional hazard regression analysis.
Supplement Table S2 The coefficents of genes in the prognostic gene signature.