Construction of Immune-Related Prognostic Signatures in Acute Myeloid Leukemia

Background: Acute Myeloid Leukemia (AML) is characterized as a type of hematological malignancy with poor survival. Accumulated evidence showed that dysregulated immune activities contribute to the pathogenesis of AML and accelerate the development of chemotherapy resistance. Thus, we aimed to construct prognostic signatures based on patients’ immune features to sort out the high-risk group and to identify survival-related checkpoint molecules as potential therapeutic targets. Methods: In the current study, we developed two prognostic signatures based on immune genes and inltrated fraction of immune cells, respectively, using a least absolute shrinkage and selection operator model, and Cox regression analysis on 415 samples obtained from TCGA and GEO databases. Results: We found the optimum strategy for predicting patients’ survival is combined using these two prognostic immune-related signatures. Through our established signatures, we classied patients into Favorable Risk group and Poor Risk group, who showed signicantly different OS and DFS. We further demonstrated the checkpoint molecules’ prole in different risk groups. Conclusions: we constructed a powerful prognostic tool here to help classify high-risk patients in early-stage, who may benet from additional immune therapies by targeting identied checkpoint molecules.


Background
Acute Myeloid Leukemia (AML), the most frequent acute leukemia in adults, is a type of hematological malignancy, characterized by proliferation and accumulation of abnormal hematopoietic progenitor cells in the bone marrow. These proliferated nonfunctional leukemia cells interfere with the normal hematopoietic system and then result in life-threatening symptoms such as immunode cient, anemia, and thrombocytopenia [1,2,3]. Although considerable efforts have been devoted, very limited advances have been observed in AML treatment during the past few decades, with < 50% overall 5-year survival rate in AML patients. The most common clinical chemotherapy in AML, consisting of sequential courses of cytarabine and daunorubicin, showed a high rate of resistance and relapse [4].
Although the cause of leukemia hasn't reached an agreement, the proposal can be divided into three groups: (1) the instinct gene mutation and translocation of AML cells, including epigenetic dysregulation; (2) immune dysregulation; (3) bone marrow microenvironment dysregulation [5,6].With blossoming studies revealed that cancer immunity contributes to tumor pathogenesis, many immunotherapy approaches are introduced into AML clinical management and have been proved a promising outcome [7,8,9]. In contrast to solid tumors, AML cells express checkpoint inhibitor receptors and ligands, which can be potential targets in clinical therapies [10]. For better improving the outcome of AML patients, there is an urgent need for sorting out high-risk AML patients and providing them additional therapeutic schemes. Therefore, in this study, we investigated 415 AML cases with survival data from two independent cohorts, The Cancer Genome Atlas (TCGA) and GSE 12417, and then used ImmPort database and CIBERSORT method to generate two different immune-related prognostic signatures to predict the outcome of AML patients and sort out the most survival-related immune checkpoint molecules [11,12]. These newly constructed signatures will help clarify AML patients into different prognosis and those high-risk patients may bene t from additional treatments targeting selected checkpoint molecules.

Study population and immune gene sets
We obtained RNA expression and associated AML patients' clinical information from two public datasets, The Cancer Genome Atlas (TCGA, https://gdc.nci.nih.gov) and Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). TCGA data consists of 173 cases while GSE12417 data contains 242 samples. GSE12417 data was obtained using the GPL96 platforms (Affymetrix Human Genome U133A Array, 163 samples), GPL97 platforms (Affymetrix Human Genome U133B Array, 163 samples) and GPL570 platforms (Affymetrix Human Genome U133 Plus 2.0 Array, 79 samples). The genes detected with more than one probe in GSE12417 were calculated by mean expression. The immune-related gene sets were obtained from the ImmPort database (http://www.immport.org/).

Immune genes related Signature Construction
We rst matched RNA expression of TCGA samples with obtained immune-related gene sets under the standard that all genes' expression should > 0 in at least half samples. Those low expressed genes which unmeet the standard were ltered out. Then we generated a univariate Cox proportional regression model to analyze the relevance between each immune gene and AML patients' overall survival time (OS). Those signi cant OS-related immune genes were then analyzed by the least absolute shrinkage and selection operator (LASSO) method to generate most survival predictive immune genes in AML patients. Next, we further conducted a stepwise Cox proportional hazards regression model on those LASSO selected genes and generated a risk score formula on basis of genes' expression and Cox regression coe cients: Risk Score = (gene1 expression *gene1 coef) + (gene2 expression *gene2 coef) +… (geneN expression *geneN coef).

Immunotype Signature Construction
In this study, we used the CIBERSORT method to analyze the abundance of 22 types of tumor-in ltrating immune cells in the TCGA cohort. CIBERSORT is a bioinformatics tool that employs a deconvolution algorithm to quantify the cellular composition of in ltrated immune cells by RNA expression. Then the LASSO Cox regression model was conducted to determine the most OS-related immune cells in AML patients. The immunotype related prognostic formula was generated based on the LASSO model: Immunotype Index = Cell1 fraction × Cell1 coef + Cell2 fraction × Cell2 coef+…+ Cell5 fraction × Cell5 coef.
Finally, the GSE12417 cohort was used to validate the prognostic ability of the constructed immunotype signature.

Survival analysis
To assess the prognostic value of each constructed model, we plotted receiver operating characteristic (ROC) curves and obtained area under the curve (AUC) of ROC using SPSS software. The maximum Youden Index corresponded risk score was set as the cut-off value, with which patients were categorized as high risk or low risk. Then Kaplan-Meier plots were employed to show patients' Overall Survival (OS) and Disease-Free Survival (DFS) between different risk groups. The statistical signi cance was tested by the log-rank test. P < 0.05 was considered a signi cant difference in all statistical settings.

Function enrichment analyses
We used Database for Annotation, Visualization and Integrated Discovery (DAVID, http://davidd.ncifcrf.gov/) to analyze the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of AML signi cant OS-related immune genes.

Construction of immune genes signature
We rstly constructed an immune gene signature based on the TCGA cohort. We obtained all gene expression data and clinical information of 173 patients and summarized their clinicopathologic characteristics in Table 1. This TCGA cohort is comprised of 92 males and 81 females, whose median age is 58 years old. As for cytogenetic risk classi cation, 31 patients (18.5%) are identi ed as "Good", 101 patients (58.38) are "Intermediate" and 37 patients (21.395) are "Poor", and with three patients are "unclear". Among whom, 73 patients (42.2%) underwent different types of transplant therapy. The median overall survival time is 17.1 months, ranging from 0 to 118.1 months and 114 patients (65.9%) die during the follow-up. For exploring the prognostic value of immune genes in AML patients, we employed a univariate Cox proportional regression model to assess the correlation of immune genes that are acquired from the ImmPort database, with OS in the TCGA cohort. Figure 1a shows the expression pro le of 207 immune genes which were statistically signi cant related to OS. We then conducted GO and KEGG analysis to enrich those signi cant immune genes to related biological processes and pathways. Figure 1b and 1c showed that they primarily activate cytokine-cytokine receptor interaction and immune response. Then we used the LASSO Cox regression method to further select the most predictive immune genes in AML patients. With one SE above the minimum criteria in LASSO Cox regression, we got 15 genes in the model: ANGPTL3, CALCRL, CANX, CLEC11A, CRLF3, CSRP1, FAM19A5, FGF11, FGFR1, IL1RAP, MPO, PLXNB1, PPP3CB, SOCS1, WFIKKN1 (Fig. 1d, 1e).
2. Immune genes signature's predictive value on OS and DFS in the TCGA cohort Next, we used a stepwise Cox proportional hazards regression to further assess the prognostic value of these 15 immune genes. Five genes were nally identi ed and a formula was derived on the basis of the ve genes: Immune Risk = (CALCRL expression × 0.000398) -(CLEC11A expression × 0.000070) -(CRLF3 expression × 0.001053 ) -(PLXNB1 expression × 0.000757) + (SOCS1 expression × 0.003623). ROC curve was generated to evaluate the prognostic ability of this derived formula. As shown in Fig. 2a, the AUC index for the integrated model was 0.776 (p < 0.0001). Then, the maximum Youden Index corresponded Immune Risk, -2.201, was set as the cut-off value. We classi ed patients of TCGA cohort into two groups: High Immune Risk (HIR), and Low Immune Risk (LIR). Kaplan-Meier survival analysis showed that the OS of patients in LIR was signi cantly longer than HIR (p < 0.001) (Fig. 2b). In addition, LIR patients had longer Disease-Free-Survival (DFS) time (p < 0.001) (Fig. 2c).
As shown in Fig. 2d, the mRNA expressions of ve prognostic immune genes were signi cantly different (p < 0.001) between HIR and LIR groups. We then further analyze the correlation between Immune Risk scores and FAB subtype (Fig. 2e) and cytogenetic risk (Fig. 2f). What's interesting is that the Immune Risk score was signi cantly positively associated with cytogenetic risk(p < 0.001). Then we further evaluated the prognostic value of Immune Risk in different cytogenetic risk patients. Similarly, Kaplan-Meier survival analysis showed that patients in the LIR group have signi cantly longer OS than HIR, in which Immune Risk showed the best prognostic value in the "good" cytogenetic risk group (p < 0.001. Figure 2g).

Constructed Immune Risk signature poorly predicts the outcome of GSE cohort
To validate the prognostic ability of Immune Risk, we calculated the Immune Risk score of each patient in the GSE12417 cohort (the detailed clinicopathologic characteristics was summarized in Table 2). With the same formula, patients were classi ed into the LIR group and HIR group base on the maximum Youden index (corresponded Immune Risk = -1.28). ROC curve was generated and as shown in Fig. 3a, the AUC index for the integrated model was only 0.52 with p = 0.6216. There was also no signi cant difference in OS between the two Immune Risk groups (p = 0.22) (Fig. 3b). According to these results, the formula solely derived from survival-related immune genes exhibited the poor potential to predict patients' outcomes in the GSE cohort. Considering the immune system, which consists of various immune cells and factors, may play a complicated role in AML pathogenesis and disease progression, combined analyzing survival-related immune genes with AML microenvironment in ltrated immune cells may show greater prognostic ability.

Construction Of Immunotype Related Signature
Then we used the CIBERSORT method to evaluate the fraction of immune cell types of AML patients in the TCGA cohort. The immune cells pro le was summarized in Fig. 4a. Every patient showed different immunotypes, which can be regarded as an instinct feature of their tumor environment and may exert prognostic potentials. Next, we conducted a univariable Cox analysis to assess the relevance of immune cells fraction with patients' overall survival. As shown in Table 3, the ve most relevant immune cell types are T cells CD4 memory activated, Eosinophils, Macrophages M2, Mast cell resting, and T cells regulatory (Tregs). Then we generated Immunotype Risk Index by the LASSO Cox regression method on the basis of the fraction of these immune cells. With one SE above the minimum criteria in LASSO Cox regression, the coe cients were calculated when Logγ = 2.375 (Fig. 4b, 4c). A formula was generated based on the ve in ltrated immune cells: Immunotype Risk Index = Mast resting ×35.36-T CD4 memory activated ×463.51-Tregs × 253.99-Macrophages M2 ×47.64-Eosinophils × 92.53. In addition, we compared the in ltrated immune cells between former classi ed HIR and LIR group, Fig. 4d showed a signi cant difference in the fraction of those ve in ltrated immune cells.

Immunotype Signature Associates With Aml's Os And Dfs
Next, ROC curves were employed to determine the predictive e ciency of the constructed immunotype model. As shown in Fig. 5a, the AUC index was 0.703 with p = 0.0001. Kaplan-Meier survival analysis showed patients in the Low Immunotype Risk (LITR) group have longer OS (p = 0.02) and DFS (p = 0.047) than High Immunotype Risk (HITR) group (Fig. 5b, 5c). Then we also validated the immunotype model in the GSE cohort. Figure 5d showed ROC curves which indicated that the AUC index was 0.6021 with p = 0.0079. Kaplan-Meier survival analysis was generated and showed patients in the LITR group lead signi cantly longer OS than the HITR group (p < 0.001, Fig. 5e). All the results indicated that the immunotype model possessed better prognostic ability than the former constructed Immune Risk model.
6. Combination analysis of immune genes signature and immunotype signature to maximum their predictive e ciency As in ltrated immune cells and immune process activation all count for tumor pathogenesis, we wonder whether combine the two models can maximize their predictive abilities. First, we respectively calculated each patients' risk score according to the Immune Risk model and Immunotype Risk Index model. Then we selected patients who rated low risk in both models and labeled them as Favorable Risk. Meanwhile, we selected those who rated high risk in both models, and labeled them as Poor Risk in TCGA cohort and GSE cohort (Fig. 6a, 6d). Kaplan-Meier survival analysis was generated to assess the prognostic ability of the merged analysis model. As shown in Fig. 6b, 6c, and 6e, patients in Favorable Risk group lead signi cantly better outcomes than Poor Risk group on the basis of OS time and DFS time in both cohorts.

Discussion
AML is a type of hematological malignancy with poor clinical outcome [15]. Although the therapy success depends on patients' age and leukemia type, the limited advances in treatment regime largely resulted in the unsatis ed outcome of AML patients [16]. AML patients bene t from these non-targeted cytotoxic drugs, but chemotherapy intolerance also is a big problem [17]. Besides, AML cells sheltered in the tumor microenvironment, where immune activities were dysregulated, resulting in cancer cells' self-renewal and developing drug-resistant [18,19,20,21]. Thus, early identify high-risk patients and provide them additional immune targeting therapies may have promising results in clinical management. So, in this study, we constructed two different immune-based prognostic signatures to sort out high-risk patients and identi ed their most survival-related checkpoint molecules. Those dysregulated expressed immune genes may have therapeutic potential in AML clinical practice.
In the current study, we included 415 AML patients' mRNA expression and corresponded clinical information obtained from TCGA and GEO databases to construct prognostic predicting signatures. Firstly, based on the TCGA cohort, we constructed a ve-immune genes survival signature, including CALCRL, CLEC11A, CRLF3, PLXNB1, and SOCS1. A majority of these genes are suggested to contribute to leukemia pathogenesis [22,23]. For example, CALCRL expression was positively associated with the engraftment capacity of primary AML patient samples in mice and higher SOCS1 expression was an independent poor prognostic factor for non-M3 and CN-AML patients [24,25].
With this immune-genes related signature, we divided TCGA cohort patients into high risk and low risk groups, who showed signi cantly different OS and DFS. What's more, we found patients' Immune Risk score is signi cantly and positively associated with their cytogenetic risk classi cation. Under different cytogenetic risk conditions, the Immune Risk signature statistically correlated with patients' survival. However, when we validated this signature in the GEO cohort, ROC curves demonstrated this signature has very low predictive e ciency. And there is no survival difference between patients who were classi ed into different risk groups by the signature.
The above results indicated that analyzing AML patients' survival sorely relies on dysregulated immune genes expression and subsequently impaired immune activation is not that convincing. The role immunity system played in the AML tumor microenvironment is too complicated to be simpli ed into different gene [26,27,28]. So, we took different in ltration of various immune cells into concern as many studies have reported that the fraction of in ltrated immune cells in the tumor microenvironment is so unique that could be regarded as patients' instinct feature to predict tumor progression [29,30].
Thus, we used the CIBERSORT method to evaluate the fraction of in ltrated immune cells and generated an immunotype-related signature, which included ve immune cells: T cells CD4 memory activated, Eosinophils, Macrophages M2, Mast cell resting and T cells regulatory (Tregs). With the immunotyperelated signature, we classi ed patients into high risk and low risk. Different risk group patients, both in the TCGA cohort and the GEO cohort, showed signi cantly different outcomes.
For comprehensively understanding the role of immunity in predicting AML survival, we further classi ed patients into Favorable Risk and Poor Risk by combining analysis of dysregulated expressed immune genes and in ltrated immune cells. Patients' OS and DFS showed signi cant differences between Favorable Risk group and Poor Risk group, which indicates that the combined analysis of these two prognostic signatures optimized their predicting ability.
Finally, we compared the different expression of checkpoint molecules in Favorable Risk and Poor Risk groups. We found only CD274 and PDCD1 expression signi cantly correlated with AML patients' OS in the TCGA cohort, while in the GEO cohort, neither of those dysregulated expressed checkpoint molecules was associated with patients' survival. Based on the fact that the pathogenesis of AML is very complicated, we considered that combined analysis of different risk factors to predict patients' prognosis is more reliable and all those differently expressed checkpoint molecules may serve better as potential therapeutic targets.
Although predicting AML patients' survival with these constructed signatures is promising, there are also some limitations for implying them into clinical management. As those signatures are generated from two retrospective cohorts and enrolled amount is limited as well, larger prospective cohorts are needed to validate the prognostic value of signatures. What's more, whether these identi ed checkpoint molecules could act as potential targets need to be further explored by mechanism studies.

Conclusion
In summary, we generated two immune-associated signatures to predict the prognosis of AML patients and sort out associated checkpoint molecules as potential therapeutic targets. Our study provides new insight into the relevance between the immune pro le and AML patients' survival. The constructed signatures help identify high-risk patients who may bene t from additional immune-based therapy.

Declarations
Ethics approval and consent to participate The data used in the study were obtained from public databases TCGA and GEO, therefore, ethical approval was not required.

Consent for publication
Not applicable.

Availability of data and material
The datasets supporting the conclusions of this article are available in TCGA at https://gdc.nci.nih.gov and GEO at https://www.ncbi.nlm.nih.gov/geo/. These data were derived from the following resources available in the public domain: GSE12417.