Identification of IRGs and TFs in TCGA cohort
We obtained gene expression profiles from TCGA, and after analysis, a total of 10375 DEGs were identified (Fig. 2A). Cancer is a multi-step process and requires the activation of transcription factors (TFs) for growth and survival. Many of the TFs reported so far are critical for carcinogenesis(Vishnoiet al., 2020). Transcription factors have been reported to play an important role in regulating different aspects of cancer immune function by regulating the expression of immune-related genes(Palazonet al., 2014;Yuet al., 2009༛Sun 2017). Therefore, we identified transcription factors and immune-related genes in the differential genes. 607 IRGs were selected by matching IRGs to DEGs (Fig. 2B). After that, 129 TFs were identified as significantly different by matching TFs to DEGs. Among these, 83 TFs were upregulated (Fig. 2C). To discover the potential prognostic significance of each upregulated TFs, we performed a univariate Cox hazard regression analysis. The expressions of 15 TFs were found to be significantly associated with AML patient survival (Fig. 2D). The DEGs constructed a PPI network based on the STRING database using Cytoscape software to explore the underlying mechanism between the TFs and IRGs (Fig. 2E).
Development Of Prognostic Signature
Then, to avoid potential overfitting, LASSO Cox regression analysis was used to select the key OS-related TFs for modeling (Fig. 3A). As a result, a total of four TFs (BATF, PRDM1, RFX5, STAT4) were identified and selected to develop a prognostic signature. According to the median risk score, patients were divided into high-risk and low-risk groups (Fig. 3B). As the risk score of patients increased, the number of deaths increased (Fig. 3B). The heatmap of gene expression between high-risk and low-risk groups in the TCGA cohort was shown (Fig. 3B). The AUC of our signature for 1-year, and 3-year and 5-year OS were 0.79, 0.81 and 0.95, respectively, indicating the high predictive capacity of the signature (Fig. 3C). The Kaplan-Meier analysis showed that patients in the high-risk group had significantly shorter OS than patients in the low-risk group (Fig. 3D).
Figure 3. Prognostic analyses for AML patients. (A) LASSO coefficient profiles of the genes. (B) Development of the prognostic signature based on 4-OS-related TFs. (C)The AUC curves of the signature for 1,3 and 5 years. (D) OS of AML patients in high- and low-risk groups.
Degs Between High-risk And Low-risk Groups Were Involved In Regulating Immune Responses
The results above show the difference in OS between patients in the high-risk and low-risk groups. The two groups were further analyzed to obtain DEG by using the limma package, 499 genes were up-regulated and 179 genes were down-regulated (Fig. 4A). The DEGs existed in the KEGG pathway and biological processes related to immunity, such as cytokine-cytokine receptor interaction, leukocyte migration, and positive regulation of cytokine production. These data suggest that these DEGs may play a role in regulating the immune response in the progression of AML.
Figure 4. DEGs between the high-risk and the low-risk groups. (A) Volcano plot of the DEGs. (B) Significantly enriched KEGG pathway of the DEGs. (C) Significantly enriched gene ontology biological process of the DEGs. (D) Significantly enriched gene ontology cellular component of the DEGs. (D) Significantly enriched gene ontology molecular function of the DEGs.
A High Level Of Tregs In The High-risk Group Was Associated With Stat4
To determine the relationship between the TFs and the immune infiltrating cells, we used the CIBERSORT algorithm to estimate the difference between the low-risk and high-risk groups. A barplot shows the proportion of 22 kinds of immune infiltrating cells in AML cases (Fig. 5A). We found that the high-risk group was positively related to Tregs, which are related to immunosuppression, and are negatively related to gamma delta T cells (Fig. 5B), which are related to strong cytotoxic and kill a broad range of tumor cells (Sebestyenet al., 2020). We performed Spearman correlation analyses to explore the relationships between immune infiltrating cells (Fig. 5C) and TFs (Fig. 5D). Significant results are marked with color for the degree of correlation, the results showed a significant correlation between these STAT4 and Tregs (Fig. 5D), and STAT4 may be the main contributor to the level of Tregs.
Validation Of Prognosis-related Genes In External Aml Cohorts
GEPIA was used to validate the expression and impact on survival of these four prognosis-related genes, BATF, STST4, RFX5 and PRDM1 in AML (Fig. 6A, B). The effect of genes on patient prognosis was verified by testing the prognosis of patients' OS in external AML cohorts (GSE5122 and GSE12417). The results confirmed that patients in the gene high expression group had a worse prognosis (Fig. 6C).
Analysis Of The Target Irgs Regulated By Stata4
The correlation test between STAT4 expression and immune cells is presented, and it was found that macrophage and NKT were negatively correlated with STAT4 expression, while inducible Treg (iTreg) was positively correlated with STAT4 expression (Figs. 7A). The above results suggest that STATA4 affects immunity, and the downstream target genes of STATA4 will be identified next. We obtained a set of immune genes regulated by STATA4 through the ChEA3 website, which integrates a collection of gene set libraries generated from multiple sources including TF-gene co-expression and TF-target associations(Keenanet al., 2019), combined with co-expression analysis to identify possible transcriptional target genes of STATA4. The results of the enrichment analysis of these target genes demonstrate their involvement in immune system processes, immune response, defense response, and natural killer cell-mediated cytotoxicity (Fig. 7B, C). In addition, we performed a survival analysis of these genes showed that most of them were unfavorable for patient survival (Fig. 7D). These genes, which are involved in tumor immunity and are unfavorable to patient survival, may be downstream genes of STAT4 affecting patient prognosis.
It is well known that the transcription level of genes is related to the methylation level of promoter regions, to further infer the reason for the high expression of STAT4 in AML the TCGA database was used to analyze the expression of STAT4 and its methylation status of promoter regions. We found that STAT4 methylation expression in the low-risk group was significantly higher than that in the high-risk group (Fig. 7E), and Pearson correlation analysis showed that STAT4 DNA methylation was negatively correlated with mRNA expression (Fig. 7F).
Validation In Clinical Tissue Samples
The STAT4 expression in different FAB classification was showed in Figrue8A. To validate the bioinformatics analysis results, 10 healthy controls and 22 AML patient samples were studied to analyze the clinical relevance of STAT4 expression. The STAT4 expression in AML patients was significantly higher than that in healthy control group (Fig. 8B). And the expression of STAT4 was positively correlated with the proportion of abnormal cells in AML (Fig. 8C) which were similar to the bioinformatics results. WT1 (Wilms tumor 1) /ABL was defined as an indicator of AML recurrence (Polaket al., 2012). We observed that STAT4 expression levels positively correlated with WTI/ABL, suggesting that STAT4 promotes AML recurrence (Fig. 8D).