Comprehensive Gene Expression Analysis Reveals Factors that Influence the Progression, Occurrence and Development of AML-M4

Acute myelomonocytic leukemia (M4), a special type of acute myeloid leukemia (AML), is a clonal malignant disease with poor prognosis and a low overall survival rate. Here, we aim to explore the molecular mechanism of AML-M4 development and progression using integrative bioinformatic analysis. We used an integrative method to identify potential driven genes in AML-M4. We firstly identify DEGs using limma packages and annotated them by GO and KEGG. To avoid individual bias, GSEA was adopted to certify the results. Furthermore, we constructed the PPI network using WGCNA which was superimposed onto STRING database. We also assessed the correlation and mutation among hub genes to deeply explore the biological mechanism in AML-M4. Finally, we confirmed our results by experiments. The results show that FLT3, WT1 and TP53, which are involved in transcriptional misregulation were upregulated, while PPBP and CCR7, which regulate cytokine-cytokine receptor interaction, as well as CD24, which acts as a protein marker of AML, were downregulated. 12 hub genes were found through the TCGA an Oncomain analysis, and the results also show that FLT3, CCR7 and MMP-9 can be potential targets for the detection and treatment of AML-M4. and GSEA. The results of the GO analysis show that neutrophil activation, neutrophil degranulation, neutrophil activation involved in immune response, neutrophil mediated immunity and leukocyte migration were the most abundant targets. The results of the GSEA and KEGG analyses show that transcriptional misregulation in cancer and the related signaling pathways of the hematopoietic cell lineage are important factors that influence the occurrence and development of AML-M4. The results of GO, KEGG and GSEA analyses are consistent in showing that neutrophil-related signaling pathways, immune responses and transcriptional misregulations are the key factors related to AML-M4. The results of PPI network analyses are consistent with the results of the GO, KEGG and GSEA analyses. The results indicated that the dynamic balance regulation of functional modules plays important role in the progression of AML-M4, such as leukocyte migration, T cell receptor signaling pathway, neutrophil mediated immunity, neutrophil activation, platelet degranulation, RNA splicing, autophagy, regulation of transcription and translational initiation. blotting assay was used to test verify those DEGs. The results show that pathways involved in immune response, neutrophil mediated immunity, leukocyte migration and transcriptional misregulation may be important factors that affect the occurrence and development of AML-M4. These results also show that FLT3, c-KIT and MMP–9 may act as potential targets for the detection and treatment of AML-M4 and further in-depth research is needed. In general, the results of this study may help to identify critical pathways and genes associated with AML-M4 and provide potential targets and new research ideas for the treatment and early detection of AML-M4.

4 ideas for the treatment and early detection of AML-M4.
Background AML is one of the most common hematological malignancy, AML cells not only were proliferated and accumulated in bone marrow and other hematopoietic tissues, but also infiltrate other non-hematopoietic tissues and organs, as well as inhibit normal hematopoietic function due to uncontrolled proliferation. The clinical manifestations of AML are anemia, hemorrhage, fever caused by infection, enlargement of liver, spleen, lymph nodes and bone pain. [1]. A variety of chemotherapy regimens, biological agents, and stem cell transplantation are the main treatment options for AML [2][3][4]. However, chemotherapy drug toxicity may lead to acute and lifethreatening complications. Compared with standard chemotherapy, allogeneic stem cell transplantation is a suitable method to reduce the risk of recurrence of AML, but also increase the risk of serious complications. Although continuously improved, the traditional method of treatment does not lead to a complete cure or an ideal duration of survival for AML-M4 in clinical practice [5]. Genomics, proteomics and bioinformatics analysis methods have been used to develop new personalized treatment strategies, study of the functions of related biomolecules, and collection of information on emerging trends in genome matching of clinical data are effective methods to improve the prognosis of patients [6][7][8]. Although many studies have analyzed genome variation in AML, the association between genome variation and molecular mechanism of AML-M4 is still unclear. Therefore, a comprehensive study of AML-M4 was urgent.
In this study, the relationship between phenotype and functionally differentiated genes of AML-M4 was studied. The DEGs between 154 AML-M4 specimens and 69 5 normal non-leukemia specimens were studied through GO, KEGG and GSEA analysis of microarray data. The signaling pathways and physiological functional modules that are highly relevant to AML-M4 were analyzed, and the differentially expressed genes were verified and preliminarily analyzed. The objective of this study is to use a bioinformation analysis method to identify critical pathways and genes associated with AML-M4 and to provide potential targets and new research ideas for the treatment and early detection of AML-M4. Microarray data source and pre-processing

Sample collection
The gene expression profiles of AML-M4 were obtained from three data sets, GSE6891, GSE10358 and GSE15061 of the NCBI GEO database, which are based on the Affymetrix HT HG-U133A and HG-U133A 2.0 Array. A total of 223 biochips from AML patients were analyzed, including 154 AML tumor samples and 69 non-leukemia samples. The raw data of the three datasets were downloaded from GEO, and the R package, Simpleaffy, was used for Affymetrix quality control and data analysis [9].
Annotations were made using gene symbols from each respective platform annotations. Then, using Perl scripts, data from all 223 samples were included into 6 a united gene expression matrix. The mean value of gene expression was used in multiple probe sets with a single gene symbol. Normalization and batch correction were performed before the next analysis was conducted.

Functional analysis
The limma package was used to identify DEGs [10], then GO and KEGG analysis were performed using clusterProfiler [11]. An FDR value of <0.01 was considered to be statistically significant. GSEA was utilized to deeply analyze the variation in biological functional and pathways between AML-M4 and non-leukemia samples [12].
The MSigDB gene sets file (c5.bp.v7.0.symbols.gmt) was selected for use as the reference gene set with permutations of 10 4 . The threshold was set at P < 0.05.
PPI Network analysis with WGCNA The 1,084 DEGs between AML-M4 and control samples were retrieved to perform coexpression analysis which integrated WGCNA R package by default parameter [13] and PPI network from STRING database [14]. We used the dynamic tree cut package to lay out the co-expression analysis clusters with the minimum peak at 0.2 for every module [15]. The liability of every module was on the basis of eigengene, and each element in every module was assembled using Pearson correlation among DEGs and their interactors with a cutoff of r of at least 0.7 [16]. Furthermore, we also identified and characterized the system-level features of PPI network using topological overlapping matrix [17]. Ultimately, every unique module was annotated using clusterProfiler package [11] and visualized in Cytoscape [18].

Survival analysis
The clinical data of hub genes in AML prognosis were studied using the survival analysis. We downloaded the 187 TCGA-LAML expression profiles and clinical data from the TCGA database using TCGA biolinks [19]. The 187 AML specimen were divided into two group by expression value of every hub gene using the survminer package for the best separation. Furthermore, the association between every hub gene and overall survival (OS) was carried out using univariate Cox regression. To remove any hub gene that might not be independent factor in prognostic predictors, multivariate Cox regression was performed. A p adjust value of <0.05 was considered to be statistically significant and is indicated in the results.

Quantitative Real-time PCR (qRT-PCR)
In order to confirm the results of former analysis, we performed qPCR to verify them in mRNA level. The total RNA of 24 paired AML-M4 samples and non-leukemia samples were obtained using TRIzol reagent. The interest target genes were then assessed through qRT-PCR using a One-Step qPCR Kit (Invitrogen, USA). All RT reactions were run in a CFX ConnectTM Real-Time System (BIO-RAD, USA), in accordance with the manufacturer's instructions. The results were reanalyzed using 2-ΔΔCT method using GAPDH as an internal control gene [20]. The primer sequences of the target genes are shown in Supplementary Table 1.

Western blotting analysis
The tissues were lysed, and total protein was quantified using the Pierce™ Statistical analysis 8 All experiments involved in this article were replayed at least three times. The student's t test was used to compare the two groups analysis. Data are showed as mean SDs, except when indicated otherwise. A p value <0.05 was considered to be statistically significant.

Results
Identification of differential expression genes (DEGs) and functional In order to farther analyze the trait of DEGs, we performed the functional variation analysis between the two groups using the cluster Profiler package. 107 GO terms were identified with an adjusted FDR of <0.01. We also used the GO SemSim package to remove rigmarole terms, only retain one unique term, which resulted in 49 unique GO terms [9]. The results of the GO analysis show that the most enriched GO targets were involved in neutrophil activation, neutrophil degranulation, neutrophil mediated immune response and leukocyte migration ( Fig. 2A). The KEGG pathway enrichment analysis showed that transcriptional misregulation in cancer, hematopoietic cell lineage, cell cycle, and TH1, 2, 17 cell differentiation were the most significantly affected phases in AML-M4 (Fig. 2B). These results complemented the results of the GO enrichment analysis. To further check on the relationship between two phenotypes and functionally DEGs, we performed a GSEA analysis on gene expression matrix at whole transcription level. The transcripts of AML-M4 were found to be prominently associated with downregulated genes related to three pathways (Fig. 2C).

Integrative module analysis proved new regulatory mechanism
For the sake of comprehensive analysis of AML-M4, we simulated the kinetics of proteome changes as previously described [21]. We integrated WGCNA and PPI network analysis among DEGs to aggregate the relational proteins which were involved in analogous molecular functions or biological actions [22]. As a result, we identified 143 modules with the number of proteins in each ranging from 2 to 25 ( Fig. 3B), and 122 of these modules were highly interconnected by their members (Fig. 3A). Each module was annotated using known functional terms or signaling pathways. We found that many modules, including module 3, 10, 15, 22 and 24 ( Fig.   3C), were notably enriched in hematopoietic system related progression. In addition, module 27 was found to be involved in RNA splicing, module 38 was involved in autophagy, module 54 was involved in the regulation of transcription, while module 83 was involved in translational initiation (Fig. 3D). In summary, the progression of AML-M4 involves the balanced regulation and extensive reprogramming of mutually connected functional modules.

Comprehensive analysis of hub genes in AML
Based on the expression profile and clinical data of 187 AML samples from the TCGA database, the clinical outcomes of the hub genes were determined using the survival analysis. Univariate and multivariate Cox regression analyses were performed to explore the candidate gene which is a prognostic predictor applicable to AML patients. As a result, 6 of the 53 hub genes were found to be significantly associated with poor prognosis, as indicated by their positive or negative correlation with a higher risk by being upregulated or downregulated, respectively in AML (Fig. 4). To further deeply explore the biological mechanism in AML, we assessed the correlation and mutation among hub genes (Fig. 5, 6). These genes may be involved in the regulation of AML progression, and may also have the potential of being candidate biomarkers or drug targets for the disease. Protein level validation using western blotting Finally, we confirmed DEGs at protein level. One of the most widely studied transcription factors in the hematopoiesis of AML-M4 is the leucine zipper ccaatenhancer binding protein alpha (C/EBPα), which is mainly involved in bone marrow cell differentiation. In AML, patients often present with mutations or decreased expression of the C/EBPα gene [23]. CD11b is a member of the family of CAM integrins, which are mainly found in M1, M2, M4 and M5 subtypes of AML [24]. Its role is integral for the development of AML. In this paper, we emphasize the importance of downregulation of the expression of C/EBPα and CD11b in the neutrophil related signaling pathway in AML, as shown in Fig. 8. FLT3, c-KIT and CSF1R are class III receptor tyrosine kinases that play a crucial role in 11 hematopoiesis [25]. The pathogenesis of several malignant tumors are associated with the overexpression of CSF1R, c-KIT and FLT3 [26,27]. In particular, the FLT3 and c-KIT genes have been intensively studied in childhood AML [28,29]. In this study, FLT3, c-KIT and CSF1R protein expression levels were all found to be upregulated in AML, which is consistent with the results of the previous analysis, as shown in Fig. 8.

Discussion
AML is a clonal malignant disease with a poor prognosis and low overall rate of survival. It originates from hematopoietic bone marrow primordial cells. Immature leukocytes grow rapidly and it effect the production of normal blood cells. The median survival time of AML patients is only 5-10 months [1].
Acute myelomonocytic leukemia (M4) and acute monocytic leukemia (M5) are special types of AML and they are manifested in cytogenetics, immunology and clinical processes [30]. The overall survival rate (OS) as a result of traditional treatment (chemotherapy and stem cell transplantation) for AML-M4 is low, and chemotherapy is easily accompanied by complications, while stem cell transplantations are high in cost, with a risk of being rejected [31]. The molecular mechanism of AML-M4 development and progression is not fully understood, and it is particularly important to find new targets and strategies for individualized therapy.
In recent years, genomic bioinformatics analysis of differentially expressed genes in pathological samples has been used to study potential targets to provide an early theoretical basis for the treatment of AML-M4 [32][33][34]. In this study, DEGs between 154 AML-M4 specimens and 69 normal specimens were used to analyze GO, KEGG In Figure 6. Only two genes, DNMT3A and FLT3, have a mutation rate greater than 10% in TCGA-LAML cohort, and DNMT3A was not differentially expressed. Both the GSEA analysis and TCGA analysis show that FLT3 is upregulated in AML-M4, which shows that it plays an important role in the development of AML-M4. FLT3 is considered to be a target of treatment for AML, and at present, the development of clinical targets related with FLT3 is very active. FLT3 is characterized by the presence of five immunoglobulin-like motifs within their extracellular section. These motifs are exclusively expressed in hematopoietic cells [35]. FLT3 mutations occur as secondary events during AML clonal evolution [36]. FLT3-ITD mutation has a negative impact on the prognosis of AML, only a minority of patients with FLT3-ITD 13 mutation in leukemic blasts are cured through chemotherapy.
The GSEA analysis found that CCR7 is downregulated in AML. CCR7 is one of the most important chemokine receptors for adaptive immune cell migration.
Additionally, CCR7 also mediates the expression of signaling molecules, such as Th1 and Th2, that can affect the homeostasis, activation and polarization of T cells [37].
Interestingly, the KEGG analysis found that Th1, 2, 17 cell differentiation is one of the most significantly affected phased in AML-M4, It has been reported that the dysregulation of Th subset cytokines contribute to the pathogenesis of AML, while accumulating evidence also indicates that many diseases, including hematological malignancies, are involved the imbalance ratio of Th1/Th2. The Th1/Th2 balance plays an important role in normal immunity [38]. The study by Georgios Leandros Moschovakis elaborates that early T cell activation and Th1 differentiation may be promoted by activation of CCR7 signaling, while the Th2 polarization was increased after the CCR7 signaling was inhibited [39]. Xu reported that decreased CCR7 expression results in an increase of Th2 responses and increased expression of IgE and IL-13 in the lung [40]. It has been established that Th17 cells participate in certain autoimmune diseases and tumors [41][42][43]. Le Di Eu et al. have demonstrated that T cells had an impaired ability to form immune synapses that are critical for T cell activation [44]. Following their activation, the expression of CCR7 was upregulated by dendritic cells (DCs) [44]. It also reported that the apoptosis in mature DCs was inhibited PI3K/Akt pathway by CCR7 signaling. [45]. CCR7 signaling has also been reported to inhibit apoptosis in mature DCs through the PI3K/Akt pathway [46]. The balanced regulation between CCR7 and DC impact T cell activation. CCR7 may be used as a treatment target of AML-M4 through the regulation of DC and T cell activation. CCR7 may also be a therapeutic target for 14 AML-M4 through its influence on T cell activation, polarization and the imbalance in the Th1/Th2 ratio and the role of CCR7 signaling in T cell homeostasis remains to be further verified.
We also found that most PPI modules are related to CCR7 and MMP-9, indicating that MMP-9 plays an important role in the occurrence and development of leukemia.
MMP-9 belongs to matrix metalloproteinase (MMP) family and plays important role in cell growth, inflammation, angiogenesis and migration [47]. The Amir-Foroushani research team confirmed that MMP-9 is inadequately expressed and highly methylated in AML [48]. Many studies have shown that MMP-9 can be further applied for the detection, prognosis or treatment of AML. For example, Amir-Foroushani reported of a method to distinguish AML from MDS using MMP-9 in 2017 [48]. In our analysis, MMP-9 downregulation was found to be correlated with a higher risk of AML-M4. MMP-9 participates in neutrophil activation and leukocyte migration module, which is related with CXCL12, ITGAM, neutrophil mediated immunity and platelet threshing. It has been suggested that the expression of MMP-9 may be related to the progression and invasiveness of AML-M4. As an antimicrobial gene, Chemokine ligand 12 (CXCL12) is involved in many diverse cellular functions, including embryogenesis, immune surveillance, inflammation response, tissue homeostasis, and tumor growth and metastasis. Relationship of CXCL12 and MMP-9 has been demonstrated to be important for normal hematopoietic cell migration in in vitro studies of normal human hematopoietic cells [49]. MMP-2 and MMP-9 in bone marrow cells can be induced by CXCL12 and then increase the cell migration [50]. MMP-9/MMP-2/CXCL12 plays as regulators in bone marrow cell migration and infiltration. CXCL12/ CXCL12 receptor together was reported as important regulators of AML cell migration in the bone marrow [51].
Several MMPs involved in the processing of CXCL12, MMP-2 and MMP-9 also can inhibit the chemoattractant activity in normal CD34+ HSCs and immature AML cell subsets [52]. Extracellular matrix protein 1 (ECM1) is one kind of secreted glycoprotein which may promote tumor development [53]. MMPs have the ability to degrade essentially all components of ECM. By breaking down ECM, MMPs may remove physical barriers, thus allowing cells to migrate and invade other tissues [54]. ECM1 has been shown to be directly associated with EGFR and is followed by EGF dependent EGFR and ERK activation. EGF was in turn shown to activate ERK signaling at transcription level, through the activation of transcription factor MMP-9 expression that proceeds downstream of these events [53]. ITGAM encodes CD11b and is involved in many inflammatory biological processes, the remarkable feature of ITGAM is its ability to combine various ligands. MMP-9 is associated with ITGAM and participates in neutrophil-mediated immunity. CXCL8 is activated by MMP-9 through proteolytic cleavage [55]. It was reported that relationship between these two mediators can regulate the development of human AML [56]. CXCL8, which can be potential therapeutic target, may regulate the angiogenesis of AML in bone marrow [57]. MMP-9 is also anti-angiogenetic; intertumoral delivery of MMP-9 decreases tumor growth, in particular angiogenesis in breast cancer mice [58]. But there are few reports of its involvement in leukemia. Therefore, it is necessary to understand the molecular mechanism of certain molecules, for which further research is needed.

Conclusion
Overall, we studied the factors that influence the occurrence and development of AML-M4 at the genome level using bioinformatics methods, qRT-PCR and western 16 blotting assay was used to test verify those DEGs. The results show that pathways  Survival analysis. A. TCGA database was used to select 612 hub genes which were correlated