Identification of Acute myeloid leukemia-driven genes by multi-dimensional approach

Background Acute myeloid leukemia (AML), which is characterized by the uncontrolled proliferation of myeloid leukemia cells in the bone marrow and other hematopoietic tissues, and is highly heterogeneous. While with the progress of sequencing technology, understanding of the AML-related biomarkers are still incomplete. The purpose of this study is to identify potential biomarkers for diagnosis of AML. Methods Based on WGCNA analysis of gene mutation expression, methylation level distribution, mRNA expression and AML-related genes in public databases, were employed for investigating potential biomarkers for prognosis of AML. This study screened a total of 6,383 genes by analyzing various changes in 103 acute myeloid leukemia (AML) samples, including gene mutation expression, methylation level distribution, mRNA expression, and AML-related genes in public databases. Moreover, seven AML-related co-expression modules were mined by WGCNA analysis and twelve biomarkers associated with the AML The AML samples were then classified into two subgroups, the prognosis of which is significantly different, based on the expression of these twelve genes. The differentially expressed genes are mainly involved in glucose metabolism, glutathione biosynthesis, small G protein-mediated signal transduction, and the Rap1 signaling pathway. With the utilization of WGCNA mining, seven gene co-expression modules were identified from TCGA database and there are unreported genes that may as potential driven genes of AML and may be the direction to identify the possible molecular signatures to predict survival of AML patients and help guide experiments for potential clinical drag targets.


BACKGROUND
Leukemia is a malignant clonal hematopoietic stem cell disease, characterized by the uncontrolled proliferation of leukemia cells that accumulate in the bone marrow and other hematopoietic tissues, thereby inhibiting the normal hematopoietic function and infiltrating tissues such as lymph nodes, liver and spleen. [1]. Based on the pathogenic cellular origin, the acute leukemia can be divided into acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). AML is the most common adult type, it is characterized by uncontrolled proliferation of primary and immature myeloid cells in bone marrow and peripheral blood, and is highly heterogeneous.
Hematopoietic transformation leads to changes in many key genes during the proliferation and differentiation of bone marrow [2]. Important regulatory gene changes may be associated with the pathogenesis of AML, which are potential target for AML treatment [3]. At present, the French-American-British classification is still widely used clinically, which classifies AML into eight subgroups (M0-M7) according to the degree of differentiation and morphological characteristics of myeloid progenitor cells [4,5]. AML is characterized by genetic heterogeneity, with different clinical behaviors observed even in the same subgroup [6]. With the progress, the complete response (CR) rate and disease-free survival (DFS) rate have slightly improved in AML patients, but the prognosis remains poor, especially in refractory or recurrent AML [7]. The 5-year survival rate of total AML patients is 21.4%, with median survival time of 5-10 months, however, it remarkably reduces to 1% if the patients older than 75 years. The CR rate after initial standard induction therapy is about 60-70% in AML patients younger than 60 years. About 30-40% of patients eventually relapse, and their average overall survival (OS) is approximately 3.8 months [8]. The recurrence rate is significantly increased in AML patients with abnormal chromosomes or genes, and the prognosis for those patients is particularly poor. Moreover, chemotherapy resistance and minimal residual disease after remission also affect the prognosis of AML patients [9].
Currently, AML treatments include chemotherapy, hematopoietic cell transplantation and target therapy, the induction therapy is mainly based on cytarabine combined with anthracyclines, and the consolidation treatment is mainly based on high-dose cytosine combined with hematopoietic cell transplantation. Although extensive studies of AML in the past forty years, the outcome of patients has not improved yet [10] . In recent years, with the progress of sequencing technology, we have a deeper understanding of the gene mutations in AML, and have found a variety of AML-related genes [11,12]. It has been reported that 20% and 30% patients have Fms-like tyrosine kinase-1(FLT3) and isocitrate dehydrogenase 1/2( IDH1/2) mutations, respectively [13,14].
The first-generation FLT3 inhibitors include sunitinib (SU11248), midostaurin(PKC412), lestaurtin(CEP701) and tandutinib. In a phase I trial, the old AML patients with FLT3 mutation have achieved high CR rate by sunitinib treatment [15], another four AML patients with FLT3/ITD mutation have achieved pathologic CR or pathologic partial response (PR) by sunitinib monotherapy [16]. However, the first-generation of ELT3 inhibitors is not perfect, such as high recurrence rate, high adverse reactions and poor tolerance, which have limited their application [17][18][19]. The second-generation FLT3 inhibitors include quizartinib, sorafenib and midostaurin, although the efficacy is significantly improved, the high drug resistance cannot be ignored. Moreover, a study showed that the AML patients had 50% probability of recurrence with 3 months after remission. Therefore, the efficacy of FLT3 inhibitor monotherapy in AML patients with FLT3/ITD mutation is not satisfactory, as well as in recurrent or refractory AML(R/R AML) patients [20]. IDH1/2 mutation sites include IDH1-R132, IDH2-R140 and IDH2-R172 [21]. However, current clinical trials suggested unsatisfied effect of IDH1/2 inhibitors in AML. The outcomes of phase I/II clinical trial showed 41.6% and 40.3% objective remission rate (ORR) in relapsed or refractory AML patients with IDH1/2 mutations by Ivosidenib(AG-120) and Enasidenib(AG221), respectively [22,23]. Among them, 20.2% patients achieved CR, the median duration of remission was 3.7 months, the adverse reactions included hyperbilirubinemia (8%) and IDH-inhibitor-associated differentiation syndrome (7%).
The above studies have suggested that small molecule inhibitors had great therapeutic potential in treatment of AML, it is characterized by strong specificity, low toxicity and good tolerance. However, so far, the studies of AML-related genes are insufficient. The patients with FLT3 and IDH1/2 mutations accounts for 50% of total AML population, and not all patients response positively to these treatments, in another word, a small number of patients benefit from FLT3 and IDH1/2 inhibitors. Therefore, further investigation of AML pathogenesis is urgently needed in order to find more potential target genes, providing the theoretical basis for development of new drugs.
According to cytogenetic abnormalities, the prognosis of AML patient is generally classified as favorable, intermediate or unfavorable [24]. However, approximately 40% of AML patients have no identifiable cytogenetic abnormalities detected by molecular cytogenetics and fluorescence in situ hybridization (FISH) methods [25], they are classified as intermedium risk, but the overall survival of these patients varies widely. AML is a highly heterogeneous disease, investigation of prognostic value of biomarkers is one of the research hotspots. Based on the analysis of single-omics data, researchers identified numerous AML-related factors. As a complex regulatory system, disease occurrence usually involves genetic variation, epigenetic changes, and abnormal gene expression. Therefore, it is very important and meaningful to screen prognostic biomarkers of AML through integrated analysis of multi-omics data.

MATERIALS AND METHODS
1. TCGA data download AML mutation data was downloaded via the R package TCGA biolinks [26] https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html.The AML data was downloaded from the websitehttp://firebrowse.org/, including SNP6 CopyNumber segment data and methylation chip data (platform for Illumina 450K chip). The mRNA and miRNA expression profile data as well as sample clinical data were downloaded from the TCGA website https://portal.gdc.cancer.gov/. In total, 103 patients' data from five data sets (mutation, CNV, methylation, mRNA, miRNA) were analyzed.
2. Healthy blood sample expression profile data download All normal tissue sample gene expression profiles were taken from 407 whole blood samples from the GTEx release version V7https://gtexportal.org/home/datasets.

Mutation gene analysis
MutSigCV was used to analyze high-frequency mutation gene, with most mutations occurring at the conserved sites. The analysis was performed by the corresponding MutSigCV module in the online analysis tool GenePattern [27] https://cloud.genepattern.org/gp/pages/index.jsf developed by the Broad Institute.

3.2Analysis of mutation signature
The following six basic mutation types appear in cancer: C>A, C>G, C>T, T>A, T>C, T>G. To investigate the different types of mutations, the researchers proposed the concept of mutational signatures [28], which is based on six basic types of mutations, followed by one base upstream and one base downstream of the mutation. There are four cases of A, T, C, and G for each base, so a total of 4*6*4=96 mutation types may be obtained. The frequency of these 96 mutation types in different cancers is different. The mutational signatures of AML can be obtained after non-negative matrix decomposition of the frequencies of 96 mutation types in each sample. Herein, the R package maftools [29] https://bioconductor.org/packages/release/bioc/html/maftools.html and SomaticSignatures [30] https://bioconductor.org/packages/release/bioc/html/SomaticSignatures.html were used to perform mutation signature analysis and the samples were then unsupervised hierarchically clustered based on the contribution of each label to observe the clinical features of the subset of samples with different mutation characteristics.

4.Somatic cell copy number variation (CNV) analysis
The GISTIC method was used to detect common CNV regions in all samples, including chromosome arm horizontal CNV and the smallest common region between samples. The GISTIC method parameters were set as: Q ≤ 0.05 as the change significance criterion; when determining the peak interval, the confidence level was 0.95; when analyzing the chromosome arm horizontal variation, the region was greater than the chromosome arm length 0.98 as the standard. The analysis was performed using the corresponding GISTIC module in GenePattern.

5.Difference analysis
Data correction, standardization and differential gene analysis were performed on the expression profiles of AML samples and healthy whole blood samples using DESeq [31] in the R algorithm.
6. Heterogeneity analysis The mean and standard deviation of each methylation site or miRNA expression value in the methylation expression profile and the miRNA expression profile were calculated, followed by calculation of the CV value to screen for sites or miRNAs with high heterogeneity. The calculation method of the CV value is: CV=Standard deviation / Average.

Identification of candidate gene sets
The differentially expressed genes and the AML-related genes identified from the GENE database, the OMIM database and the KEGG database were jointly examined to screen at least one gene for the study. The gene annotated in the high heterogeneity methylation site analysis and the significant mutation gene obtained by MutSigCV analysis were further added to expand the scope of the study, and finally obtain the candidate gene. The expression profile data of the candidate gene set in AML samples in TCGA was selected for the next step of weight co-expression network analysis.
8. WGCNA analysis WGCNA [32] is a systematic biology method that uses gene expression data to construct a scale-free network. First, the similarity matrix is constructed using gene expression data, that is, the absolute value of the Pearson correlation coefficient between the two genes(i and j) is calculated using Eq. (1), where x i and y j represent the expression values of the i-th and j-th genes, respectively: Then, Eq. (2) is used to transform the gene expression similarity matrix into adjacency matrix, where β is the soft threshold, which is actually the Pearson correlation coefficient β power of each gene pair: This step can strengthen the strong correlation and weaken the weak correlation from the index level. The next step is to convert the adjacency matrix into a topological matrix using Eq. (3). The topological overlap measure (TOM) is used to describe the degree of association between genes: min( 1-TOM indicates the degree of dissimilarity between gene i and gene j. Hierarchical clustering of genes using 1-TOM as a distance is performed, then using the method of dynamic cut tree for module identification.
Using the WGCNA package of the R algorithm, a weighted co-expression network was constructed on the expression profile data of the candidate gene set obtained in the previous step. The modules related to AML survival time were screened, and the inter-subnets were constructed for the co-expressed gene sets in the module, then candidate core genes of AML were screened according to the network node degree.

9.Cluster analysis
All cluster heat maps were completed using the R package pheatmap, and the clustering method selects ward.D.

10.Identification of miRNAs that regulate the module
The miRNA-mRNA interaction relationship was included in the starBase v2.0 database http://starbase.sysu.edu.cn/targetSite.php. The database was used to screen the interaction between differential miRNAs and module genes, and the interaction network between differential miRNAs and module genes was constructed to observe differential miRNAs and module core genes regulating the relationship. The miRNAs with significant regulatory effects on the modules were then screened by hypergeometric analysis based on the interaction between the miRNA and the module gene.
11. Survival analysis Based on the clinical data compiled by TCGA, Kaplan-Meier analysis of candidate core genes of AML was performed to identify genes that had a significant impact on patient prognosis (P<0.05).
12.Function and pathway enrichment analysis GO and KEGG enrichment analysis was performed on the gene set of interest using the R package clusterprofiler [33] https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

Sample and clinical data
The study included a total of 103 patients with clinical information (Supplementary Table 1) and multiomics analysis data (including mutation results, CNV results and methylation expression profile data, mRNA expression profile data and miRNA expression profile data). The mRNA expression profile data of 407 healthy whole blood samples was also downloaded from the GTEx database, and the flowchart was shown in Figure1.

Identification of significant variant genes
Numerous somatic mutations occur during the development of cancer, and mutations in a few genes, so called 'driving genes' directly promote tumorigenesis and development. Driving gene prediction was performed using MutSigCV for mutation data from 103 tumor samples. At the significance threshold q<0.05, a total of 378 candidate driving genes were screened, including DNMT3B, NPM1 and other genes with a high proportion of mutations in AML (see Supplementary Table 2).

Mutation characteristics
We studied the point mutation information of 103 samples. According to the mutation site of each sample, we considered the bases 1 bp upstream and downstream of the mutation site and divided the mutations into 96 types according to the above classification method. The frequency distribution in the 103 tumor samples was counted and the distribution is shown in Figure 2A.
The frequencies of the 96 mutation types appearing in each sample were further subjected to non-negative matrix decomposition to extract the existing mutation characteristics. The four mutation characteristics extracted are shown in Figure 2B, which are highly similar to signature27, signature1, signature6 and signature19 in the cosmic database. Signature6 is associated with a DNA mismatch repair system, and signature1 is associated with spontaneous deamination of 5-methylcytosine.
The distribution of each mutation feature in different samples (see Supplementary Table 3 for analysis results) was then performed for unsupervised hierarchical clustering (see Figure 2C). In general, the three subgroups in the AML sample have different mutation characteristics, with the ratio of signature 1 being the highest in one subgroup, and signature 3 being the highest in the other subgroup (see Figure 2D). The sample classification was performed from the mutation characteristics alone. The FAB classification results were significantly different in the distribution of M2-M5 (p-value = 0.04715) in the subgroups. The distribution of clinical features is shown in Figure 2E.

Somatic cell CNV
The GISTIC analysis of 103 samples detected a total of two copies and eight copy number deletions of concentrated copy number variants (MCRs) (see Supplementary Table 4; Figure  3A-3B). Among them, the most significant areas of amplification were 11q23.3，21q22.2 etc.; the most significant of the missing areas were 5q31.3，12p13.2 and so on. The unsupervised hierarchical clustering for the CNV distribution of these samples can divide AML samples into two sub-categories (see Supplementary Table 5 and Figure 3C for classification results), and one of them has significant CNV changes.
According to the classification of SNP and CNV data, the classification results of each sample are shown in following Figure 3D, revealing that the classification effects of different platforms have certain consistency. The classification results are shown in Supplementary Table 6. 4. DNA methylation CV values were calculated for each methylation site of the 103 cancer samples. A total of 1,441 highly heterogeneous methylation sites were screened with CV values greater than >2.5, 779 genes were annotated, and 143 genes were differentially expressed, including SMAD1，SLC1A2 and the like. The heterogeneity analysis results are shown in Supplementary Table 7. The distribution of these heterogeneous methylation sites is shown in Figure 3E.

Differential expression gene analysis
Differential gene analysis was performed on the mRNA expression profiles of 103 cancer samples and 407 control samples. When the significance threshold was 0.05, |logFC|>6, 4734 differentially expressed genes were screened and the results are shown in Supplementary Table 8.

Differential gene expression and mutation relationship
The effect of each mutation site on the gene is different, so the mutations of each gene in all samples and their effects on the genes were counts and categorized as Frame_Shift_Del， Frame_Shift_Ins， In_Frame_Del， Missense_Mutation， Nonsense_Mutation， Nonstop_Mutation， Splice_Site，Translation_Start_Site. Statistical analysis using t-tests showed that there were significant differences in the distribution of differential and non-differential genes in Splice_Site (p-value = 0.01), indicating that SNV has a certain effect on gene expression. The results of the mutation classification are shown in Supplementary Table 9, and the statistical results are shown in Supplementary Table 10.

Candidate gene set
The differentially expressed genes and the GENE database and OMIM database, as well as the AML-related genes identified in the KEGG database were jointly examined (genetic results are shown in Supplementary Table 11). The relationship between the genes included in the database and the differential genes is shown in Figure 4A, of which only 4,504 genes were identified in differential analysis, which may be associated with the onset of AML.
In order to further expand the scope of investigation, 779 genes annotated in the analysis of highly heterogeneous methylation sites and 378 significant mutations obtained by MutSigCV analysis were added to obtain a candidate gene set of 6,383 genes (see Supplementary Table12). The AML sample expression profile data of the 6,383 genes in TCGA was then used for the next weight co-expression network analysis.

Weighted co-expression network construction of candidate gene sets
The construction of a weighted co-expression network for candidate gene sets was performed using R's WGCNA software package. The co-expression network conforms to the scale-free network, that is, the logarithm log(k) of the node with the connection degree k is negatively correlated with the logarithm log(P(k)) of the probability of the node, and the correlation coefficient is greater than 0.8. To ensure that the network is a scale-free network, the optimal β=10 was used (see Figure4B) . The average-linkage hierarchical clustering method was used to cluster genes(see Figure4C), obtaining fourteen modules (see Supplementary Table 13).
To determine the association of gene modules with prognostic risk ( Favorable ， Intermediate/Normal，Poor), we calculated the Pearson correlation coefficient (higher for the module) and the significance P for the corresponding correlation for each module and sample prognosis risk, respectively. The results of gene module characteristics and phenotypic correlation coefficients are shown in Table 1. Subsequently, the gene significance (GS) value of each gene module was calculated, the larger the GS value, the more relevant the module is to the prognosis. Through the correlation analysis and GS value calculation, the last seven modules related to AML were selected, which are tan, red, salmon, black, green, purple, greenyellow (see Figure4D and  Supplementary Table 14 ). Based on the expression relationship of the genes in the seven co-expression modules analyzed above, we constructed a co-expression network for each module separately (see Supplementary Table 15 for the analysis results), then counted the degree of each network node, the degree of node indicates that there are connections with several genes in the network, the higher the degree, the more important the node is in the network (see Supplementary Table16). A small number of genes in the co-expression network have obvious concentration. After the expression changes, the expression of these genes will interact with other nodes adjacent to them to produce co-expression, which will affect downstream biological functions. Therefore, these have higher node degrees and the gene is likely to be a key gene in the development of AML. Finally, we chose the node top10 genes as the key gene in the co-expression network, which is the key gene related to AML. 6. miRNA expression profiling 6.1 Heterogeneous miRNA analysis CV values were calculated from miRNA expression profiles of 103 cancer samples. A total of 568 highly heterogeneous miRNAs were screened at CV > 2.5 (see Supplementary Table 17).

miRNA-mRNA analysis
The miRNA-mRNA interaction pairs of the number of supporting experiments ≥ 3 and number of cancer types ≥1 from the starBase v2.0 database http://starbase.sysu.edu.cn/starbase2/mirMrna.phpwere screened. Then, the hypergeometric test method was used to identify the miRNAs that regulate the AML-related modules, revealing that there were 72 miRNAs that have significant interaction with the above four modules (see Supplementary Table 18).
Regarding the regulatory relationship between highly heterogeneous miRNAs and the core genes of the modules and modules, a total of ten highly heterogeneous miRNAs interacted with the above seven modules (see the Supplementary Table 19), of which two were high. Heterogeneous miRNAs regulate two core genes of the module, which are miR-346 and ID4， miR-449a and FIP1L1-PDGFRA, respectively.

Survival analysis
To determine whether the 7 key genes associated with AML (7 modules of the top 10 gene) were significantly associated with prognosis, we combined the clinical data of 103 samples in the TCGA to determine the expression of these genes in the sample. Kaplan-Meier survival analysis was performed, when the significance threshold was set to 0.05, a total of twelve module core genes were screened for prognosis (survival curves see Figure5A and Supplementary Table 20 for results).Survival analysis was also performed on the previously analyzed genes that are both significant mutations and seven modules combined with their mutations in the sample, when the significance threshold was set to 0.05, no significant mutations and prognosis were related.

Driving gene-mediated AML molecular typing
Unsupervised clustering of cancer samples was performed using the expression of twelve prognostic-related genes screened in the previous step in 103 AML samples (see Figure5B and  Supplementary Table 21). As shown, all AML samples can be divided into two categories (see Figure5C and Supplementary Table 22), and the prognosis of the two subgroups is significantly different.
To further analyze the differences between these key genes in different subpopulations, the mean expression of each gene in each sample type was used as the expression value of the gene in this category. After t-test analysis, these twelve genes were associated with prognosis, of which seven (HOXB-AS3， HOXB3， SLC9C2， CPNE8， MEG8， S1PR5， MIR196B) were differentially expressed in the two subgroups. The distribution of seven genes see Figure6A and the analysis results in Supplementary Table 23.
According to the drug-target interaction relationship included in the drugbank database, the seven genes were separately labeled with drugs, revealing that one gene (S1PR5)is a target gene of one drug (Fingolimod) (see Supplementary Table 24).

AML subgroup functional differences
Differential genetic analysis was performed on the two types of AML samples obtained in the previous step. When the significance threshold was 0.05, 2293 differentially expressed mRNAs were screened out (see Supplementary Table 25 for the analysis results). Functional and pathway enrichment analysis of these genes (GO and KEGG enrichment analysis see Figure6B and Figure  6C;see Supplementary Table 26), showed that these genes are mainly involved glucose metabolism, glutathione biosynthesis, small G protein-mediated signal transduction, and the Rap1 signaling pathway.

DISCUSSION
We applied WGCNA to analyse public dataset comprising 103 AML patients to identify the genes associated with prognosis indicators. In this study, seven distinct gene modules and twelve biomarkers associated with the AML prognosis from 6383 genes that satisfied standard of co-expression analysis were identified. The 103 AML samples could be classified into two subgroups which is significantly different in prognosis.
To further analyze the differences between these key genes in different subpopulations, the mean expression of each gene in each sample type was used as the expression value of the gene in this category. After t-test analysis, these twelve genes were associated with prognosis, of which seven (HOXB-AS3， HOXB3， SLC9C2， CPNE8， MEG8， S1PR5， MIR196B) were differentially expressed in the subgroups. Some studies have shown that several of these seven genes are associated with the development or prognosis of AML. It has been previously reported that HOX genes have an important role in the development of AML malignant phentype and inhibition of HOX/PBX dimer formation leads to necroptosis in AML [34]. In addition, down regulation of HOXB-AS3 can inhibit cell proliferation. In AML, patients with high HOXB-AS3 expression have shorter survival than patients with low HOXB-AS3 expression [35]. More than 30% of AML patients have a type III receptor tyrosine kinase FLT3 mutation, and HOXB2 and HOXB3 are novel regulators of oncogenic FLT3-ITD-driven AML [36]. Hsa-miR-196b has been shown to directly target HOXA9 / MEIS1 oncogenes and FAS tumor suppressors in MLL rearranged leukemia [37].Recent analysis showed that HOXB3 is an important homeobox protein which contributes to leukemogenesis by a novel miR-375-HOXB3-CDCA3/DNMT3B regulatory circuitry in AML [38]. In addition, SLC9C2，CPNE8，MEG8，S1PR5 have not been found to be associated with the prognosis of AML. This indicates that among the seven genes we analyzed, there are unreported genes that can be used as potential driven genes for AML.
Our study identified the miRNAs that regulate the AML-related modules, heterogeneous miRNAs regulate two core genes of the module, which are miR-346 and ID4，miR-449a and FIP1L1-PDGFRA. It has been reported that FIP1L1-PDGFRA is a kinase translocation which could be cancer driver, resulting in eosinophilias and leukemias [39]. However, the role of miR-449a in leukemogenesis has yet to be elucidated. Previous research has shown that as an independent prognostic biomarker in AML, aberrant miR-335/ID4 expression contributed to AML development by activation of PI3K/Akt signaling pathway [40]. It would be worth exploring the interaction and biological functions of miR-346/ID4 and miR-449a/FIP1L1-PDGFRA in AML. More work will be necessary to explore these possibilities.
According to the drug-target interaction relationship included in the drugbank database, the seven genes were separately labeled with drugs, revealing that one gene (S1PR5)is a target gene of one drug (Fingolimod). Current treatment options for AML include protein phosphatase 2A (PP2A) activators [41]. An earlier meta analysis indicate that with caution of infections or cardiac toxicity, fingolimod could be a potential treatment for elderly, low tumour burden AML patients in combination with hydroxyurea or subcutaneous cytarabine [42]. Fingolimod (FTY720, Gilenya; Novartis, Basel, Switzerland) is a known PP2A activator that inhibits multiple signaling pathways like Ras/ERK, PI3K/Akt, JNK, JAK/STAT and Wnt/β-catenin pathways [43] and is currently used in patients with multiple sclerosis [44]. As a tumour suppressor [45], fingolimod demonstrated inhibition of tumor proliferation and in-vivo leukaemogenesis [46]. However, how fingolimod activates PP2A remains unclear, it has been reported that fingolimod might interacts with PP2A inhibitory proteins such as SET oncogene directly [47]. Our results may suggest a possible drag-target interaction.

CONCLUSION
With the utilization of WGCNA mining, seven gene co-expression modules were identified from TCGA database and there are unreported seven genes( HOXB-AS3，HOXB3，SLC9C2， CPNE8， MEG8， S1PR5， MIR196B) that could be used as potential driven genes for AML. Indeed, in vivo/vitro experiments and multicenter randomized controlled clinical trials should be performed to evaluate the possible application of molecular signatures to predict survival of AML patients and further research of these hub genes may contribute to molecular targeted therapy of AML and help guide experiments for potential clinical drag targets.