Identication of prognostic biomarkers and potential regulatory axes for acute myeloid leukemia based on a competitive endogenous RNA network

Acute myeloid leukemia (AML) is a common heterogeneous hematological malignancy with unclear pathogenesis and high mortality. Non-coding RNAs have recently received extensive attention. This study explored the potential mechanisms of interaction during the development of AML by analyzing a competitive endogenous RNA (ceRNA) network. To obtain differentially expressed genes (DEGs), the transcripts and clinical AML data were downloaded from The Cancer Genome Atlas (TCGA) database. Bioinformatic analysis was used to predict the function annotation and RNA interaction. The data were used to construct a ceRNA regulatory network and survival analysis was used to discover key genes and predict potential regulatory axes. Quantitative Real-time PCR (qRT-PCR) was used to detect DEGs in HL-60 and THP-1 cell lines to verify the prediction.


Introduction
Acute myeloid leukemia (AML) is an aggressive hematological malignancy with a complex heterogeneous background, characterized by rapid proliferation of immature myeloid leukemia cells [1,2].
The prognosis of AML is poor with its complex pathogenesis, which is not completely clear until now. Therefore, identi cation of related genes and clari cation of pathogenesis and potential therapeutic targets of AML is critical.
Long non-coding RNA (lncRNA) is a series of non-coding RNA with a length of more than 200 nucleotides [3]. Increasing number of studies have shown that these transcription "noises", which were previously regarded as having no biological function, play critical roles in several biological activities of eukaryotes, such as chromatin modi cation, post-transcriptional processing and nuclear transport [3][4][5]. MicroRNA (miRNA), a class of endogenous non-coding RNAs, maintain the balance of gene regulatory networks by binding to speci c 3'UTR sites of messenger RNA (mRNA) [6]. It has been demonstrated that miRNAs play an important role in the pathogenesis of many diseases [7][8][9]. In addition, the ceRNA hypothesis suggests that the miRNA response element can act as a sponge for competitive binding of different types of RNA molecules, including lncRNA and mRNA, to regulate their interactions.
Recently, high-throughput sequencing technology has been widely used to identify cancer-related genes, providing a new perspective for exploring the genetic mechanisms of cancer [10]. The competitive endogenous RNA (ceRNA) mechanism proposed by Salmena et al. [11] suggests that transcription products such as lncRNA and mRNA can be used as ceRNAs to regulate gene expression levels and affect cell function by competing with miRNAs. Subsequently, several studies have reported that lncRNA, as a molecular sponge of miRNA, regulates mRNA expression and plays a role in tumorigenesis. For example, LncRNA SMAD5-AS1 can upregulate APC expression through sponging miR-135B-5p and inhibit proliferation of diffuse large B-cell lymphoma through the Wnt/β-catenin pathway [12]. Lnc-SNHG1 can promote the progression of non-small cell lung cancer by acting as a sponge of miR-497 [13]. Therefore, the discovery of the lncRNA-miRNA-mRNA network may provide a new language for RNA communication and a comprehensive understanding of the etiology and pathogenesis of cancer.
In this study, through comprehensively integrating the gene and RNA expression data of AML, differentially expressed genes (DEGs) were screened out and the AML-related ceRNA regulatory network was established. Six genes signi cantly associated with survival were identi ed. Finally, we predicted the CTB-193M12.1/hsa-mir-206/NFAT5 regulatory axis. Experimental veri cation was performed to validate the key genes. This study may increase understanding of the molecular mechanism and provide new therapeutic concepts for AML.

Materials And Methods
The Cancer Genome Atlas (TCGA) data retrieval The transcriptome datasets of 324 AML patients and 150 normal subjects were retrieved from the TCGA database. Data types for AML patients and normal subjects included read counts of Gene Expression Quanti cation, FPKM, and Transcriptome Pro ling miRNA Expression Quanti cation. The TCGA database includes standardized RNA expression pro le data. The Ensemble o cial website was applied to annotate and separate lncRNA, miRNA, and mRNA, and the controversial RNAs were subsequently discarded.

Identi cation of DEGs
The transcriptome and miRNA data from the TCGA dataset were combined into a matrix. The differences of RNAs in AML were analyzed using the edgeR software package, which is based on the R language. A P-value lower than 0.01 (P < 0.01) and a fold change lower than 2 were used as thresholds to lter and screen differential genes. The "ggplot2" and "pheatmap" packages in the R software were used to generate volcano maps and thermal maps, respectively. This was done to demonstrate the differential expression of three gene sets.

Functional annotation
To elucidate the biological characteristics and processes of DEGs, Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were analyzed via the Over-Representation method of the "clusterPro ler" package in R software [14]. The hypergeometric test method was adopted with a P < 0.05 as the threshold; the functional pathways that met the conditions were then sequenced.
Afterward, the 10 signi cant biological pathway results were visualized.
The "cytoHubba" plugin was used to identify the central genes and their sub-network. The R package "ggalluvial" was used to con rm the ceRNA axes.
Survival and prognosis analysis TCGA provided clinical information and survival data for patients with AML. The data were analyzed with the "survival" package in the R software. The "survminer" package was used for determining the most suitable cut-off value of the gene expression levels in the survival analysis. P < 0.05 was signi cantly.
Differential genes with a signi cant impact on survival were screened. Moreover, the GEPIA website was used to query the expression of DEGs, the relationship between different lncRNAs and NFAT5 was analyzed using R software.
Cell Culture THP-1 and HL-60 cells were generously gifted by Professor Wenjiao Chang from the First A liated Hospital of China University of Science and Technology. Cells were maintained in RPMI1640 medium containing 10% heat-inactivated fetal bovine serum at 37°C in a 5% CO2 atmosphere. The inoculation concentration was 2-3×10 5 cells/ml, which was transmitted every 2-3 d to maintain logarithmic growth.

Human peripheral blood mononuclear cells (PBMCs) from 15 normal persons were collected from the Health Examination Center of the First A liated Hospital of Anhui University of Traditional Chinese
Medicine as the control group [20,21]. Written informed consent was obtained from all patients. This study was approved by the Hospital Ethics Committee.

GO and KEGG pathway analysis of differently expressed mRNAs
The biological characteristics and functional annotations of differentially expressed mRNAs were described by enriching on GO and KEGG, respectively. GO enrichment analysis indicated that differentially expressed mRNAs were signi cantly enriched in extracellular structure organization, extracellular matrix and receptor regulator activity (Figure 4a). Moreover, KEGG enrichment analysis revealed that multiple enrichment pathways, such as neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, and axon guidance (Figure 4b).
Construction of a ceRNA network in AML A total of 720 potential lncRNA-miRNA regulatory axes were obtained by intersecting the screened differentially expressed miRNAs with the predicted results. After screening the predicted results, 1535, 21068, and 2031 regulatory axes were found in TargetScan, miRDB, and miRTarBase, respectively. The acquired results were intersected to obtain 37 high con dence miRNA-mRNA regulatory axes. Finally, the ceRNA regulatory network was constructed with 164 lncRNA-miRNA-mRNA axes and the RNA interaction relationships in AML were predicted ( Figure 5). Sanky diagram drawn by "ggalluvial" package in R software was used to show the regulatory relationship of ceRNA ( Figure 6).  (Figure 7). Thus, the survival-related lncRNA-miRNA-mRNA regulatory axes were obtained by searching the target genes of lncRNAs in the ceRNA regulatory network. Notably, among the aforementioned regulatory axes, a strong correlation was found between CTB-193M12.1 and NFAT5, which provided a direction for the next prediction of potential ceRNA regulatory axes.
Identi cation of a potential regulatory axis CTB-193M12.1/hsa-mir-206/NFAT5 was predicted as a potential regulatory axis of the ceRNA network for the following reasons: analysis in the GEPIA website showed that CTB-193M12.1 and NFAT5 is a luxury gene speci cally expressed in AML patients (Figure 8a, b). A total of 31 tumor types with speci c expression of this gene were identi ed by the GEPIA website, and the conclusion showed that CTB-193M12.1 is speci cally overexpressed in AML patients. Then, the interaction between CTB-193M12.1 and hsa-mir-206 was demonstrated using an extremely dependable database (LncBase). Similarly, a total of three highly reliable databases (miRTarBase, miRDB, and TargetScan) were used to predict the interaction between hsa-mir-206 and NFAT5. Furthermore, survival analysis proved that the high expression level of CTB-193M12.1 was signi cantly correlated with the low survival rate. And the expression level of NFAT5 was extremely high in AML patients (Figure 8b). Ultimately, the GEPIA website was used to determine the correlation between CTB-193M12.1 and NFAT5, and the obtained results were compared with the data analysis result. The outcome demonstrated that the expression of CTB-the correlation analysis result and the ceRNA theory (Figure 8c). In summary, the data suggested that CTB-193M12.1 and NFAT5 expression were signi cantly increased and positively correlated with each other, while hsa-mir-206 was signi cantly decreased in AML patients. These ndings are consistent with the "ceRNA hypothesis", which suggests that lncRNAs indirectly regulate the expression of mRNAs via modulating the expression of miRNAs. Therefore, we conclude that the CTB-193M12.1/hsa-mir-206/ NFAT5 axis might become a potential regulatory axis of the ceRNA network and may have a profound effect on the progression of AML (Figure 9).

Experimental validation
The expression levels of hsa-mir-206 and NFAT5 in HL-60 cells, THP-1 cells, and 15 normal PBMC samples were detected by qRT-PCR. We discovered that the expression level of hsa-mir-206 was signi cantly decreased while the expression level of NFAT5 was moderately increased in HL-60 cells and THP-1 cells ( Figure 10). The results of qRT-PCR validated the TCGA data and also provided basic evidence for our conjecture.

Discussion
It has been elucidated that several lncRNAs, miRNAs, and mRNAs are differentially expressed in AML, and the ceRNA network might be an instrumental knot in the regulation of tumorigenesis [23]. Therefore, it is urgent to identify key genes and understand the pathogenesis of AML. This study used a series of methods to identify and analyze the data downloaded from TCGA and GTEx databases and verify the obtained results.
Here, the differentially expressed genes were obtained by analyzing the downloaded data. GO and KEGG enrichment analysis revealed that differentially expressed RNAs and notable pathways of enrichment were all associated with the progression of AML. A ceRNA network constructed by 164 regulatory axes was screened. A total of six lncRNAs (ZBTB20-AS4, ZMYM4-AS1, RP11-402L5.1, CTB-193M12.1, KIZ-AS1, and CTD-2260A17.1) were identi ed from the constructed ceRNA network through survival analysis, which were signi cantly associated with overall survival rate (P < 0.05) (Fig. 7). Patients with high expression of CTB-193M12.1 indicated poor prognosis (Fig. 7a). The correlation analysis showed that the interaction between CTB-193M12.1 and NFAT5 was signi cantly strong (r = 0.82, P < 0.01) (Fig. 8c). A potential CTB-193M12.1/hsa-mir-206/NFAT5 regulatory axis with the highest correlation was constructed ultimately, which was basically in line with the ceRNA theory (Fig. 9). The results of qRT-PCR validated the differentially expressed of hsa-mir-206 and NFAT5 (Fig. 10). The analysis aimed to provide new ideas for a comprehensive and deeper understanding of the molecular mechanism of AML.
LncRNA, a functional RNA molecule that plays a role in various biological functions of the human body, contributes signi cantly to the aforementioned regulatory network. Since studies focused on lncRNAs are still limited, only three have o cial human genome nomenclature committee symbols (ZBTB20-AS4, ZMYM4-AS1, and KIZ-AS1) and none have been reported. However, cluster analysis showed that CTB-193M12.1 and NFAT5 expression were increased in AML patients while hsa-mir-206 was decreased (Fig. 3). Our bioinformatics analysis results also revealed a strong correlation between CTB-193M12.1 and AML (Fig. 7a). Besides, several studies indicate that abnormal expression of lncRNAs is often associated with the development of AML [3,24]. Therefore, it is reasonable to suggest CTB-193M12.1 as a regulatory gene of AML.
As a binding site for multiple RNAs, miRNA are also indispensable in tumorigenesis. Interestingly, our results revealed that of 163 lncRNA-miRNA-mRNA regulatory axes, 145 were regulated by hsa-mir-206, which indicated that hsa-mir-206 may be the strongest regulator in the pathogenesis of AML. Recently, studies on hsa-mir-206 have become a hot topic, but studies on AML are still lacking. Notably, we found that a recent study by Chen et al. [25] on occupational asthma (OA) proposed a regulatory axis similar to that in this study. They pointed out that miR-206-3p, as a key factor in calcineurin/NFAT signaling in macrophages and bronchoalveolar lavage cells, affects the transcription of iNOS and thus regulates the development of OA. This proven hsa-mir-206/NFAT regulatory axis correlates with our ndings. Further, Chen et al. also used the THP-1 cell line. Therefore, it is reasonable to suggest that hsa-mir-206 can act as a key site in regulating the development of AML by mediating the expression of NFAT5. Moreover, it has been reported that low expression of hsa-mir-206 is closely associated with the poor prognosis of pediatric AML patients [26]. This is consistent with our nding that the downregulation of hsa-mir-206 predicts poor prognosis in AML. It is noteworthy that our study was conducted in adults, which not only veri ed the above view but also provided powerful evidence that hsa-mir-206 may have an impact on the pathogenesis of AML. Also, hsa-mir-206 plays an important role in other cancers. For example, downregulation of miR-206 leads to poor prognosis in endometrial cancer [27] and promotes the occurrence and development of laryngeal cancer [28]. High expression of miR-206 inhibits osteosarcoma cell proliferation [29], as well as the proliferation and migration of prostate cancer cells [30]. Overall, these results suggested that hsa-mir-206 can regulate multiple cancer processes and might play an important role in AML through the hsa-mir-206/NFAT5 regulatory axis.
Similarly, NFAT5, which is a target gene of hsa-mir-206 in the ceRNA network, has been shown to regulate multiple tumor processes. For example, NFAT5 can act as a ceRNA regulated by circFOXO3, plays a role in glioblastoma through sponging miR-138-5p and miR-432-5p [31], and may also regulate the progression of colon cancer through the MALAT1/miR1295p/NFAT5 axis [32]. Here, results from our investigation and the GEPIA website both showed that NFAT5 has a strong relationship with CTB-193M12.1 (Fig. 8c) and that the low expression of NFAT5 can lead to a higher survival rate. Moreover, one study also found that the downregulation of NFAT5, which is regulated by an upstream gene, led to a more optimistic prognosis in lung cancer [33]. This demonstrates that high expression of NFAT5 can lead to poor prognosis in cancer and may also participate in the progression of multiple cancer types as a ceRNA. Therefore, it is reasonable to suggest that NFAT5 may regulate the progression of AML through the CTB-193M12.1/hsa-mir-206/NFAT5 ceRNA regulatory axis proposed in this study.
Recently, studies on the association of ceRNA with AML have emerged, but most of them only focused on the association between lncRNA dysregulation and AML [34,35]. Among a study aimed at pediatric AML, although the biomarkers based on the ceRNA network have been analyzed [36], whether the biomarkers suggested in the study applies to adult AML still need further validation. Another research proposed by Wang et al. [37] constructed a coexpression ceRNA network and identi ed several cancer-related genes, but the interactional relationships between differentially expressed genes still not been analyzed. Our current research aimed to establish a more integrated regulatory network and select the most relevant regulatory axis to an in-depth study, in order to identify several prognostic biomarkers and understand the new "language" of communication between RNA transcripts from multiple dimensions in AML.
However, there are still some limitations in our study. The interaction between RNAs has not been experimentally con rmed. It is necessary to create experiments to verify the CTB-193M12.1/hsa-mir-206/NFAT5 regulatory axis in future studies.

Conclusions
In conclusion, a potential regulatory axis was constructed, which revealed the correlation between genes and may ful ll valuable functions in the biological process of AML. Several new prognostic and predictive biomarkers were identi ed, which may provide a new direction for revealing the pathogenesis of AML.

Consent for publication
The recruited personnel have signed written informed consent.

Availability of data and materials
The RNA and miRNA expression pro les and clinical follow-up data used during the present study are obtained from a public database of TCGA. Moreover, the datasets used to support the ndings of this study are included within the article, and are available from the corresponding author upon reasonable request.          The regulatory axis CTB-193M12.1/hsa-mir-206/NFAT5, which consistent with the ceRNA hypothesis.