Identication of an EMT-related lncRNA signature in glioma

Background Epithelial-mesenchymal transition (EMT) has been implicated in the invasion and progression of gliomas, the present study investigated EMT-related long noncoding (lnc)RNAs in gliomas. Methods Candidate lncRNAs screened from a glioma dataset in The Cancer Genome Atlas were used to construct EMT-related lncRNA coexpression networks in order to identify EMT-related lncRNAs; these were validated using the Chinese Glioma Genome Atlas (CGGA). Gene set enrichment analysis and principal component analysis (PCA) were used to annotate lncRNA functions. We established an EMT-related lncRNA signature composed of 9 lncRNAs (HAR1A, LINC00641, LINC00900, MIR210HG, MIR22HG, PVT1, SLC25A21-AS1, SNAI3-AS1, and SNHG18) that could distinguish between low- and high-risk glioma patients. The functions of the identied lncRNAs were evaluated by constructing a competing endogenous RNA network. dehydrogenase, The signature was an independent hazard ratio=1.806). These ndings were validated in the CGGA dataset. PCA also revealed differences in EMT status between low- and high-risk patients. Competing endogenous RNA network showed that complex lncRNA–microRNA–mRNA interactions potentially contributing to EMT progression in glioma.


Introduction
Gliomas are the most common type of malignant brain tumor in adults [1]. Especially, the glioblastoma (WHO grade IV) is the most invasive and aggressive glioma subtype with the highest mortality rate [2; 3].
It is said that epithelial-mesenchymal transition (EMT) is closely related to glioma malignancies [4; 5], and we have known that lncRNAs orchestrate multiple cellular processes by modulating EMT in a variety of tumors cells [6]. Even though the detailed mechanisms of lncRNAs interact with EMT to promote glioma's development and progression are not fully understood, their elucidation is critical for developing effective diagnosis and treatments.
The transdifferentiation of epithelial cells into motile mesenchymal cells-a process known as epithelialmesenchymal transition (EMT)-plays an important role in development, wound healing, and stem cell behavior, and also contributes to the pathology of brosis and cancer progression [7]. Tumor cells undergoing EMT acquire resistance to apoptosis and antitumor drugs while disseminating to distant sites [8]. Thus, EMT is a hallmark of carcinogenesis, and strategies targeting EMT pathways have therapeutic potential for cancer treatment [9].
Long noncoding (lnc)RNAs are RNA molecules longer than 200 bases that do not encode proteins [10; 11] and have been shown to function as master regulators of gene expression in normal biological processes and in diseases such as cancer [12]. As an important component of competing endogenous (ce)RNA networks, lncRNAs act as micro (mi)RNA sponges to regulate protein encoding and translation in cancer [13; 14]. For example, Upregulated TMPO antisense transcript 1 is a ceRNA that sponges miR-140-5p in gastric cancer cells, thereby alleviating inhibition of SRY-box transcription factor 4, an EMT regulator [15].
Although lncRNAs related to EMT have been identi ed [16], little is known about their role in glioma. The lncRNA-microRNA-mRNA ceRNA networks link the function of protein-coding mRNAs with that of noncoding RNAs such as microRNA, long non-coding RNA, pseudogenic RNA, and circular RNA. We learned that LINC01133 inhibits gastric cancer progression and metastasis by acting as a ceRNA for miR-106a-3p to regulate adenomatous polyposis coli (APC) gene expression and the Wnt/β-catenin pathway [17]. And the SMAD5-AS1 target miR-135b-5p to inhibits diffuse large B cell lymphoma proliferation in the same pathway [18]. So, the deregulation of ceRNA networks may an important factor in lead to human diseases including cancer [13].
In this study, we used data from The Cancer Genome Atlas (TCGA) to establish an EMT-related lncRNA signature for glioma that was subsequently validated using data from the Chinese Glioma Genome Atlas (CGGA). We also established a lncRNA-mediated ceRNA network to investigate the downstream targets and mechanisms of EMT-related lncRNAs (Fig. 1).

Patients and datasets
Clinical information of glioma patients was obtained from TCGA (https://cancergenome.nih.gov/) and CGGA (http://www.cgga.org.cn) websites; the 2 datasets were used for training and validation, respectively. Clinicopathologic information of the study population is shown in Fig. S1.

LncRNA pro ling
The lncRNA pro le was determined using an established mining method [19]. Brie y, genes were identi ed as protein-coding or noncoding based on their Refseq IDs, and only long noncoding genes in the NetAffx annotation les were retained. A total of 14,142 lncRNAs were obtained from the TCGA dataset, among which 215 were EMT-related as determined by gene set enrichment analysis (GSEA) (http://www.broadinstitute.org/gsea/index.jsp) [20]. Ultimately, 162 EMT-related lncRNAs from TCGA and 152 EMT-related lncRNAs from CGGA were retained to construct EMT lncRNA coexpression networks (P ≤ 0.001) using Cytoscape v3.7.0 software (The Cytoscape Consortium, San Diego, CA, USA).

Establishment of a lncRNA signature
To establish a lncRNA signature for predicting survival, univariate Cox analysis was carried out with the 2 datasets. We screened lncRNAs with prognostic signi cance and identi ed 16 EMT-related lncRNAs that were closely associated with glioma patient survival (P < 0.01). We then assigned a risk score using the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm [21; 22; 23] and de ned the inclusion criteria for the 9 lncRNAs and their constants, selecting the optimal penalty parameter β for a minimum 10-fold cross validation in the training set. The risk score was calculated with the following formula: Risk score = βgene1*exprgene1 + βgene2*exprgene2+……+ βgenen × exprgenen where exprgenen is the expression level of a lncRNA. Low-and high-risk groups were distinguished based on median risk score.

Construction of a ceRNA network
The interaction of lncRNAs, miRNAs, and mRNAs in glioma was predicted based on overlapping miRNA seed sequence binding sites in lncRNAs and mRNAs. The interaction between the 9 identi ed EMT-related lncRNAs and miRNAs was rst predicted using the MiRcode database, which provides wholetranscriptome human miRNA target predictions based on the GENCODE annotation and includes > 10,000 long noncoding RNA genes. miRTarBase, TargetScan, and miRDB databases were then used to identify aberrantly expressed miRNA-mRNA pairs. We analyzed only target mRNAs that matched the databases. Finally, lncRNA-miRNA and miRNA-mRNA pairs were merged based on shared miRNAs into a ceRNA network of EMT-related lncRNAs, which was visualized using Cytoscape v3.7.0 software.

Statistical analysis
The expression of EMT-related lncRNAs between the two risk groups and their relationship with WHO grades were analyzed using one-way ANOVA. To compare glioma characteristics with different clinical pathology risk scores, a one-way ANOVA or t-test was conducted to compare risk scores for patients grouped by clinical or molecular pathology characteristics. Uni-and multivariate Cox regression analyses and principal component analysis (PCA) were performed using R v3.6.3 and SPSS v22 software applications (SPSS Inc, Chicago, IL, USA). GSEA was performed to investigate the functions of DEGs between the 2 groups of gliomas. Survival status was determined by univariate Cox regression analysis; the Kaplan-Meier curve was generated using Prism v6 software (GraphPad, La Jolla, CA, USA). Operating characteristic (ROC) curves were used to study the prediction e ciency of the risk signature, age, grade IDH-mutant status, and 1p/19q codeletion status. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the "clusterPro ler" package of R software (P adj <0.05 and Q < 0.05). GSEA was used for functional annotation of genes. A 2-sided P value < 0.05 was considered signi cant for all statistical tests.

Establishment of an EMT-related lncRNA signature in glioma
To identify prognostic lncRNAs, expression data for each lncRNA in TCGA and CGGA were analyzed with a univariate Cox proportional hazards regression model (P < 0.01). We selected 137 and 92 lncRNAs related to glioma patient prognosis from TCGA and CGGA, respectively. We screened lncRNAs with prognostic signi cance in both datasets and identi ed 16 that were signi cantly associated with overall survival (OS) of glioblastoma patients (P < 0.01). LASSO Cox regression was used to identify the optimal prognostic lncRNAs in the TCGA data set and establish the risk signature. Nine of the 16 candidate lncRNAs retained their prognostic signi cance and were thus selected as independent prognostic markers ( Fig. 2a-c). All the 9 EMT-related lncRNAs, LINC00900, MIR210HG, MIR22HG, PVT1, and SNHG18 were found to be risky lncRNAs with HR > 1, and HAR1A, LINC00641, SLC25A21-AS1, and SNAI3-AS1 were protective lncRNAs (HR < 1) ( Table 1). Based on the median risk score, we divided glioma patients into low-and high-risk groups (Fig. S2); OS was longer in the former than in the latter (P < 0.0001; Fig. 2d). Then, we use the CGGA(n = 508) dataset as a validation dataset to calculate the risk scores. We found a signi cant difference in OS between the two risk groups (P < 0.0001; Fig. 2e). Validation of the 9-lncRNA prognostic risk signature We ranked the risk scores of patients in the TCGA training set and analyzed their distribution and plotted survival status (Fig. 3a). The heatmap revealed differences in the expression patterns of the prognostic lncRNAs between low-and high-risk patient groups. The expression of 4 protective lncRNAs was upregulated whereas that of 5 risk-associated lncRNAs was downregulated in patients with low risk scores; the opposite trends were observed in patients with high risk scores. These results were con rmed in the CGGA validation set (Fig. 3b).
Expression of 9 EMT-related lncRNAs and relationship with patient prognosis We evaluated the expression of the 9 EMT-related lncRNAs according to tumor grade in the TCGA training set and CGGA validation dataset. LINC00900, MIR210HG, MIR22HG, PVT1, and SNHG18 increased whereas HAR1A, LINC00641 SLC25A21-AS1, and SNAI3-AS1 decreased with tumor grade (Fig. 4a, c). LINC00900, MIR210HG, MIR22HG, PVT1, and SNHG18 levels were higher whereas those of HAR1A, LINC00641 SLC25A21-AS1, and SNAI3-AS1 were lower in the high-risk group than in the low-risk group (Fig. 4b, d). The results from the two data are consistent. We also examined the signi cance of these 9 lncRNAs for patient survival using the TCGA dataset. Based on the expression level of each lncRNA, the survival time associated with each gene differed signi cantly between high-and low-risk groups (Fig. S3). Uni-and multivariate Cox regression analyses showed that the 9-lncRNA signature was an independent factor signi cantly associated with OS (TCGA: P = 0.041, hazard ratio [HR] = 1.806; CGGA: P < 0.001, HR = 1.855) (Fig. 5c-f). and IDH mutation status, AUC = 0.726) (Fig. 5b), demonstrating that the prognostic signature was reliable and effective. Uni-and multivariate Cox regression analyses performed using the TCGA dataset showed that risk score, age, WHO grade, and IDH mutation status were all related to OS (Fig. 5c, d); this was con rmed using the CGGA validation dataset (Fig. 5e, f). These results suggest that risk score derived from EMTrelated lncRNAs is an independent factor that predicts the prognosis of glioma patients.

PCA and functional annotation of EMT-related lncRNAs
We carried out PCA to investigate differences between low-and high-risk groups based on EMT-related and overall gene expression pro les (Fig. 6a-d). Low-and high-risk glioma patient groups were distinguished based on EMT gene expression levels. We also found that EMT status was associated with a speci c lncRNA signature. We performed GO and KEGG pathway analyses of the top 3000 genes that were differentially expressed between low-and high-risk groups and found that differentially expressed genes (DEGs) were enriched in the following GO terms: M phase of cell cycle, Apoptosis, Tumor necrosis factor-mediated signaling pathway, and Regulation of cell adhesion. Additionally, KEGG pathway analysis revealed that Focal adhesion, Adherens junction, Pathways in cancer, and Cell adhesion molecules were signi cantly enriched (Fig. 6e, f). Functional annotation by GSEA showed that DEGs between the 2 groups were enriched in Epithelial-mesenchymal transition, E2F targets, P53 Pathway and IL6/JAK/STAT3 signaling (Fig. 6g). These results indicate that the two risk groups strongly correlated with the malignancy of glioma cells.

Discussion
In this study, we identi ed a 9-lncRNA signature that could distinguish between low-and high-risk glioma patients based on the median risk score in TCGA. These 9 EMT-related lncRNAs have prognostic signi cance in glioma: patients in the low-risk group had longer OS than those in the high-risk group. Deletion of chromosomal arms 1p and/or 19q and IDH mutations are considered as useful biomarkers in glioma [2; 24; 25]; we evaluated the risk scores of IDH and 1p/19q codeletion in the TCGA dataset and found that both were related to OS. The results were con rmed using the CGGA dataset, indicating that the 9-lncRNA signature is reliable and effective for predicting prognosis. We also established a lncRNA-miRNA-mRNA ceRNA network to predict the regulatory relationships of the 9 lncRNAs; functional enrichment analysis of downstream target genes revealed that the lncRNAs are involved in tumorigenesis and cancer progression.
As an intricate genetic procedure, EMT can be utilized by both normal and tumor epithelial cells to enable them to separate from neighboring cells and migrate [26], and have in uenced invasive, cancer progression and therapy resistance property of cancer cells [27; 28]. Previous reports indicate that Snail [29; 30], ZEB [31; 32], Twist [33] families and some other signaling pathways genes [34; 35; 36] have been identi ed as master regulators in EMT progress. More and more research shows lncRNAs epigenetically orchestrate EMT, and have promote our understanding of the molecular mechanisms that lncRNA-mediated in cancer initiation and malignancy progression [6]. LncRNAs play a critical role in diverse cellular processes in both normal and disease states [12; 37]. Speci cally, lncRNAs promote or act as decoys to repress transcription, or function as epigenetic regulators or scaffolds that interact with various proteins to form ribonucleoprotein complexes [38; 39]. LncRNAs are involved in multiple signaling pathways including p53, nuclear factor κB (NF-κB), phosphatidylinositol 3-kinase/protein kinase B, and Notch, which have been linked to EMT. Among the 9 EMT-related lncRNAs identi ed in this study, LINC00900, MIR210HG, MIR22HG, PVT1, and SNHG18 were associated with increased risk of glioma whereas HAR1A, LINC00641, SLC25A21-AS1, and SNAI3-AS1 were protective. PVT1 is upregulated in various carcinomas, and its overexpression is associated with poor survival in patients [40]. PVT1 functions act as a ceRNA to protect mRNAs from miRNA-mediated repression and associated with the development of resistance to common chemotherapeutic agents [41; 42]. MiR210HG sponges miR-1226-3p to facilitate breast cancer cell invasion and metastasis and EMT via regulation of mucin-1c; and MIR210HG upregulation was shown to be associated with advanced International Federation of Gynecology and Obstetrics (FIGO) stage, metastasis, and poor prognosis in cervical cancer [43; 44].
SNHG18 promotes resistance of glioma to radiotherapy by repressing semaphorin 5A [45]. LINC00641 expression level was found to be decreased in breast cancer tissue, which was negatively related to tumor size, lymph node metastasis, and clinical stage [46]. SNAI3-AS1 could bond up-frameshift protein 1 (UPF1), regulate Smad7 expression, and activate TGF-β/Smad pathway, to induce EMT in hepatocellular carcinoma [47], and it also acts an important factor in the ceRNA network in Adipose Tissue From Type 2 Diabetes [48]. Therefore, the close relationship between the lncRNA and EMT we screened and the important role of lncRNA in the ceRNA network have been further con rmed in other tumors or diseases.
Although some of the identi ed lncRNAs have not been previously reported as associated with EMT, our results suggest that they may directly or indirectly be involved in this process. There were some limitations to our study. Firstly, our analyses were based on publically available datasets and require con rmation in other patient populations. Moreover, experiments should be carried out to validate the roles of the identi ed lncRNAs and their targets and the lncRNA-miRNA-mRNA ceRNA network in glioma.

Conclusions
The 9-EMT-related lncRNA signature established in this study has prognostic value for glioma patients, while the ceRNA network provides insight into the molecular basis of glioma progression as well as potential therapeutic targets for its treatment.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.