Screening of independent prognostic long noncoding RNA for gastric cancer in TCGA

Background The global incidence of gastric cancer (GC) ranks the fourth among cancers and its 5-year survival is less than 25%. LncRNAs are vital regulators involved in pathological processes of cancer. It is urgent to screen the prognostic lncRNA in GC. Method Expression le and clinical data of GC were downloaded from TCGA. Differentially expressed lncRNAs were calculated by edger R package, followed by the prognosis analysis. COX analysis was conducted to compute the independent factor of GC. Potential signaling pathways that the screened lncRNAs enriched in were evaluated by gene set enrichment analysis (GSEA). At last, Pearson analysis was conducted to predict the possible mechanism of lncRNA in GC process. Result ENSG00000224363 was an unfavorable prognostic factor to OS (overall survival) and DFS (disease-free survival) of GC as COX regression analyzed. GSEA analysis indicated that ENSG00000224363 may regulate cell cycle, apoptosis and autophagy of GC cells. Conclusion LncRNA ENSG00000224363 is overexpressed in GC, serving as an independent unfavorable prognostic factor.


Data collection
Using the R package (Bioconductor/TCGAbiolinks), RNA-Seq raw count, as well as clinical data of GC patients (age, gender, tumor grade, TNM staging, OS and DFS) were downloaded from TCGA database (https://tcga-data.nci.nih.gov/tcga/). A total of 407 expression les containing 32 normal tissues and 375 tumor tissues were collected. Among them, OS was recorded from 373 patients and DFS was recorded from 303 patients.
Downloaded datasets containing complete clinical data of GC patients were subjected to survival analysis or correlation analysis. Based on the median level of ENSG00000224363 in GC samples, they were classi ed to high-expression group (> median of ENSG00000224363) and low-expression group (≤ median of ENSG00000224363).

Statistical methods
Statistical Product and Service Solutions 22.0 (SPSS 22.0) was used for statistical analysis. Differentially expressed genes in downloaded TCGA datasets were calculated using the Edger package. χ2 test and Fisher's exact probability test were used for analyzing the correlation between ENSG00000224363 and pathology of GC patients. Kaplan-Meier (K-M) and Log-rank tests were adopted for survival analysis.
Univariate and multivariate Cox proportional hazard models were introduced to analyze risk factors of GC survival. Cor package was used to calculate Pearson correlation of all genes with ENSG00000224363. p<0.05 indicated a statistically signi cant difference. Signi cantly enriched gene sets were judged as per gene sets with p<0.05 and the false discovery rate (FDR) <0.25 in GSEA.
Results 2480 lncRNAs were up-regulated and 707 were down-regulated in GC A total of 50455 genes were detected to be expressed in GC tissues from the downloaded TCGA database, in which 14,464 were lncRNAs (fold change of cut-off value ≥2 and p <0.05). Particularly, 2480 up-regulated lncRNAs and 707 lncRNAs were obtained (Fig 1A).
Survival R package was utilized for screening prognostic lncRNAs of DFS and OS of GC. Firstly, GC patients were divided to two groups based on the median expression of alternative lncRNAs. Prognosis curve was drawn using the K-M method. LncRNAs with log-rank p value <0.05 were output. Finally, there were 234 lncRNAs screened out to be the prognostic factors for OS, of which 48 lncRNAs were prognostic factors for DFS (data were not all showed, Fig 1B, 1C, 1D and 1E).
ENSG00000224363 was an independent unfavorable prognostic factor for DFS and OS of GC The univariate Cox analyses of DFS and OS were conducted in the 48 screened lncRNAs, respectively. Eleven lncRNAs were associated with OS and 16 were associated with DFS (Table 1). Only 4 lncRNAs were both associated with DFS and OS of GC, which were subjected to the multivariate Cox analyses of DFS, in which 2 were associated with DFS of GC ( Table 2 -5). At last, these 2 lncRNAs were enrolled in multivariate Cox analyses of OS and only lncRNA ENSG00000224363 was obtained, which was an independent prognostic risk factor for DFS and OS of GC (Table 6-7). Chi-square test showed that ENSG00000224363 was associated with lymph node metastasis of GC (Table 8). However, ENSG00000224363 was not correlated to age, tumor grade, TNM staging and other indicators of GC patients.

ENSG00000224363 could predict DFS and OS of GC
To analyze the prognostic potential of ENSG00000224363 in GC, we divided GC patients in the TCGA database into 14 groups according to tumor stage, grade, tumor remnant, depth of tumor local in ltration, lymph node metastasis, gender and age. Correlation between ENSG00000224363 expression with DFS and OS in each group was calculated, respectively. Among them, ENSG00000224363 was a risk factor for DFS in the female group (Fig 2A), male group (Fig 2B), low-grade group (Fig 2C), high-grade group ( Fig  2D), early-stage group (Fig 2E), advanced-stage group (Fig 2F), no distant metastasis group (Fig 2G), no lymph node metastasis group (Fig 2H), lymph node metastasis group (Fig 2I), no tumor residual group (Fig 2J), local tumor deep in ltration group ( Fig 2K) and older group ( Fig 2L). In addition, ENSG00000224363 expression was a risk factor for OS in the female group (Fig 3A), male group (Fig 3B), older group (Fig 3C), no distant metastasis group (Fig 3D), high-grade group (Fig 3E), lymph node metastasis group (Fig 3F), no tumor residual group (Fig 3G), local tumor deep in ltration group ( Fig 3H) and advanced-stage group (Fig 3I).

Correlation between ENSG00000224363 and genes involved in GC progression
Correlation between the whole genome and ENSG00000224363 was analyzed using Cor R package. It is found that cyclin dependent kinase 3 (CDK3) (Fig 5A), CDK15 (Fig 5B), cyclin dependent kinase-like 3 (CDKL3) ( Fig 5C) and CDKL4 ( Fig 5D) were positively correlated with ENSG00000224363 expression.

Discussion
LncRNA is generally transcribed in eukaryotic cells with barely or no protein-encoding ability (23,24). It regulates gene expressions in the form of RNA through pre-transcriptional, transcriptional, and posttranscriptional level (4,25). Recent studies have shown that lncRNA is involved in many important processes, such as X chromosome silencing, genomic imprinting, chromatin modi cation, transcriptional activation, transcriptional interference and intracranial transport. LncRNA is also closely related to tumors and non-neoplastic diseases (26,27,28,29). The present study showed that differentially expressed lncRNA was related to the occurrence, development, invasiveness, metastasis and prognosis of GC, which may be used as a diagnostic marker and therapeutic target (30,31,32).
Dysregulated lncRNAs were rstly analyzed in the present study between normal gastric tissues and GC tissues in TCGA. There were 3187 lncRNAs to be analyzed. To identify the relationship between dysregulated lncRNAs and prognosis of GC, K-M analysis of DFS and OS was introduced. There were 48 lncRNAs identi ed to be associated with both OS and DFS of GC. Among them, 47 were unfavorable factors and the remaining were protective ones. Cox hazard rate model was widely used to assess the clinical outcome of patients since 1972. It has the advantage to analyze the prognostic values of multiple factors (33).
After the prognostic lncRNAs were obtained, they were enrolled in the univariate Cox analysis of OS and DFS. Analysis data showed that lncRNA ENSG00000224363 was an independent prognostic risk factor for DFS and OS of GC. Its high level predicted an earlier recurrence and worse outcome of GC patients.
Later, chi-square test revealed that ENSG00000224363 was positively associated with lymph node metastasis.
To assess the prognostic potential of ENSG00000224363 in GC, patients were divided into 14 groups.
The results revealed that ENSG00000224363 predicted DFS in 12 groups and OS in 9 groups. GSEA analysis indicated that the main function of ENSG00000224363 was enriched in cell cycle, apoptosis and autophagy of GC (34,35,36).
Correlation analysis showed that ENSG00000224363 was positively correlated with key tumor-driving genes and negatively correlated with tumor-suppressor genes. For example, the CDK family, which are key regulators in promoting cell cycle and modulating transcription (37,38), were positively correlated with ENSG00000224363 expression. Besides, CDKN1A (p21), an inhibitor of CDK family, can arrest cell cycle arrest and eventually inhibit cell growth (39,40). In addition, the caspase family members that are capable of inhibiting cell apoptosis, were positively correlated with ENSG00000224363 (41,42). Invasiveness is a vital trigger for tumor progression (43,44). Expression level of MMP family was in accordance with ENSG00000224363. Meanwhile, expression levels of key regulators in the ErbB4, MAPK family and Wnt family were also coordinated with ENSG00000224363 (45, 46, 47).
This study for the rst time demonstrated that lncRNA ENSG00000224363 was up-regulated in GC and it was an independent prognostic factor for DFS and OS of GC. It also revealed the possible mechanisms of ENSG00000224363 in regulating GC process. However, in vitro experiments are lacked, and our ndings should be validated at the cytological level in the future.

Conclusion
LncRNA ENSG00000224363 is up-regulated in GC, serving as an independent unfavorable prognostic factor.