A long non-coding RNA signature predicts survival for glioblastoma as prognostic biomarkers


 Background: Glioblastoma (GBM) is one of the most fatal tumors in the central nervous system. Its prognosis is very poor. There is increasing evidence that long noncoding RNA (lncRNA) participates in the biological process of glioblastoma. Nevertheless, the role of lncRNA in predicting the prognosis of GBM is still uncertain. Methods: In this study, using RNA-Seq and clinical follow-up data of GBM patients from The Cancer Genome Atlas (TCGA), we performed differential analysis of lncRNA, univariable and multivariable Cox regression analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, and Gene Ontology (GO) analysis.Results: We identified four lncRNAs closely interrelated with survival and prognosis of GBM patients. This lncRNA signature was effective in both the training set and the testing set, and it was independent to clinical factors.Conclusions: Our data suggested that the four lncRNAs could be used as promising biomarkers for predicting prognosis in GBM patients.


Background
Glioma is the most common malignant form of primary tumors in central nervous system, accounting for nearly 80% of primary lethal brain tumors [1]. GBM is the most invasive type of glioma. The median survival time of patients diagnosed with GBM is less than 15 months following standard treatment [2]. Even with significant improvement of the therapeutic regimen in GBM, including surgery, radiotherapy as well as chemotherapy, the prognosis of patients is still extremely dismal.
This failure of therapy is owing to the complex biological process of GBM, including the existence of refractory glioma stem cells, as well as chemotherapy protection caused by the blood-brain/tumor barrier [3]. Since there has no substantial breakthrough in treatment of GBM, it is essential to provide patients with valuable prognostic information. To date, widespread and effective prognostic biomarkers for GBM have not been available.
LncRNA is a kind of transcript without protein coding function, its length exceeds 200 nucleotides [4].
LncRNA can be assorted into five types according to their mode of action and functions: sense, antisense, intronic, intergenic and bidirectional [5]. Last few years, increasing evidence reveals that lncRNA is involved in a variety of functional activities [6], including chromatin remodeling, transcriptional and post-transcriptional regulation, et al. Dysregulated lncRNAs exert tumor facilitator or suppressor function in various cancers [7,8] Moreover, several studies indicate that numbers of lncRNAs are aberrantly expressed in glioma and correlate with glioma pathogenesis, such as initiation and progression [9,10], which have significant clinical values in diagnosis and prognosis of glioma [11,12]. Therefore, lncRNAs exhibit the potential to be promising survival predictors for patients with GBM.
In our study, analysis of GBM RNA-Seq data from TCGA was performed and differentially expressed lncRNAs were discerned. Then we identified a four-lncRNA signature which had important prognostic value for GBM patients based on differentially expressed lncRNAs Furthermore, we evaluated the performance of the four-lncRNA signature, and its effectiveness, generalization ability and independence were verified. Thus, this four-lncRNA signature could provide valuable prognostic information to GBM patients.

Methods
The GBM patient information The GBM data and matching clinical information were downloaded from TCGA database. We excluded samples without overall survival information, and in total, 159 patients with GBM were analyzed in our study. 159 samples were randomly classified into training set(n = 80) and testing set(n = 79). lncRNA expression profile GBM RNA-seq data were obtained from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). After annotation using ensemble database, the expression levels of lncRNAs and mRNAs were calculated by Reads Per Kilobase per Million mapped reads (RPKM). We screened out lncRNAs according to the following three criteria [13]: 1) transcripts were not detected in any protein-coding domain; 2) transcript sequences had been annotated in GENCODE project [14]; 3) transcripts were expressed in at least half of the GBM samples. The lncRNA expression profiles were defined as those with an average RPKM ≥ 0.1 in 159 GBM samples. Finally, a total of 6315 lncRNAs were identified. Expression difference was analyzed by edgeR package [15] in R.

Statistical analysis
According to the training group, the interrelation between lncRNA expression level and patients' overall survival was computed by univariate Cox regression analysis. Those lncRNAs with P-values less than 0.05 were regarded to be statistically significant. Then, the chosen lncRNAs were fitted in a multivariate Cox regression analysis in the training set. Risk scores were assessed by incorporating these chosen lncRNAs, which were weighted according to their estimated regression coefficients in the multivariable Cox regression model. The risk score can be computed for each GBM patient based on prognostic four-lncRNA signature. According to the median risk score, GBM patients could be classified into high-risk and low-risk groups, respectively. Survival differences between the two groups were assessed by the Kaplan-Meier survival analyses. To further test whether the risk score was independent of age and gender, multivariate Cox regression and data stratification analyses were performed. The time-dependent receiver operating characteristic (ROC) curve within 2-5 years were used to evaluate the sensitivity and specificity of the survival prediction in accordance with the risk score. All analyses were completed using R (version 3.6.1).

Functional enrichment analyses
In order to further understand the function of the four lncRNAs, we computed the expression relevance between the four lncRNAs and protein-coding genes by spearman correlation coefficients.
GO and KEGG enrichment analyses for these co-expressed genes were executed with the cutoff value of P-value < 0.05 and enrichment score > 1.0.
Four prognostic lncRNAs were detected from the training set The 159 GBM patients were allocated into two sets at random, a training set (n = 80) and a testing set (n = 79). Based on the training set, univariable Cox regression model of lncRNAs was carried out, and a set of four lncRNAs were identified to be considerably correlated with the patients' overall survival (P-value < 0.05; Table 1). All of them had positive coefficients, indicating that their aberrant expression was related to GBM tumorigenesis.

Expression level of the four lncRNAs
Here, we compared the expression levels of the four prognostic lncRNAs between normal brain tissue and GBM and found that AC096772.6 and UNC5B-AS1 were downregulated while HOTAIR and CTD-2263F21.1 were upregulated in GBM (Fig. 2). Except for HOTAIR, the other three lncRNAs were all firstly reported in GBM.
The four-lncRNA signature and patients' survival in the training set We designed a risk-score formula for prognostic prediction of GBM patients, based on the expression level of the four lncRNAs. The formula was as follows: Risk score= (0.91 × expression level of AC096772.6) + (0.14 × expression level of HOTAIR) + (0.36 × expression level of CTD-2263F21.1) + (0.31 × expression level of UNC5B-AS1). Afterwards, the four-lncRNA signature risk score was computed for each patient in the training set, and these patients were allocated into high-risk (n = 40) and low-risk groups (n = 40) by the median risk score value defined as the cutoff value. The Kaplan-Meier analysis indicated that patients in the high-risk group had lower overall survival than those in the low-risk group (P-value = 2.35e-05; Fig. 3A). We also performed time-dependent ROC analysis to assess the sensitivity and specificity of the four-lncRNA signature, and the area under the curve (AUC) score was 0.888 (2 years) (Fig. 3B). Moreover, ROC analysis from 3 to 5 years was performed (S- Fig. 1A-C). The result demonstrated that the four-lncRNA signature displayed a better prognostic predictive capability in the training set. Univariate Cox regression analysis showed that the four-lncRNA risk score was remarkably correlated with patients' survival (P-value = 6.57E-08, HR = 1.409, 95% CI = 1.244-1.597; Table 2). The scatter plot of the four lncRNAs risk score and survival status in training set were showed in Fig. 3C. We found that the mortality rate of patients with high risk scores was higher than that of patients with low risk scores.
Confirmation of the four-lncRNA signature for the survival prediction in testing set and the entire TCGA data set Subsequently, to validate our results, we verified the four-lncRNA signature in the testing set. Based on the same risk score method, the risk score of each patient in the testing set was computed, and GBM patients were divided into a high-risk (n = 32) and a low-risk group(n = 47) by the same threshold. In line with the results in the training set, survival analysis suggested that the prognosis of the high-risk group was remarkably worse than that of the low-risk group (P-value = 0.0497; Fig. 4A).
Similar result was yielded in the entire TCGA cohort (P-value = 5.87e-05; Fig. 4B). The ROC curves of the four-lncRNAs signature had AUC score of 0.782 in the testing set (Fig. 4C), and 0.746 in the entire set (2 years) (Fig. 4D). ROC analysis of the testing set and entire set from 3 to 5 years was also performed (S- Fig. 2A factors? To answer this question, we executed multivariate Cox regression analysis based on lncRNA risk score and clinical characteristics such as age and gender ( Table 2). The result suggested that when adjusted by clinical factors, four-lncRNA risk score remained to be closely related to overall survival in training set and entire set besides testing set. We considered that this may be attributed to small sample size. However, this did not affect the performance and predictive value of the four-lncRNA signature in the overall set. Moreover, we found that the age was also significantly related to prognosis. Then, stratification analysis was used, and all patients could be classified into younger stratum (age ≤ 60, n = 81) and older stratum (age > 60, n = 78).The analysis revealed that the four-lncRNA risk score could further subdivide GBM patients into high-risk and low-risk group within each age stratum (Fig. 5). These results indicated that prognostic value of the four-lncRNA signature was independent of age.

Functional properties of the four prognostic lncRNAs
Many lncRNAs involve in tumorigenesis via communicating with protein-coding genes. To deduce the biological function of the four lncRNAs in GBM, co-expression correlation between four lncRNAs and protein-coding genes were calculated. Protein-coding genes whose spearman correlation coefficients were more than 0.45 were chose as co-expressed genes of four prognostic lncRNAs. Finally, a total of 1621 protein-coding genes were notably associated with at least one of the four lncRNAs. Then we performed functional enrichment analysis, and the results showed that there were 479 GO terms and 11 KEGG pathways. Among the GO terms, Wnt signaling pathway, RNA catabolic process, chromatin modification and histone modification were mainly enriched. The KEGG pathway included Ribosome, thermogenesis, regulation of actin cytoskeleton and so on (Fig. 6).

Discussion
Last several years, an extensive amount of lncRNAs have been discovered and become a novel class of gene regulators. Numerous studies have proved that lncRNAs act as vital parts in various biological functions by regulating gene expression at the transcriptional, post-transcriptional and epigenetic levels [16]. LncRNAs are also thought to serve as either tumor suppressors or oncogenes in different kinds of cancer [17]. In addition, a large number of lncRNAs are aberrantly expressed in glioma, which are closely concerning with diagnosis, prognosis, metastasis and tumorigenesis [9]. lncRNAs exhibit the potential to become ideal indicators for early diagnosis and prognostic prediction of cancers [18,19]. Therefore, we speculated that differential expression of lncRNAs may serve as prognostic biomarkers for patients with GBM.
The purpose of our study was to systematically identify lncRNAs that served as prognostic biomarkers for GBM by measuring the expression of lncRNAs in GBM. Thus, we obtained lncRNA expression profiles data and clinical data from TCGA database, and differential expression analysis of lncRNAs was performed. A four-lncRNA prognostic signature notably associated with the overall survival was identified on the basis of differentially expressed lncRNAs. Subsequently, the manifestation of this four-lncRNA signature was assessed by ROC analysis. And the results suggested that the four-lncRNA signature had significant predictive value for prognosis in two years, however, it performed unsatisfactorily in three to five years. [20]. Meanwhile, we validated the predictive capability of the four-lncRNA signature was independent of clinical factors. lncRNAs can exert its biological functions in various levels including interacting with protein-coding genes. A spearman correlation analysis that incorporated lncRNAs and protein-coding genes was performed to identify co-expressed genes.
Functional enrichment analysis was carried out to expose the function of the four prognostic lncRNAs in GBM. And the results increased our understanding in how these lncRNAs affected prognosis of GBM.
For example, a great deal of studies have shown that Wnt signaling pathway is abnormally activated in GBM, promoting the proliferation and invasion of GBM, and inducing resistance to temozolomide in GBM [21,22].
To our knowledge, in these four prognostic lncRNAs, HOTAIR and UNC5B-AS1 have been investigated in cancers, while AC096772.6 and CTD-2263F21.1 have not been reported in any cancers. HOTAIR was firstly discovered in 2007 by Chang et.al as a trans-acting intergenic lncRNA that is transcribed from the homeobox gene C cluster and inhibit the expression of distal HOXD locus [23]. And increasing evidences reveal that overexpressed HOTAIR is related with proliferation, invasion, metastasis and shorter overall survival in cancers via many mechanisms [24,25]. UNC5B-AS1 Notably, contrary to reported results above, our data showed that UNC5B-AS1 was downregulated in GBM. We speculated that this may be induced by tissue specificity of lncRNA.
Thus, it is a convincing speculation that the four prognostic lncRNAs may be implicated in pathological processes of GBM and their aberrant regulation may lead to occurrence and development of GBM.
Nevertheless, except for HOTAIR, the relationship between the other three lncRNAs and GBM is reported for the first time, further biological and molecular experiments should be executed to help us to explore and understand their functional roles and underlying mechanisms in GBM.

Conclusion
A four-lncRNA signature which could predict prognosis of GBM patients was identified in our work.
Additionally, we also validated the effect of the four-lncRNA signature. Our data greatly determined that the prognostic predictive value of the four-lncRNA signature was powerful, and GBM patients may benefit from it.

Consent to publish
Not applicable.

Availability of data and materials
All datas are available. Please contact us to access if it is needed.

Competing interests
There are no conflicts of interest in this study.

Funding
The study did not accept any funding.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.