Expression of lncRNAs Associated with GPCRs and Clinical Prognosis in GBM

Background: Glioblastoma Long non-coding RNAs (lncRNAs) play a variety of key regulatory roles in a variety of biological processes, and have an important inuence on the occurrence and development of tumors by regulating the expression of target genes. However, their role in the prognosis of GBM is still lacking in accurate prognostic markers. This study aims to establish an effective lncRNAs model to evaluate the prognosis of GBM. We used of lncRNA and clinical follow-up from The Cancer Genome Atlas (TCGA) to conduct univariate analysis, clustering analysis, coding gene difference analysis, Kyoto of and analysis and Gene Ontology for GBM lncRNAs that are closely related to the survival and prognosis of GBM were found and a multiple regression model was constructed to calculate the risk score of the samples, so as to accurately predict the clinical prognosis of GBM patients. new perspective for better evaluation of the role of lncRNAs in the clinical prognosis of GBM.


Introduction
Non-coding RNA, including short RNA represented by miRNA, siRNA and piRNA and long non-coding RNA (lncRNA), among which lncRNA is the most transcribed. In recent years, studies on epigenetic regulation of glioblastoma have attracted great attention from scholars, especially the lncRNA closely related to cell regulation, which may provide a new treatment method at the molecular level for inhibiting the growth and recurrence of glioblastoma. In the past 30 years, lncRNA has been found to regulate gene expression through interaction with DNA, RNA, protein or chromatin remodeling [16]. For example, lncRNA can act as cell "address codes" to transfer protein complexes to appropriate positions on chromosomes and lead to subsequent activation or deactivation [17]. Compared with small interfering RNAs and microRNAs, lncRNAs have greater potential in target recognition and can be folded into more advanced structures, contributing to chromatin remodeling and transcription and post-transcription regulation [18].
The Cancer Genome Atlas (TCGA) is an open public data platform, including 38 kinds of 11, 000 cases of Cancer patient information. The mining and analysis of these huge data will help researchers comprehensively study the molecular mechanisms of various cancers, identify new therapeutic targets and tumor-related molecular markers, and thus improve the diagnostic ability and therapeutic effect of various human malignant diseases [19]. In this study, we selected glioblastoma brain samples from the TCGA database for differential expression analysis of gene transcriptional levels, including the data of coding RNA (namely mRNA) and lncRNA. For mRNA, GO and KEGG were used for enrichment analysis of the differentially expressed genes, respectively. By functional analysis of the differentially expressed genes in glioblastoma, it was found that the G protein receptor pathway was enriched signi cantly. For lncRNAs, we performed univariate COX regression analysis on glioblastoma samples with prognostic survival data in the TCGA database to screen out lncRNAs with signi cant prognostic differences. Next, lncRNAs with signi cant differences in expression and prognosis were selected, which were further screened by stepwise COX multivariate regression analysis as independent prognostic factors, and a risk model that could be used to predict the prognosis of patients was constructed, and lncRNA targets that might be related to the prognosis of glioblastoma were extracted. In the end, we selected GO and KEGG to enrich signi cantly different mRNA and expression data in glioblastoma samples that can act as independent prognostic factor lncRNA, and evaluated the association between them by constructing a coexpression network.
Methods And Materials 2.1 Data collection and processing RNA-SEQ raw data can be downloaded from TCGA (https://tcga-Data.nci.nih.gov/tcGA/). Then, the counts expression of the extracted RNA. According to the gene encoding proteins, the genes are divided into coding genes and non-coding genes, and corresponding counts are extracted for subsequent difference analysis.

Differential analysis
We also used the R package (R-3.6.3 https://www.r-project.org/) of "DESeq2" to analyze the differences between mRNA and lncRNA from the GBM sample and the normal sample. In addition to R distribution itself bring R package, other packages we used need to download and install on Bioconductor packages website (http://www.bioconductor.org/). First, based on the R packages of "GGplot2", "Pheatmap", "Gplot", "Stringr", "Tidyverse", "Tibble", "FactoMineR", "Factoextra", "String", "Factoiner", "Factoextra", etc., the RLE box plot, principal component plot (PCA) and cluster analysis plot were used to detect the quality of RNA sequencing data and remove the large discrete samples from the deviated groups. Then, all samples were standardized by using the VST function of DEseq2 package, and the differentmrna and lncRNA were screened by using the "DESeq" function. Finally, the mRNA annotation of the coding gene was carried out by using the "org.hs.eg.db" and the "clusterPro ler" R software package, and the lncRNA was annotated by the non-coding le gencode. V34, where the difference threshold was FDR < 0.01 & P value < 0.01 & |log2FC| > 1.

Biological process and pathway enrichment analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using Toppgene (https://toppgene.cchmc.org/). The selection criteria are the number of genes that fall on a term/GO difference≥4 &P value < 0.05. The term/GO is drawn according to an enrich factor value sorted in descending order of size, taking the rst 30 results.
Enrich factor = (the number of different genes in a term/the total number of different genes) /(the total number of genes in database term/the total number of genes in the database). The bar graph results of Top30 GO and KEGG were plotted by SigmaPlot 14.0.

univariate Cox proportional regression analysis
We chose 167 tumors samples from TCGA with control-survival information and began with the function rowSums in the R package (rowSums >1). A univariate Cox Proportional regression model was used to calculate the association between the expression of each lncRNA and OS. Pvalue < 0.05 & HR1.5(HR > 1.5 or HR < 0.67) were used to analyze the effect of LncRNA on the prognosis of GBM. R package of GGplot2 is used for the volcanic diagram.

Muti Cox proportional regression analysis and stepwise Cox proportional hazards regression
Combined with the overall survival rate of GBM patients in TCGA, the R package of survival was used to perform multivariate COX regression analysis and stepwise COX multivariate regression analysis on the 12 lncRNAs to obtain lncRNAs (pvalue < 0.05). To make our model more optimized and practical, Muti Cox proportional regression analysis and a stepwise Cox proportional hazards regression model was used. Finally, a risk score formula was calculated by taking into account of the expression of optimized lncRNAs and correlation estimated Cox regression coe cients: Risk Score =1.43*AC021594.1+3.92*AC105046.1+0.27*CRNDE+0.92*LINC01574+1.11*LINC02320. Patients with GBM were classi ed into the high-or low-risk group by ranking the given risk score. Differences in overall survival (OS) between the high-and low-risk group in thel cohorts were assessed using Kaplan-Meier method and two-tailed log-rank test. A Cox proportional hazards regression model was used to identify independent prognostic factors. P < 0.05 was set as signi cant difference in all statistical methods.

Construction of the LncRNA/mRNA Coexpression Network
To construct the LncRNA and mRNA coexpression network(CNC network), we calculated the Pearson correlation coe cient and R value to evaluate LncRNA-mRNA correlation [20].
The network construction procedure includes: (1) Preprocess data: 93 differentially encoded genes and expressions of the above 5 LncRNA in 160 samples were selected from 3 KEGG and 2 GO enrichment analyses related to GPCRs.
(2) R package of Psych was used to calculate the correlation between each LncRNA and 93 encoding genes Pearson, and top20 relationship pairs with signi cant FDR < 0.01&P < 0.01 were selected.
(4) Pearson correlation between the encoding genes involved in relation pairs was calculated, and Top100 relation pairs with signi cance FDR < 0.01&P < 0.01 were selected.
(5) Cytoscape (3.5.1) software was used to construct the network.

Statistical analysis
Differential analysis and normalization were mainly conducted by "DEseq2" package. Cox regression model or Kaplan-Meier analysis with log-rank test was performed by "survival" package. Student's t test was used for continuous variables, while categorical variables were compared by χ2 test. The Wilcoxon rank-sum test was a nonparametric statistical test mainly utilized for comparing two groups and the Kruskal-Wallis test was suitable when it comes to two or more groups. All statistical analysis was achieved in R software (Version 3.6.3). We considered p <0.05 to be statistically signi cant.

Extraction of RNA sequencing data
The RNA sequencing data, clinical information and survival information les of 155 GBM primary cancer tissues and 5 normal brain samples are all downloaded in TCGA. The TCGA includes data of 38 types of tumors, including 6 types of data (copy number, DNA methylation, gene expression RNAseq, phenotype, Somatic mutation, miRNA expression). In this study, we only took transcriptome data, including coding and non-coding RNA data.

Difference analysis of GBM samples
Violin diagram and box diagram were used to show the overall distribution of mRNA and LncRNA of all samples, and volcanic diagram and heat diagram were used to statistical and visual display the results of different mRNA and LncRNA. R package of pheatmap was used for the heatmap, and R package of ggplot2 was used for the box diagram, volcano diagram and violin diagram.
A heatmap can intuitively show the expression levels of different genes in each sample, cluster them according to the expression data of different genes, and obtain the relationship between the expression patterns of samples and genes, so as to draw a correlation conclusion with biological signi cance. In the heat map, each column represents one sample, including 155 GBM samples and 5 normal samples. Each line represents a gene (mRNA and LncRNA). Firstly, statistical software R was used to conduct statistical analysis of the differences between the mRNA and LncRNA of GBM samples and normal samples (FDR < 0.01 & P value < 0.01 & |log2FC| > 1). Finally, 547 differentially expressed mRNAs were selected from 19261 mRNAs, and 471 differentially expressed LncRNA s were selected from 11307 lncRNAs. Genes expressing similar expressions (mRNA and LncRNA) were clustered together with the samples. Black represents 0, indicating no change in gene expression level, red represents increased expression level, and green represents decreased expression level. The brightness of the color represents the degree to which gene expression levels rise or fall. It can be seen from the results that mRNA and LncRNA are mostly upregulated in the 5 normal samples compared with GBM (Fig 1E,F).
The violin diagram and box diagram can be combined to show the distribution of gene expression abundance in each sample. The results showed that the abundance difference of mRNA and LncRNA in these 160 samples was less than 5%, that is to say, the abundance difference of coding and non-coding genes in normal and abnormal samples was not signi cant (Fig 1 A,C).
To be more speci c, we also found by barplot that 430 of the 547 mRNAs were down-regulated and 117 were up-regulated, compared with the normal sample. Among the 471 differentially expressed lncRNAs, 422 were down-regulated and 49 were up-regulated (Fig 2A).
The volcano plot is used to show the signi cantly different expressions of the two sets of samples. The p-value and Fold change values obtained by accurate T-test statistical analysis were used to draw the volcano map between the two groups. In the volcano plot, one of the coordinates shows the negative log of P-values computed by the T-test, and the other shows the converted log of P-values compared to the two conditions. Similarly, we can also nd through the volcano map that compared with the normal sample, the GBM sample down-regulated more in the differentially expressed mRNA and LncRNA ( Fig  1B,D).

Enrichment analysis of GO and KEGG pathways of DEGs
We conducted GO and KEGG enrichment analysis of the differentially expressed genes respectively, and selected the GO and KEGG results of Top30. In the results of BP, we found that the main difference between gene enrichment in embryonic bone system development and formation, chemical synapses and synaptic transmission, eating behaviors, G protein coupled receptor pathway and G protein coupling the second messenger cyclic nucleotide receptor signaling pathways, neuropeptide signaling pathways, and other functions. In the results of MF, we found that the differentially expressed genes were mainly concentrated in neurotransmitter binding and receptor activity, G protein-coupled peptide receptor activity, G protein-coupled receptor activity and other functions. In the results of the KEGG pathway, we found that the differentially expressed genes were mainly concentrated in the interaction in stimulating nerve tissues, G-protein-coupled receptor ligand binding, G-protein-coupled receptor downstream signaling, and G-protein-coupled receptor pathways, among which G-protein-related pathways accounted for three of the rst ten pathways. This indicates that there is a close relationship between GPCRs and GBM. More

Univariate COX survival regression analysis
In order to better classify the samples, there were 167 GBM tumor samples with prognostic survival data in TCGA, including 155 GBM primary cancer tissues and 12 GBM metastatic cancer tissues (one of which has no survival data), including 15110 lncRNAs in total, and 14480 lncRNAs were left after the deletion of some lncRNAs that appeared only a few times. Finally, a total of 642 lncRNAs with signi cant prognostic correlation were calculated by univariate COX survival regression analysis (Fig 2B).

LncRNA-based prognostic biomarkers in GBM patients
In the signi cance analysis results of GBM samples, we obtained a total of 471 lncRNAs with signi cant differences, and 642 lncRNAs with signi cant prognostic differences in univariate COX survival regression analysis. The analysis software was used to take the intersection of the two (Venn diagram) to obtain 12 lncRNAs, which were not only differentially expressed in normal tissues and cancer tissues, but also had prognostic differences in cancer tissues (Fig 2C).

LncRNA with independent prognostic factor
Prognostic biomarkers based on LncRNA in GBM patients, namely 12 lncRNAs obtained by intersection, were used for prognostic survival curves respectively. Univariate COX survival regression curve analysis showed that among the 12 lncRNAs (Fig 4), only AC097641.1 and AC117453.1 were HR< 1, is a protective factor. While other LncRNA HR>1, is a risk factor. The prognosis of patients in the high-risk group was signi cantly worse than that of patients in the low-risk group. Then, stepwise COX multivariate regression was used for the 12 lncRNAs obtained, and 5 lncRNAs (P<0.05) (Fig 5).

Multiple regression model risk Score survival analysis
We constructed a multiple regression model with 5 lncRNAs that could act as independent prognostic factors. The risk score of each sample was calculated based on a multiple regression model consisting of ve lncRNAs (Risk Score =1.43*AC021594.1+3.92*AC105046.1+0.27*CRNDE+0.92*LINC01574+1.11*LINC02320) Then, according to the median risk score, all cases were divided into the high-expression group and the low-expression group, and the correlation between risk score and prognosis was calculated. P < 0.0001 was signi cantly correlated with prognosis.
Survival analysis using scatter plot (Fig 6.A, B), heat map (Fig 6.C), and Kaplan-Meier plot (Fig 6.D), all showed that the prognosis of the high-risk group was signi cantly lower than that of the low-risk group. It is proved that the multiple regression model can accurately predict the clinical prognosis of GBM patients.

Coexpression network of coding gene/non-coding gene
Although more and more studies are trying to reveal the functional signi cance of lncRNAs, the biological role of most lncRNAs is still unknown. Biological processes and cellular regulatory networks are very complex and involve the interaction of various molecules, such as proteins, RNA and DNA. Therefore, we constructed a coexpression network of coding gene/non-coding gene based on 155 GBM samples and 5 normal brain samples in TCGA database to study the potential interaction between GPCR-related mRNAs and lncRNAs in each sample. The co-expression network consisted of 93 GPCR related mRNAs and 5 lncRNAs (Fig 7). The gure showed that 5 lncRNAs (AC021594. showed that lncRNAs and GPCR-related mRNAs in GBM patients interregulated very closely. In addition, external data sets in CGGA were also used to verify the above results. In the CGGA database, one of the lncRNAs, namely CRNDE, could be found, and the results were consistent with the previous ones. In other words, the prognosis of the high-risk group was signi cantly lower than that of the low-risk group, which was signi cantly correlated with the prognosis (Fig 8).
Discussion GBM is one of the most common primary malignant tumors of the central nervous system. The etiology of this disease is unknown [21]. The standardized treatment options of GBM include surgical excision within the maximum safety range, postoperative adjuvant radiotherapy, and chemotherapy. The most common is the cytotoxic alkylating agent Temozolomide. Almost all GBM eventually relapsed [22].
Despite great progress in recent years, the mortality rate of GBM remains high. GBM survival rate is inversely proportional to age. Speci cally, about 5% of GBM patients survive ve years after diagnosis. The median survival time of untreated GBM patients is only about 3 months [23]. A better understanding of the genetic and molecular pathogenesis of GBM could lead to more effective treatments.
LncRNA does not participate in protein coding, many recent studies have found that LncRNA is involved in a variety of life activities in vivo, such as the diversity of embryonic stem cells, the regulation of cell cycle, and the occurrence and development of cancer [24]. Nowadays, more and more studies have con rmed that LncRNA can be used as a diagnostic marker for cancer and even as a therapeutic option [25].
The human genome contains more than 50,000 LncRNA genes [26]. lncRNAs play an important role in cell biology by regulating gene expression at different levels, including chromatin remodeling, transcription and post-transcription processing [27,28]. More and more evidence indicates that lncRNAs dysfunction may play a role in the occurrence and development of tumors [29,30]. Some studies have shown that abnormal LncRNA expression is related to the prognosis of GBM and can be used as a tumorrelated predictor. Compared with protein-coding genes, lncRNAs have signi cant advantages as diagnostic and prognostic biomarkers [31]. In the global analysis of GBM samples, 584 of them are related to poor prognosis. 282 lncRNAs were associated with a better survival rate in GBM patients and were con rmed as prognostic biomarkers for GBM [32]. We attempted to use mRNA data, LncRNA data and clinical follow-up data from the tumor genome atlas to identify LncRNA related to GBM prognosis and provide potential therapeutic targets for treatment. lncRNAs have become novel biomarkers for the diagnosis and prognosis of various human cancers [33][34][35][36]. We speculated that the differential expression of lncRNAs might be used as biomarkers to determine the prognosis of GBM patients, and previous studies have shown that several lncRNAs, including MEG3, H19 and Gas5, are abnormally expressed in all GBM patients [37][38][39][40].
First, we downloaded all GBM-related transcriptomic sequencing data, clinical information and survival information les from TCGA, including 155 GBM primary cancer tissues and 5 normal samples. Then we conducted a signi cance analysis of the differentially expressed mRNA and LncRNA in GBM primary cancer tissues and normal sample tissues respectively. Among 19261 mRNAs, 547 mRNAs showed signi cant differences, 430 were down-regulated, and 117 were up-regulated, that is to say, in the normal sample, most of them are up-regulated . Among 11307 lncRNAs, 471 lncRNAs showed signi cant differences, 422 were down-regulated and 49 were up-regulated, that is to say, most of them were also upregulated in the normal sample. Therefore, univariate COX survival regression analysis was performed on 167 GBM patients in TCGA, including 155 GBM primary cancer tissues and 12 GBM metastatic cancer tissues (one of which has no survival data), containing a total of 14480 lncRNAs, and nally 642 lncRNAs with signi cant prognostic differences were screened out. Among 471 lncRNAs with signi cant difference and 642 lncRNAs with different prognosis, 12 lncRNAs (AC005632.5, AC021594.1, AC097641.1, AC105046.1, AC117453.1, CRNDE, HOTAIR, HOxC-AS2, HoxC-AS3, LINC01111, LINC01574, and LINC02320) were obtained from the intersection of the two (Venn diagram).
The 547 differentially expressed mRNAs obtained above were enriched by GO and KEGG respectively, and the GO and KEGG results of Top30 were selected. In the results of GO enrichment analysis, we found that the main difference between gene enrichment in embryonic bone system development and formation, chemical synapses and synaptic transmission, G protein coupled receptor pathway and G protein coupling the second messenger cyclic nucleotide receptor signaling pathways, neuropeptide signaling pathways, as well as the combination of neurotransmitters and receptors on activity, G protein coupling peptide receptor activity, G protein coupled receptor activity, etc. In the results of KEGG pathway analysis, we found that the differentially expressed genes were mainly concentrated in the interaction in stimulating neural tissue, G-protein-coupled receptor ligand binding, G-protein-coupled receptor downstream signaling, and G-protein-coupled receptor pathways, among which G-protein-related pathways accounted for three of the rst ten pathways. Based on GO enrichment analysis and KEGG pathway analysis, we found that there was a close relationship between GBM and G protein-coupled receptors. This had not been found in previous studies.
G protein coupled receptors (GPCRs) is arguably the most common receptor in research, and also an important target in drug development. According to statistics, over 30% of clinical prescription drugs directly act on GPCR, which is the most widely used targeted gene family among drugs approved by The Food and Drug Administration (FDA). The structural variability of GPCR makes it a very speci c protein, which makes members of this family the most important drug targets. GPCRs are also the largest family of cell membrane receptors involved in signal transduction [41], and GPCR-mediated signaling pathways include nerve conduction, embryonic development, regulation of the immune system, control of sight and smell, and other physiological activities [42]. GPCR can transfer various extracellular signals such as neurotransmitters and hormones into cells, thus playing an important role in the regulation of cell function, which is also closely related to the occurrence, development and metastasis of a variety of human tumors. The regulation of tumor by GPCR mainly involves the alteration of the level of the second messenger in the cell and then the realization of this process through the signal transduction pathway associated with it [43].
In order to fully understand how LncRNA affects GBM, we constructed a tumor-speci c lncRNAs/mRNAs co-expression network based on biological prediction based on correlation analysis. In GO enrichment analysis and KEGG pathway, there were 93 mRNAs associated with GPCRs, and 5 lncRNAs (AC021594.1, AC105046.1, CRNDE, LINC01574, LINC02320) were all related to a large number of the GPCRs mRNAs. Then, stepwise COX multivariate regression was used and then we obtained ve lncRNAs (AC021594.1, AC105046.1, CRNDE, LINC01574, LINC02320) that could act as independent prognostic factors (P<0.05).
The results of survival analysis showed that the prognosis of patients in the high-risk group was signi cantly lower than that of patients in the low-risk group, which proved that the multiple regression model established by us could predict the clinical prognosis of GBM patients more accurately. Studies have shown that colorectal neoplasia differentially expressed (CRNDE) play a role in promoting cancer in glioblastoma [44]. Compared with normal brain tissue, THE CRNDE of GBM patients was up-regulated 32 times, and the expression level of CRNDE was associated with glioblastoma grade [45]. Meanwhile, CRNDE can also down-regulate miR-186 and inhibit the expression of downstream target gene XIAP (Xlinked apoptotic inhibitor) and PAK7 [P21 protein (Cdc42/Rac) -activated kinase 7], thus promoting the proliferation of human GSC [46]. In addition to these observations, Wang et al. [47] showed that CRNDE overexpression could promote the growth of glioblastoma cells through mammalian target in vitro and in vivo rapamycin (mTOR) signaling pathways.
In addition, in order to verify the predictive effect of these ve lncRNAs on the prognosis of GBM patients, we used the data obtained from Chinese Glioma Genome Atlas (CGGA) as the validation set, downloaded and analyzed the corresponding CRNDE RNA-SEQ data and clinical information in CGGA, and obtained comparable results. Therefore, we speculated that the expression of these lncRNAs might become effective predictors of survival and potential prognostic biomarkers for GBM patients.
GBM is a primary malignant tumor of the central nervous system with a poor prognosis. Analysis of global GBM samples has found a variety of prognostic related lncRNAs, however, accurate prognostic markers are still lacking. We used mRNA data, LncRNA data and clinical follow-up data from TCGA to conduct univariate analysis, clustering analysis, coding gene difference analysis, KEGG analysis, GO analysis, etc., for GBM patients. Through multiple systematic analysis, we found 5 lncRNAs that are closely related to the survival and prognosis of GBM, and these 5 lncRNAs can be used as independent prognostic factors. In addition, we also constructed a multiple regression model with these 5 lncRNAs.
The multiple regression model composed of these ve lncRNAs was able to calculate the risk score of each sample to accurately predict the clinical prognosis of GBM patients. So we constructed an effective multiple regression model, which can effectively predict the clinical prognosis of GBM patients.

Conclusion
In this study, GBM-speci c mRNAs and lncRNAs were identi ed through bioinformatics analysis of tens of thousands of candidate RNAs. Finally, a multiple regression model related to lncRNAs was successfully constructed and it was found that these lncRNAs could be used as independent prognostic biomarkers for GBM. Our study provides a new perspective to better evaluate the role of lncRNAs in the clinical prognosis of GBM.

Declarations
We declare that we have no nancial and personal relationships with other people or organizations that can inappropriately in uence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as in uencing the position presented in, or the review of, the manuscript entitled "Expression of lncRNAs associated with GPCRs and clinical prognosis in GBM".
We con rm that we would like support with depositing and managing our data , and we can also submit datasets to Springer Nature as part of our publisher's Data Support Services. All data generated or analysed during this study are included in this published article.
Ethics approval and consent to participate There are no animal or human trials involved.
Consent for publication Not applicable.
Competing interests The authors declare that they have no competing interests. Funding Not applicable