A signature related to lung adenocarcinoma prognosis: A study based on TCGA database

Background: The intrinsic molecular subtypes of lung adenocarcinoma (LUAD) impact clinical treatment decision-making, but the molecular mechanisms are still unclear. Therefore, we aimed to identify sensitive biomarkers to evaluate LUAD patient prognosis. Methods: Differentially expressed RNAs from LUAD patients were obtained from The Cancer Genome Atlas (TCGA) database and they were used to construct a competitive endogenous RNA (ceRNA) network. Based on the examination of clinical data, long noncoding RNAs (lncRNAs) and mRNAs in the network were selected by univariate and multivariate Cox regression analysis. Finally, functional enrichment analysis was used to reveal prognostic signatures based on the classi�cation into high and low-risk groups, survival analysis, and an independence test. Results: The ceRNA network consisted of 21 mRNAs, 53 lncRNAs, and 8 miRNAs that were selected from the differentially expressed RNAs identi�ed. Next, based on univariate and multivariate Cox regression analysis, a prognostic signature, including two mRNAs (HOXA10 and CBX2) and four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132) was constructed. Eventually, survival analysis showed that signi�cant differences in survival rates between high and low-risk groups and the area under the curve (AUC) for three ‐ year survival was 0.714. Compared with clinical risk factors, including age, pathological stage, and TNM stage, our risk score had a higher prognostic value. Conclusion: By screening from a ceRNA network, we constructed a signature, including two mRNAs (HOXA10 and CBX2) and four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132), that can be utilized as a prognostic biomarker in LUAD. This signature may provide options for clinical treatment.


Introduction
There are dozens of molecular pathways and mechanisms that have been proven to affect the occurrence and development of tumors [1].Gene expression patterns identi ed in LUAD cover different functional pathways and exhibit variable outcomes for patients [2], which effect treatment choices [3].The American Joint Committee on Cancer (AJCC) staging system is at the forefront of guiding clinical decision making and is, by far, the best way to predict LUAD patient prognosis.However, more than 20% of early-treatment patients ultimately develop cancer recurrence and metastasis [4].Therefore, it is necessary to identify the molecular pathways involved in LUAD growth in order to design novel diagnostic strategies and targeted therapies.
Gene expression pro les play a unique role as predictors for prognosis and treatment [5].The identi cation of risk-related biomarkers for LUAD will also provide an opportunity to understand cancer development and progression.Identi cation of sensitive biomarkers to evaluate LUAD prognosis at an early stage is essential for successful disease treatment.
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs), RNA subtypes without protein-coding functions, are widely expressed in the tissue of organisms [6].In recent years, lncRNAs have been a hotspot for research because of their vital role in chromatin organization, transcription, RNA processing, and nuclear domains [7][8][9][10][11].A growing number of studies have demonstrated that aberrant lncRNA expression is related to prognosis or response to therapy in various types of cancer.For example, Wang et al. found that lncRNA miR503HG inhibits tumor metastasis as a prognostic indicator in liver cancer [12] and Xie et al. revealed that circulating lncRNAs are valuable, novel biomarkers for the diagnosis and prognosis of non-small cell lung cancer [13].Consequently, there may be additional lncRNA prospects as biomarkers or therapeutic targets due to their genome-wide expression pattern in various tissues as well as their tissue-speci c expression characteristics.
In 2011, Pandol proposed the competing endogenous RNA (ceRNA) hypothesis which suggests that lncRNAs and mRNAs compete for miRNA binding as a regulatory mechanism.Xia W. et al. revealed that the TWIST1-centered ceRNA network facilitates the development process of LUAD [14].Through this mechanism, miRNAs bind to target mRNA molecules using miRNA response elements (MREs), which reduce the stability of target mRNAs and inhibit translation at the transcriptional level and, conversely miRNAs can be affected.Various types of RNA transcripts, including mRNAs, lncRNAs, and circular RNAs (circRNAs), compete for miRNA binding by mutating MREs as a means of mutual regulation.This confers new and broader biological functions for both mRNA and lncRNA.In addition, a previous study showed that a predictive signature combining the tumor immune microenvironment with chemosensitivity-related features can improve the prognosis and therapeutic outcome for patients with stage II to III gastric cancer [15].Based on the RNA network and signature, we may nd novel prognostic biomarkers or therapeutic targets for LUAD.
Taken together, we need to investigate the trends associated with the development of LUAD at the molecular level.In this study, we used the transcriptome sequencing data from LUAD patients in TCGA to identify differentially expressed genes that could be used to build related transcriptional regulatory networks, constructed a prognostic signature, and comprehensively analyzed the prognostic value of indicators.

Data acquisition and differential expression analysis
The gene expression pro les were obtained from the Genomic Data Commons (GDC) database (http://gdc--portal.nci.nih.gov/), which contains all TCGA data for analysis.The data included RNA-seq data from 539 LUAD patient samples and 59 tumor-adjacent normal tissues, as well as all clinical information from these cases.Also, 521 tumor samples and 46 healthy samples from miRNA-seq data were downloaded.To explore possible lncRNA-mRNA-miRNA interactions, RNA-seq analysis was utilized.
The edgeR package in R was used to identify differentially expressed genes or mRNAs (DEGs), lncRNAs (DELs), and miRNAs (DEMs) from the raw sequencing data in TCGA.Referencing previous literature, adjusted p values were used in the false discovery rate (FDR) to ensure more accurate results.The thresholds were set as FDR < 0.01 and |logFC (fold change)| > 2.

CeRNA network building
Based on the use of mircode (http://www.mircode.org/index.php),we predicted the interaction between DELs and DEMs for building a transcriptional regulatory network.Next, three online tools, including miRDB (http://www.mirdb.org),miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/php/index.php), and TargetScan (http://www.targetscan.org/vert_72/),were used for predicting the target genes of signi cant DEMs.To obtain more accurate results, we constructed a Venn diagram to identify a common set from the three target gene sets as well as the intersection of the above results and 2296 DEGs.To build the lncRNA-mRNA-miRNA interaction network, the expression of lncRNAs and mRNAs was considered to have a negative correlation when the miRNAs were expressed.Cytoscape (version 3.7.0)was used to visualize the results and obtain the ceRNA network graph.The CentiScaPe2.2APP was used to select central genes from the ceRNA network (degree > 16).

Prognosisrelated RNA selection and prognostic signature construction
After excluding samples containing two or more tumor tissues and incomplete clinical data, we performed univariate Cox regression analysis of lncRNAs and mRNAs in the ceRNA network to screen for prognostic genes, with p < 0.05 considered signi cant.Then, multivariate Cox regression analysis was conducted to establish a prognostic signature for LUAD.The signature was de ned as a collection of highly prognostic and connected genes within the network.

Functional enrichment analysis
DAVID (https://david.ncifcrc.gov/),the database for annotation, visualization, and integrated discovery, is based on all genes in the human genome database and provides systematic and comprehensive annotation information of biological functions.We used DAVID to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of mRNAs in the ceRNA network, with p < 0.05 considered signi cant.Visualization of the enrichment analysis results was performed using ggplot R software.

High and low-risk group classi cation and survival analysis
A risk score, which could better determine the performance of the lncRNAs and mRNAs in the signature, was calculated.It is the product of the Cox regression coe cient for each gene multiplied by the gene expression value.The calculation formula is as follows: Risk score = ∑β × expression value gene Here β refers to the regression coe cient of multivariate Cox analysis; Expression value gene is the gene expression value level of core lncRNAs and mRNAs.
The risk score median was regarded as the threshold.We divided the sample into the high-risk and lowrisk groups.Survival curves, distribution of risk scores, and patient survival status were determined with the R software package.By receiver-operating characteristic (ROC) curve analysis, the signature's performance was assessed, optimizing the critical expression threshold (condition: AUC > 0.7 and p < 0.05) and exploring the sensitivity of prognostic signatures for LUAD patient three-year survival rates.
Furthermore, it was essential to use Kaplan-Meier plots (http://kmplot.com/analysis/)to explain LUAD patient overall survival in relationship to the prognostic signature gene expression, with p < 0.05 considered statistically signi cant.

Independent prognostic analysis and forestplot construction
For verifying the independence of the predictive signature, the prognostic values of clinical risk factors (age, pathological stage, TNM stage and risk score) were evaluated using the survival package in R and forestplots were constructed using R.

CeRNA network
To investigate possible lncRNA-mRNA-miRNA interactions in LUAD patients, we successfully predicted 92 DELs and 19 DEMs with interactions by MiCode.In addition, 33 target genes were later found and the results predicted matched with the 2296 DEGs.Based on the 92 DEL-DEM pairs and 22 DEM-DEG pairs, Cytoscape was used to build a visual network (Fig. 2), which involved 53 DELs (43 upregulated and 10 downregulated), 21 DEGs (15 upregulated and 6 downregulated), and 8 DEMs (5 upregulated and 3 downregulated) in Table 1.Within the network, the node degree indicates the number of links with other nodes and nodes with a higher degree usually play a pivotal role in the system being modeled by the network.Three miRNAs were identi ed, namely hsa-mir-144, hsa-mir-195, and hsa-mir-143 with degrees greater than 16.
Table 1 LncRNAs, miRNAs and mRNAs in the CeRNA network In the above network, we discovered two DEGs and four DELs signi cantly related to LUAD prognosis using univariate Cox regression analysis (p < 0.05).Hereafter, multivariate Cox regression analysis was conducted on the two mRNAs (HOXA10 and CBX2) and the four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132), identifying this combination as the optimal prognosisrelated signature (Table 2).

Functional enrichment analysis
It is important to understand that the functions of lncRNAs relate to their target genes rather than their protein-encoding capabilities.The functional enrichment analysis resulted in genes that were involved in protein digestion and absorption as well as the cell cycle (p < 0.05 and gene count > 2).Three GO terms were identi ed out of the 19 terms (Fig. 3), including biological process (BP), cellular component (CC), and the molecular function (MF), which were associated with protein binding (MF), nucleoplasm(CC), and DNA replication (BP), respectively.

High-low risk groups and survival analysis
For a better evaluation of the signature's ability to predict LUAD patient prognosis, a risk score was built, where the coe cients for the samples were weighted.The risk score was calculated as follows (Fig. 4a): Risk score = (0.07460 × expression level of HOXA10) + (-0.19060 × expression level of CBX2) + (-0.08709 × expression level of LINC00460) + (-0.09837 × expression level of LINC00330) + (-0.08088 × expression level of DGCR5) + (-0.15842 × expression level of C14orf132).With the median risk score set as the threshold, a total of 480 patients meeting the standard was equally divided into the high-risk and low-risk groups.The survival status of each group is shown in Fig. 4b.Meanwhile, Survival analysis showed that there was a signi cant difference in the ve-year overall survival between the high-risk group and the lowrisk group (p = 5e-05, Fig. 4c).Over time, the survival rate of the two groups gradually decreased, yet the survival time of the low-risk group was signi cantly higher than that of the high-risk group.The AUC suggested superior prognostic value since the ROC curve indicated that the AUC of the lncRNA-mRNA signature for predicting three-year survival was 0.714 (Fig. 4d).We assessed the link between gene expression in the prognostic signature and LUAD patient survival by means of a Kaplan-Meier curve and log-rank test, where p < 0.05 was considered signi cant.Subsequently, two mRNAs (HOXA10 and CBX2) and two lncRNAs (DGCR5 and C14orf132) were selected.In the ceRNA network of LUAD patients, we determined that upregulation of CBX2, DGCR5, HOXA10 (HR: 1.55%, 1.16%, and 1.49%, p < 0.05) and downregulation of C14orf132 (HR:0.7%,p < 0.05) (Fig. 5) were signi cantly correlated with survival time and risk-group determination.A high level of C14orf132 or low levels of CBX2, DGCR5, HOXA10 were associated with higher patient survival rates.

Independence test
We evaluated the prognostic value of clinical risk factors (age, pathological stage, and TNM stages) and risk score with the survival package in R. The results showed that risk score was the superior factor affecting the survival rate (Fig. 6).

Discussion
During the last few years, along with the emerging role of high-throughput sequencing technology, molecular characteristics associated with LUAD has been gradually recognized and are becoming the focus of various studies.In the present study, we selected 53 DELs, 21 DEGs, and 8 DEMs to build a ceRNA network that centered on three core miRNAs that were downregulated, including miR-195, miR-143, and miR-144.Then, a signature consisting of two mRNAs (HOXA10 and CBX2) and four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132) was ultimately constructed.Our results showed that there were signi cant differences in the survival rates between the high and low-risk groups according to the risk scores of the identi ed signature.In addition, survival analysis revealed that the presence of HOXA10, CBX2, and DGCR5 as a risk factor led to a signi cant decrease in LUAD patient survival time, while C14orf132 was regarded as a protective factor, extending survival.Therefore, this signature is a strong predictor of overall survival.
Since miRNAs play a role in the molecular regulation of transcription [14], changes in these may impact prognosis.It has been widely recognized by scholars that miRNAs with high node degrees have shown an important inhibitory effect on LUAD and miR-195 can suppress the course of lung adenocarcinoma by regulating CD4 + T cell activation [16].In LUAD, oncogenic networks and oncogenes guided by miR-143 were identi ed and may help with tumor suppressor effects, establish new prognostic indicators, and nd therapeutic targets [17].The propagation, migration, and invasion of LUAD cells are obstructed by overexpression of the miR-144 family target, EZH2 [18].These studies all indicate that miRNAs in the ceRNA network may act as suppressors in LUAD.Meanwhile, these data provide strong clues that mRNAs and lncRNAs in the network are potential candidates for LUAD prognosis.
We relied on the intrinsic correlation between epigenetic characteristics and transcriptional regulation to identify an RNA-based signature as a reliable prognostic tool.We have also identi ed previous research that supports their importance in cancer.High expression of HOXA10 can promote tumor development in gastric cancer, hepatocellular carcinoma, and ovarian cancer [19][20][21].More importantly, downregulated LINC00483 suppresses tumor cell invasion, migration, and epithelial to mesenchymal transition.It can also bind to miR-144 and encourage the radiosensitivity of LUAD by inhibiting the expression of HOXA10 [22], which promotes LUAD progression directly by enhancing Wnt/β-catenin signaling [23].Studies suggest that we can improve the clinical radiotherapy effect on LUAD by inhibiting the transcriptional regulation of HOXA10, thus prolonging survival.The other mRNA in the signature is CBX2, which is overexpressed in cancers, including breast cancer and hepatocellular carcinoma, and is signi cantly related to poor prognosis [24][25], indicating that its activity provides an advantages for the growth and development of cancer [26].In addition, targeted therapy studies in non-small cell lung cancer clearly show that SMARCE1 inhibits EGFR expression, in part by modulating the level of the polycomb repressive complex component CBX2.Speci cally, SMARCE1 interacts directly with SWI/SNF and the EGFR oncogenic signal, as an important regulator of the drug response to MET and ALK inhibitors in non-small cell lung cancer cells [27].We speculate that CBX2 may have strong performance by regulating its own expression CBX2 and affecting target therapies.Moreover, survival analysis in the current study revealed that HOXA10 and CBX2 were risk factors for reduced survival and this was veri ed in the above studies.
Furthermore, lncRNAs act as ceRNA or miRNA sponges, representing an extensive form of gene expression regulation at the post-transcriptional level [28].Out of four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132) in the signature, C14orf132 and LINC00330 were found to be related to the prognosis of colorectal cancer and bladder cancer, respectively [29][30].In this study, high expression of C14orf132 could prolong the survival of LUAD patients, but its speci c mechanism of action needs to be explored.Via targeting of the miR-302c-5p/FOXA1 axis, LINC00460 can facilitate tumor growth in LUAD [31].Similarly, LINC00460 has also been con rmed to promote the progression of multiple cancers through different target genes [32][33][34].The last lncRNA, DGCR5, is slightly different because when it is highly expressed it exerts anticancer activity in various cancers, including gastric cancer, hepatocellular carcinoma, cervical cancer, and bladder cancer, suggesting a good prognosis [35][36][37][38].Interestingly, DGCR5 promotes LUAD progression by inhibiting hsa-mir-22-3p [39], which means higher expression DGCR5 implies a worse prognosis for LUAD patients and this nding was consistent with our current survival analysis results.DGCR5 may be considered a unique prognostic indicator for LUAD.However, the underlying function remains unclear and is worth investigating.
By GO and KEGG functional enrichment analysis, the biological functions of the hub DEGs in the network were identi ed.Downregulated DEGs were mainly involved in the cell cycle, while upregulated DEGs were linked with protein digestion and absorption.A previous study showed that cell cycle pathway disruption can cause cell cycle arrest, which is related to the prognosis of human cancers [40].Moreover, the GO terms were related to protein binding, the nucleoplasm, and DNA replication.The rst enrichment function, protein binding, is important during tumor metastasis when tumor cells interact with the microenvironment through binding of cell surface receptors to protein ligands [41].The transcriptionally active genes, especially those involved in developmental regulation and the cell cycle, interact in the nucleus with the nuclear porin [42].It is also known that DNA replication disorders cause genomic instability and confer genetic diversity during tumorigenesis [43], explaining how the above pathways may identify effective treatment strategies for LUAD.
For further veri cation of the signi cance of this study's ndings, we compared our risk score results with clinical risk factors (age, pathological stage, and TNM stage), and determined that our risk score showed a superior effect on survival, although we cannot exclude the possibility that clinical risk factors affect patient prognosis.Different from previous studies, we constructed a ceRNA network centered on prognosis-related miRNAs and used univariate and multivariate analysis to select an effective mRNA-lncRNA signature from the network.We then used clinical risk factors to validate the independent predictive signi cance of the signature.We believed that the use of the various statistical tests may make the results more reliable.However, this study is not without aws.First, this study was based on an online database, which has limitations.Second, this study is somewhat simple, so additional in-depth functional analysis of the four lncRNAs in the signature is necessary.Even so, our research may provide guidance for future studies, which may help in the selection of clinical treatment targets.

Conclusion
In conclusion, from a ceRNA network, we constructed a signature consisting of two mRNAs (HOXA10 and CBX2) and four lncRNAs (LINC00460, LINC00330, DGCR5, and C14orf132), which could be regarded as a prognostic biomarker in LUAD and may provide options for clinical treatment.

Table 2
Signi cant genes in univariate and multivariate Cox regression analysis.