A Three-Gene Prognostic Signature Based on Circular RNA-Associated Competing Endogenous RNA Network for Patients with Lung Adenocarcinoma

Background: Evidence is increasingly indicating that circular RNAs (circRNAs) are closely involved in tumorigenesis and cancer progression. However, functions of circRNAs in lung adenocarcinoma (LUAD) are still unknown. It is necessary to investigate the regulatory mechanism of circRNAs based on competing endogenous RNA (ceRNA) network in LUAD procession and further construct a prognostic signature for predicting overall survival of LUAD patients. Methods: Differentially expressed circRNAs (DEcircRNAs), differentially expressed miRNAs (DEmiRNAs) and differentially expressed mRNAs (DEmRNAs) were selected to construct the ceRNA network based on TargetScan prediction tool and Pearson correlation coecient. Functional and pathway enrichment analysis were performed using DAVID database. A PPI network was constructed and then visualized by Cytoscape software. Finally, we constructed a prognostic signature for LUAD patients using LASSO method and assessed the prognostic performance in the validation cohort. Results: A total of 38 DEcircRNAs, 56 DEmiRNAs, and 960 DEmRNAs were identifed. Based on the interactions predicted by TargetScan, we constructed a circRNA-associated ceRNA network including 11 DEcircRNAs, 8 DEmiRNAs and 49 DEmRNAs. GO and KEGG pathway analysis indicated that the circRNA-associated ceRNA network might be involved in regulation of GTPase activity and endothelial cell differentiation. After removing the discrete points, a PPI network containing 12 DEmRNAs was constructed. Univariate cox regression analysis showed that three DEmRNAs were signicantly associated with overall survival. Therefore, we constructed a three-gene prognostic signature for LUAD patients using LASSO method. By applying the signature, patients in the training cohort could be categorized into high-risk or low-risk subgroup with signicant survival difference (HR: 1.62, 95% CI: 1.12-2.35, log-rank p = 0.009). The prognostic performance was conrmed in an independent GEO cohort (HR: 2.59, 95% CI: 1.32-5.10, log-rank p = 0.004). Multivariate cox regression analysis proved that the three-gene signature was an independent prognostic factor for LUAD. Conclusions: Our ndings provided a deeper understanding of the circRNA-associated ceRNA regulatory mechanism in LUAD pathogenesis and constructed a prognostic signature that could be a useful guide for personalized treatment of LUAD patients. miRNAs and target genes based on the constructed ceRNA network. For example, in circRNA-associated ceRNA network constructed in present study, hsa_circ_0001936 interacted with has-miR-142-5p and further might regulate its target genes, such as CAV2, FGD5 and S1PR1, which were involved in positive regulation of GTPase activity process. Such results indicated that hsa_circ_0001936 might play key roles in LUAD progression by regulating GTPase activity. Thus, the regulatory mechanism of these circRNAs speculated in ceRNA network requires further experimental investigation in future studies.


Introduction
Lung cancer is the leading cause of malignancy-related death worldwide. Lung adenocarcinoma (LUAD) accounts for ~ 50% of all types of lung cancer and has been on the rise year after year, especially in women and young adults. Despite recent advances in diagnostic and therapeutic approaches, the 5-year survival rate for LUAD is < 20% [1]. In addition, the underlying molecular mechanisms of LUAD still remain unclear. Therefore, identifying the key molecular mechanisms of LUAD and establishing effective prognostic models for individualized treatment are urgent.
As the boom of sequencing technology and bioinformatics methods, increasing noncoding RNAs (ncRNAs) were identi ed, one of which is circular RNA (circRNA). CircRNA, featuring stable structure and high tissue-speci c expression widely expressed, has a covalently closed loop structure in which the 3′ and 5′ ends are linked in a non-collinear way by a process termed "back-splicing" [2][3][4]. In recent years, thousands of circRNAs have been identi ed in various organisms and found being closely associated with many diseases, particularly with cancer [5][6][7]. For example, Tian et al. reported that has_circ_0003159 expression was signi cantly down-regulated in gastric cancer and associated with lymphatic node metastasis and distal metastasis [8]. Gao et al. showed that has_circ_0018289 was upregulated in cervical cancer and promotes proliferation, migration and invasion of tumor cells [9]. Such evidences strongly support that circRNAs play critical roles in tumor progression. Hence, exploring the regulatory mechanism of circRNAs in tumor is necessary and may be bene cial for tumor therapy.
There is increasing evidence that circRNA contains microRNA response elements (MREs) and inhibits the function of microRNA (miRNA) as miRNA sponges through the competing endogenous RNA (ceRNA) network [10]. The ceRNAs are implicated in many biological processes and the equilibrium of ceRNAs and miRNAs can be critical for promotion of diseases. For example, Sang et al. reported that hsa_circ_0025202 could act as a miRNA sponge for miR-182-5p and further regulate the expression and activity of FOXO3, regulating tamoxifen sensitivity and tumor progression in breast cancer [11]. Fang et al. showed that circRNA-100290 could promote colorectal cancer progression through miR-516b-induced downregulation of FZD4 expression and Wnt/β-catenin signaling [12]. In addition, Chen et al. found that has_circ_100395 regulates miR-1228/TCF21 pathway to inhibit lung cancer progression [13]. Collectively, these ndings show that dysregulation of important circRNAs in the ceRNA network disrupt the miRNAmediated circRNA/mRNA interactions and therefore contribute to cancer initiation and progression.
However, systematic analysis of circRNA-associated ceRNA network in LUAD remains insu cient and requires further exploration.
In this study, based on the public cohorts downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA), we aimed to detect the aberrant key circRNAs and explore the potential regulatory mechanism by constructing a circRNA-associated ceRNA network for LUAD patients. Furthermore, using LASSO method, we established an mRNA-based prognostic signature derived from ceRNA network and validated the prognostic performance in an independent cohort. This new approach of constructing prognosis signature based on ceRNA networks can elucidate the circRNA-mediated regulatory mechanisms in LUAD progression and serve as a guide toward more effective individualized treatment decisions for LUAD patients.

Material And Method
Data processing Two circRNA expression pro les (GSE101586 [14] and GSE101684 [15]) were obtained from the GEO (https://www.ncbi.nlm.nih.gov/geo/) database, including 5 and 4 pairs of LUAD and adjacent normal lung tissues respectively. These raw data were processed by background correction and normalization using the "affy" package of R/Bioconductor. The expression data of mRNA (513 LUAD tissues and 59 normal tissues) and miRNA (513 LUAD tissues and 46 normal tissues) were retrieved from TCGA (https://portal.gdc.cancer.gov/). The normalized count values of level 3 gene expression data derived from Illumina HiSeqV2 were extracted as gene expression measurements. Clinical information of 513 LUAD patients was obtained. Forty-seven LUAD patients were excluded because of unknown survival time, age, and tumor stage. Ultimately, 466 patients were retained in our study. An independent cohort (GSE42127 [16]) collected from GEO contained 133 LUAD patients, which was used to test the prognostic ability. The Robust Multi-array Average algorithm was used for preprocessing the raw data. All cohorts and clinical information were described in Table 1 and Supplementary Table S1. Differentially expressed circRNAs (DEcircRNAs) were identi ed by student t-test with p < 0.05 between LUAD and adjacent normal lung tissues. The common DEcircRNAs overlapped between two cohorts were selected to construct ceRNA network. The edgeR package was used to identify the differentially expressed miRNAs (DEmiRNAs) and differentially expressed mRNAs (DEmRNAs). The cutoff values were set at the FDR < 0.05 and |log2FC|>2.
Constructing the circRNA-associated ceRNA Network A circRNA-associated ceRNA network was constructed based on the interactions between DEcircRNAs, DEmiRNAs and DEmRNAs. TargetScan prediction tool was used to identify interactions between DEcircRNAs with the target miRNAs. To obtain high-quality circRNAs acting as miRNA targets and distinguish those circRNAs acting as miRNA decoys, the circRNAs that had perfect nucleotide pairing between the 2nd and 8th positions of the 5' end of miRNA sequences were selected [17]. The miRNAs within circRNA-miRNA interactions were further screened by DEmiRNAs identi ed from TCGA. The remained DEmiRNAs were further used to screen the interactions between miRNA and mRNA predicted by Targetscan. The mRNAs targeted by DEmiRNAs were intersected with the DEmRNAs identi ed from TCGA. The correlation between DEmiRNAs and DEmRNAs was further calculated by Pearson Correlation Coe cient. Only the interactions with signi cant negative correlation were retained. Finally, removing the nodes that could not form a circRNA-miRNA-mRNA axis, a circRNA-associated ceRNA network was established and further visualized by Cytoscape software (version 3.7.0; www.cytoscape.org).

Functional Enrichment
In order to explore the biological functions which the circRNA-associated ceRNA network might be involved in, we selected the DEmRNAs within ceRNA network to make the functional enrichment analysis using Database for Annotation, Visualization and Integrated Discovery (DAVID; http://www.david.abcc.ncifcrf.gov/). Gene oncology involved three categories: biological processes, molecular function and cellular components. Pathway enrichment was carried out using the Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/), and it contains information about genomes, chemical substances, biological pathways and diseases.

Construction of the PPI Network PPI information can be evaluated by an online tool, The Search Tool for the Retrieval of Interacting
Genes/Proteins (STRING) database (https://string-db.org/). This database has a comprehensive score for each PPI relationship pair that is distributed between 0 and 1; the higher the score, the more reliable the PPI relationship. In this study, we applied STRING to explore potential correlation between DEmRNAs within ceRNA network with a medium con dence criterion (con dence score ≥ 0.4). The Cytoscape software was used to visualize the PPI network.

Establishment of prognostic signature
Based on DEmRNAs within PPI network, univariate cox regression analysis was utilized to test the association between gene expression and overall survival (OS). Genes with signi cant association were selected to construct the prognostic signature by LASSO method. The model was applied to the expression matrix of candidate genes and the optimal value of the penalty parameter λ was selected to calculate the coe cient of each gene constituting prognostic signature. Using the combination of weighted genes expression values, a risk scoring model was established and the risk scores were calculated as shown in the following equation: Risk score = expression of Gene 1 * β1 + expression of Gene 2 * β2 +…expression of Gene n * βn. βi is the regression coe cient of Gene i, which represents the contribution of Gene i to the prognostic risk score. Using the median risk score as the cutoff point, patients in each cohort were divided into low-risk or high-risk group correspondingly.

Statistical analysis
T-test was used to observe the gene expression difference between metastasis samples and nonmetastasis patients. The multivariate Cox proportional-hazards regression model was used to evaluate independent association between prognostic signature and patient survival after adjusting for stage, age and gender. Hazard ratios (HRs) and 95% con dence intervals (CIs) were computed based on the Cox regression analysis. Survival curves were estimated using the Kaplan-Meier method and were compared using the log-rank test. Fisher's exact test was used to observe the differences in mortality rate and lymph node metastasis rate between different risk groups. The signi cance was de ned as a P value of < 0.05. All statistical analysis were performed using the R3.4.0.

Identi cation of differentially expressed RNAs
The owchart for this study is shown in Fig. 1. We initially performed differential expression analysis by comparing circRNA expression between LUAD and adjacent normal lung tissues in the two cohorts from GEO. With cut-off criteria of p < 0.05, a total of 38 DEcircRNAs (including 31 up-regulated circRNAs and 7 down-regulated circRNAs) were concordance between GSE101586 and GSE101684 ( Fig. 2A). The expression data of miRNAs and mRNAs download from TCGA were analyzed using the "edgeR" package in R. DEmiRNAs and 49 DEmRNAs, and subsequently visualized using Cytoscape (Fig. 3).

Functional enrichment of DEmRNAs
To further investigate the biological processes which the circRNA-associated ceRNA network might be involved in, we performed GO annotation and KEGG pathway enrichment using DAVID database. Functional enrichment results were shown in Fig. 4. For GO annotation, ten GO terms were signi cantly enriched. For example, positive regulation of GTPase activity, as the most signi cantly enriched GO term, had been reported in multiple articles related to the progression and metastasis of cancer [18][19][20].
Besides, endothelial cell differentiation and regulation of transcription from RNA polymerase II promote, were also related to cancer progression and chemoresistance validated in many researches [21][22][23][24]. KEGG pathway enrichment analysis indicated that two pathways, including pathways in cancer and focal adhesion, were signi cantly enriched. Focal adhesion, regulating cell proliferation, survival, migration, and invasion, plays an essential role in tumour invasiveness and metastasis [25,26]. Such results showed that DEmRNAs within network played key roles in multiple cancer-related pathways, and further indicated that the circRNA-associated ceRNA network might be involved in LUAD carcinogenesis and progression.

Construction of the PPI network and evaluation of relationship with clinical characteristics
To explore the interactions between the 49 DEmRNAs, we constructed a PPI network using the STRING database. After removing unconnected nodes, the PPI network comprised 12 nodes and 27 edges (Fig. 5A). Furthermore, we investigated the correlation between the gene expression and clinical variables including lymphatic node metastasis, distant metastasis and overall survival. Univariate cox regression analysis results showed that the high expression of three genes (including CEP55, KIF14 and PRR11) predicted poor prognosis in LUAD patients (Table S2). Notably, we found that CEP55 and PRR11 were signi cantly up-regulated in patients with lymphatic node metastasis (Fig. 5B, 5D). The expression of KIF14 also had an upward trend in lymphatic node metastasis patients (Fig. 5C). Although no statistical signi cance was observed, the average expression levels of all three genes increased in distant metastasis patients (Fig. 5E-5G). These results indicated that these three genes might play key roles in the progression and prognosis of LUAD.

Construction and validation of the prognostic signature
Based on the importance of three genes in the progression and prognosis of LUAD, we tried to construct a prognostic signature for LUAD patients using LASSO method. Combining the regression coe cients with gene expression values, the risk-score formula was created as follows: Risk score = (0.058*expression level of CEP55) + (0.065*expression level of KIF14) + (0.202*expression level of PRR11). We then calculated the risk score for each patient and ranked them based on increasing score, after which patients were classi ed into a high-risk (n = 233) or a low-risk (n = 233) group based on the median risk score. We observed the overall survival between two risk groups with signi cantly different survival rate (log-rank p = 0.009; Fig. 6A). Patients with high risk score had signi cantly shorter OS than patients with low risk score. Mortality rate was 30.5% (71/233) in the high-risk group, as compared to 20.6% (48/233) in the low-risk group (p = 0.021, Fisher exact test, Fig. 6B). The rate of lymphatic node metastasis in highrisk group was signi cantly higher than that in low-risk group (p = 0.005, Fisher exact test, Fig. 6C). The risk score distribution, survival status, and expression pro le of the three prognostic genes are shown in Fig. 6D. Taking into the patients' clinical features, including age, gender and stage, the multivariate Cox regression analysis showed that the three-gene signature also had statistical signi cance as an independent prognostic factor in the training cohort (HR: 1.45, 95% CI: 1.01-2.13, P = 0.050) ( Table 2). We test the prognostic performance of the three-gene signature in an independent cohort. Similar to the training cohort ndings, patients in high-risk group had a shorter survival time than patients in low-risk group (log-rank p = 0.004, Fig. 7A). The risk score distribution, survival status, and expression pro le of the three prognostic genes are shown in Fig. 7B

Discussion
LUAD is a major lung cancer that is in a locally advanced or metastatic stage at the time of diagnosis, which leaves no time for early detection or treatment [27]. The accurate diagnosis and prognosis may warrant timely treatment to potentially decrease the mortality. Therefore, it is essential to explore the molecular mechanisms of LUAD progression and identify the effective biomarkers contributing to better treatment and better overall prognosis for LUAD patients.
Growing experimental evidences have shown that circRNAs play important roles in many complicated human diseases, including malignant tumors [28][29][30]. Lately, lots of studies indicated that circRNAs could function as a tumor regulator in lung cancer [31][32][33]. As a type of high e ciency ceRNA, it could inhibit the binding of miRNAs to target genes and regulate the expression level of target genes by exerting a miRNA sequestering effect [34]. However, the expression pattern and biological function of circRNAs in LUAD remain largely elusive. In present study, we constructed a circRNA-miRNA-mRNA ceRNA network to explore the regulatory mechanism of circRNAs involved in tumor progression. An integrated circRNAassociated ceRNA network, including 11 circRNAs, 8 miRNAs and 49 mRNAs, was established. The 11 cicRNAs identi ed in the ceRNA network were hsa_circ_0002191, hsa_circ_0002727, hsa_circ_0049271, hsa_circ_0050395, hsa_circ_0001974, hsa_circ_0004006, hsa_circ_0000641, hsa_circ_0079929, hsa_circ_0007788, hsa_circ_0001936, hsa_circ_0015278. We found that two of eleven circRNAs had been reported to be related to disease progression and pathogenesis. Peng et al. reported that hsa_circ_0015278 was signi cantly down-regulated in papillary thyroid carcinoma, showing interactive potential with two cancer-related miRNAs [35]. Furthermore, several promising cancer-related genes that may be targets of the dysregulated hsa_circRNA_0015278/miR-141-3p/miR-200a-3p axis were identi ed to explore the pathogenesis of papillary thyroid carcinoma. Another circRNA, named hsa_circ_0079929, has been reported by Zou et al [36]. They found that elevated expression of hsa_circRNA_0079929 might inhibit the expression of hsa-miR-26a-3p to increase aortic smooth muscle cells phenotype or apoptosis in thoracic aortic dissection. Although other circRNAs have not been reported previously, we could detect the interactive potential with miRNAs and target genes based on the constructed ceRNA network. For example, in circRNA-associated ceRNA network constructed in present study, hsa_circ_0001936 interacted with has-miR-142-5p and further might regulate its target genes, such as CAV2, FGD5 and S1PR1, which were involved in positive regulation of GTPase activity process. Such results indicated that hsa_circ_0001936 might play key roles in LUAD progression by regulating GTPase activity. Thus, the regulatory mechanism of these circRNAs speculated in ceRNA network requires further experimental investigation in future studies.
Limited by small sample size or lack of corresponding clinical data, it is not feasible to construct diagnostic or prognostic signatures for clinical application based on circRNAs at the current stage. However, extensive transcriptome data with corresponding clinical data provide us the possibility to construct clinically available diagnostic or prognostic signatures. In present study, we identi ed the key mRNA participating in the circRNA-associated ceRNA network based on STRING database and further constructed a three-gene prognostic signature for LUAD patients. The three-gene prognostic signature is composed of centrosomal protein 55 (CEP55), kinesin family member 14 (KIF14) and proline rich 11 (PRR11). CEP55 localizes to the centrosome of interphase cells and to the midbody during cytokinesis [37]. Many studies have demonstrated that CEP55 was highly expressed in a variety of cancers and could be used as a diagnostic and prognostic marker for several cancers [38][39][40]. For LUAD, Fu et al. had reported that CEP55 was signi cantly up-regulated in LUAD patients and could be an independent prognostic factor [41]. As a mitotic kinesin, KIF14 has been reported to serve oncogenic roles through the regulation of the cell cycle, DNA replication and DNA repair biological processes in a variety of malignancies, such as colorectal cancer [42], ovarian cancer [43], gastric cancer [44], prostate cancer [45], and so on. Zhang et al. had con rmed that KIF14 was notably up-regulated in tumor tissues of LUAD and the expression levels of the KIF14 exhibited a strongly correlation with OS [46]. PRR11, located on chromosome 17q22, has been reported to be closely associated with cell cycle progression and was also demonstrated to participate in various biological processes in tumor cells, including cell invasion, migration and proliferation, by acting as an oncogene [47][48][49][50]. Ji et al. found that PRR11 was periodically expressed in a cell cycle-dependent manner. RNAi-mediated silencing of PRR11 caused S phase arrest and suppressed cellular proliferation, colony formation ability in lung cancer cells, demonstrating that PRR11 had a critical role in both cell cycle progression and tumorigenesis [51]. All three genes are associated with tumor progression and prognosis, indicating that the three-gene signature could be served as clinically available prognostic signature to provide potential novel targets and promote individualized treatment. In addition, we found that all three genes were regulated by has-miR-144-3p interacting with hsa_circ_0002191 and hsa_circ_0002727, indicating that hsa_circ_0002191 and hsa_circ_0002727 might serve as potential regulators to affect gene expression and further in uenced the progression and prognosis of LUAD patients. Thus, further experimental investigation needs to be implemented to explore the regulatory mechanism and prognostic ability of hsa_circ_0002191 and hsa_circ_0002727.

Conclusion
In summary, the present study constructed and analyzed a circRNA-associated ceRNA network via comprehensive bioinformatics analysis, which may provide meaningful evidences to future studies focused on the molecular mechanisms of LUAD. Furthermore, a three-gene signature derived from circRNA-associated ceRNA network was constructed and could be applied for clinical prognostic evaluation, which provide a useful guide for personalized treatment of LUAD patients. The owchart for this study.

Figure 2
Heat maps of the 38 differentially expressed circular RNAs (A) and volcano plots of differentially expressed mRNA (B) and differentially expressed miRNA (C). Red and green dots represent signi cantly up-regulated and down-regulated RNAs, respectively (FDR < 0.05 and |log2FC| >2.0).

Figure 3
The circRNA-associated competing endogenous network in lung adenocarcinoma. Triangles represent circRNAs, diamonds represent miRNAs, ellipses represent mRNAs, and black lines represent circRNA-miRNA-mRNA interactions.
Page 18/21     Validation of the three-gene prognostic signature in an independent cohort. Patients in GSE84437 cohort were strati ed into high-risk group and low-risk group based on median of risk score. (A) Kaplan-Meier curve of the overall survival for high-risk and low-risk scores ranking by the three-gene prognostic signature. (B) Risk score distribution and survival status for each patient. (C) Expression heat map of three genes corresponding to each sample above.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryTables.docx