Identification of differentially expressed RNAs
We initially performed differential expression analysis by comparing circRNA expression between LUAD and adjacent normal lung tissues in the two cohorts from GEO. With cut-off criteria of p < 0.05, a total of 38 DEcircRNAs (including 31 up-regulated circRNAs and 7 down-regulated circRNAs) were concordance between GSE101586 and GSE101684 (Figure 1A). The expression data of miRNAs and mRNAs download from TCGA were analyzed using the “edgeR” package in R. With cut-off criteria of FDR < 0.05 and |log2FC| >2.0, 56 differentially expressed miRNAs (including 37 up-regulated miRNAs and 19 down-regulated miRNAs) and 960 differentially expressed mRNAs (including 653 up-regulated mRNAs and 307 down-regulated mRNAs) were identified respectively (Figure 1B, 1C).
Construction of the circRNA-associated ceRNA Network
Using the TargetScan prediction tool to identify DEcircRNAs with the target miRNAs, we identified 494 circRNA-miRNA pairs based on 38 DEcircRNA and 221 miRNAs. After screened by the DEmiRNAs identified from TCGA-LUAD cohort, only 39 circRNA-miRNA pairs remained, including 18 DEcircRNAs and 18 DEmiRNAs. We further identified mRNAs targeted by these DEmiRNAs from TargetScan database and selected those that were overlapped with the DEmRNAs identified from TCGA-LUAD cohort. A total of 305 miRNA-mRNA pairs were identified, including 11 DEmiRNAs and 191 DEmRNAs. After calculating the expression correlation between miRNA and mRNA by Pearson correlation coefficient, only 53 miRNA-mRNA pairs with significant correlation were retained. Finally, combining the circRNA-miRNA pairs with miRNA-mRNA pairs, a circRNA-miRNA-mRNA network was constructed, including 11 DEcircRNAs, 8 DEmiRNAs and 49 DEmRNAs, and subsequently visualized by Cytoscape (Figure 2).
To further investigate the biological processes which the circRNA-associated ceRNA network might be involved in, we performed GO annotation and KEGG pathway enrichment using DAVID database. Functional enrichment results were shown in Supplementary Figure S1. For GO annotation, ten GO terms were significantly enriched. Most terms, such as positive regulation of GTPase activity, endothelial cell differentiation and regulation of transcription from RNA polymerase II promote, had been reported in multiple articles related to the progression and metastasis of cancer. KEGG pathway enrichment analysis indicated that two pathways, including pathways in cancer and focal adhesion, were significantly enriched. Such results showed that DEmRNAs within network played key roles in multiple cancer-related functional pathways, and further indicated that the circRNA-associated ceRNA network might be involved in LUAD carcinogenesis and progression.
Construction of the PPI network and evaluation of relationship with clinical characteristics
To explore the interactions between the 49 DEmRNAs, we constructed a PPI network using the STRING database. After removing unconnected nodes, the PPI network comprised 12 nodes and 27 edges (Supplementary Figure S2). Furthermore, we investigated the correlation between the gene expression and clinical variables including lymphatic node metastasis, distant metastasis and overall survival. Univariate cox regression analysis results showed that the high expression of three genes (including CEP55, KIF14 and PRR11) predicted poor prognosis in LUAD patients (Supplementary Table S1). Notably, we analyzed the expression difference between different lymphatic node metastasis status and distant metastasis status respectively (Supplementary Figure S2). The results showed that CEP55 and PRR11 were significantly up-regulated in patients with lymphatic node metastasis. The expression of KIF14 also had an upward trend in lymphatic node metastasis patients. Although no statistical significance was observed, the average expression levels of all three genes increased in distant metastasis patients. These results indicated that these three genes might play key roles in the progression and prognosis of LUAD.
Construction and validation of the prognostic signature
Based on the importance of three genes in the progression and prognosis of LUAD, we tried to construct a prognostic signature for LUAD patients using LASSO method. Combining the regression coefficients with gene expression values, the risk-score formula was created as follows: Risk score = (0.058*expression level of CEP55) + (0.065*expression level of KIF14) + (0.202*expression level of PRR11). We then calculated the risk score for each patient and ranked them based on increasing score, after which patients were classified into a high-risk (n = 233) or a low-risk (n = 233) group based on the median risk score. We observed the overall survival between two risk groups with significantly different survival rate (log-rank p = 0.009; Figure 3A). Patients with high risk score had significantly shorter OS than patients with low risk score. Mortality rate was 30.5% (71/233) in the high-risk group, as compared to 20.6% (48/233) in the low-risk group (p = 0.021, Fisher exact test, Figure 3B). The rate of lymphatic node metastasis in high-risk group was significantly higher than that in low-risk group (p = 0.005, Fisher exact test, Figure 3C). The risk score distribution, survival status, and expression profile of the three prognostic genes are shown in Figure 3E. Taking into the patients’ clinical features, including age, gender and stage, the multivariate Cox regression analysis showed that the three-gene signature also had statistical significance as an independent prognostic factor in the training cohort (HR: 1.58, 95% CI: 1.07-2.3, P = 0.021, Figure 3D).
We test the prognostic performance of the three-gene signature in an independent cohort. Similar to the training cohort findings, patients in high-risk group had a shorter survival time than patients in low-risk group (log-rank p = 0.004, Figure 4A). The risk score distribution, survival status, and expression profile of the three prognostic genes are shown in Figure 4C. Multivariate Cox regression analysis showed that, after adjusting for age, gender and stage, the prognostic signature remained significantly associated with patient OS (HR: 2.3, 95% CI: 1.17-4.6, P = 0.016, Figure 4B).
Development and validation of the nomogram based on three-gene signature
Multivariate Cox proportional hazards regression analysis indicated that two clinical variables, including age and stage, were independent risk factors for OS. Combing clinical risk factors and three-gene signature, we constructed a nomogram that is able to predict 3- and 5-years OS based on the multivariate Cox proportional hazards regression analysis (Figure 5A). The primary and external verification C-indexes were 0.717 and 0.716 in terms of predicting OS, respectively. The calibration plots for the probability of survival at 3- and 5-year showed good agreement between the predicted OS by nomogram and actual OS of LUAD patients in TCGA-LUAD and GSE42127, respectively (Figure 5B and 5C).