Construction of the cancer stemness-related gene signature
First, to identify the cancer stemness-related genes in LUAD, we calculated the enrichment scores of the gene set of cancer stemness for 526 tumorous samples from the TCGA cohort of LUAD patients using the ssGSEA algorithm, and obtained 5253 cancer stemness-related genes (P < 0.05, R > 0; Figure 1 A).
Next, we performed a paired differential expression analysis using R package “edgeR” based on RNA-seq data of 57 tumorous samples and 57 matched normal tissues we previously selected from the TCGA cohort of LUAD patients; 3393 DEGs were obtained, including 1224 upregulated DEGs and 2169 downregulated DEGs (|logFC| > 1, FDR < 0.05; Figure 1 B). We also investigated the prognostic value of all genes using RNA-seq data and survival data for the TCGA cohort of LUAD patients, and obtained 4237 relevant genes (P < 0.05; log-rank test). Finally, we intersected 5253 cancer stemness-related genes, 1224 upregulated DEGs and 4237 prognostically relevant genes, and obtained 47 overlapping genes (Figure 1 C), which were defined as qualified cancer stemness-related genes for subsequent construction of the stemness-related gene signature.
The 47 qualified cancer stemness-related genes were analyzed as the input of the univariate Cox regression analysis for the training set. Then, 35 eligible genes in the univariate Cox regression analysis (P < 0.05) were further screened using LASSO regression analysis. Eventually, eight genes (PLEK2, GPX8, TCN1, PERP, CD79A, PAX5, FSCN1, and CNNM1) were selected to develop a stemness-related gene signature using the multivariate Cox regression model (Figure 1 D and 1 E).
Evaluation of predictive capability of the cancer stemness-related gene signature
The predictive performance of the cancer stemness-related gene signature was assessed in the training set (n = 400), the validation set (n = 100) and an external test set (GSE8894; n = 61). AUCs for 5-year overall survival were 0.681, 0.673 and 0.774 in the training, validation and test sets, respectively (Figures 2 A – 2 C), suggesting its robust predictive ability for predicting prognosis.
Next, we calculated risk score of each LUAD patient from the TCGA cohort based on RNA-seq data of the eight genes using multivariate Cox regression model, and classified all patients into the low-risk and high-risk groups based on the median of risk scores of all patients. Principal component analysis of genes consisting the cancer stemness-related signature revealed a distinct expression pattern between the low-risk and the high-risk groups in dimensionality 1 (Figure 2 D – 2 F), indicating the cancer stemness-related gene signature has a good distinguishing ability.
To demonstrate the relationship between the risk score and the expression of eight genes comprising the cancer stemness-related signature and the survival of patients with LUAD, we displayed the risk score, survival status, and heatmap for gene expression in a picture with the same abscissa. The findings showed that the high-risk scores were associated with poor survival outcomes and high expression levels of six genes (PLEK2, GPX8, TCN1, PERP, FSCN1, and CNNM1) in the training set (Figure 2 G), the validation set (Figure 2 H) and the external test set (GSE8894; Figure 2 I).
Clinical significance of the cancer stemness-related gene signature
Since the gene signature manifested a robust predictive ability, we next investigated its survival value in the training set, the validation set, and the external test set. As previously described, we calculated risk score of each LUAD patient from the TCGA cohort based on RNA-seq data of the eight genes using multivariate Cox regression model, and classified all patients into the low-risk and high-risk groups based on the median of risk scores of all patients. In line with the results generated from the previous step, the low-risk group had a better survival outcome as compared with the high-risk group in the training set, the validation set, and the external test set (P < 0.001; Figure 3 A – 3 C).
To further investigate the association between the risk score and the TNM classification of LUAD, we compared the risk score across different TNM stages based on the TCGA cohort. Consistent with the findings of the survival analysis, we found that the risk score was significantly augmented in patients with higher TNM staging (Figure 3 D – 3 F), indicating a significant impact of the cancer stemness-related gene signature on TNM staging. Moreover, we investigated the association between the risk score and the immune score in LUAD. The immune score was computed using R package “estimate” based on RNA-seq data of patients with LUAD. The results showed that risk score was negatively correlated with immune score (P = 0.0005, R = -0.1552; Figure 3 G).
To explore the biological processes associated with the cancer stemness-related gene signature, we performed functional annotation of genes that were correlated with the cancer stemness -related gene signature. We first investigated the relationship of the gene signature with all genes using RNA-seq data from the TCGA cohort of patients with LUAD. 332 genes were selected as the gene signature-related genes (P < 0.01, R > 0.3). Afterwards, these 553 genes were functional annotated using R package “clusterProfiler”. The findings showed that several biological processes related to cancer invasion and metastasis were significantly enriched, including cell adhesion molecules (CAMs), extracellular structure organization, focal adhesion (Figure 3 H – 3 I)
Identification of the hub genes related with cancer stemness
Considering the previously established eight-gene signature not only can reflect overall survival, but also was associated with TNM staging and immune score, we speculated that the eight genes consisting of the gene signature were potential critical genes in the development and progression of LUAD. To identify potential hub genes, we investigated their survival significance in two independent cohorts (TCGA and GSE8894). The results demonstrated that six genes were associated with poor survival (PLEK2, GPX8, TCN1, PERP, FSCN1, and CNNM1; log-rank test; P < 0.05; Figure 4 B – 4 D and Figure 4 F – 4 H; Figure S1), while two genes were linked to better survival (CD79A and PAX5; log-rank test; P < 0.05; Figure 4 A and Figure 4 E; Figure S1).
To evaluate the expression levels of these eight genes in LUAD, we compared their expression levels between LUAD tissue and normal lung tissue using RNA-seq data from the TCGA cohort. As expected, the results showed that the expression levels of the eight genes were significantly enhanced in tumor tissue compared with normal tissue (P < 0.01; Figure 4 I – 4 L).
Establishment of a prognostic nomogram for precision medicine
Considering the prognostic significance of the eight-gene signature, we sought to combine it with several common clinical characteristics to better predict survival of LUAD patients. We first conducted a multivariate Cox regression analysis to examine the prognostic significance of the risk score based on the gene signature, age, immune score, pathological T, pathological N, and pathological M. The results showed that the risk score, pathological T, and pathological N could be used as effective prognostic characteristics for LUAD (P < 0.01; Figure 5 A). We noticed that pathological M showed no significant prognostic value (P = 0.172), which seems to be contrary to the current knowledge that patients with pathological M have a poor survival. Thus, we further analyzed the clinical data and found that there were 229 patients with M0 while 14 patients with M1. In other words, the sample size between the two groups was extremely imbalanced, which possibly affected the accuracy of the statistical results.
Afterwards, we constructed a nomogram combining the risk score, pathological N, and pathological T (Figure 5 B), which can be used to accurately predict the patients’ survival risk for precision medicine.
Impact of the stemness-related gene signature on cancer behaviors
Considering the robust predictive ability of the stemness-related gene signature, we wondered whether this gene signature could reflect the malignant degree of lung cancer. To this end we estimated the enrichment score of proliferation, invasion and metastasis of lung adenocarcinoma using ssGSEA algorithm based on RNA-seq data of patients with lung adenocarcinoma from the TCGA cohort. Then we analyzed the relationship of the enrichment score of proliferation, invasion and metastasis with the risk scores based on the stemness-related gene signature; the findings showed that the risk scores were significantly correlated with these malignant tumor behaviors including proliferation, invasion and metastasis (P < 0.05, Figure 6 A – 6 C).
To investigate the underlying mechanisms of the effects of the stemness-related gene signature on cancer behaviors, we further explored the relationship between the risk score and hypoxia, epithelial-mesenchymal transition (EMT) and angiogenesis, which are all the potential causes of tumor metastasis. As expected, the risk scores were positively correlated with the levels of hypoxia, EMT and angiogenesis, strongly suggesting that the stemness-related gene signature could predict the malignant degree of lung cancer (P < 0.05, Figure 6 D – 6 F).