Identification of exosome-derived genes strongly associated with prognosis in HCC
We screened 5775 DEGs in the exoRbase database. Through WGCNA, we divided the DEGs in the exoRbase database into trait modules to collect genes with similar traits (Figure S1), thereby determining HCC exosome-specific modules (Fig. 1A-B). Through gene cluster analysis in Cytoscape software, we identified 168 hub genes. Finally, a total of 19 HCC-related exosome-derived hub genes were incorporated in the subsequent analysis through integrated analysis. The correlation between genes is shown in Fig. 1D. GO analysis performed on these genes indicated that the genes were mainly enriched in "cytosolic ribosome", "polysomal ribosome" and "ribosomal subunit" (Fig. 1E). Through univariate Cox regression, we found that seven genes were closely related to the OS of HCC patients (P < 0.05). Subsequent multivariate Cox regression analysis determined the optimal gene set consisting of two genes (MYL6B and THOC2) to predict the prognosis of HCC.
A prognostic model integrating the two exosome-derived genes was constructed and validated to have great predictive performance
Based on the two exosome-derived genes, we constructed a prognostic model. The prognostic index was calculated as PI = (0.0273 * expression level of MYL6B) + (0.1931 * expression level of THOC2). The optimal cutoff value determined by X-tile software was used as the threshold to divide patients into a high-risk group and a low-risk group. The optimal cutoff was determined to be 1.331 in HCC cohort in TCGA database, 3.104 in ICGC database and 1.863 in GSE14520. The HCC cohort in TCGA was adopted as a training cohort, and the HCC cohorts in ICGC and GSE14520 were used as validation cohorts to evaluate the predictive performance of the prognostic model. Patients in the high-risk group showed a worse prognosis than patients in the low-risk group in both the training cohort (P < 0.001) (Fig. 2A) and validation cohorts (P < 0.01) (Fig. 2D, 2G). Figure 2B, 2E and 2H show the expression levels of the two exosome-derived genes in patients and the corresponding risk scores. The area under the curve (AUC) at 0.5, 1, 3, and 5 years was 0.66, 0.68, 0.64, and 0.59 in the training cohort, respectively, indicating a great predictive performance of the prognostic model (Fig. 2C). The AUCs of the two validation cohorts at 0.5, 1, 3 and 4 years reached 0.72, 0.72, 0.68 and 0.63 (Fig. 2F) and 0.66, 0.61, 0.62 and 0.59 (Fig. 2I), respectively.
Subsequently, the correlation between the prognostic risk scores of the prognostic model and clinical variables was explored. Figure S2A shows the distribution of prognostic risk scores of patients stratified by traditional clinical variables and the expression profiles of the two exosome-derived genes. Patients with pathological grade G3-G4 had a higher prognostic risk index than patients with G1-G2 (Figure S2B) (P < 0.05). The prognosis of patients with vascular tumor invasion was also significantly worse than the prognosis of those without vascular tumor invasion (P < 0.05) (Figure S2C), consistent with clinically accepted diagnostic criteria. However, there was no significant difference in the prognostic risk among patients with different TNM stages (Figure S2D).
Chemotherapy response of patients with different prognostic risk scores
Chemotherapy is one of the common treatment options for HCC patients. It is a suitable treatment option for HCC patients who cannot undergo surgery to reduce tumor volume and prolong survival time. It is also commonly used in postoperative adjuvant chemotherapy to suppress tumor recurrence and progression[26]. The chemotherapy resistance of tumor cells is a key issue affecting the efficacy of anticancer drugs. We adopted the prognostic model to evaluate the therapeutic response of HCC patients to 266 traditional chemotherapeutic drugs and molecular targeted drugs on the Genomics of Drug Sensitivity in Cancer (GDSC) website. The half-maximum inhibitory concentration (IC50) was used as the criterion for tumor sensitivity to drugs. Figure 3A-L shows the difference between the sensitivity of high-risk patients and low-risk patients to drugs such as methotrexate, erlotinib and vinorelbine. High-risk patients were more resistant to chemotherapy than low-risk patients (P < 0.05). Through GSEA, we explored the signaling pathways that the exosome-derived prognostic model mainly interferes with. Figure 3M presents the Top5 positively regulated pathways, and Fig. 3N shows the Top5 negatively regulated pathways.
Characteristics of the immune microenvironment between high- and low-risk HCC patient groups
Interfering with immune checkpoints or pathways as an important form of immunotherapy and a novel hotspot in current antitumor therapy[27]. In this study, the relative proportion of infiltration and corresponding prognostic risk scores of 22 immune cell types from the TCGA HCC cohort are shown in Fig. 4A. Compared with that in the low-risk group, a higher proportion of memory B cells, M0 macrophages and follicular helper T cells (Fig. 4B-D) and a lower proportion of NK resting cells (Fig. 4E) were found in the high-risk group (P < 0.05). Figure 4F shows the correlation between the prognostic model and the immune checkpoints. In the high-risk group, the expression levels of the immune checkpoints PD1, B7H3, B7H5, CTLA4 and TIM3 were significantly higher than those in the low-risk group (P < 0.05) (Fig. 4G-K).
Establishment and evaluation of a corresponding nomogram in the TCGA HCC cohort
TCGA HCC patients with available clinical information were included in univariate and multivariate Cox regression analysis to assess the independent predictive performance of the prognostic model compared to clinical variables. The results of Cox regression determined that age (HR = 1.687, P < 0.05), TNM stage (HR = 1.947, P < 0.05) and the prognostic model (HR = 1.792, P < 0.05) could independently predict the prognosis of HCC patients (Fig. 5A). A predictive nomogram integrating these independent prognostic factors was then constructed to quantitatively predict the prognostic risk of HCC patients (Fig. 5B). The calibration curves for 1, 3 and 5 years are shown in Fig. 5C-E. The C index of the nomogram reached 0.63, indicating a better prediction consistency with the actual results compared to the use of the single independent prognostic factor age (0.57), TNM stage (0.53) and the prognostic model (0.59). The AUC of ROC curves at 1, 2, 3, and 5 years further confirmed that the predictive value of the nomogram was superior to any single independent prognostic factor (Fig. 5F-H). The results of DCA suggested that compared with the use of a single independent prognostic factor, the best clinical benefit was obtained by using the nomogram to predict the prognosis of HCC patients (Fig. 5I-K). The above results determined that the nomogram is suitable for clinically predicting the prognosis of HCC patients.
Construction and validation of a recurrence model based on the two exosome-derived genes
A recurrence model integrating the same two exosome-derived genes was constructed. The relapse index was defined as RI = (0.0236 * MYL6B expression level) + (0.0899 * THOC2 expression level). The TCGA HCC set with recurrence status was divided into a high-risk group and a low-risk group using the optimal cutoff value which was determined to be 1.272 using X-tile software. The recurrence probability of patients in the high-risk group was significantly higher than that of patients in the low-risk group (HR = 2.44, P < 0.001) (Fig. 6A). Figure 6B shows the expression levels of the two exosome-derived genes in patients and the predicted recurrence risk. ROC curves indicated great specificity and sensitivity of the recurrence model (Fig. 6C). The HCC cohort with recurrence status in GSE14520 was adopted to validate the predictive performance of the recurrence model as a validation cohort. The optimal cutoff value was 1.403. A significantly higher probability of recurrence was found in the high-risk group than in the low-risk group (HR = 1.54, P < 0.05) (Fig. 6D). The expression profiles of the two exosome-derived genes in HCC patients and prediction of recurrence risk are shown in Fig. 6E. The AUCs of ROC at 0.5, 1, 3, and 5 years were 0.68, 0.64, 0.58 and 0.57, respectively (Fig. 6F).
The characteristics of clinical variables and expression of the two exosome-derived genes of TCGA HCC patients and corresponding recurrence risk scores predicted by the recurrence model are presented in Figure S3A. HCC patients with pathological grades in G3-G4 were more likely to experience recurrence than HCC patients in G1-G2 (P < 0.05) (Figure S3B). The probability of recurrence of HCC patients with vascular tumor invasion was also significantly higher than that of HCC patients without vascular tumor invasion. (P < 0.05) (Figure S3C), while Figure S3D shows that there were no significant differences in recurrence probability between patients with TNM stage III-IV and stage I-II.
A nomogram integrating independent recurrence factors was constructed and validated
Performing independence analysis through Cox regression determined that age (HR = 2.253, P < 0.05), TNM stage (HR = 2.433, P < 0.05) and the recurrence model (HR = 4.907, P < 0.05) were independent predictive factors of recurrence in HCC patients (Fig. 7A). Based on these independent predictors, a corresponding nomogram was generated (Fig. 7B). The calibration curves at 1 year, 3 years, and 5 years indicated that the predicted results of the nomogram were consistent with the actual results (Fig. 7C-E). The C index of nomogram (0.70) > age (0.61) > recurrence model (0.60) > TNM stage (0.56) indicated that the clinical value of the nomogram to predict the recurrence probability of HCC patients was superior to that of a single independent predictive factor. The results of the ROC curves were also consistent with the above conclusions (Fig. 7F-H). DCA confirmed that the clinical benefit from the nomogram for recurrence prediction was better than that of a single independent predictor (Fig. 7I-K).
Diagnostic models were built and demonstrated to have superior predictive power
A diagnostic model based on the two exosome-derived genes was constructed to accurately distinguish between HCC patients and normal subjects during early diagnosis. The diagnostic equation determined by the stepwise logistic regression method was defined as follows: logit (P = HCC) = -23.0370 + (2.2566 × expression level of MYL6B) + (3.8247 × expression level of THOC2). Applying the diagnostic model to the paired HCC samples from TCGA (consisting of 50 normal samples and 50 HCC samples) reached a sensitivity of 76.00% and a specificity of 82.00% (Fig. 8A). Figure 8B shows the consistency between the disease outcome predicted by the diagnostic model based on the expression levels of MYL6B and THOC2 and the actual result. The AUC of the diagnostic model reached 0.8792, confirming the superior performance of the diagnostic model in identifying HCC (Fig. 8C). The HCC cohort from ICGC (consisting of 202 normal samples and 243 HCC samples) was adopted to validate the diagnostic performance. The sensitivity of the diagnostic results was 76.95%, the specificity was 88% (Fig. 8D), and the AUC was 0.9077 (Fig. 8F), indicating that the diagnostic model can accurately distinguish HCC patients from normal subjects. The consistency between the predicted results based on the expression of MYL6B and THOC2 and the actual results is presented in Fig. 8E.
The early diagnosis of HCC is mainly based on imaging and pathological examination, but small nodules < 2 cm often lead to missed diagnoses because of difficulties in accurate characterization[28]. We also constructed a diagnostic model based on MYL6B and THOC2 to distinguish HCC and dysplastic nodules. The diagnostic formula was determined as follows: logit (P-HCC) = -45.5786 + (2.6004 x MYL6B expression level) + (4.0578 x THOC2 expression level). We adopted patient information from GSE6764 as the training set and GSE89377 as the validation set to evaluate the predictive value of the diagnostic model. The diagnostic model in the training set reached 91.43% sensitivity and 76.47% specificity (Fig. 8G), and the AUC was 0.9328 (Fig. 8I). The sensitivity and specificity in the validation set were 82.50% and 50.00% (Fig. 8J), respectively, and the AUC was 0.7864 (Fig. 8L), indicating that the diagnostic model had great predictive performance. The gene expression profile of HCC patients and the consistency between the predicted disease state and the actual disease state are shown in Fig. 8H and 8K.
The expression characteristics of exosome-derived genes were validated to accurately predict the OS of patients with HCC
To evaluate whether MYL6B and THOC2 were suitable for constructing prognostic, recurrence and diagnostic models, we examined the expression profiles of these genes. MYL6B and THOC2 were obviously more highly expressed in the TCGA HCC samples than in the paired normal samples (Fig. 9A-B). The expression levels of MYL6B and THOC2 in HCC patients from GSE6764 were significantly higher than those in dysplastic nodule samples (Fig. 9C-D). The K-M curves indicated that patients with high gene expression had a worse prognosis than patients with low gene expression (Fig. 9E-F), and ROC curve analysis confirmed the specificity of MYL6B and THOC2 in predicting the prognosis of patients (Fig. 10G-H). We further explored the association between gene expression and patient prognosis and recurrence in the Kaplan-Meier Plotter database. The results confirmed that high gene expression was closely related to shorter overall survival time and progression-free survival time (Fig. 9I-L). Figure 9M-N shows the correlation between the expression profiles of MYL6B and THOC2 and immune cell infiltration. The above results determined the effective predictive value of the two exosome-derived genes MYL6B and THOC2.