Based on Exosome-Derived Genes for Constructing Diagnostic, Prognostic and Recurrence Models and Predicting Therapeutic Response for Hepatocellular Carcinoma


 Background: In this study, we performed a comprehensive analysis to identify the optimal gene set derived from exosomes in hepatocellular carcinoma (HCC) with great prognostic value to construct prognostic, recurrence and diagnostic models and predicted therapeutic response for hepatocellular carcinoma.Methods: Differential expression analysis and Weighted Gene Co-Expression Network Analysis (WGCNA) of RNA-sequencing were used to identify differentially expressed exosome-derived genes related to HCC. Univariate and multivariate Cox regression analyses were performed to determine exosome-derived genes associated with prognosis of HCC to construct a prognostic model, and recurrence and diagnostic models were built based on the same genes. Results: MYL6B and THOC2 were determined to be closely related to prognosis in HCC. The prognostic and recurrence models built based on these two genes were confirmed to be independent predictive factors with superior predictive performance. Patients with high prognostic risk had poorer prognosis than patients with low prognostic risk in all HCC sets, namely, the TCGA cohort (HR=2.5, P<0.001), ICGC cohort (HR=3.15, P<0.001) and GSE14520 cohort (HR=1.85, P=0.004). A higher recurrence probability was found in patients with high recurrence risk than in patients with low recurrence risk in the HCC sets of the TCGA cohort (HR=2.44, P<0.001) and GSE14520 cohort (HR=1.54, P=0.025). High prognostic risk patients had higher expression of immune checkpoints, such as PD1, B7H3, B7H5, CTLA4 and TIM3 (P<0.05). Patients with higher prognostic risk showed higher resistance to some chemotherapy and targeted drugs, such as methotrexate, erlotinib and vinorelbine (P<0.05). The diagnostic models based on the same two genes were determined to accurately distinguish HCC from normal subjects and dysplastic nodules.Conclusions: Prognostic, recurrence, and diagnostic models based on the two exosome-derived genes (MYL6B and THOC2) were proven superior predictive performance. Our findings lay the foundation for identifying molecular markers to increase the early detection rate of HCC, improve disease outcomes, and determine more effective individualized treatment options for patients.


Introduction
As the most common primary malignant tumor of the liver, hepatocellular carcinoma (HCC) is listed as the sixth most commonly diagnosed cancer worldwide and the fourth leading cause of cancer death with its high morbidity and mortality rates (approximately 841,000 new cases and 9782,000 deaths annually) [1]. HCC is a highly malignant tumor, its onset is insidious, and it progresses rapidly. Patients are often in advanced stages (BCLC stage B and stage C) at the time of diagnosis and therefore cannot undergo surgery or transplantation [2; 3]. HCC is characterized by molecular heterogeneity, and its molecular pathogenesis is very complex [4]. Although many new treatments for HCC have been proposed in recent years, less progress has been made in signi cantly improving the disease outcome of HCC [5; 6]. Therefore, there is an urgent need for a better understanding of the pathogenesis and molecular mechanism of HCC and identi cation of novel key HCC marker molecules to help determine more effective treatment strategies for patients.
Exosomes are a type of vesicle (30-100 nm) derived from endosomes and are shed by cells in processes of cellular housekeeping and communication [7]. Exosomes are important mediators of cell interactions, and they transmit information between cells through their membrane-enclosed structure and cytoplasmic content [8]. Several studies have shown that the release rate and content of exosomes derived from cancer cells are signi cantly different from those originating from normal cells, and multiple types of cancer cell lines produce more exosomes than normal cells [9]. During the progression of HCC, the secretion of cancer cell-derived exosomes has been shown to promote tumorigenesis, enhance angiogenesis and epithelial to mesenchymal transition, induce drug resistance and interfere with antitumor immune mechanisms, thereby promoting tumor progression and metastasis [10; 11; 12].
Exosomes play an important role in the carcinogenic process of HCC, leading to poor disease outcomes in HCC, but most of the current studies have focused on exploring the role of a single factor of exosomes in the progression of HCC [13; 14]. No studies have conducted a combination analysis of exosome-derived molecules and assessed their association with the development and prognosis of HCC.
HCC is a type of immunogenic tumor that develops in a continuous immunomodulatory environment caused by chronic viral in ammations such as hepatitis B (HBV) and hepatitis C (HCV) infections [15], and once HCC is recognized by the immune system, tumor cells are vulnerable to attacks by immune cells [16]. Immunotherapy has received widespread attention in the antitumor eld as a novel treatment option for HCC. However, tumor cells can enhance immunosuppression through immune checkpoint expression changes and immune cell aggregation, which can lead to immune escape and promote HCC progression [17; 18; 19]. The immune activity of exosomes can affect the immunoregulatory mechanism of tumor cells, including the regulation of antigen presentation, immune activation, immune suppression, immune surveillance and immune escape [20; 21]. Further exploration of the marker molecules of exosomes and their relationship with the immune microenvironment of HCC may provide important insights for immune recognition and therapeutic intervention.
The emergence of high-throughput sequencing technology provides great convenience for us to analyze and identify marker molecules that have an important impact on the immune microenvironment of HCC and disease prognosis [22]. In this study, we comprehensively analyzed the genomic sequence of HCC cells and identi ed genes derived from exosomes that are closely related to the prognosis of HCC to construct diagnostic, prognostic and recurrence models of HCC. We evaluated the clinical relevance of the models and the characteristics of patient immune in ltration and used the prognostic model to predict the response of HCC patients to immune checkpoint inhibitor therapy to help determine effective individualized treatment options and improve disease outcomes for HCC patients.

Materials And Methods
Identi cation of differentially expressed genes (DEGs) derived from exosomes between HCC and adjacent nontumor tissues The mRNA sequences of HCC patients with available clinical information were obtained from the TCGA database (370 HCC tissue samples and 50 normal tissue samples) and the ICGC database (243 HCC tissue samples and 202 normal tissue samples). Using the limma R package, DEGs with a threshold of absolute log2-fold change (FC) > 1 and an adjusted P value < 0.05 were chosen. WGCNA was used to identify the HCC exosome-speci c module in the exoRbase database. The genes in the HCC exosomespeci c module, the DEGs in the TCGA database and the hub genes were combined to determine key genes derived from HCC-related exosomes.
Construction and validation of a predictive prognostic model based on the identi ed key exosome-derived genes Univariate Cox regression analysis was performed to identify exosome-derived genes related to the prognosis of patients with HCC. Genes with P < 0.05 were considered statistically signi cant. The identi ed genes and other genes that the univariate Cox regression analysis considered not to be statistically signi cant but have been clinically identi ed to have statistical signi cance in the prognosis of HCC patients were incorporated in subsequent multivariate Cox regression analysis. Through forward selection and backward elimination selection, the optimal gene set was obtained to construct a prognostic model. The prognostic index (PI) was identi ed as follows: PI= (β* expression level of MYL6B) + (β* expression level of THOC2), and HCC patients with known survival status were divided into highand low-risk groups based on the optimal cutoff value determined by X-tile software. Kaplan-Meier (K-M) curve and time-dependent receptor working characteristic (ROC) curve analyses were performed to evaluate the predictive performance of the prognostic model.

Establishment and evaluation of a predictive nomogram
Univariate and multivariate Cox regression analyses were performed to assess whether the prognostic model can independently predict the prognosis of HCC patients compared to traditional clinical variables (such as age, AFP, weight, vascular tumor cell, sex, pathological grade and TNM stage). The hazard ratio (HR) and 95% con dence interval (CI) for each variable were calculated. variables with P < 0.05 were considered statistically signi cant independent prognostic factors. Rms R software was used to integrate these independent prognostic factors to establish a predictive nomogram and corresponding calibration maps. Calibration and identi cation were used for the validation of calibration maps. The closer the calibration curve was to the reference line (at 45°), which represents the best prediction value, the better the prediction performance of the nomogram. The consistency index (C Index), calculated using a bootstrap method with 1000 resamples, was also adopted to validate the consistency between the predicted results of the nomogram and actual results. In addition, ROC curve analysis was performed to evaluate the predictive value of the nomogram compared to a single independent prognostic factor, and decision curve analysis (DCA) was adapted to measure the difference in clinical bene t that could be achieved using a nomogram for prognosis prediction compared to using a single prognostic factor [23].
Internal and external validation of the expression pro les of exosome-derived genes Using Wilcoxon signed-rank tests in Prism 7.0 (GraphPad, San Diego, CA, USA), we compared the expression pro le of a single gene in HCC tissues and normal tissues in the TCGA HCC cohort and con rmed gene expression differences in HCC tissues and nodule samples in the GSE6764 HCC cohort. P < 0.05 was considered statistically signi cant. The K-M curve was used to evaluate the in uence of the expression of a single gene on the prognosis, and ROC curve analysis was performed to validate the predicted performance of the gene. For external validation, we further con rmed the effect of single gene expression on prognosis and recurrence in the Kaplan-Meier Plotter database [24]. The effect of single gene expression on immune cell in ltration in tumors was evaluated.
Estimating the fractions of different types of immune cells CIBERSORT analysis was adopted to quantitatively analyze the absolute abundance of 22 immune cell types in heterogeneous tissue to evaluate the in ltration of immune cells [25]. The R package "CIBERSORT" was used to convert the mRNA data of nontumor cells into the tumor microenvironment in ltration level.

Results
Identi cation of exosome-derived genes strongly associated with prognosis in HCC We screened 5775 DEGs in the exoRbase database. Through WGCNA, we divided the DEGs in the exoRbase database into trait modules to collect genes with similar traits ( Figure S1), thereby determining HCC exosome-speci c modules ( Fig. 1A-B). Through gene cluster analysis in Cytoscape software, we identi ed 168 hub genes. Finally, a total of 19 HCC-related exosome-derived hub genes were incorporated in the subsequent analysis through integrated analysis. The correlation between genes is shown in Fig. 1D. GO analysis performed on these genes indicated that the genes were mainly enriched in "cytosolic ribosome", "polysomal ribosome" and "ribosomal subunit" (Fig. 1E). Through univariate Cox regression, we found that seven genes were closely related to the OS of HCC patients (P < 0.05). Subsequent multivariate Cox regression analysis determined the optimal gene set consisting of two genes (MYL6B and THOC2) to predict the prognosis of HCC.
A prognostic model integrating the two exosome-derived genes was constructed and validated to have great predictive performance Based on the two exosome-derived genes, we constructed a prognostic model. The prognostic index was calculated as PI = (0.0273 * expression level of MYL6B) + (0.1931 * expression level of THOC2). The optimal cutoff value determined by X-tile software was used as the threshold to divide patients into a high-risk group and a low-risk group. The optimal cutoff was determined to be 1.331 in HCC cohort in TCGA database, 3.104 in ICGC database and 1.863 in GSE14520. The HCC cohort in TCGA was adopted as a training cohort, and the HCC cohorts in ICGC and GSE14520 were used as validation cohorts to evaluate the predictive performance of the prognostic model. Patients in the high-risk group showed a worse prognosis than patients in the low-risk group in both the training cohort (P < 0.001) ( Fig. 2A) and validation cohorts (P < 0.01) (Fig. 2D, 2G). Figure  Subsequently, the correlation between the prognostic risk scores of the prognostic model and clinical variables was explored. Figure S2A shows the distribution of prognostic risk scores of patients strati ed by traditional clinical variables and the expression pro les of the two exosome-derived genes. Patients with pathological grade G3-G4 had a higher prognostic risk index than patients with G1-G2 ( Figure S2B) (P < 0.05). The prognosis of patients with vascular tumor invasion was also signi cantly worse than the prognosis of those without vascular tumor invasion (P < 0.05) ( Figure S2C), consistent with clinically accepted diagnostic criteria. However, there was no signi cant difference in the prognostic risk among patients with different TNM stages ( Figure S2D).

Chemotherapy response of patients with different prognostic risk scores
Chemotherapy is one of the common treatment options for HCC patients. It is a suitable treatment option for HCC patients who cannot undergo surgery to reduce tumor volume and prolong survival time. It is also commonly used in postoperative adjuvant chemotherapy to suppress tumor recurrence and progression [26]. The chemotherapy resistance of tumor cells is a key issue affecting the e cacy of anticancer drugs. We adopted the prognostic model to evaluate the therapeutic response of HCC patients to 266 traditional chemotherapeutic drugs and molecular targeted drugs on the Genomics of Drug Sensitivity in Cancer (GDSC) website. The half-maximum inhibitory concentration (IC50) was used as the criterion for tumor sensitivity to drugs. Figure 3A-L shows the difference between the sensitivity of highrisk patients and low-risk patients to drugs such as methotrexate, erlotinib and vinorelbine. High-risk patients were more resistant to chemotherapy than low-risk patients (P < 0.05). Through GSEA, we explored the signaling pathways that the exosome-derived prognostic model mainly interferes with. Figure 3M presents the Top5 positively regulated pathways, and Fig. 3N shows the Top5 negatively regulated pathways.
Characteristics of the immune microenvironment between high-and low-risk HCC patient groups Interfering with immune checkpoints or pathways as an important form of immunotherapy and a novel hotspot in current antitumor therapy [27]. In this study, the relative proportion of in ltration and corresponding prognostic risk scores of 22 immune cell types from the TCGA HCC cohort are shown in Fig. 4A. Compared with that in the low-risk group, a higher proportion of memory B cells, M0 macrophages and follicular helper T cells (Fig. 4B-D) and a lower proportion of NK resting cells (Fig. 4E) were found in the high-risk group (P < 0.05). Figure 4F shows the correlation between the prognostic model and the immune checkpoints. In the high-risk group, the expression levels of the immune checkpoints PD1, B7H3, B7H5, CTLA4 and TIM3 were signi cantly higher than those in the low-risk group (P < 0.05) (Fig. 4G-K).
Establishment and evaluation of a corresponding nomogram in the TCGA HCC cohort TCGA HCC patients with available clinical information were included in univariate and multivariate Cox regression analysis to assess the independent predictive performance of the prognostic model compared to clinical variables. The results of Cox regression determined that age (HR = 1.687, P < 0.05), TNM stage (HR = 1.947, P < 0.05) and the prognostic model (HR = 1.792, P < 0.05) could independently predict the prognosis of HCC patients (Fig. 5A). A predictive nomogram integrating these independent prognostic factors was then constructed to quantitatively predict the prognostic risk of HCC patients (Fig. 5B). The calibration curves for 1, 3 and 5 years are shown in Fig. 5C-E. The C index of the nomogram reached 0.63, indicating a better prediction consistency with the actual results compared to the use of the single independent prognostic factor age (0.57), TNM stage (0.53) and the prognostic model (0.59). The AUC of ROC curves at 1, 2, 3, and 5 years further con rmed that the predictive value of the nomogram was superior to any single independent prognostic factor ( Fig. 5F-H). The results of DCA suggested that compared with the use of a single independent prognostic factor, the best clinical bene t was obtained by using the nomogram to predict the prognosis of HCC patients (Fig. 5I-K). The above results determined that the nomogram is suitable for clinically predicting the prognosis of HCC patients.
Construction and validation of a recurrence model based on the two exosome-derived genes A recurrence model integrating the same two exosome-derived genes was constructed. The relapse index was de ned as RI = (0.0236 * MYL6B expression level) + (0.0899 * THOC2 expression level). The TCGA HCC set with recurrence status was divided into a high-risk group and a low-risk group using the optimal cutoff value which was determined to be 1.272 using X-tile software. The recurrence probability of patients in the high-risk group was signi cantly higher than that of patients in the low-risk group (HR = 2.44, P < 0.001) (Fig. 6A). Figure 6B shows the expression levels of the two exosome-derived genes in patients and the predicted recurrence risk. ROC curves indicated great speci city and sensitivity of the recurrence model (Fig. 6C). The HCC cohort with recurrence status in GSE14520 was adopted to validate the predictive performance of the recurrence model as a validation cohort. The optimal cutoff value was 1.403. A signi cantly higher probability of recurrence was found in the high-risk group than in the low-risk group (HR = 1.54, P < 0.05) (Fig. 6D). The expression pro les of the two exosome-derived genes in HCC patients and prediction of recurrence risk are shown in Fig. 6E. The AUCs of ROC at 0.5, 1, 3, and 5 years were 0.68, 0.64, 0.58 and 0.57, respectively (Fig. 6F).
The characteristics of clinical variables and expression of the two exosome-derived genes of TCGA HCC patients and corresponding recurrence risk scores predicted by the recurrence model are presented in Figure S3A. HCC patients with pathological grades in G3-G4 were more likely to experience recurrence than HCC patients in G1-G2 (P < 0.05) ( Figure S3B). The probability of recurrence of HCC patients with vascular tumor invasion was also signi cantly higher than that of HCC patients without vascular tumor invasion. (P < 0.05) ( Figure S3C), while Figure S3D shows that there were no signi cant differences in recurrence probability between patients with TNM stage III-IV and stage I-II.

A nomogram integrating independent recurrence factors was constructed and validated
Performing independence analysis through Cox regression determined that age (HR = 2.253, P < 0.05), TNM stage (HR = 2.433, P < 0.05) and the recurrence model (HR = 4.907, P < 0.05) were independent predictive factors of recurrence in HCC patients (Fig. 7A). Based on these independent predictors, a corresponding nomogram was generated (Fig. 7B). The calibration curves at 1 year, 3 years, and 5 years indicated that the predicted results of the nomogram were consistent with the actual results ( Fig. 7C-E). The C index of nomogram (0.70) > age (0.61) > recurrence model (0.60) > TNM stage (0.56) indicated that the clinical value of the nomogram to predict the recurrence probability of HCC patients was superior to that of a single independent predictive factor. The results of the ROC curves were also consistent with the above conclusions (Fig. 7F-H). DCA con rmed that the clinical bene t from the nomogram for recurrence prediction was better than that of a single independent predictor (Fig. 7I-K).
Diagnostic models were built and demonstrated to have superior predictive power A diagnostic model based on the two exosome-derived genes was constructed to accurately distinguish between HCC patients and normal subjects during early diagnosis. The diagnostic equation determined by the stepwise logistic regression method was de ned as follows: logit (P = HCC) = -23.0370 + (2.2566 × expression level of MYL6B) + (3.8247 × expression level of THOC2). Applying the diagnostic model to the paired HCC samples from TCGA (consisting of 50 normal samples and 50 HCC samples) reached a sensitivity of 76.00% and a speci city of 82.00% (Fig. 8A). Figure 8B shows the consistency between the disease outcome predicted by the diagnostic model based on the expression levels of MYL6B and THOC2 and the actual result. The AUC of the diagnostic model reached 0.8792, con rming the superior performance of the diagnostic model in identifying HCC (Fig. 8C). The HCC cohort from ICGC (consisting of 202 normal samples and 243 HCC samples) was adopted to validate the diagnostic performance. The sensitivity of the diagnostic results was 76.95%, the speci city was 88% (Fig. 8D), and the AUC was 0.9077 (Fig. 8F), indicating that the diagnostic model can accurately distinguish HCC patients from normal subjects. The consistency between the predicted results based on the expression of MYL6B and THOC2 and the actual results is presented in Fig. 8E.
The early diagnosis of HCC is mainly based on imaging and pathological examination, but small nodules < 2 cm often lead to missed diagnoses because of di culties in accurate characterization [28]. We also constructed a diagnostic model based on MYL6B and THOC2 to distinguish HCC and dysplastic nodules. The diagnostic formula was determined as follows: logit (P-HCC) = -45.5786 + (2.6004 x MYL6B expression level) + (4.0578 x THOC2 expression level). We adopted patient information from GSE6764 as the training set and GSE89377 as the validation set to evaluate the predictive value of the diagnostic model. The diagnostic model in the training set reached 91.43% sensitivity and 76.47% speci city (Fig. 8G), and the AUC was 0.9328 (Fig. 8I). The sensitivity and speci city in the validation set were 82.50% and 50.00% (Fig. 8J), respectively, and the AUC was 0.7864 (Fig. 8L), indicating that the diagnostic model had great predictive performance. The gene expression pro le of HCC patients and the consistency between the predicted disease state and the actual disease state are shown in Fig. 8H and 8K.
The expression characteristics of exosome-derived genes were validated to accurately predict the OS of patients with HCC To evaluate whether MYL6B and THOC2 were suitable for constructing prognostic, recurrence and diagnostic models, we examined the expression pro les of these genes. MYL6B and THOC2 were obviously more highly expressed in the TCGA HCC samples than in the paired normal samples (Fig. 9A-B). The expression levels of MYL6B and THOC2 in HCC patients from GSE6764 were signi cantly higher than those in dysplastic nodule samples (Fig. 9C-D). The K-M curves indicated that patients with high gene expression had a worse prognosis than patients with low gene expression (Fig. 9E-F), and ROC curve analysis con rmed the speci city of MYL6B and THOC2 in predicting the prognosis of patients (Fig. 10G-H). We further explored the association between gene expression and patient prognosis and recurrence in the Kaplan-Meier Plotter database. The results con rmed that high gene expression was closely related to shorter overall survival time and progression-free survival time (Fig. 9I-L). Figure 9M-N shows the correlation between the expression pro les of MYL6B and THOC2 and immune cell in ltration.
The above results determined the effective predictive value of the two exosome-derived genes MYL6B and THOC2.

Discussion
As one of the most common malignant tumors, HCC in icts a heavy health and economic burden on society [29]. HCC is a disease involving a series of complex genetic and epigenetic alterations [30]. Although a variety of staging systems have been proposed and used in clinical decision-making to predict patient prognosis, these systems are mainly based on clinical pathological characteristics without considering the key role of complex molecular pathogenic mechanisms in the carcinogenesis and progression of HCC [31; 32], and the disease outcome of patients has not been signi cantly improved [33]. Identifying important clinical predictive marker molecules is essential to improve the early detection rate of HCC patients, determine effective treatment strategies, and improve their prognosis and recurrence.
Exosomes are the main types of extracellular vesicles (EVs), which can mediate the transfer of protein, DNA and various forms of RNA and participate in cell-to-cell communication [34]. The regulatory dysfunction of exosomes along with multiple complex molecular mechanisms is one of the characteristics that promotes the development of HCC [35]. The level of exosomes released by tumor cells increases, mediating the communication between HCC cells and non-HCC cells. The increase in the expression of carcinogenic molecules such as TUC399 [36] and miR-1247-3p [37] can promote tumor cell proliferation and metastasis, while tumor suppressors such as miR-122 [38] and miR-9-3p [39] are always downregulated, preventing their inhibitory effects on tumors. Identifying and performing a combination analysis of key marker molecules derived from exosomes may help accurately assess the progression of HCC and predict the prognosis and recurrence of HCC.
High-throughput sequencing technology provides an opportunity for us to seek key marker molecules in exosomes that are closely related to HCC development and prognosis. In our study, we determined that MYL6B and THOC2 have good prognostic value for HCC through integrated analysis and univariate and multivariate Cox regression. MYL6B is an essential light chain for non-muscle myosin II (NMII) and is involved in cell adhesion, migration, material transport and endocytosis control [40; 41]. It has been shown that MYL6B binds to the p53 protein to accelerate p53 degradation and promote the development of HCC [42]. THOC2 is involved in mediating mRNA export from the nucleus to the cytoplasm, chromosomal arrangement, the mitotic process, and genome stability [43]. The expression of THOC2 was found to promote the proliferation and invasion of melanoma [44]. The prognostic model and recurrence model based on the two genes were con rmed to independently and accurately predict the prognosis and recurrence of HCC patients. The diagnostic model constructed by the same two genes was also determined to distinguish HCC from normal subjects and dysplastic nodules with high speci city and high sensitivity. Our ndings lay a foundation for adopting MYL6B and THOC2 as biomarkers for the early diagnosis and prognostic evaluation of HCC.
In addition, exosomes are also involved in the regulation of the tumor immune response [45]. Tumor cell exosomes can promote the production of prostaglandin E2, IL-6 and TGF-β by myeloid-derived suppressor cells (MDSCs), thus forming a powerful immune suppression environment in tumor lesions [46]. Exosomes are also considered to be important mediators of antitumor immune responses and immune surveillance evasion of tumor cells. Some studies have shown that HCC-derived exosomes can reduce the cytotoxicity of T cells and NK cells and promote the aggregation of immunosuppressive M2 macrophages and N2 neutrophils [47; 48]. Characterizing the expression of exosome marker molecules may help assess the response to immunotherapy and nd a more effective treatment for patients. In this study, the prognostic model and the recurrence model were adopted to evaluate the immune in ltration score of HCC patients with different risk scores. Higher in ltration levels of B memory cells, M0 macrophages and T follicular helper cells and lower in ltration levels of NK cells were found in high prognostic risk patients, and high recurrence risk patients also presented higher in ltration fractions of B memory cells, M0 macrophages and T follicular helper cells and lower in ltration fractions of monocytes. In addition, we found that patients with high prognostic risk express higher levels of immune checkpoints such as PD1, B7H3, and CTLA4 in tumor tissues than patients with low prognostic risk, indicating that high-risk patients may be suitable candidates for immunotherapy and can obtain greater clinical bene t from it. Considering that chemotherapy is one of the main treatments for patients with advanced HCC and that drug resistance is one of the important factors affecting chemotherapy e cacy [49], we also evaluated the sensitivity of patients to common chemotherapy drugs and molecular targeted drugs through the prognostic model. High-risk patients showed increased resistance to chemotherapy drugs and molecular targeted drugs such as methotrexate, erlotinib and vinorelbine compared to low-risk patients.
Inevitably, our research does have some limitations. First, the mRNA sequences and corresponding clinical information in this study were mainly obtained from the TCGA and ICGC databases. When evaluating the clinical relevance and independent predictive performance of prognostic and recurrence models, the main factors incorporated in the analysis are traditionally recognized as important factors, such as TNM stage, pathological grade, and vascular invasion, that affect HCC development. Some clinical variables with similar contributions, such as internal and external chemicals, genetic factors and geographical environment, were not incorporated in the study due to insu cient sample size. Second, the validation set of the diagnostic model identifying HCC and nodules also affected the performance evaluation due to the small sample size. In addition, the clinical feasibility of the prognostic model in predicting the immunotherapy response needs to be prospectively tested in subsequent clinical trials. We will also further explore the role of MYL6B and THOC2 in the progression of HCC in the future.

Conclusion
Prognostic and recurrence models based on two exosome-derived genes (MYL6B and THOC2) were determined to have predictive performance with high speci city and high sensitivity, generate risk scores that are closely related to traditional clinical variables, and help identify candidates who are suitable for immunotherapy. Patients with different risk scores had signi cant differences in immune in ltration, and high prognostic-risk patients may obtain greater clinical bene t from immune checkpoint inhibitor therapy. The diagnostic model constructed with the same two genes could accurately distinguish HCC patients from normal subjects and hyperplastic nodules. The integration and comprehensive analysis of exosome-derived genes and genomic data of HCC patients provide the possibility to improve the early detection rate of HCC patients, improve their disease outcome and provide novel insights into the development of individualized treatment strategies for patients.       Construction and validation of a nomogram. A The independence of the prognostic model and clinical variables for prognosis prediction. B A nomogram was constructed to predict 1, 3, and 5 years of OS in HCC patients. C-E Calibration curves of the nomogram. F-I ROC curves validating the prediction speci city of the nomogram. J-M DCA indicated that the nomogram achieved the optimal clinical bene t. Figure 6 K-M curve analysis, risk score distribution and time-dependent ROC curve analysis of the recurrence model in the HCC cohorts from TCGA (A-C) and GSE14520 (D-F). A and D The high-risk group was more likely to experience recurrence than the low-risk group (P<0.05). B and E Expression of MYL6B and THOC2 and the corresponding prognostic risk scores of patients. C and F ROC curves validating the predictive value of the recurrence model.  Establishment and evaluation of a nomogram. A Independent analysis of the recurrence model compared to clinical characteristics. B A nomogram was constructed to quantitatively assess the probability of recurrence. C-E Calibration curves of the nomogram. F-H ROC curves determined that the prediction performance of the nomogram was better than that of a single independent predictor. I-K DCA determined that the prediction of the nomogram could achieve the maximum clinical bene t.  The expression characteristics of MYL6B and THOC2 and their correlation with patient prognosis, recurrence, and immune in ltration. A-B MYL6B (A) and THOC2 (B) expression was higher in the HCC tissues than in the paired normal tissues in TCGA. C-D Higher expression of MYL6B (C) and THOC2 (D) was found in HCC tissues than in dysplastic nodules in GSE6764. E-F Survival analysis of MYL6B (E) and THOC2 (F). G-H ROC curves validating the prognostic value of MYL6B (G) and THOC2 (H). I-L Shorter survival times (I and J) and progression-free survival times (K and L) were found in patients with high expression of MYL6B and THOC2 than in patients with low expression of MYL6B and THOC2 in the Kaplan-Meier Plotter database. M-N Correlation between the expression of MYL6B (M) and THOC2 (N) and immune in ltration.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.