Development and validation of a survival model based on autophagy-associated genes for predicting prognosis of hepatocellular carcinoma

Background: Hepatocellular carcinoma (HCC) is one of the devastating tumors with increasing incidence. Autophagy-associated genes (ARGs) are widely participated in the cellular processes of HCC. This study proposed to identify the novel prognostic gene signature based on ARGs in HCC. Methods: We downloaded the RNA sequencing data and clinical information of HCC and normal tissues from The Cancer Genome Atlas (TCGA) database. The differentially expressed ARGs were screened by the Wilcoxon signed-rank test. Functional enrichment analyses were conducted to explore the biological implications and mechanisms of ARGs in HCC. Cox regression analysis and Lasso regression analysis were performed to screen the ARGs which related to overall survival (OS). The OS-related ARGs were then used to establish a prognostic prediction model. Kaplan-Meier curves and receiver operating characteristic (ROC) curves were both applied to evaluate the accuracy of the model. GSE14520 dataset was downloaded as the testing cohort to validate the prognostic risk model in TCGA. A nomogram based on the clinical features and risk signature was established to predict the 3-year and 5-year survival rate of HCC patients. Results: Totally 27 differentially expressed ARGs were screened in this study. Then, 3 OS-related ARGs (SQSTM1, HSPB8, and BIRC5) were identied via the Cox regression and Lasso regression analyses. Based on these 3 ARGs, a prognostic prediction model was constructed. HCC patients in high-risk group presented poorer prognosis than those with low risk score in TCGA cohort (3-year OS, 53.7% vs 70.2%; 5-year OS, 42.0 % vs 55.2%; P=4.478e-04) and in the testing group (3-year OS, 57.7% vs 73.5%; 5-year OS, 43.2% vs 63.0%; P=1.274e-03). The risk score curve showed a well feasibility in predicting the patients’ survival both in TCGA and GEO cohort with the area under the ROC curve (AUC) of 0.756 and 0.672, respectively. Besides, the calibration OS-related ARGs (SQSTM1, HSPB8, and BIRC5) were screened. Based on these genes, we established a prognosis prediction model, which presented a good ecacy in guiding the personalized medicine for HCC patients. The above results suggested that ARGs signature can act as the effective and promising prognostic indicator for HCC patients. Moreover, further studies for these ARGs may also improve the cancer management of HCC.


Background
Hepatocellular carcinoma (HCC), which is the most frequent liver malignancy, ranks as the third leading cause for cancer deaths in the world [1]. According to the studies, hepatitis B virus, hepatitis C virus, and alcoholism are the most common risk factors for HCC [2,3]. Despite the advances made in the diagnosis and treatments for HCC patients, this disease still is a challengeable threat to the health of individuals. Moreover, due to the tumor recurrence, metastasis, and drug resistance, the 5-years survival rate of HCC patients is unsatis ed [4]. Therefore, it is pressing to identify the novel and speci c biomarkers and targets for diagnosis, prognostic analysis, and targeted therapy in HCC.
Autophagy is an important process which allows lysosomes to degrade the damaged and non-functional proteins or organelles [5]. Recently, increasing number of investigations demonstrated that the abnormal autophagy is involved in many types of tumors, including esophageal cancer, gastric cancer, and breast cancer [5][6][7]. What's more, some autophagy genes have shown the potential to serve as the ideal biomarkers or therapeutic targets for cancer management [5,6]. Recent studies also revealed the relationships between pathophysiological processes of HCC and autophagy. For example, Fang et al. found that suppression of autophagy can inhibit the hepatitis C virus replication in human hematoma cells [8]. Pan et al reported that up-regulation of p62/SQSTM1 can decrease the sensitivity of HCC cells towards sorafenib [9]. LC3, a vital marker for autophagy, was also demonstrated to be a promising indicator for predicting the prognosis of HCC patients [10]. In addition, some studies have given evidence that small molecules that regulate the autophagy regulatory mechanisms may provide new clues for targeted therapy in advanced HCC [11]. For instance, ATG5 siRNA could suppress autophagy and enhance norcantharidin-induced apoptosis in HCC [12]. Xue et al. found that ULK1 may act as a novel target for HCC treatment [13]. However, little literature is available regarding the prediction of HCC prognosis using the prognostic models based on autophagy associated genes (ARGs). Our study proposed to explore the impact of ARGs on the prognosis of HCC patients, thus helping to improve the prognostic judgement and targeted therapy.
In this study, we screened 3 ARGs which closely related to the overall survival (OS) of HCC patients from the TCGA database. A prognosis prediction model was constructed based on these 3 ARGs and demonstrated to perform well for HCC patients in both training and testing cohort. Moreover, we established a clinical nomogram combining the OS-related ARGs and clinicopathological factors (age, gender, grade, stage, T (primary tumor), N (lymph nodes), M (metastasis), and risk score) and showed its good performance for predicting the 3-year and 5-year survival rate of HCC patients.

Data collection and procession
The Human Autophagy Database (http://www.autophagy.lu/index.html) was used to download the entire 232 ARGs [14]. RNA sequencing data and the corresponding clinical data for 374 HCC and 50 non-tumor tissues were acquired from the TCGA database (https://tcga-data.nci.nih.gov/tcga/) [15]. The Data procession was conducted as the previous study [16]. The cBio Cancer Genomics Portal (http://cbioportal.org) was utilized to explore the genetic alterations and clinical information of the selected ARGs in HCC [17].

Differentially expressed ARGs in HCC
The Wilcoxon signed-rank test in package "limma" in R software (version 3.6.3) was applied to screen the differentially expressed ARGs between HCC and normal tissues, with the criteria of |log 2 fold change (FC)|>1.5 and adjusted P-value < 0.05 [16,18]. Then, we combined the expression data of the differentially expressed ARGs with the corresponding clinical data. Univariate Cox regression and multivariate Cox regression analyses were applied to pick out the ARGs that were closely related to OS in HCC [18]. We also performed the Lasso regression to remove ARGs that might be closely correlated with others [14,16].

Functional enrichment analyses
To explore the biological implications and potential mechanisms of ARGs in HCC, GO annotation and KEGG pathway analyses were conducted for ARGs via R software with "GO plot", "ggplot2", "Cluster Pro ler" and "DOSE" packages, etc [16]. Top enriched terms with p-value < 0.05 and q-value < 0.05 were regarded noteworthy.

Establishment of prognostic model for HCC
A linear combination of the expression of ARGs and regression coe cient which based on the multivariate Cox regression was used to construct the risk signature [14]. The OS-related predictive formula was calculated as following: risk score = (Expression gene1 × Coe cient gene1 ) + (Expression gene2 × Coe cient gene2 ) +…+ (Expression genen × Coe cient genen ) [14]. Then, each patient with HCC was assigned a risk score according to the formula.

Assessment of the prognostic model
Kaplan-Meier survival curves and time-dependent ROC curves were performed to evaluate the e ciency of the prognostic model [19].
2.6 External validation of the prognostic gene signature GSE14520 dataset with 221 HCC patients was downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) [20,21]. Each patient received a risk score which was calculated using the same prognostic gene-signature based risk model in TCGA dataset. Then, the Kaplan-Meier curves and ROC curve were performed to evaluate the predictive performance of the prognostic gene signature.

Construction of the clinical nomogram
We downloaded the clinical data of HCC patients from the TCGA database. Then, a clinical nomogram based on several factors (age, gender, grade, stage, T (primary tumor), N (lymph nodes), M (metastasis), and risk score) was built for predicting the 3-year and 5-year survival rate of HCC patients by using the "survival" and "rms" packages in R software [19]. Moreover, the concordance index (C-index) and calibration curves were both applied to evaluate the accuracy of the nomogram.

Validation of the expression of ARGs at protein level and mRNA level
The Human Protein Atlas database (https://www.proteinatlas.org/) contains more than 11,200 unique proteins [22], we therefore utilized it to evaluate the proteins levels of the prognostic ARGs in HCC tissues and normal tissues. TIMER database (https://cistrome.shiny apps.io/timer/) was used to validate the mRNA levels of ARGs in HCC tissues and normal samples [23].
2.9 Construction of transcription factors-genes networks and miRNA-genes networks The NetworkAnalysis (http://www.networkanalyst.ca) database was conducted to predict the transcription factors (TFs) and miRNAs of SQSTM1, HSPB8, and BIRC5 [24]. The TFs prediction was based on the ENCODE database with ChIP-seq data [24]. Only the results with a peak intensity signal value < 500 and a potential score value < 1 were identi ed for further study [24]. The miRNA-genes network was predicted and constructed by the TarBase and miRTarBase in the NetworkAnalysis database.

Statistical analysis
The Perl language and R software 3.6.3 were applied to conduct all the statistical tests and graphics. P < 0.05 was regarded as statistical signi cance.

Function enrichment analyses for ARGs
To investigate the biological functions and molecular mechanisms of the above 27 ARGs in HCC, GO enrichment analysis and KEGG pathway analysis were conducted (Fig. 2, Fig. 3). The GO enrichment analysis can be divided into three parts: biological processes (BP), cellular components (CC) and molecular function (MF). As showed in Fig. 2, the top enriched terms for BP included autophagy, process utilizing autophagic mechanism, and regulation of apoptotic signaling pathway. As for CC, the most signi cant terms involved in autophagosome, chaperone complex, and integrin complex. In MF group, these differentially expressed ARGs were mainly associated with BH domain binding, chaperone binding, and heat shock protein binding. Besides, KEGG enrichment analysis showed that the selected differentially expressed ARGs were mainly participated in pathways in cancer, human papillomavirus infection, apoptosis, measles, and platinum drug resistance ( Fig. 3a-3b).

Prognostic gene signature for HCC cohorts
Using a univariate Cox regression analysis, totally 13 differentially expressed ARGs (BAX, SQSTM1, PEA15, CDKN2A, HSPB8, HGS, IKBKE, HSP90AB1, RAB24, BIRC5, DDIT3, BAK1, and TMEM74) were found to be markedly related to OS in HCC patients (Fig. 4a). All these 13 survival-related ARGs were regarded as risk factors (HRs, 1.154-1.715; P < 0.05) and their overexpression may cause worse prognosis. Then, we validated the prognostic roles of these 13 survival-related ARGs in Kaplan Meier-plotter website (http://kmplot.com/analysis/index.php). The results are consistent with the ndings in TCGA dataset ( Fig. S1; The OS curve of BAX presented similar trend to these of other genes, but not statistically signi cant). Finally, the 13 differentially expressed ARGs were entered into a Lasso regression analysis. Figure 4b presented the regression coe cient of these 13 ARGs in HCC. While 7 ARGs (SQSTM1, PEA15, CDKN2A, HSPB8, RAB24, BIRC5, and TMEM74) were included, the model reached the best performance (Fig. 4c). The biological functions and risk coe cients of 7 ARGs were listed in Table 1, which mainly associated with the formation of autophagosome, regulation of autophagy, as well as regulation of apoptosis.  S3a). Moreover, HCC patients in genes-altered group presented poorer PFS (progression-free survival) (Fig.S3b) and DFS (disease-free survival) (Fig.S3c) than these in genesunaltered group. The OS curves presented similar trend to that of PFS, but not statistically signi cant (Fig.S3d). These results suggested that the 7 ARGs play a crucial role in hepatocellular carcinogenesis.
The ROC curves were also plotted using the risk factors related to OS (age, gender, grade, stage, T (primary tumor), M (metastasis), N (lymph nodes), and risk score). We then evaluated the area under curve (AUC) values for each risk factor, and the results showed that risk score curve has a better feasibility in predicting the individuals' survival with AUC of 0.756 (Fig. 6b). In addition, Figure. 6c-6e showed that the survival time of HCC patients decreases along with the rising of risk score.
The prognostic role of the three-gene risk model was then validated in the GEO validation cohort (GSE14520; n = 221). According to the median value of risk score, the patients were also divided into two groups (high risk group and low risk group). The Kaplan-Meier survival result showed that HCC patients in high risk group have a signi cantly worse OS compared to cases with low risk in the validation dataset (3-year OS, 57.7% vs 73.5%; 5-year OS, 43.2% vs 63.0%; P = 1.274e-03) (Fig. 7a). Moreover, Fig. 7b demonstrated that the risk score presents a well predictive ability with AUC of 0.672. Figure 7c-7e showed that patients in high risk group have shorter survival time than patients with low risk score in validation cohort. Taking together, the three-gene signature has good feasibility for predicting the prognosis in HCC.

Construction of a nomogram for predicting survival rate of HCC
The clinical nomogram was constructed to quantitatively assess the patients' survival rate through combining several risk factors (age, gender, grade, stage, T (primary tumor), N (lymph nodes), M (metastasis), and risk score). As shown in Fig. 8a, the total points of risk factors were utilized to evaluate the individuals' 3-year and 5-year survival rates. The concordance index (C-index) was 0.68 (95% CI: 0.63-0.73). In addition, calibration curves presented good concordance between actual survival and nomogram-predicted survival (Fig. 8b-8c), especially for the 3-year survival.

The relationships between ARGs and clinical factors
The Student's t test was applied to investigate the relationships between the expression levels of 3 ARGs (SQSTM1, HSPB8, and BIRC5) and clinical factors. The results showed that the risk scores increased along with the T (primary tumor) and tumor grade ( Fig. 9a-9b). In addition, the expression of SQSTM1 was higher in the groups of patients aged > 65, male, and patients with higher tumor grade ( Fig. 9c-9e).
We also found the level of BIRC5 was related to the grade, tumor stage and T (primary tumor) in HCC patients ( Fig. 9f-9 h).

Validation of the expression of ARGs at protein level and mRNA level
The Human Protein Atlas database (https://www.proteinatlas.org/) was used to evaluate the protein levels of 3 prognostic ARGs (SQSTM1, HSPB8, and BIRC5) in HCC tissues with their expression in normal tissues (Fig. 10a-10c). As expected, the protein levels of 3 prognosis-related ARGs (SQSTM1, HSPB8, and BIRC5) were markedly higher in HCC tissues as compared with normal samples. Then, the TIMER database (https://cistrome.shinyapps.io/timer/) was applied to validate the mRNA expression levels of SQSTM1, HSPB8, and BIRC5. The results showed that the mRNA levels of these 3 ARGs were obviously higher in HCC compared to normal controls (Fig. S4).

Discussion
HCC is a common and frequently occurring malignancy worldwide. Due to the lack of ideal prognostic biomarkers, HCC patients usually can't receive reasonable treatment immediately [25]. Traditional prognostic risk factors (such as tumor size, histological type, stage, and grade) could only be adopted and evaluated post-surgery. Some scholars even believe that the present TNM stage system should be perfected for its insu cient to accurately predict the prognosis of cancer patients [26][27][28]. Moreover, different patients can present different treatment responses. Thus, more speci c and effective markers are required to be identi ed for evaluating prognosis and screening potentially HCC patients with high risk. To date, increasing number of biomarkers have been screened for the prediction of prognosis in HCC [29,30]. For example, Dai et al. con rmed that high expression of HIF-1α is an independent prognostic factors for OS and DFS of HCC patients [31]. Similarly, our lab previously found that high levels of SOX12 is an independent and important risk factor for HCC patients [32]. However, the translation of these biomarkers into clinical application still leaves much to be desired. Firstly, researchers need to investigate the molecular mechanisms behind the dysregulation of marker. What's more, these biomarkers are expected to be evaluated in large samples. In addition, the single gene study may provide inaccurate conclusion since the expression levels of single gene can be affected by many factors. The prognostic gene signature may solve these problems and eventually provide more convincing results since it is based on the statistical model and comprise of multiple related genes.
Numerous researches have reported that autophagic dysfunctions were associated with several pathophysiological processes, including in ammation, metabolic disorder, neurodegeneration and cancer [33,34]. Autophagy can sever as both tumor suppresser role and oncogenetic role in tumorigenesis, which may depend on the tumor microenvironments and tumor heterogeneity [6,35]. For example, Fan et al. reported that autophagy can promote the metastasis and glycolysis of HCC cells through Wnt/β-catenin pathway [36]. Conversely, exenatide induced autophagy can inhibit the cell proliferation of HCC cells [37].
Considering the emerging role of autophagy in cancers, autophagy-related studies may provide us a better understanding on the pathogenesis and prognosis of HCC. Importantly, the risk gene signature obtained from the entire set of ARGs could be superior to single gene for predicting the survival.
So far, the rapid development of gene chip assays and second-generation gene sequencing have greatly facilitated the development of personalized medicine and precision medicine. Increasing numbers of biomarkers have been identi ed by integratedly analyzing the genomic data form individual specimens. It is indisputable that these methods can nally consolidate and improve the current clinical setting for cancer management. To our knowledge, this study rstly explored the prognostic roles of entire reported ARGs in HCC. Here, 27 differentially expressed ARGs was identi ed from 374 HCC tissues and 50 normal tissues. Then, the functional enrichment analyses were conducted to explore the roles and mechanisms of the differentially expressed ARGs in HCC. Though the Cox regression and Lasso regression analyses, we established a risk model based on 3 OS-related ARGs (SQSTM1, HSPB8, and BIRC5). The HCC patients were then divided into two groups (high risk group and low risk group) according to the risk score derived from this model. Kaplan-Meier curves and ROC curves suggested that the risk model performed well in both training cohort and testing cohort. What's more, the clinical nomogram combining the clinicopathological features and risk score was applied to predict the 3-year and 5-year survival rate of HCC patients. And the calibration plots and C-index illustrated a good performance of the nomogram to predict patient' survival.
However, there are still some de ciencies in the study. First, our study mainly focused on the prognostic role of selected ARGs and we didn't deeply study the other ARGs. Second, we failed to validate the expression levels of prognosis-related ARGs by in vitro or in vivo assays. Third, the prognostic model should be evaluated in large clinical samples. Finally, in-depth studies on the 3 prognosis-related ARGs may improve the targeted therapy of HCC patients.

Conclusions
In conclusion, 3 OS-related ARGs (SQSTM1, HSPB8, and BIRC5) were screened. Based on these genes, we established a prognosis prediction model, which presented a good e cacy in guiding the personalized medicine for HCC patients. The above results suggested that ARGs signature can act as the effective and promising prognostic indicator for HCC patients. Moreover, further studies for these ARGs may also improve the cancer management of HCC.        The relationships between ARGs and clinical factors. a. The risk scores increase along with the T (primary tumor). b. The risk scores increase along with tumor grade. c. The expression of SQSTM1 is higher in the groups of patients aged >65. d. The expression of SQSTM1 is higher in the groups of male. e. The expression of SQSTM1 is higher in the groups of patients with higher tumor grade. f-g. The level of BIRC5 is associated with the grade, tumor stage and T (primary tumor) in HCC patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementary.docx