Construction and Validation of a Prognostic Risk Model Based on Autophagy-Related Genes in Hepatocellular Carcinoma Cells

Hepatocellular carcinoma (HCC) is a prevalent primary liver cancer and the main cause of cancer mortality. Its high complexity and dismal prognosis bring dramatic diculty to treatment. Due to the disclosed dual functions of autophagy in cancer development, understanding autophagy-related genes devotes into seeking novel biomarkers for HCC. Methods Differential expression of genes in normal and tumor groups was analyzed to acquire autophagy-related genes in HCC. GO and KEGG pathway analyses were conducted on these genes. Genes were then screened by univariate regression analysis. The screened genes were subjected to multivariate Cox regression analysis to build a prognostic model. The model was validated by ICGC validation set. Altogether, Enrichment analysis showed that they were mainly in including autophagy cell Genes were screened by univariate analysis and multivariate Cox regression analysis to build a prognostic model. The model was constituted by 6 feature genes: EIF2S1, BIRC5, SQSTM1, ATG7, HDAC1, FKBP1A. Validation conrmed the accuracy and independence of this model in predicting HCC patient’s prognosis.


Introduction
Hepatocellular carcinoma (HCC) is a multi-step and complex disease involved in a series of genetic and epigenetic alterations, including genomic deletion, ampli cation, mutation and insertion 1 . So far, certain therapeutic strategies like radical excision, liver transplantation, radiofrequency ablation (RFA) and arterial embolization (TAE) are expected to be applied in the treatment of this lethal disease [2][3][4] . Due to early diagnosis, intervention therapy, and development of therapies and surgery, the treatment for this cancer has been progressed greatly. However, most patients are diagnosed in the advanced stage due to lack of available biomarkers 5,6 . High recurrence rate, rapid progression, short overall survival (OS) of HCC patients fail to give rise to a satisfactory outcome 2,7 . Thus, there is an urgent need to in-depth decipher pathways and mechanism of HCC progression.
Autophagy is a basic process to deliver damaged organelles and misfolded proteins to lysosomes for degradation to main intracellular homeostasis 8 . It is a conserved evolutionary process that is involved in all mammalian cells under basic conditions and generates building block molecules that support basic cellular processes 9 . The effect of autography is complex and varies from organ to organ. For instance, muscle and liver organs need autography to remove redundant protein aggregation, lipid accumulation, and damaged mitochondria in avoidance of oxidative stress arising from excessive reactive oxygen species (ROS) 10,11 . Pathogens of various human diseases from neurodegenerative disease, metabolic disease to cancers have shown defects in autophagy 12 . Autophagy dysregulation becomes increasingly important in liver diseases, non-alcoholic fatty liver disease (NAFLD), fatty degeneration, hepatomegaly, and primary malignant liver tumor (such as HCC) 13,14 . Thus, we thought that autophagy-related genes may participate in regulation of HCC or even most cancer development.
In this study, we obtained the expression data and clinical data of HCC and autophagy-related genes via The Cancer Genome Atlas (TCGA) database and human autophagy database. Autophagy-related genes in HCC were obtained through differential expression analysis. A prognostic model was constructed and validated through univariate regression analysis and multivariate Cox regression analysis. Altogether, a 6gene based prognostic risk model was determined. The model casts a light on the interplay between autophagy-related genes and prognosis of HCC. It also fuels a new tool for the diagnosis and treatment of HCC.  Table 2). The expression of these genes was extracted from mRNA expression data in TCGA-LIHC. Differential expression analysis (|logFC|>1, FDR < 0.05) was undertaken on normal group and tumor group using "limma" package. Differentially expressed autophagy-related genes were therefore obtained. Clinical data (like survival status and time) in Liver Cancer -Riken, Japan (LIRI-JP) were accessed from International Cancer Genome Consortium (ICGC) (https://icgc.org/) as the validation set (Supplementary Table 3) to validate the multivariate prognostic model.

Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG)
GO and KEGG enrichment analyses were conducted on the above autophagy-related genes in LIHC using "clusterPro ler" package. "digest" and "GOplot" packages were applied for visualization.

Construction and validation of a prognostic prediction model
Univariate regression analysis was undertaken on differentially expressed genes (DEGs) using "survival" package (p < 0.0001). Next, multivariate regression analysis was undertaken on the above screened genes using "survival" package for establishment of a prognostic risk model. Patient's risk score was calculated according to the expression level of each gene. Median value of risk score was deemed as a cut-off to distinguish high-risk and low-risk groups. Patient's survival curve was drawn by "survival" package. R package "survivalROC" was used to draw 3-year and 5-year OS receiver operating characteristic (ROC) curves. Area under the curve (AUC) was calculated.

Validation of the risk model with clinical information
We validated to assure whether the predictive performance of the model was independence of other clinical variables (including age, sex, T stage, clinical stages). Univariate and multivariate Cox regression analyses were undertaken on clinical data in TCGA-LIHC as well as risk score. ROC curves of clinical characteristics and risk score were drawn.

Drawing and validation of the nomogram
A nomogram was established with clinical information and risk score to predict the possibility of 3-year and 5-year OS of HCC patients. Effectiveness of the nomogram was evaluated by calibration curve.

Data preprocessing and differential expression analysis
In this study, mRNA expression data in TCGA-LIHC were downloaded and autophagy-related genes were isolated according to data from Human Autophagy Database. Afterwards, "limma" software package was applied to analyze the differential expression of autophagy-related genes in normal group and tumor group. Altogether, 42 autophagy-related DEGs in LIHC were found (upregulated: 37; downregulated: 5) (Fig. 1A). Boxplot of the expression of these genes in samples was shown in Fig. 1B.

Enrichment analysis of autophagy-related DEGs
Some basic signaling transduction pathways and biological processes regulated by autophagy-related DEGs in HCC were further analyzed. GO and KEGG enrichment analyses were undertaken on 42 obtained DEGs. GO enrichment analysis revealed the main enrichment of DEGs in regulation of autophagy, neuronal death, regulation of apoptotic signaling pathway, and that sort of biological processes ( Fig. 2A).
KEGG illuminated that DEGs were mainly enriched in signaling pathways like cell apoptosis, cellular senescence and PI3K-Akt signaling pathway (Fig. 2B).

Construction and validation of a prognostic risk model
Univariate Cox regression analysis was undertaken on autophagy-related DEGs based on TCGA-LIHC (training set) (p < 0.0001). A total of 21 genes signi cantly associated with patient's prognosis were obtained to draw a frost plot (Fig. 3A). Genes were screened by multivariate Cox regression analysis to build a prognostic model. Finally, a 6-gene based prognostic risk model was determined (Fig. 3B). LIHC patients were divided into high-risk and low-risk groups. Risk score distribution plot (Fig. 3C) and survival status plot (Fig. 3D) of patients in TCGA-LIHC were obtained. Moreover, the drawn ROC curves exhibited that AUC values of 3-year and 5-year OS were 0.717 and 0.733, respectively (Fig. 3E). Kaplan-Meier cumulative curve suggested the OS of low-risk score patients was remarkably longer than that of highrisk score patients (Fig. 3F). Universality of the model was validated by ICGC validation set LIRI-JP. ROC curves showed that AUC values of 3-year OS and 5-year OS were respectively 0.822 and 0.772 (Fig. 3G). Survival curves showed a longer survival of patients with low-risk score (Fig. 3H). Taken together, the model held high accuracy.

Validation of the independence of the risk model with clinical data
To further evaluate the performance of the prognostic model, univariate regression analysis was undertaken on risk value and clinical information of TCGA-LIHC patients. Risk score, clinical stages and T stage all showed signi cant in uence on patient's prognosis (Fig. 4A). While multivariate regression analysis exhibited that only risk score held signi cant effect on patient's prognosis (Fig. 4B). ROC curves based on clinical characteristics and risk score showed that AUC of risk score (0.78) was higher than that of all clinical characteristics (Fig. 4C). All the above, the prognostic model was a favorable prognostic prediction indicator better than patient's clinical characteristics and independent from clinical characteristics themselves.

Drawing and validation of the nomogram
Nomogram has been widely employed in predicting cancer patient's prognosis. It is mainly because that it can simplify statistical prediction model into a single value evaluation of individual's OS possibility. The nomogram generated by clinical characteristics (age, sex, T stage and clinical stages) and risk score could be used to predict the possibility of 3-year and 5-year OS of HCC patients (Fig. 5A). 3-year and 5year calibration curves were drawn to validate the prediction performance of the nomogram, and a high tting level was observed (Fig. 5B-C). On the whole, the nomogram is capable of accurately predicting patient's prognosis, fueling an opportunity for the following consultant, decision and arrange.

Discussion
HCC is a global cancer health concern. Its complexity and bad prognosis preclude effective access to treatment. Finding a helpful biomarker and building a reliable model to predict patient's prognosis are of general interest for HCC therapy.
Regulation of these 6 genes were proved in assorted cancers except HCC. Sequestosome 1 (SQSTM1) encodes a multifunctional protein that binds to ubiquitin and activates the nuclear factor kappa B (NF-kB) signaling pathway. P62 was con rmed to inhibit autophagy ux and promote epithelial-mesenchymal transformation in metastatic prostate cancer by maintaining the level of HDAC6 17 . Sequestosome 1 is an effective prognostic factor associated with cell proliferation in human colorectal cancer 18 . In addition, p62 is upregulated in the prophase of HCC and induces cancer by maintaining the survival of stressinduced HCC initiating cells 19 . HDAC1 plays a key role in regulating eukaryotic gene expression. It has been reported to promote glycolysis in gastric cancer, and its high expression is an independent adverse prognostic factor for OS and disease-free survival 20 . Silencing of HDAC1 enhances the sensitivity of ovarian cancer to chemotherapy 21 . HDAC1 restrains Snail2-mediated epithelial-mesenchymal transition (EMT) in the process of metastasis of HCC (Yue Hu et al. 22 ). Viewed in toto, these genes participate in progression of HCC, which is consistent with this paper.
Besides, 4 feature genes were not yet well de ned in HCC. EIF2S1 catalyzes the rst regulatory step in the initiation of protein synthesis to promote the binding of the initial tRNA to the 40S ribosome subunit. Phosphorylated eIF2α has been found to predict disease-free survival in triple-negative breast cancer patients 23 . Estrogen-induced apoptosis of breast cancer cells takes place by blocking dephosphorylation of eIF2α protein 24 . In addition, Li Suzhen et al. 25 demonstrated that TIPRL enhances survival of lung cancer by inducing autophagy through the eIF2α-ATF4 pathway. BIRC5 is a member of the IAP gene family, which encodes negative regulatory proteins that prevent the death of apoptotic cells. It has been shown that the long non-coding RNA (lncRNA) nR2F1-AS1 promotes the malignant phenotype of osteosarcoma cells through the miR-485-5p/miR-218-5p/BIRC5 axis 26 . Inhibition of BIRC5 improves cervical cancer cells sensitivity to radiotherapy 27 . Nine genes, including BIRC5, may be biomarkers for HCC 28 . FKBP1a encodes proteins that are members of the immunomodulatory family, and it plays a role in immunomodulatory and fundamental cellular processes involved in protein folding and transport. Ten genes, including FKBP1A, were identi ed as biomarkers for breast cancer 29 . LncRNA SNHG15 promotes prostate cancer cell proliferation, migration and EMT by regulating miR-338-3p/FKBP1A axis 30 . Luo Yue et. al 31 reported FKBP1A overexpression in HCC. ATG7encodes an E1-like activator that is critical for autophagy and cytoplasmic transport to vacuoles. Studies have shown that ATG7 adjusts three negative breast cancer tumor progression 32 . MiR-154 exerts suppressor role by directly targeting ATG7 in bladder cancer 33 . In the context, the investigation of mechanism of SQSTM1 and HDAC1 in the 6 feature genes was in its infancy. EIF2S1, BIRC5 and FKBP1A were only identi ed as biomarkers of HCC, and ATG7 was not reported to be associated with HCC. Hence, this paper may provide theoretic basis for studying these genes in HCC.
On the whole, 6 autophagy-associated genes were identi ed via bioinformatics analysis (EIF2S1, BIRC5, SQSTM1, ATG7, HDAC1, FKBP1A), and corresponding prognostic risk model was constructed. Our nding will yield valuable insight into early diagnosis, prognosis, and development of new therapies. However, application of these 6 feature genes requires validation by incremental clinical experiments and animal experiments.

Declarations
Ethics approval and consent to participate Not applicable.

Availability of data and materials
The data and materials in the current study are available from the corresponding author on reasonable request.