Development of a Survival Model Based on Autophagy-Associated Genes for Predicting Prognosis of Gastric Cancer

Background: Gastric cancer (GC) is one of lethal diseases worldwide. Autophagy-associated genes play a crucial role in the cellular processes of GC. Our study aimed to investigate and identify the prognostic potential of autophagy-associated genes signature in GC. Methods: RNA-seq and clinical information of GC and normal controls were downloaded from The Cancer Genome Atlas (TCGA) database. Then, the Wilcoxon signed-rank test was used to pick out the differentially expressed autophagy-associated genes. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to investigate the potential roles and mechanisms of autophagy-associated genes in GC. Cox proportional hazard regression analysis and Lasso regression analysis were carried out to identify the overall survival (OS) related autophagy-associated genes, which were then collected to construct a predictive model. Kaplan-Meier method and receiver operating characteristic (ROC) curve were utilized to validate the accuracy of this model. Finally, a clinical nomogram was established by combining the clinical factors and autophagy-associated genes signature. Results: A total of 28 differentially expressed autophagy-associated genes were identied. GO and KEGG analyses revealed that several important cellular processes and signaling pathways were correlated with these genes. Through Cox regression and Lasso regression analyses, we identied 4 OS-related autophagy-associated genes (GRID2, ATG4D, GABARAPL2, and CXCR4) and constructed a prognosis prediction model. GC Patients with high-risk had a worse OS than those in low-risk group (5-year OS, 27.7% vs 38.3%; P=9.524e-07). The area under the ROC curve (AUC) of the prediction model was 0.67. The nomogram was demonstrated to perform better for predicting 3-year and 5-year survival possibility for GC patients with a concordance index (C-index) of 0.70 (95% CI: 0.65-0.72). The calibration curves also presented good concordance between nomogram-predicted survival and actual survival. Conclusions: We constructed and evaluated a survival model based on the autophagy-associated


Background
Gastric cancer (GC) is one of lethal diseases that ranks the second leading cause for cancer deaths and the fth for cancer incidence worldwide [1]. Although there was a slight decline in the incidence and mortality of GC over the past decades, GC still is a terrible threat to the health of people, especially in east Asian countries [2]. Generally, gastric carcinogenesis is a multistep and multifactorial process, this includes Helicobacter pylori infection, genetic factors alterations, and epigenetic regulations [3,4]. Despite the signi cant advances in the approaches of diagnosis and treatments for GC patients, drug resistance, cancer metastasis, and recurrence are still the major problems causing a low 5-year survival rate [5].
Therefore, novel and speci c biomarkers for early diagnosis and prognostic analysis in GC were urgently needed.
Autophagy is an essential cellular process in various cells, allowing lysosomes to degrade the nonfunctional and damaged proteins or organelles [6,7]. Previous studies have showed that autophagy is widely participated in the pathophysiological processes of cells, including in ammation, metabolism, and cancigenesis [7,8]. The role of autophagy in GC has been previously explored [9]. For example, Xiong et al. found that suppression of autophagy can enhance cinobufagin-induced apoptosis in GC cells [10]. Hu et al have proven that drug resistant GC cells have an increased autophagy level compared to the parental cells [11]. Conversely, high-level of autophagy was also demonstrated to suppress GC in other studies [9,12]. Some studies have given evidence that the upregulated autophagy can play a tumor suppressor role in GC cells [9,13]. For instance, Yang et al. found that 5-Fu can suppress GC cells via inducing autophagy [13]. In addition, other pharmacological treatments can induce the autophagic cell death in GC cells [14,15]. However, the prognostic models based on autophagy-associated genes haven't been reported in GC. Considering the above contradictory ndings and research needs, this study aimed to investigate the impact of the entire set of autophagy-associated genes on clinical outcomes of GC patients, thus helping to improve the personalized medicine.
In this study, we performed Cox regression analysis to screen the autophagy-associated genes which signi cantly correlated with overall survival (OS) in GC patients, and a prognostic model was developed.
We also utilized the least absolute shrinkage and selection operator (Lasso) analysis to construct an optimal risk model. The Kaplan-Meier method and the receiver operating characteristic (ROC) curve were performed to assess the performance of the model. Besides, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were applied to explore the potential roles and mechanisms of autophagy-associated genes in GC. We also built a nomogram to predict the patients' survival rate.
Compared with other studies that only explore a single gene or several genes, our study may improve the prognosis judging accuracy by integrating multiple autophagy-associated genes.

Data acquisition and procession
Totally 232 autophagy-associated genes were acquired from the website: Human Autophagy Database (http://www.autophagy.lu/index.html). RNA-seq and the corresponding clinical information for 375 GC and 32 adjacent non-tumor tissues was obtained from the TCGA database (https://tcga-data.nci. nih.gov/tcga/). Then, the online database GENCODE (https:// www.gencodegenes.org/human/releases.html) was utilized to convert the ensemble gene IDs into gene symbols [16]. The expression data of autophagy-associated genes were picked out. The cBio Cancer Genomics Portal (http://cbioportal.org) was applied to search for genetic alterations of the selected autophagy-associated genes in GC [17].

Differentially expressed autophagy-associated genes in GC
Data analysis for the differentially expressed autophagy-associated genes between GC and normal controls were carried out using the Wilcoxon signed-rank test in package "limma" in R software (version 3.6.3) [18]. |log 2 fold change (FC)| >1 and a false discovery rate (FDR) < 0.05 were set as the thresholds [18]. Then, we evaluated the expression data of the differentially expressed autophagy-associated genes with the corresponding clinical characteristics.
Univariate Cox regression analysis and multivariate Cox regression analysis were conducted to screen the autophagy genes that were markedly associated with OS of GC [19]. The Lasso regression was performed as described previously, so as to remove highly correlated survival-related autophagyassociated genes [19,20].

Construction of prognostic model for GC
The OS-related predictive formulas were performed to establish the prognostic model by using package "glmnet" in R software based on the multivariate Cox regression [19,20]. And each GC patient received an individual risk score.

Validation the prognostic model
Kaplan-Meier method was applied to evaluate the differences in high-risk and low-risk groups in term of the predictive signature. Besides, the accuracy of prognostic prediction model was determined by the ROC curve in package "survivalROC" in R software [21].

Construction of nomogram
The detailed clinical data of GC patients were obtained from the TCGA database. Age, sex, grade, stage, T, M, N, and risk score were integrated to build a clinical nomogram for predicting 3-year and 5-year survival possibility of GC patients through "survival" and "rms" packages in R software [19,22]. In addition, the calibration curves and concordance index (C-index) were utilized to assess the performance of the nomogram.

Functional enrichment analysis
To investigate the molecular mechanisms of autophagy-associated genes in GC, GO and KEGG enrichment analyses were carried out and visualized by R software with "ggplot2", "GO plot", "DOSE", and "Cluster Pro ler" packages, etc [21]. The p-value < 0.05 and q-value < 0.05 were set as the criteria.

Statistical analysis
All the statistical examinations and graphics were conducted using the Perl language packages and R software 3.6.3. Statistical signi cance was considered as P < 0.05.

Characteristics of GC patients
GC cohorts in TCGA database includes totally 375 stomach adenocarcinoma patients. The clinical characteristics of GC patients were presented in Table S1. We plotted Kaplan-Meier curves of tumor (T), metastasis (M), lymph (N), and stage for GC cohorts (Figure. S1)

GO and KEGG analyses of autophagy-associated genes
To explore the potential mechanisms of the above 28 autophagy-associated genes in GC, GO and KEGG enrichment analyses were performed in R software ( Figure. 3, Figure. 4). The results of GO analysis showed that the top enriched terms for biological processes (BP) include cell growth, neuron death, regulation of neuron death, regulation of response to cytokine stimulus, and regulation of cytokinemediated signaling pathway ( Figure. 3a-3b). As for KEGG analysis, the selected differentially expressed autophagy-associated genes were mainly associated with platinum drug resistance, bladder cancer, apoptosis, p53 signaling pathway, pancreatic cancer, hepatitis B, ErbB signaling pathway, apoptosismultiple species, IL-17 signaling pathway and endocrine resistance ( Figure. 4a-4b).

Prognostic signature for GC cohorts
A total of 10 autophagy-associated genes (MAP1LC3B, IRGM, GRID2, ATG4D, GABARAPL1, and HSPB8, CTSL, GABARAPL2, CXCR4, and DLC1) were found to be markedly associated with OS in GC patients via conducting a univariate Cox regression analysis (Table 1). Among these 10 survival-related autophagy-associated genes, 9 genes (MAP1LC3B, IRGM, GRID2, GABARAPL1, and HSPB8, CTSL, GABARAPL2, CXCR4, and DLC1) were identi ed as risk factors (HRs, 1.112-5.214; P < 0.05) and their overexpression predict worse outcome. However, overexpression of the ATG4D (HR=0.654 (0.475-0.903), P < 0.05) may improve the OS in GC patients. Then, these 10 genes were entered into a Lasso regression analysis. Figure. 5a illustrated the regression coe cient of these 10 autophagy-associated genes in GC. As shown in Figure. 5b, when 8 genes (MAP1LC3B, IRGM, GRID2, ATG4D, CTSL, GABARAPL2, CXCR4, and DLC1) were included, this model achieved the best performance. Table 2 listed the functions and coe cients of these 8 genes, which mainly involved in the formation of autophagosomal vacuoles, apoptosis regulation, as well as autophagosome maturation.  the GC survival in two groups. We found that patients in high-risk group present lower survival than those in low-risk group (5-year OS, 27.7% vs 38.3%; P=9.524e-07) (Figure. 6a). The ROC curve of the predictive model was illustrated in Figure. 6b, with AUC of 0.67. Besides, Figure. 6c-6e showed that the mortality rate of GC patients increases along with the increasing of risk score.

The nomogram for predicting survival rate of GC
The clinical nomogram was used to quantitatively evaluate the individuals' risk via integrating several risk factors. In the nomogram, the individuals' 3-year and 5-year survival rates were assessed by the total points of risk factors (Figure. 7a). The C-index reaches 0.70 (95% CI: 0.65-0.72). Moreover, calibration curves illustrated good concordance between nomogram-predicted survival and actual survival (Figure 7b and 7c), especially for 3-year survival.

Discussion
Autophagy is a fundamental process of cells and regulated by several pathways [23]. Abnormal autophagy has been demonstrated to contribute to the progression of gastric cancigenesis [24]. Li et al reported that autophagy can regulate the drug sensitivity of GC cells through Notch signaling pathway [25]. Helicobacter pylori infection, a recognized risk factor in gastric cancigenesis, was also found to be closely associated with the modulation of autophagy in GC cells [26]. Moreover, several researches have reported the potential relationship between autophagy genes and GC survival. For example, high levels of LC3B and cytoplasmic SQSTM1 were positively associated with poor prognosis of GC patients [27]. Ge et al reported that upregulated ATG5 may be an independent prognostic biomarker for GC [28]. Beclin-1 was also regarded as an independent prognostic factor in GC [29]. Considering the emerging role of autophagy in GC, it is meaningful to explore the prognostic values of autophagy-associated genes in GC. Importantly, the gene signature derived from the entire set of autophagy-associated genes could be superior to single gene in predicting the survival of GC.
To data, GC still is one of the challengeable tumors and has brought heavily economic and medical burden. Moreover, due to the lack of speci c and ideal prognostic biomarkers, GC patients can't receive reasonable treatment immediately. Many studies demonstrated that the existing biomarkers lack su cient sensitivity and speci city for cancer diagnosis and prognosis [30,31]. Importantly, some academics believe that the present TNM stage system need to be improved so as to effectively predict the individuals' survival [32,33]. With the development of life science and technology, increasing number of prognostic markers have been screened in GC. For example, Zhang et al. found that positive lymph node ratio can be a more precise indicator of survival in GC patients [34]. Liquid biopsy shows high speci city and accuracy in predicting the prognosis and drug response in GC patients [35,36]. Non-coding RNAs are also regarded as the promising markers for prognosis analysis in GC patients [30]. However, the translation of these markers into clinical application still leaves much to be desired. Researchers need to explore the mechanisms behind the dysregulation of biomarker. In addition, the use of these biomarkers are expected to be evaluated in large samples.
The last few decades have seen rapid strides in gene chip assays and second-generation gene sequencing, which have greatly promoted development of personalized medicine and precision medicine. Increasing studies have focused on the eld of genomic and proteomics data analyses for exploring the ideal markers and targets for cancer management. We believe that these methods can nally consolidate and improve the current prognosis judging system for GC patients. To the best of our knowledge, our study is the rst to combine the totally reported autophagy-associated genes with GC and investigate their prognostic roles in GC. We identi ed 28 differentially expressed autophagy-associated genes from 375 GC and 32 normal tissues. Then, the GO and KEGG analysis were performed to investigate the potential roles and underlying mechanisms of these genes. Though the Cox survival analysis and Lasso regression analysis, a risk model based on 4 autophagy-associated genes (GRID2, ATG4D, GABARAPL2, and CXCR4) was established. We divided GC patients into two groups according to the risk score obtained from this model. Kaplan-Meier method and ROC curves suggested that model performed well. Moreover, we construct a nomogram that combined the risk score and clinicopathological characteristics for predicting 3-year and 5-year survival rate of GC patients. C-index and calibration plots veri ed an e cient performance of the nomogram for predicting individuals' survival.
Nonetheless, there are several de ciencies in this study. Firstly, our study primarily focused on the prognostic role of selected autophagy-associated genes and didn't deeply investigate the other autophagy-associated genes. Secondly, although we evaluated the predictive model using Kaplan-Meier method and ROC curves, additional examination of the model should be carried out in large clinical samples. Finally, further studies on the 4 OS-related autophagy-associated genes may facilitate the targeted therapy in GC.

Conclusions
In summary, this study evaluated gene expression pro les of 232 autophagy-associated genes and developed a OS-related predictive model, which had a good e cacy in guiding the personalized medicine for patients with GC. Totally 4 OS-related autophagy-associated genes (GRID2, ATG4D, GABARAPL2, and CXCR4) were identi ed. These ndings indicated that autophagy-associated genes signature can serve as a speci c and effective prognostic biomarker for GC patients. What's more, in-depth studies for these genes may also facilitate the targeted therapy of GC.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data used in the study were downloaded from The Cancer Genome Atlas (TCGA) database.

Competing interests
The authors declare that they have no con icts of interest.

Funding
This study was supported in part by grant from the Scienti c Foundation of Shaanxi Province (S2019ZDCXL-01-02-01) and grant from the National Clinical Research Center for Digestive Diseases (2015BAI13B07).

Authors' contributions
Conceived and designed the study: YH, DF, and LH. Collected the literature: XW, YL, and YZ. Wrote the manuscript: WY, LD, XZ and LN. Revised the manuscript: QZ, YH, DF, and LH; Statistical analysis: WZ and JL. All authors approved the nal version of manuscript. Figure 1 Differentially expressed autophagy-associated genes in GC and non-tumor tissues. a. The heatmap of differentially expressed autophagy-associated genes between normal tissues and GC tissues; b. The volcano map of autophagy-associated genes. The blue dots mean downregulated genes and the red dots represent upregulated genes.

Figure 2
The expression patterns of 28 autophagy-associated genes in GC and normal controls. The red and blue box plots represent GC and normal controls, respectively.