Construction and validation of prognostic model based on autophagy-related lncRNAs in gastric cancer

Gastric cancer (GC) is one of the most common cancer worldwide. Although emerging evidence indicates that autophagy-related long non-coding RNA (lncRNA) plays an important role in the progression of GC, the prognosis of GC based on autophagy is still deficient. The Cancer Genome of Atlas stomach adenocarcinoma (TCGA-STAD) dataset was downloaded and separated into a training set and a testing set randomly. Then, 24 autophagy-related lncRNAs were found strongly associated with the survival of the TCGA-STAD dataset. 11 lncRNAs were selected to build the risk score model through the least absolute shrinkage and selection operator (LASSO) regression. Every patient got a risk score (RS), and patients were separated into a high-risk group and a low-risk group due to the median RS. The multivariate Cox analysis showed that the RS could be an independent prognosis predictor. The Kaplan-Meier survival analysis and the Receiver Operating Characteristic (ROC) curve indicated the model had an excellent prediction effect. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the mRNAs in the prognostic network were mainly involved in the autophagy and ubiquitin-like protein ligase binding. Gene Set Enrichment Analysis (GSEA) analysis uncovered that the differentially expressed genes (DEGs) in the high-risk group partially participated in the ECM receptor interaction and other signaling pathways. Our results indicated that the risk score model based on the autophagy-related lncRNAs performed well in the prediction of prognosis for patients with GC.


Introduction
Gastric cancer (GC) is the fifth most prevalent tumor and one of the deadliest carcinomas worldwide (Bray et al., 2018), among which 90% are adenocarcinomas (Crew and Neugut, 2006). Although there has been a stable decrease in GC occurrence and mortality rates globally over several decades after the beginning of the 19th century (Bosetti et al., 2013;Kamangar et al., 2006), GC remains a high case fatality rate of 75% throughout the world (Fock, 2014). Due to the insufficient understanding of the molecular mechanism (Rocken, 2017) and the lack of relevant clinical prediction systems (Jemal et al., 2011), most patients with GC have already been in the advanced stage when they are diagnosed (Schmidt et al., 2005), which brings great trouble to the patient's survival (Verdecchia et al., 2007) and clinical therapy. Therefore, it is critical to construct an accurate prediction system for GC, which has the ability for early diagnosis and to prevent the disease before premalignant lesions have developed.
Autophagy is a highly conserved and evolutionarily ancient catabolic process that can degrade the misfolded proteins and damaged organelles (Mizushima, 2007). For the past few years, knowledge about the function of autophagy has developed. It has been found that autophagy participated in plenty of physiological processes in mammals, including quality control of proteins and organelles, immunity, nutrient deprivation, hypoxia, drug stimuli, stress, and prevention of neurodegeneration (Mizushima and Komatsu, 2011). Autophagy can also regulate biological process including apoptosis, protein synthesis, cell growth, and proliferation through the AMPK/mTOR pathway, PI3K/Akt/mTOR pathway, P53 pathway, and other signaling pathways (Chang et al., 2017;Ren et al., 2016;Zhao et al., 2015). Furthermore, the role of autophagy in the progression of carcinoma also has several breakthroughs.
The effect of autophagy is considered controversial in tumorigenesis, which can both promote and suppress cancer development under different cell types or stress modes (Glick et al., 2010). It is thought that autophagy prevented carcinogenesis (Aita et al., 1999;Liang et al., 1999). Nevertheless, once the tumor is established, increased autophagic flux often promotes tumor cell growth by providing energy and vital compounds upon various stress stimuli (White, 2012). Similarly, growing studies have also proved that autophagy had a significant effect on the GC.
The role of autophagy in GC is also complex and contradictory. Some research supported that autophagy was a tumor promoter in GC. Cause autophagy inhibitor will destroy the protective mechanism of GC cell and promotes cell death induced by therapeutic drugs (Li et al., 2017b;Maes et al., 2014;Maes et al., 2013). However, PD-L1 expression was also found enhanced by autophagy inhibition in GC (Wang et al., 2019b). Moreover, a study has shown that autophagy inducers such as AMPK inducers could be used in the GC treatment, which will lead to autophagic cell death of tumor cells .
In recent years, the relationship between long non-coding RNA (lncRNA) and autophagy has several breaks in the diagnosis, treatment, and prognosis of GC. Pieces of evidence showed that lncRNA is vital for the occurrence, prognosis, and chemoresistance of GC by regulating autophagy-related mRNA (Hu et al., 2017;Zhao et al., 2014;Zhu et al., 2019). Some studies have demonstrated that silencing of LINC01419 and CCAT2 promotes autophagy through constraining the PI3K/Akt1/mTOR pathway, thus inhibiting the invasion and migration of GC cells Yu et al., 2018). Another study also revealed that autophagy was associated with the proliferation of GC cells, partially due to the MALAT1 promoted by downregulating miR-204 (Shao et al., 2020). However, most of these experiments only explored the role of one or a few lncRNAs in the autophagy of GC and could not fully explain the relevant mechanisms.
To be concluded, the role of autophagy-related lncRNA in GC is complicated and controversial, which involves hundreds of molecules in this process. Therefore, a model consisted of multiple autophagy-related lncRNAs will have a better prognosis predicting accuracy than the single. For this purpose, we built a risk score model based on the multiple autophagy-related lncRNAs, which also performed well in the prediction of prognosis for the GC patients in the clinic.

Materials and Methods
Screening for the autophagy-associated lncRNAs Autophagy-associated genes were obtained through the Human Autophagy Database (HADb, http://www. autophagy.lu/). Pearson correlation coefficient was calculated using the "limma" package from R (http://www. bioconductor.org) to determine the correlation between the expression of the lncRNAs and the corresponding autophagy-related genes. Then, lncRNAs with Pearson correlation coefficient > 0.3 and p < 0.001 were filtered out to be the autophagy-related lncRNAs.

Data acquisition
All 350 patients' data of STAD were downloaded from The Cancer Genome Atlas (TCGA) website (https://portal.gdc. cancer.gov/). All the data was submitted to R software and randomly separated into the training set and testing set.

Identification of the prognostic autophagy-related lncRNAs
Kaplan-Meier and univariate Cox regression analysis was applied to select the prognostic autophagy-related lncRNA in the TCGA-STAD dataset (P < 0.05).

Survival analysis
Least absolute shrinkage and selection operator (LASSO) regression was adopted to build the model using the R package "glmnet" with prognostic autophagy-related lncRNAs selected before. In this process, only the most potent prognostic markers were chosen and constituted the optimal panel of prognosis that achieved the best performance. Each patient got a risk score (RS) incorporating the expression of the lncRNAs ðExp i Þ and the corresponding LASSO coefficients ðCoef i Þ, Risk Score = P n i¼1 Exp i ÃCoef i . The median RS was chosen to divide patients into a high-risk group and a low-risk group in the training set, testing set and TCGA-STAD dataset, respectively. Multivariate Cox regression was used to assess whether the RS could be an independent predictor. Kaplan-Meier curves were used to check the significant difference in survival probability for patients between two types of risk groups. An area under the Receiver Operating Characteristic (ROC) curve (AUC) was used to examine the accuracy. All statistics were processed through R version 3.6.3 software (https://www.r-project.org/) (p < 0.05).

Network construction and pathway analysis
The mRNA expression data of these GC patients were downloaded from the TCGA database. The mRNAs co-expressed with the lncRNAs in the risk score model were selected from the mRNA expression data by using the correlation test through the "limma" package (http://www. bioconductor.org) with R software (Pearson correlation coefficient > 0.3, p < 0.001). The network was constructed and visualized through Cytoscape software (https:// cytoscape.org/). The Sankey diagram was built using the R software. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed using the "clusterProfiler" package of R software (Gene ratio > 0.05, p < 0.05). The signaling pathways that the differentially expressed genes (DEGs) mainly participated in between two types of risk groups were distinguished through Gene Set Enrichment Analysis (GSEA).

Development of nomogram
A nomogram was constructed through the "rms" package for R (https://cran.r-project.org/web/packages/rms/), combining the risk type and other clinical features including age, gender, stage, and TNM stage to predict the individual's survival probability.

Results
Characteristics of patients TCGA-STAD dataset was patients diagnosed with stomach adenocarcinoma, which consisted of a total of 350 patients. Kaplan-Meier survival curves were plotted for the TCGA-STAD dataset regarding major clinical features (Suppl. Fig. S1).

Selecting of prognostic autophagy-related LncRNA for TCGA-STAD dataset
The autophagy-related lncRNAs were obtained by using the correlation test of autophagy-related genes (Pearson correlation coefficient > 0.3, p < 0.001). A total of 24 autophagy-related lncRNAs were significantly correlated with the survival for TCGA-STAD through Kaplan-Meier and univariate Cox regression analysis (p < 0.05) (Fig. 1).
Construction of prognostic risk score model based on autophagy-related lncRNAs All the patients from TCGA -STAD cohort were randomly separated into a training set and testing set (p > 0.05) (Tab. 1). Then all these 24 identified prognostic lncRNAs were analyzed with the LASSO regression analysis ( Fig. 2A) and 3-fold positive cross-validation (Fig. 2B) in the training set. The regression coefficient was computed to select the optimal panel of prognosis. The model achieved the best performance while 11 autophagy-related lncRNAs were included. Combining the expression of the autophagyrelated lncRNAs and LASSO coefficients (Tab. 2), each patient got an RS as a measure of survival risk.
The correlation between RS and survival time and status in STAD patients was also plotted in the training set (Fig. 3D), testing set (Fig. 3E), and TCGA-STAD dataset (Fig. 3F). In all these datasets, patients with higher RS tended to face a shorter survival time and worse survival status.
Multivariate Cox regression analysis was used to evaluate the independent prognostic indicators. The results of the multivariable analysis showed the RS was a robust and independent prognostic indicator in the training set ( Fig. 4A), testing set ( Fig. 4B), and TCGA-STAD dataset ( Fig. 4C) (p < 0.05). Kaplan-Meier analysis was also conducted on the training set ( Fig. 4D), testing set ( Fig. 4E), and TCGA-STAD dataset (Fig. 4F). Distinctly, there is a strong and significant difference between the two types of risk groups (p < 0.0005). Furthermore, a ROC curve was adopted to appraise the accuracy of the model in the training set (Fig. 4G), testing set (Fig. 4H), and TCGA-STAD dataset (Fig. 4I). The area under the curve (AUC) was applied to evaluate the accuracy of the model's   prediction in 1-year, 3-year, and 5-year. The value of AUC prompts the model to predict well in prognostic prediction.

Construction of the prognostic mRNA-LncRNA network and identification of involved signaling pathways
The mRNAs co-expressed with the lncRNAs in the model were obtained by using the correlation test with R (Suppl. Tab. S1). Then all these mRNAs and lncRNAs were used to construct the prognostic mRNA-lncRNA interaction network (Figs. 5A-5B). The most significant GO terms for biological process (BP) (Fig. 5C), cellular component (CC) (Fig. 5D), and molecular function (MF) (Fig. 5E), as well as KEGG pathways (Fig. 5F), were analyzed to reveal potential biological functions of the mRNAs in the network (Gene ratio >0.05, p < 0.05). The results showed that the mRNAs in the network were mainly involved in autophagy both in GO and KEGG analysis. GSEA analysis uncovered that the DEGs were significantly enriched in several common signaling pathways (Tab. 3).

Nomograms for personalized prognostic prediction in STAD patients
A nomogram incorporating the risk type and clinical characters was built to estimate individual survival probability quantitatively of STAD patients. The concordance index (C-index) of the nomogram was 0.682. The plot showed the 1-year, 3-year, and 5-year survival probabilities in the TCGA-STAD dataset (Fig. 6).

Discussion
Autophagy is a self-degradative process that plays an essential role in equilibrating energy, eliminating misfolded or aggregated proteins, and reacting to stimuli (Ravanan et al., 2017). To date, three types of autophagy have been found: macroautophagy, microautophagy, and chaperonemediated autophagy (Mizushima and Komatsu, 2011). Macroautophagy is considered the main form of autophagy, which is extensively studied compared to the other two types (Munz, 2017;Nishida et al., 2016;Wen and Klionsky, 2016). The process of autophagy is also classified into four critical steps: initiation, nucleation, maturation, and degradation (Feng et al., 2014). In the past decades, researchers have discovered 32 autophagy-related genes (Atgs) in yeast, most of which are also highly conserved in mammals significantly (Nakatogawa et al., 2009). In recent years, some studies have reported that autophagy has a strong relationship between the prognosis and survival in GC. Qu et al. (2017) found that Beclin1 (protein homolog of the yeast ATG6) was much overexpressed in malignant tissues than the nonmalignant tissues in GC. Moreover, they also found that overexpression of Beclin1 was related to a poor prognosis for GC. Liao et al. (2014) discovered that the "stone-like" structure pattern of LC3A (an autophagosomal biomarker) was correlated with increased recurrence and worse survival possibilities in gastric carcinoma.
As recognized previously, lncRNAs such as H19 and HOTAIR played a key role as primary regulators in the carcinogenesis of GC (Cheng et al., 2018;Endo et al., 2013;Okugawa et al., 2014). Until now, numerous studies have proved that lncRNAs were highly active in plenty of pathological processes of GC, such as proliferation and metastasis. Among them, some lncRNAs were defined as protective factors while others were risk factors (Fanelli et al., 2018;Guo et al., 2014;Li et al., 2017a;Ma et al., 2016;Zhu et al., 2019). Furthermore, several studies have proved that lncRNAs participated in the progression, especially malignant progression of GC through regulating autophagyrelated mRNAs (Chen et al., 2018;Shao et al., 2020;Wang et al., 2019a;Hu et al., 2017;Yu et al., 2018;Zhao et al., 2014;Zhu et al., 2019).
Although numerous studies have been performed and much is known about lncRNAs and autophagy in GC, previous studies mainly focused on the single lncRNA. The prognostic system that relied on the multiple autophagyrelated lncRNAs is still not clear. More importantly, as one of the deadliest cancers all over the world, prognosis evaluation of patients with GC still depends too much on the pathological analysis currently, which also facing many challenges and inconvenience in the clinic.
In this study, we systematically analyzed the prognostic value of autophagy-related lncRNAs in GC using bioinformatics and statistical tools with R software. A total of 24 lncRNAs was found strongly linked with the survival of TCGA-STAD based on Kaplan-Meier and univariate Cox regression analysis. We used the LASSO regression analysis to build the model in the training set and found 11 prognostic signatures of lncRNAs. The RS was calculated by integrating lncRNAs expression levels and corresponding LASSO coefficients for each patient. AC005586.1, AL353804.1, IPO5P1, AP003392.1, AL355574.1 and AC092574.1 were considered as protective lncRNA while LINC01705, AP001528.2, AC009948.1, HAGLR, and AP001033.2 were risk lncRNA. The accuracy of the model was tested in the testing set, and the TCGA-STAD dataset and the RS were found to significantly correspond with patient outcomes in both the testing set and TCGA-STAD dataset.
GO analysis revealed that the mRNAs in the prognostic network were mainly involved in autophagy, which is consistent with the expected results. The MF of GO analysis uncovered that these mRNAs also had a link with ubiquitin or ubiquitin-like protein ligase binding. Ubiquitin (Ub) is a protein highly conserved in all eukaryotes and bears many potential sites for additional post-translational modifications (Komander and Rape, 2012). Ub was one of the most prominent factors in modifying protein substrates and degradation (Zheng and Shabek, 2017). The proteolytic system based on ubiquitin and autophagy is two prime systems in eukaryotic cells (Ciechanover, 2015;Ciechanover and Kwon, 2017). Studies have shown that ubiquitin, as a capital regulator, has participated in all processes in the autophagy flux (McEwan and Dikic, 2011). Atg8 was a ubiquitin-like protein, which was also found crucial for the autophagosome formation and consisted of the lipid conjugation system in autophagy (Nakatogawa et al., 2007).
KEGG analysis uncovered that the mRNAs were primarily involved in the apoptosis, mTOR pathway, p53 pathway, and PI3K-Akt pathway. There are two types of autophagy-related signaling pathways. One is the mTOR-dependent pathway, such as the AMPK/mTOR and PI3K/Akt/mTOR pathways, and the other is the non-mTOR-dependent pathway, such as the p53 pathway (Cao et al., 2019). The PI3K/AKT/mTOR pathway regulates many biological processes, including autophagy, and is frequently activated in various human cancers (Aoki and Fujishita, 2017). The molecular changes in the PI3K/Akt/mTOR signaling pathway were also found could increase the clinical stage and promote the recurrence in carcinoma (Aoki and Fujishita, 2017;Morgan et al., 2009). In GC, PI3K/Akt/mTOR activation was found significantly upregulated in GC cells, which results in the inhibition of autophagy of GC cells (Morgan et al., 2009). Additionally, inhibition of the PI3K/Akt/mTOR pathway increased the autophagic flux and promoted the apoptosis of cancer cells (Saiki et al., 2011).
The results of the GSEA analysis showed that the highrisk group was much active in the ECM receptor interaction. Some studies indicated that autophagy affected the extracellular matrix (ECM), thus participating in the invasiveness and metastasis of cancer cells. In a rapidly growing and aggressive tumor, biosynthesis is obviously highly demanded. And in this process, detachment-induced autophagy will help the cancer cells get rid of ECM contact and promoting the metastasis subsequently during the advanced cancer stage (Guadamillas et al., 2011;Lock and Debnath, 2008). Autophagy will also induce metastatic cancer cells to hibernation if a steady connection was not established between the new ECM microenvironment and the cancer cells (Lu et al., 2008;Su et al., 2015).
At last, we constructed a nomogram combining the risk type and other clinical features, including age, gender, stage,  and TNM stage, which can predict an individual's clinical outcome quantitatively. By using the nomogram, every patient will get total points based on his or her various indicators, respectively. The total points will predict the patient's survival probability in 1, 3-, and 5-year. Obviously, the higher the total points are, the lower the survival probability is.

Conclusion
In conclusion, we identified a prognostic risk score model consisted of 11 autophagy-related lncRNAs based on the TCGA-STAD dataset. This model was an independent predictor and performed well for prognosis in the training set, testing set, and TCGA-STAD dataset. A nomogram incorporating the model and clinical features could accurately predict the survival rate for individual GC patients. Our finding suggests that the 11-autophagy lncRNA risk score model may help facilitate individual prediction of a patient's prognosis with GC in the clinic.
Availability of Data and Materials: The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. SUPPLEMENTARY FIGURE S1. Kaplan-Meier survival curves for TCGA-STAD dataset. Kaplan-Meier survival analysis regarding prime clinical features in GC. *p < 0.05.

SUPPLEMENTARY TABLE S1
The list of co-expressed mRNAs with the autophagy-related lncRNAs in the risk score model