Identi ﬁ cation and Validation of Two Prognostic Autophagy-related Models in Clear Cell Renal Cell Carcinoma

Background: Clear cell renal cell carcinoma (ccRCC) is one of the most frequent malignancies. Increasing evidence has highlighted the critical roles of autophagy-related genes and autophagy-related long non-coding RNA (lncRNA) in ccRCC development and progression. Therefore, it is necessary to identify novel biomarkers associated with autophagy in ccRCC. Methods: A total of 507 ccRCC patients were included in our study and then randomly divided into a training cohort (n=255) and testing cohort (n=252). Univariate Cox regression models, Lasso regression analyses and multivariate Cox regression models were successively used for constructing gene model and lncRNA model. Receiver-operating characteristic (ROC) curve analysis, Kaplan-Meier (K-M) analysis and more functional analyses were applied for verifying the accuracy of the two models. Results: The autophagy-related genes (ARGs) prognostic model was constructed based on the six ARGs (EIF4EBP1, IFNG, BID, BIRC5, CX3CL1 and RAB24) and ve autophagy-related lncRNAs (AC093278.2, AC010326.3, AC099850.3, AC016773.1 and AC009549.1) and then ccRCC patients were signicantly stratied into high- and low-risk groups in terms of overall survival (OS). K-M survival analyses indicated that low-risk group had a lower mortality rate than high-risk group in the six-gene prognostic risk model (training cohort: P=4.138e-07; testing cohort: P=1.125e-03) and the same results were obtained in the case of the ve-lncRNA prognostic risk model (training cohort: P=4.564e-09; testing cohort: P=2.485e-03). The results of time-dependent ROC curves revealed six-gene prognostic risk model had a higher area under curve (AUC) of 0.765 than the ve-lncRNA prognostic risk model at an AUC of 0.759. Therefore, the gene model is an indicator as good as the model constructed by lncRNAs. In addition, further functional analysis indicated these genes were functionally involved in regulation of endopeptidase activity, regulation of peptidase activity, autophagy, human cytomegalovirus infection, shigellosis, autophagy-animal and HIF-1 signaling pathway. Conclusions: A total of six OS-related ARGs and ve autophagy-related lncRNA were identied in our current study. The two autophagy-related prediction models including genes and lncRNAs are reliable prognostic and predictive biomarkers for ccRCC. the two gene sets (go positive regulation of autophagy and kegg regulation of autophagy) in group for six-gene prognostic risk model while another two gene sets (go selective autophagy and go regulation of autophagy) indicated a better in the ve-lncRNA risk model


Introduction
Clear cell renal cell carcinoma (ccRCC) is one of the most aggressive and common types in the urinary system with a steadily rising rate of 2-4% each year, which accounts for 70-85% of renal cell carcinoma (RCC) [1]. In 2020, it is reported that in the US there will be approximately 73,750 new cases and 14,840 deaths from kidney cancer [2]. Not only African Americans but also Chinese have both a higher incidence and mortality rates for ccRCC [3]. Despite the rapidly evolving of experimental technologies and therapeutic guidelines in this eld, such as tyrosine kinase inhibitors (TKI) [4], optimized surgery [5] and immunotherapy [6], there is no curative treatment for advanced ccRCC. Therefore, it is warranted to develop candidate therapeutic biomarkers for improving survival of ccRCC patients. Autophagy is a signi cant self-digestion process of long-lasting proteins, misfolded proteins and damaged intracellular organelles which is increasingly considered as a protective mechanism against many tumor diseases [7]. In the past few years, considerable amount of evidence suggested that autophagy played a pivotal role in the growth and progression of RCC, mainly in maintaining cellular homeostasis and providing material and energy for cell survival [8,9].
Autophagy-related genes (ARGs) and long non-coding RNA (lncRNA) have traditionally been the hotspot of scienti c research. Kong et al. revealed the expression levels of autophagy-related genes, BECN1 and ATG5, were higher in chronic lymphocytic leukemia (CLL) compared with healthy controls [10]. Liu et al. found a novel CAIF-p53-myocardin axis as a potential therapeutic target in treatment of defective autophagy-associated cardiovascular diseases [11]. To better understand the impact of ARGs and lncRNAs on ccRCC, it is necessary to construct more prediction models to improve prediction accuracy so as to help the clinical application.
In this study, the expression of ARGs and lncRNAs that related to the prognosis in ccRCC patients from The Cancer Genome Atlas (TCGA) were elected and two corresponding prognostic prediction models were built by using a series of bioinformatic tools for the rst time. Receiver-operating characteristic (ROC) curve analysis, Kaplan-Meier (K-M) methods and independent prognostic analysis were used to assess the speci city and sensitivity of these models to determine the prognostic signi cance. Besides, further function and pathway enrichment analyses were conducted to provide hints concerning the roles of the prognostic genes in ccRCC progression. These ndings demonstrated an effective multi-dimensional biomarker strategy that would be effective in monitoring autophagy alteration and predicting the prognosis in ccRCC patients.

Methods
ccRCC data source and patient information RNA-seq data of 611 ccRCC tissue samples (539 primary ccRCC tissue samples, 72 normal controls) were altogether downloaded and extracted from TCGA. Among these patients, a total of 507 primary ccRCC patients with gene expression data and clinical follow-up information was involved in the present study after removal of patients with unknown clinical information and patients whose ID numbers did not match with their corresponding expression pro le and clinical data. All 507 ccRCC patients were randomly divided into a training cohort (n = 255) for identifying prognostic model and a testing cohort for validating its prognostic value of the model (n = 252). A sum up of 232 ARGs were obtained from the Human Autophagy Database (HADb, http://www. autophagy.lu/index.html) which is an autophagydedicated database aiming to reserve human genes involved in autophagy. Four autophagy gene sets (go positive regulation of autophagy M15852, go regulation of autophagy M10281, go selective autophagy M24317 and kegg regulation of autophagy M6382) were extracted from the Molecular Signatures Database v4.0 (http://www.broadinstitute.org/gsea/msigdb/index.jsp).
Prognostic Args And Lncrnas In Two Models 'limma' package in R software was applied to estimate differentially expressed ARGs between ccRCC and their non-tumor counterparts with cutoff of |log2 Fold Change (FC)|>1 and adjusted P-value < 0.05. Similarly, the autophagy-related lncRNAs in ccRCC patients were also screened using package 'limma' in R with thresholds of|cor|>0.6 and P < 0.001. Further, the univariate Cox regression models were constructed to determine prognosis-related ARGs and lncRNAs that were signi cantly linked to overall survival (OS) using the data in the training group (P < 0.001). In order to identify optimal predictive ARGs and lncRNAs, Lasso regression analyses were performed by package 'glmnet' in R via the cross-validation likelihood (cvl) method for 1000 times. Multivariate Cox regression models were used to construct the OS prognostic risk models. A total of six differentially expressed ARGs and ve autophagy-related lncRNAs were brought into nal prognostic models. The whole process is shown in Fig. 1.

Construction Of Prognostic Risk Model
Two prognostic risk models were respectively established based on the regression coe cients of the individual genes derived from the multivariate Cox regression model and the expression level of genes.
According to the median risk score of training group as the cutoff, each set was classi ed into a high-risk group and a low-risk group. A lower risk score indicates a better prognosis for the ccRCC patients. To assess the e ciency of the two models, the survival analyses were conducted by K-M methods to compare the high-risk and low-risk groups by package 'survival' in R and the time-dependent ROC curves for 3-year and 5-year were plotted by package "survivalROC" in R.

Functional Analysis
Gene set enrichment analysis (GSEA, http://www.broadinstitute.org/gsea/msigdb/index.jsp) was performed to identify the autophagy-related gene sets between high-risk and low-risk groups for two models. Normalized enrichment score (NES) and false discovery rate (FDR) were de ned as statistical differences. Principal component analysis (PCA) of each stage of selected genes was performed by 'limma' and 'scatterplot3d' package in R. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were carried out using the DAVID (https:// david.ncifcrf.gov/) to nd the major biological attributes of these genes. Signi cant enrichment results were visualized for high-dimensional information using R package 'Goplot'. The Correlation circle maps were drawn using R language.
The Cox regression models were used to evaluate the association between expression pro les and survival status of ccRCC patients. Univariate, LASSO and multivariate analyses were performed using the Cox proportional hazards regression model. The Kaplan-Meier curve was determined by GraphPad Prism 7. ROC curve and the corresponding area under the ROC curve (AUC) calculated from the ROC curves were used to evaluate the accuracy of the models that predicted prognosis and estimate the diagnostic value of gene expression using the package of "survivalROC" in R. Hazard ratios (HR) and 95% con dence intervals (CI) were calculated. P value < 0.05 was considered statistically signi cant.

Six ARGs and ve autophagy-related lncRNAs were screened out from the training cohort
There were 507 ccRCC patients with raw expression data and corresponding clinical follow-up information extracted from TCGA project in the present study which were randomly divided into a training dataset (n = 255) and a testing dataset (n = 252). A total of 222 ARGs were integrated from HADb database. Considered as the criteria of FDR < 0.05 and |log2(FC)|>1, 45 differentially expressed genes were nally obtained between tumor and non-tumor tissues including 36 up-regulated and 9 downregulated ARGs (Fig. 2). In addition, we identi ed 467 autophagy-related lncRNAs by co-expression networks (corFilter = 0.6, P < 0.001). Univariate Cox regression analyses were utilized to single out the prognostic ARGs and lncRNAs in the training group, resulting in 12 ARGs (P < 0.001) and 79 autophagyrelated lncRNAs (P < 0.001) associated with the ccRCC patients' overall survival. Lasso regression analyses of those screened genes (  (Table 1, Table 2). There were ve ARGs (EIF4EBP1, IFNG, BID, BIRC5, and RAB24) and four autophagy-related lncRNAs (AC010326.3, AC099850.3, AC016773.1 and AC009549.1) associated with shorter survival which were regarded as prognostic risky factors whereas the remaining one ARG (CX3CL1) and one autophagyrelated lncRNA (AC093278.2) with negative coe cient of multivariate Cox regression analyses may be protective factors due to the close association between their high expression and longer patients' survival.    (Fig. 6C, D). The preliminary analyses demonstrated the ve-lncRNA model had a better competitive performance than the another in the training dataset.
With the increase in the risk scores, Fig. 5G and Fig. 6G showed the distribution of the risk score, the number of patients in two risk groups and prognostic expression pro les in the training dataset. The expression levels of ARGs (EIF4EBP1, IFNG, BID, BIRC5, and RAB24) and lncRNAs (AC010326.3, AC099850.3, AC016773.1 and AC009549.1) were up-regulated and the expression levels of remaining ARG (CX3CL1) and lncRNA (AC093278.2) were down-regulated.
To further validate the prognostic power of the two models, patients in the testing dataset were marked as a high-risk group and a low-risk group by the same cutoff derived from the training dataset. In consistent with the ndings in the training dataset, K-M survival analysis (Fig. 5B, 6B) and ROC curves analysis (Fig. 5E, F; 6E, F) compared with the training dataset were summarized in Table 3. The heatmap of expression pro les, the risk scores and distribution and patients' survival status in the TCGA database were also performed in testing dataset (Fig. 5H, 6H). The partial analyses demonstrated the six-gene model had a better competitive performance than the latter in the testing dataset.

Comparison of the different status between high-risk and low-risk in two prognostic risk model
In order to further explore these differences in statuses, PCA were performed for all genes extracted from TCGA, ARGs and autophagy-related lncRNAs and six genes and ve lncRNAs selected nally under the analyses of two models, respectively. Figure 8 indicated the divider line between the red dots and the green dots became increasingly obvious with the small number of candidate genes selected. Four typical autophagy-related gene sets were employed for gene functional enrichment analysis to identify the lowrisk group is enrich in autophagy (P < 0.01) except for 'kegg regulation of autophagy' in the lncRNA model (P = 0.4). The higher the normalized enrichment score (NES) score is obtained, the greater the enrichment of the gene set. The results showed that the two gene sets (go positive regulation of autophagy and kegg regulation of autophagy) had a greater enrichment in the low-risk group for six-gene prognostic risk model while another two gene sets (go selective autophagy and go regulation of autophagy) indicated a better enrichment in the ve-lncRNA prognostic risk model (Fig. 9).
6. Further functional analysis of the six-gene prognostic risk model and ve-lncRNA prognostic risk model To explore the primary mechanism of the gene model, GO enrichment and KEGG analysis were performed according to the differentially expressed ARGs. The ARGs were found signi cantly related to top GO biological processes according to the results of DAVID, including regulation of endopeptidase activity, regulation of peptidase activity, autophagy and process utilizing autophagic mechanism (Fig. 10, P < 0.05). The heatmap of the relationship between ARGs and GO enrichment analysis was also displayed. Moreover, 5 KEGG signaling pathways functionally involved ARGs, consisting of human cytomegalovirus infection, shigellosis, autophagy-animal, HIF-1 signaling pathway and Kaposi sarcoma-associated herpesvirus infection (Fig. 11, P < 0.05). In addition, the expression correlation between prognostic ARGs and lncRNAs and all genes extracted from ccRCC in TCGA were examined. Correlation circle maps were plotted based on genes which had both positive correlation and negative correlation genes (Fig. 12, Table 2). Color intensity is proportional to the correlation coe cients. These results suggested that the two models might affect autophagy-related biological processes and pathways involved in ccRCC.

Discussion
Although there is a growing interest to use new rst-line treatment options as non-operative interventions or therapy protocols for ccRCC, including TKI cabozantinib, the immunotherapy combination of nivolumab /ipilimumab, and the combination of atezolizumab /bevacizumab, the individual prognosis of ccRCC is commonly less than satisfactory [4,6]. A number of lines of evidence suggested autophagy played an important role in cell survival, development, and cancer progression. Liu et al. found ccRCC was associated with mono-allelic loss and/or mutation of the critical autophagy-related gene ATG7, and autophagy had an anticancer role in ccRCC tumorigenesis [12]. Messai et al. suggested EPAS1 or the autophagy sensor ITPR1 may serve as a potential therapeutic target in the future because of the improvement of NK cell responses in patients with RCC [13]. LncRNAs are a type of functional non-coding transcripts that are longer than 200 nt. Recent studies have revealed lncRNAs played crucial roles in various biological processes and regulated gene expression via diverse mechanisms in eukaryotes [14]. Genes could produce a polypeptide chain or functional RNA which consist a series of nucleotide sequences and support the basic structure and performance of life. Therefore, this is de nitely an area where additional research input is needed in investigating the relevance of the expression levels of ARGs or autophagy-related lncRNAs for ccRCC.
In our research, six ARGs (EIF4EBP1, IFNG, BID, BIRC5, CX3CL1 and RAB24) and ve autophagy-related lncRNAs (AC093278.2, AC010326.3, AC099850.3, AC016773.1 and AC009549.1) were considered closely associated with patients' survival in ccRCC. IF4EBP1 acts as a signi cant effector in mTOR signaling pathway and the protein it encodes competitively binds to eukaryotic translation initiation factor 4E (EIF4E) to inhibit EIF4E complex assembly and thus blocking translation [15,16]. IF4EBP1 is involved in the occurrence and progression of a variety of malignancies and is associated with poor survival. Cha et al. identi ed EIF4EBP1 were anomaly up-regulated in hepatocellular carcinoma, and the protein overexpression indicated a short survival and poor prognosis [17]. An autophagy-related prognostic signature including EIF4EBP1 also has correlation with distant metastasis-free survival for breast cancer [18]. Additionally, IF4EBP1 was found correlated with poor prognosis of ccRCC and was a risk factor for ccRCC survival in the model we constructed. Interferon gamma (IFNG) has been known as the regulator for both tumor immune surveillance and tumorgenesis. Dong et al. showed that immunityrelated GTPase family member 1 was important for IFNG-mediated regulation of tumor cell growth in melanoma [19]. Also, speci c IFNG was close association with autophagy and immune responses against Mycobacterium tuberculosis [20,21]. Overexpression of the BH3-interacting domain death agonist (BID) protein sensitizes certain cancer cell lines to apoptosis induced by anticancer agents so that BID fused with TAT cell penetrating peptide and doxorubicin may be considered as a potential therapeutic method for advanced prostate cancer treatment [22]. However, the mechanisms underlying the resistance of tumor cell to BID have yet been fully understood. BIRC5 is a well-known cancer therapeutic target which has remained a central goal of survivin studies in the cancer eld. It has been used in the eld of gastric cancer [23], breast cancer [24], Leukemic [25] and so on. Besides, some previous studies had found the expression levels of BIRC5 was higher in RCC tissues than in adjacent normal tissues and was potential therapeutic targets for personalized medicine [26,27]. CX3CL1 is a multifunctional in ammatory chemokine with a single receptor CX3CR1. Yao et al. revealed the expression of CX3CR1 was associated with the process of cellular migration in vitro and tumor metastasis of ccRCC in vivo by clinical and molecular cellular evidence [28]. RAB24 is an atypical RAB protein that promotes the process of late autophagy for long time. RAB24 could terminate the autophagic and endocytic pathways under nutrientrich conditions [29]. Taken together, these ndings suggest the six genes might be important regulators in ccRCC progression and might function via the involvement of the above processes and pathways.
AC099850.3 has once been used to construct the lncRNA-miRNA-mRNA ceRNA network in squamous cell carcinoma of the tongue (SCCT) in order to bene t the diagnosis and treatment of SCCT [30]. Similarly, it was reported that AC016773.1 could clarify the pathogenesis of ccRCC and might provide potential therapeutic targets in laryngeal squamous cell carcinoma (LSCC) [31,32]. The rest of three lncRNAs above are either poorly investigated or have not been reported, which means our ndings suggested further research for them is imperative.
In the current study, the expression levels of ARGs and autophagy-related lncRNAs were extracted from TCGA by performing a genome-wide analysis to identify some molecular biomarkers associated with autophagy for detecting the prognosis of ccRCC patients. In order to improve its accuracy and credibility of the models, all patients were randomly divided into two equal portions, training cohort (n = 255) and testing cohort (n = 252). A total of 12 ARGs and 79 autophagy-related lncRNAs were found in the univariate Cox regression analysis. six ARGs (EIF4EBP1, IFNG, BID, BIRC5, CX3CL1 and RAB24) and ve autophagy-related lncRNAs (AC093278.2, AC010326.3, AC099850.3, AC016773.1 and AC009549.1) were used for constructing two OS-related prediction models via further Lasso regression analyses and multivariate Cox regression, respectively. The two prediction models were then classi ed ccRCC patients into the high-risk group and low-risk group with signi cantly different overall survival. K-M curve analysis demonstrated the competitive performance of the two models for predicting 5-year survival in the training dataset and testing dataset. Next, the prognostic power of the prediction models was further validated in ROC curves and independent prognostic analyses (age, gender, grade, T-staging and M-staging), demonstrated good reliability and reproducibility of the two models and even independent prognostic indicators in predicting ccRCC patients' survival. Finally, GO and KEGG analyses suggested that these genes were functionally associated with regulation of endopeptidase activity, regulation of peptidase activity, autophagy, human cytomegalovirus infection, shigellosis, autophagy-animal and HIF-1 signaling pathway. Consequently, further functional characterization of the six ARGs and ve autophagy-related lncRNAs would be bene cial to unraveling the underlying mechanism of ccRCC.
To the best of our knowledge, this is the rst study that compares gene and lncRNA models for predicting outcome of ccRCC patients. The two models were capable to stratify ccRCC patients into two risk groups with signi cantly different OS, showing greater prognostic power and having a chance to be routinely used as candidate biomarkers in the clinical work. However, our results indicated the six-gene prognostic risk model is superior to the lncRNA one through a series of veri cation methods in a way. The main probable reason is that the gene encode proteins directly involved in biological process whereas the lncRNA's main function is to regulate the coding potential of the genome. It is also the reason why the cancer genes were rst discovered and described in 1991 but lncRNAs have just been studied for a few years [33]. In any case, all the two models are particular excellent so that we suggest ccRCC patients can determinate the ve ARGs and six lncRNAs to improve the accuracy and operability of prognosis evaluation.
However, there were some de ciencies needed to be corrected in the future. Firstly, the data only came from TCGA instead of combination of information from multiple databases. Secondly, the deeper mechanisms why ARGs modulate the initiation and progression of ccRCC have not been elucidated. Therefore, additional work is needed to make up for these shortcomings and assess the e ciency of the two models.
In brief, a total of six OS-related ARGs (EIF4EBP1, IFNG, BID, BIRC5, CX3CL1 and RAB24) and ve autophagy-related lncRNAs (AC093278.2, AC010326.3, AC099850.3, AC016773.1 and AC009549.1) were identi ed in our current study. The two models can be used as independent prognostic biomarkers in stratifying risk subgroups in terms of survival for patients with ccRCC. The gene model had similarly super performance for risk strati cation compared to the lncRNA-related model. Moreover, further research of these pivotal genes and lncRNAs may guide the tailored therapy of ccRCC. Figure 10 The ARGs were found signi cantly related to top GO biological processes according to the results of DAVID, including regulation of endopeptidase activity, regulation of peptidase activity, autophagy and process utilizing autophagic mechanism Page 27/28 Figure 11 The heatmap of the relationship between ARGs and GO enrichment analysis was also displayed.

Declarations
Moreover, 5 KEGG signaling pathways functionally involved ARGs, consisting of human cytomegalovirus infection, shigellosis, autophagy-animal, HIF-1 signaling pathway and Kaposi sarcoma-associated herpesvirus infection Figure 12 Correlation circle maps were plotted based on genes which had both positive correlation and negative correlation genes