An Autophagy-Related Gene Signature Predicts Prognosis of Early Stage Colorectal Cancer

Background and aims A certain number of early-stage colorectal cancer (CRC) patients suffer tumor recurrence after initial curative resection. In this context, an effective prognostic biomarker model is constantly in need. Autophagy exhibits a dual role in tumorigenesis. Our study aims to develop an autophagy-related gene (ATG) signature based on high-throughput data analysis for disease-free survival (DFS) prognosis of patients with stage I/II CRC. Methods Gene expression proles and clinical information of CRC patients extracted from four public datasets were divided into training cohort (GSE39582, n=566), validation cohort (TCGA CRC, n=624), and meta-validation cohort (GSE37892 and GSE14333, n=420). Autophagy genes signicantly associated with prognosis were identied. Among 655 autophagy-related genes, a 10 gene ATG signature, which was signicantly associated with DFS in the training cohort (HR, 2.76[1.56–4.82]; P=2.06×10 -4 ), was constructed. The ATG signature, stratifying patients into high and low autophagy risk groups, was validated in the validation (HR, 2.29[1.15–4.55]; P=1.5×10 -2 ) and the meta-validation cohorts (HR, 2.5[1.03–6.06]; P=3.63×10 -2 ), and proved to be prognostic in a multivariate analysis. Functional analysis revealed enrichment of several immune/ inammatory processes in the high autophagy risk group, where signicantly higher inltration of T regulatory cells (Tregs) was observed. autophagy also participates in the polarization of monocytes into TAMs[29]. More new strategies targeting TAM polarization as well as autophagy await exploration and further studies are needed to clarify the above speculations. However, our prognostic ATG signature relies on the gene expression proles from microarray platforms, which make it too expensive and time-consuming to popularize in clinical application. In addition to the dataset limitations from retrospective studies, further prospective clinical tests are recommended to validate our results. Despite the limitations, our research proposes a novel perspective to predicting the prognosis of early CRC patients and offers valuable insights into the relationship between autophagy, immune/ inammatory response, and tumorigenesis.

calibrate the gene expression levels in the TCGA cohort. The 'combat' algorithm of the R package 'sva' and the z-scores were used to correct the batch effects, to standardize microarray data across multiple experiments, and compare them independent of the original hybridization intensities.

Construction of a prognostic autophagy-related gene (ATG) signature
To construct a prognostic ATG signature, we focused on 655 ATGs from 8 gene sets identi ed via MSigDB (version 6.2) [9][10][11] with the keyword "autophagy". Only 617 genes measured on all platforms involved in this study with high variation (determined by median absolute deviation (MAD) >0.5) were selected. After 1000 times of random Cox univariate regressions, genes with repeated signi cance, which indicated a strong relationship between ATGs and patients' DFS, were selected as candidates for the signature. A Cox proportional hazards regression model on CRC samples together with the least absolute shrinkage and selection operator (LASSO) were applied to minimize the risk of over-tting as well as to generate a risk model.
Patients were divided into low and high autophagy risk groups in accordance with the optimal ATG signature cutoff, which was de ned by the time-dependent receiver operating characteristic (ROC) curve analysis at 5 years of disease-free survival (DFS) in the training cohort.
The ATG signature by the shortest distance between the ROC curve and the point at 100% true positive rate and a 0% false-positive rate was deemed as the cutoff value.

Validation of ATG signature
C-index was employed in the training, independent validation, and meta-validation cohorts respectively to assess the predictive capability of the model. For further validation, the prognostic value of the ATG signature was evaluated in CRC patients with early stages (stage I&II) and all stages in different cohorts through survival analysis. Univariate and multivariate analyses of the ATG signature and available clinical parameters was performed to assess whether the ATG signature is an independent risk factor.

Functional analysis
Enrichment of potential pathways of the ATG signature by gene ontology (GO) analysis was performed on gPro ler (https://biit.cs.ut.ee/gpro ler/), and gene set enrichment analysis (GSEA) [12] was conducted using the Bioconductor package'fgsea'. Gene sets of cancer hallmarks from MSigDB with statistical signi cance (FDR-adjusted P<.05) were selected [13]. CIBERSORT [14] was used to dissect immune cell in ltration in different risk groups.

Statistical analysis
All statistical analyses were performed in R software (version 3.5.1). Categorical variables were reported as count. Continuous data were reported as mean with standard deviation (SD) and compared with the student's t-test. The LASSO regression was plotted using the 'glmnet' R package (version: 2.0-16). Time-dependent ROC curve analysis was done by the R package 'survivalROC' (version: 1.0.3).
Survival analysis was conducted using the Kaplan-Meier method and compared with the log-rank test. Univariate and multivariate analysis of ATG signature and clinical parameters were performed using the log-rank test. The statistical signi cance level was set at α=0.05, two-sided.

ATG signature establishment
After ltration with MAD >0.5, 617 genes measured on all platforms were selected for this study. By 1000 times of random Cox univariate regressions, 58 ATGs were identi ed to be strongly relevant to DFS and considered as candidates for the signature (Figure 1). A LASSO Cox regression in stage I and II patients in the training cohort (Table 1)  Based on the time-dependent ROC curve analysis at 5 year-DFS in the training cohort, the optimal cutoff of ATG signature that divided the patients into high and low autophagy risk groups was de ned as -0.009 ( Figure 3). The risk scores of all patients are shown in Supplementary Table 1. Survival analysis showed the DFS rate was higher in the low autophagy risk group compared to the high autophagy risk group for patients with early stages (stage I and II) in the training cohort (GSE39582) ( Figure 4A, 4B, 4C, HR, 2.76 [1.56-4.82]; P=2.06×10 -4 ). So was indicated for patients in all stages in the GSE39582 dataset ( Figure 5A, 5B, 5C, HR, 1.7[1.25-2.31]; P=5.21×10 -4 ).

Validation of the ATG signature
To assess the predictive capability of the risk model, C-index was rst applied to various cohorts which turned out to be 0.74 (95%CI, 0.63 -0.85) in the GSE39582 cohort, 0.70 (95%CI, 0.54 -0.85) in the TCGA cohort and 0.70 (95%CI, 0.51 -0.89) in the meta-validation cohort (Table 3), higher than those of Oncotype DX colon. To further verify whether the ATG signature predicted well in various populations, we employed the same formula to the independent validation cohort (TCGA) and the meta-validation cohort (GSE37892 and GSE14333).  (Table 4).  3.3 Functional analysis of the ATG signature GO analysis and GSEA were carried out to explore the biological function and signaling pathways of genes from the ATG signature. GO analysis revealed that the genes within the ATG signature were mostly involved in the regulation of autophagy and catabolic processes ( Figure 6A and Supplementary Table 2). GSEA was performed between different risk groups to further investigate the pathways that were signi cantly affected. We found a signi cant enrichment in multiple immune/ in ammatory pathways in the high autophagy risk group, including the IL6/JAK/STAT3 signaling pathway, the IL2/STAT5 pathway, the IFN-αpathway, the IFN-γ pathway, the in ammatory response pathway ( Figure 6B, P value<0.005). Some cell cycle/ metabolism associated pathways, including the G2-M, oxidative phosphorylation, E2F, MYC were also signi cantly enriched in the high autophagy risk group, in addition to a few classic pathways like mTORC1 and epithelial-mesenchymal transmission (EMT; P value<0.005).
As there was a signi cant enrichment in the immune/ in ammatory pathway through GSEA analysis, we conducted immune in ltration analysis. ESTIMATE algorithm displayed signi cant differences in the immune score (P=0.02) and ESTIMATE score (P=0.027) between high and low autophagy risk groups in the TCGA CRC cohort ( Figure 7A). In ltration of plasma cells and Tregs were enriched signi cantly in the high autophagy risk group compared with the low autophagy risk group in the GSE39582 and TCGA cohort ( Figure 7C). By contrast, M1 macrophage in ltration turned out to be signi cantly lower in the high autophagy risk group ( Figure 7C).

Adjuvant chemotherapy effects on different autophagy risk groups
Survival analysis conducted for different autophagy risk groups with and without adjuvant chemotherapy respectively, showed that for early stage CRC patients without adjuvant chemotherapy in the high autophagy risk group displayed poorer DFS in the GSE39582 cohort

Discussion
Thanks to the improved awareness of cancer screening, CRC is now detected at an early stage, resulting in a better rate of survival.
Surgery without chemotherapy, which was deemed as the curative treatment, was carried out on the majority of patients with stage I/II colon cancer and in some cases of stage I/II rectal cancer [2]. Indeed, it enabled prevention from unnecessary side effects of chemotherapy. Nevertheless, more than 20% of patients with stage I/II CRC who underwent surgical resection still suffered recurrence [13].
Quite a few multigene prognostic signatures have been developed for CRC, but none of them graduated to widespread application due to the uncertainty of prognostic accuracy. Accordingly, an effective prognostic model composed of multiple biomarkers to distinguish earlystage patients with a high risk of recurrence is crucial and necessary for elective adjuvant chemotherapy or other targeted treatments.
Emerging studies have revealed that autophagy functions diversely in the development, maintenance, and progression of tumors. While autophagy may prevent cellular cancerous transformation in normal tissue, it acts as a survival mechanism in established tumors, especially under stress conditions and in response to chemotherapy [14]. Several autophagy inhibitors and activators have been brought up as improved chemotherapeutic options for cancer treatment [15], but without su cient clinically signi cant results, especially in CRC.
Accordingly, further investigation on the biological mechanism of autophagy in the tumor microenvironment deserves attention, and more targets associated with autophagy awaits to be found. As we looked over the genes within the ATG signature, some of them have been found correlated to CRC, but mostly bear contextdependent biological functions in cancers, similar to autophagy. For example, the cytosolic histone deacetylase 6 (HDAC6) served as a tumor suppressor in hepatocellular carcinogenesis [16], while another study revealed HDAC6 inhibitor signi cantly suppressed the proliferation and viability, and induced apoptosis in CRC cells, where autophagy activation was observed [17]. Elevated Sirtuin 2 (SIRT2) was found to be associated with poor prognosis in CRC patients via its participation in tumor angiogenesis [18]. Meanwhile, in a separate study SIRT2 was found to be downregulated in colon cancer biopsies compared to adjacent noncancerous tissues, and overexpression of SIRT2 inhibited the proliferation and metastatic progression of SW480 cells [19]. In terms of autophagy-related functions, a previous investigation reported that in response to oxidative stress or serum starvation, SIRT2 dissociated as acetylated FOXO1, which later bound to autophagy protein 7 (ATG7) and induced autophagy in tumors [20]. As these inconsistencies make it di cult to clarify the role of autophagy in CRC, we further investigated the biological functions of the ATG signature, expecting to nd some clues in the autophagyrelated functions in tumors.
GSEA revealed that the ATG signature included genes that were robustly involved in multiple immune/ in ammatory pathways including IL6/JAK/STAT3, IL2/STAT5, IFN-α, IFN-γ, TNF-α/NF-κB, and the in ammatory response presented a particular relation to CRC proliferation or prognosis as previous studies revealed [21][22][23][24][25]. As our ndings suggested that a high autophagy risk score correlated with the downregulation of these immune/ in ammatory pathways, we speculated that autophagy might play a role in CRC tumorigenesis and tumor proliferation via an anti-immune/ anti-in ammatory response. Moreover, increased in ltration of Tregs and decreased in ltration of M1 macrophages observed in the high autophagy risk group during immune in ltration analysis seemingly catered to the anti-immune/ antiin ammatory response. No signi cant difference in cytotoxic T lymphocyte was found in the immune in ltration analysis. Tregs are known to suppress both antibody-mediated and cell-mediated immune responses [26]. The pro-in ammatory M1 macrophages, a phenotype of tumor-associated macrophages (TAMs), correlated to a better prognosis in CRC patients for its tumor-suppressing function [27,28]. By triggering an anti-immune or anti-in ammatory response, autophagy might promote polarization or re-polarization towards the M2 phenotype spontaneously and thus lead to the decrease of M1 in ltration observed. Previous studies have described the link between autophagy and macrophage polarization in tumor microenvironment. For example, mTOR which functions as a conserved kinase protein in the regulation of autophagy also participates in the polarization of monocytes into TAMs [29]. More new strategies targeting TAM polarization as well as autophagy await exploration and further studies are needed to clarify the above speculations.
However, our prognostic ATG signature relies on the gene expression pro les from microarray platforms, which make it too expensive and time-consuming to popularize in clinical application. In addition to the dataset limitations from retrospective studies, further prospective clinical tests are recommended to validate our results. Despite the limitations, our research proposes a novel perspective to predicting the prognosis of early CRC patients and offers valuable insights into the relationship between autophagy, immune/ in ammatory response, and tumorigenesis.

Conclusion
In conclusion, our study established a prognostic ATG signature that can effectively predict DFS for early-stage CRC patients. Meanwhile, we also revealed the possible correlation between ATG signature and immune/ in ammatory pathways.   Discovered ATG signature for prognostic prediction of colorectal cancer. (A). Ten ATGs signature found from the LASSO Cox regression.
(B). Heatmap of the identi ed 10-gene signature.  the TCGA cohort (F) and meta-validation cohort (I), respectively. P values comparing risk groups were calculated with the log-rank test.

Figure 5
The association of the ATG signature with DFS in CRC patients with all stages. Patients with CRC of early stage were ranked by autophagy risk scores in the training cohort (A), the TCGA cohort (D) and the meta-validation cohort (G). Time-dependent ROC curves for DFS in CRC patients achieved with different duration in the training cohort (B), the TCGA cohort (E) and the meta-validation cohort (H). Kaplan-Meier curves showed DFS of CRC patients in low and high autophagy groups in training cohort (C), the TCGA cohort (F) and meta-validation cohort (I), respectively. P values comparing risk groups were calculated with the log-rank test. Hazard ratios (HRs) and 95% CIs are for low vs high autophagy risk. CRC = colorectal cancer, ATG = autophagy-related gene, DFS = disease-free survival.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.