Effect of Aging-Related Genes on the Prognosis of Colon Cancer

Background:Colon cancer is a common malignant cancer with high incidence and poor prognosis. Cell senescence and apoptosis are important mechanisms o related genes(ARGs) play an important role. This study aimed to establish a prognostic risk model based on ARGs for diagnosis and prognosis prediction of co Methods: We downloaded transcriptome data and clinical information of colon cancer patients from the Cancer Genome Atlas(TCGA) database and the microarray dataset(GSE39582) from the Gene Expression Omnibus(GEO) database. Univariate COX, least absolute shrinkage and selection operator(LASSO) regression algorithm and multivariate COX regression analysis were used to construct a 6-ARG prognosis model and calculated the riskScore. The prognostic signatures is validated by internal validation cohort and external validation cohort(GSE39582).In addition, functional enrichment pathways and immune microenvironment of aging-related genes(ARGs) were also analyzed. We also analyzed the correlation between rsikScore and clinical features and constructed a nomogram based on riskScore. We are the rst to construct prognostic nomogram based on ARGs. Results: Through univariate COX,LASSO regression algorithm and multivariate COX regression analysis,6 prognostic ARGs (PDPK1,RAD52,GSR,IL7,BDNF and SERPINE1) were screened out and riskScore was constructed. We have veried that riskScore has good prognostic value in both internal validation cohort and external validation cohort. Pathway enrichment and immunoanalysis of ARGs provide a direction for the treatment of colon cancer patients. We also found that riskScore was closely related to the clinical characteristics of patients. Based on riskScore and related clinical features, we constructed a nomogram, which has good predictive performance. Conclusion: The 6-ARG prognostic signature we constructed has a certain clinical predictive ability. Its riskScore is also closely related to clinical characteristics, and nomogram based on this has stronger predictive ability than a single indicator. ARGs and the nomogram we constructed may provide a promising treatment for colon cancer patients.

reduce the risk of cancer (8,9); Cell apoptosis refers to the programmed cell death controlled by genes. aging-related genes(ARGs) play an important role in cell senescence and apoptosis. The expression of sense-cence genes can inhibit tumor cell proliferation and activate the body's speci c immune response by promoting tumor cell senescence and apoptosis, and promote the killing and elimination of tumor cells (5,10,11). However, the high expression of some ARGs can also inhibit tumor cell senescence and apoptosis (12,13). It is currently known that the abnormal activation of multiple signal pathways such as PI3K/AKT, TGFβ/SMADs, RAS, p53/p21, P16/Rb, etc., is involved in the process of tumor senescence or apoptosis (14,15).
Due to the tumor heterogeneity of colon cancer, the diagnosis, treatment and prognosis of patients are different. Therefore, targeted treatment plans should be given according to personal risk factors and genetic factors. Studying the molecular characteristics at the genomic level can provide more insights for treatment and prognosis. Moreover, the prognosis and molecular mechanisms of colon cancer in different sites are also different (16,17), so understanding the changes in the molecular mechanism of colon cancer and nding new biomarkers is of great signi cance to the prognosis of colon cancer. In order to conduct a more in-depth study on the prognosis of colon cancer, we constructed the Aging-Related genes (ARGs) signature based on the Cancer Genome Atlas(TCGA) database and performed internal veri cation, and at the same time we performed external veri cation on the GSE39582 in the Gene Expression Omnibus(GEO) database. Finally, we also constructed a clinical risk prognosis model based on the ARG signature to improve the predictive performance of the model.

Data sources and processing
We downloaded transcriptome data and clinical information of colon cancer patients from the Cancer Genome Atlas(TCGA) database portal (https://portal.gdc.cancer.gov).The microarray dataset(GSE39582) was downloaded from the Gene Expression Omnibus(GEO) database (https://www.ncbi.nlm.nih.gov/geo/) containing 585 cases. The patients without overall survival(OS) time were excluded, and 339 cases from TCGA and 557 cases from GSE39582 were included. For analyzing the ARGs which affected OS in colon cancer patients, we used the sample function in R to randomly divide TCGA patients into the development cohort(n=170) and internal validation cohort(n=169) according to the ratio of 1:1.The cases from GSE39582 were used as external validation cohort.307 human ARGs from the Human Aging Genomic Resources (https://genomics.senescence.info/genes/).The speci c contents of ARGs are shown in Table S1.

Screening of prognostic ARGs
All ARGs were included in our study. For identifying the ARGs related to prognostic(P<0.05),the 'survival' package was used in R by univariate COX (18)in the development cohort. Considering that the prognostic ARGs were too many, least absolute shrinkage and selection operator(LASSO) regression algorithm (19) with penalty term was used to delete ARGs with multicollinearity.
2.3 Construction the prognosis ARG signature and generation of Riskscore Through multivariate COX regression analysis (20,21)with bothway recursive elimination,6-ARG prognosis model was established using the "glmnet" package.We performed an independent prognostic analysis of the risk score in development cohort. The riskScore was calculated as (22): With n means the number of ARGs in the signature, Expi means the levels of ARG expression in the signature and Coef means the estimated regression coe cient value from the COX-PH algorithm. According to the median of riskScore, patients were divided into high-risk and low-risk groups.

The validation of prognostic ARG models
Kaplan-Meier(K-M) survival analysis was perfromed to validate the predictive power of the model. The receiver operating characteristic (ROC) curve (23)was also plotted for judging the effect of the model. For the 6 ARGs in the model, we compared their expression in the normal group and the cancer group. What's more, we analyzed their expression enrichment in the high and low risk group using 'pheatmap' package (24) in R. All analyses were performed in the development cohort, internal validation cohort and external validation cohort.

Gene Ontology analyses in ARGs
To explore the pathways associated with ARGs, they were enriched by biological processes of Gene Ontology(GO) enrichment analyses(25) through a free online platform(http://www.bioinformatics.com.cn) for data analysis and visualization. The most signi cantly enriched pathways in the Molecular Function(MF),Biological Process(BP) and Cellular Component(CC) were visualized.

Immune microenvironment analysis
From previous studies (26)(27)(28)(29)(30)(31),immune-related gene set with 29 immune cell types and immune-related functions were obtained. We enriched 29 types of immune cells and their functions in each sample by single sample gene set enrichment analysis(ssGSEA) algorithm with 'GSVA' package in R (32).For further analysis of immune in ltration in the high and low risk groups, we also caculate the immune score, stromal score, ESTIMATE score and tumor puity using 'estimate' package. The differences between the two groups were compared through Mann-Whitney U test. We also visualizated the result using 'pheatmap' package(24).

Nomogram based on riskScore and clinical factors
We compared the expression of ARGs of riskScore in the patients with different clinical characteristics, and judged the correlation between riskScore and clinical characteristics of patients. Nomogram was constructed to assess 1-, 3-and 5-year OS in the development cohort. In order to judge the clinical application effect of the model, decision curve analysis (DCA) was used and the calibration curves of the nomogram were plotted. The tests were performed in the development cohort, internal validation cohort and external validation cohort.

3.result 3.1 Establish the prognostic model based on ARGs
For screening the ARGs associated with survival in colon cancer patients, we used univariate Cox under the condition of P <0.05 to analyze the data from the development cohort.26 prognostic ARGs were screened out and shown in Table 1.Then,we used LASSO algorithm for further screening of ARGs ( Figure   1A,B) and obtain 8 ARGs. Finally, multivariate COX regression analysis with bothway recursive elimination was used and the 6-ARG prognosis signature was constructed ( Figure 1C)( Table 2).The riskScore was calculated as: PDPK1*-1.450+RAD52*1.478+GSR*-0.665+IL7*-0.708+BDNF*1.413+SERPINE1*0.430.According to the median of riskScore, patients were assigned to the high risk or low risk group.

Prognostic value of the 6-ARG prognosis signature
To further analyze the ARGs in the model, we compared their expression in the normal and cancer groups. The results are shown in Figure 2.P<0.05 was de ned as statistically signi cant. The results showed that there were differences in the expression of all other ARGs except RAD52.According to Naccarati A et al. (33),the TT genotype of RAD52 rs11226 with longer survival in the colon cancer patients. Considering the limited amount of our data, there may be some deviation. Patients were divided into high risk and low risk groups based on the median riskScore in the development cohort. Kaplan-Meier(K-M) survival analysis were performed and the patients in high risk group was had signi cantly worse survival rate than those in low risk group ( Figure 3A).What's more, the area under the curve(AUC) of the riskScore for 1-year,2-year,3-year and5-year OS were 0.87,0.81 and 0.80 respectively ( Figure 3B).Scatter plots were used to show the distribution of patients in the high and low risk group ( Figure 3C).The distribution of patient riskScore was also shown in Figure 3D.The heatmap showed the cluster analysis results of ARGs in the model ( Figure 3E).The same analysis was performed in both internal validation cohort ( Figure S1) and external validation cohort ( Figure S2).In the internal validation cohort, AUC of the riskScore for 1-year,2-year,3-year and 5-year OS were 0.65,0.59 and 0.67 while that in the external validation cohort were 0.61,0.58 and 0.59 respectively.
To explore the functions and pathways associated with ARGs, GO enrichment analysis was performed ( Figure 4).The pathway with the highest enrichment score in BP is aging. Transcription regulator complex obtained the highest enrichment score in CC and in MF,DNA-binding transcription factor binding obtained the highest enrichment score. Mole DJ et al. 's research (34)showed that transcription regulatory complex (TRC) regulates osteopontin that is implicated in colorectal cancer dissemination. ARGs may be a key link between TRC and colorectal cancer. DNA-binding transcription factor binding plays an important role in telomeres against cell aging which was used to activate telomerase(35).

Immune microenvironment landscape
We obtained 29 immune cell types and immune-related functions and calculate immune score, stromal score, ESTIMATE score and tumor puity to explore the difference of immune in ltration between high and low risk groups by ssGSEA algorithm ( Figure 5). iDCs, NK cells and Th 2 cells showed higher levels in the low risk group. We found statistical differences in stromal score between the high and low risk groups. Studies (36) have shown that stromal score is related to the survival of colon cancer patients, and the role of ARGs in this study cannot be ignored.

Construction of nomogram
We applied univariate Cox ( Figure 7A) and multivariate COX regression analysis ( Figure 7B) nding that riskScore consistently showed signi cant statistical differences. Considering the correlation between riskScore and clinical features, we constructed nomogram ( Figure 7C) and judged its predictive power. In the development cohort, the AUC for 1-year,2-year,3-year and5-year OS were 0.85,0.90 and 0.83 ( Figure 7D).In the internal validation cohort, the AUC for 1-year,2year,3-year and5-year OS were 0.84,0.79 and 0.73 ( Figure 7E).In the external validation cohort, the AUC for 1-year,2-year,3-year and5-year OS were 0.72,0.67 and 0.65 ( Figure 7F).Compared with riskScore, nomogram based on riskScore has stronger predictive ability, which can provide some signi cant guiding value for clinical work. What's more, we performed calibration curves and DCA in the development cohort (Figure 8),internal validation cohort ( Figure S3) and external validation cohort ( Figure S4).Compared with stage, nomogram based on riskScore has higher prediction performance, which was re ected in all three cohorts. Therefore, the nomogram we have constructed do have more predictive power than a single indicator.

4.discussion
Colon cancer is one of the most common malignant tumors in the world with high mortality and poor prognosis. In 2018, there were about 1.1 million new cases, accounting for about 6.1% of the total cancer cases, and about 550,000 deaths Accounting for about 5.8% of the total deaths, In 2020, there were 930,000 deaths, or 10% of the total number of deaths (1,37). Therefore, nding suitable treatment decisions and improving the prognosis of colon cancer patients is particularly important. Due to the tumor heterogeneity of colon cancer, it is di cult to predict the prognosis of patients, and the prediction of prognosis by traditional factors is di cult to meet the needs (38). At present, many ARGs can be used as good prognostic markers for colon cancer. Our research used ARGs to construct a clinical risk prognostic model for colon cancer.
In this study, we collected 41 normal samples and 473 colon cancer patient samples from the TCGA database. There were 339 cancer samples with complete clinical data. 19 normal samples from the GEO database GSE39582, 566 colon cancer patients, and 557 cancer samples retained complete clinical data. Firstly, we extracted ARG based on TCGA database, and divided the samples into training set and test set according to 1:1. There were 170 cases in the training set, 169 cases in the test set, and the test set was used for internal veri cation. We performed Univariate Cox analysis on the training set, then performed Lasso dimensionality reduction, and then performed multi-factor cox analysis to screen for differentially expressed genes. Finally, we constructed a clinical risk prognosis model based on the differential genes, and externally veri ed the model with the GSE39582 chip. We also constructed a nomogram and a nomogram based on the prognostic model, and evaluated the prediction performance through 1, 3, and 5 year correction curves and DCA.
Our results show that the constructed model has good prognostic performance and is conducive to treatment decisions. The genes involved in the construction of prognostic models include PDPK1, RAD52, GSR, IL7, BDNF, and SERPINE1. These genes have been reported in previous studies to be associated with the prognosis of colon cancer or other tumors. Among the six genes, RAD52, BDNF, and SERPINE1 are dangerous genes, and PDPK1, GSR, and IL7 are bene cial genes for prognosis. PDPK1 mainly interacts with the MAPK/AKT and PI3K/AKT/mTOR signaling pathways (39,40). The 3phosphoinositide-dependent protein kinase, the encoded product of PDPK1, can phosphorylate proteins downstream of these signaling pathways, which is closely related to cell proliferation and survival (40,41). In invasive breast cancer cell lines and ovarian cancer, PDPK1 is highly expressed (41), while it is low expressed in colon cancer cells. Inhibition of PDPK1 expression can promote cell senescence and reduce cell migration, (13,42). The expression level of GSR in colon cancer is low, and its encoded product is glutathione reductase, which participates in the reduction metabolism of glutathione. According to reports, glutathione peroxidation may be related to the malignant degree of colon cancer (43). In addition, when the expression of GSR and Fas is inhibited, it can promote the metastasis of colon cancer and is conducive to the proliferation of tumor cells (44). The coding product of IL7 gene is interleukin-7, and the combination of IL7 and its receptor IL7R is essential for T cells (45). It has been reported that IL7 can inhibit the development of colon cancer, and the level of IL7 in colon cancer samples is signi cantly reduced (10), which is consistent with our results. IL7 may play an anti-tumor role through apoptosis pathway [48]In addition, IL7 can also enhance the anti-tumor immunity by combining other factors to enhance the function of immune cells (46). The protein encoded by the mitosis (48). The high expression of RAD52 gene is related to the poor prognosis of colon cancer patients. BDNF has been shown to be related to a variety of signaling pathways, promoting colon cancer cell metastasis through ERK, PI3K/Akt, and p38 signaling pathways (49); activating PI3K/Akt and ERK pathways can increase the resistance of colon cancer cells to chemotherapy drugs, BDNF-Akt-Bcl2 signal, BDNF/TrkB signal is also related to the decrease of colon cancer cell apoptosis (12,50). In addition, BDNF can also promote the expression of endothelial growth factor (VEGF), and the VEGF pathway plays a key role in tumor-induced angiogenesis (2,12). The encoded product of SERPINE1 is plasminogen activator inhibitor 1, which has the effect of inhibiting brinolysis. SERPINE is signi cantly related to the poor prognosis of head and neck cancer, glioma, gastric cancer, and colon cancer (51)(52)(53). In colon cancer, the expression of SERPINE1 is up-regulated, which is related to the aggressiveness of the tumor. SERPINE1 is also considered to be a member of the (epithelial cell-mesenchymal transition) EMT pathway (54). When the expression of SERPINE1 is inhibited, the EMT process is also inhibited, and the level of tumor cell apoptosis increases and the invasiveness of tumor cells decreases (52).
In addition, we analyzed the immune cell in ltration of colon cancer specimens, and the results showed that there were statistical differences in iDCs, Th2 cells, and NK cells between the high-risk group and the low-risk group, among which iDCs and NK cells were signi cantly different. The expression of these three immune cells in the low-risk group was higher than that in the high-risk group. Colorectal tumor antigen can recruit dendritic cells and promote cell maturation and cytokine release, which is conducive to the generation of effective Th1 immune response (55). The survival rate of patients with high DC in ltration is signi cantly better than that of patients with low DC in ltration, the more immature dendritic cells in the tumor stroma and the more mature DCs at the edge of the in ltration, the longer the overall survival of the patient (55,56). Moreover, there is a signi cant positive correlation between immature DC and regulatory T cells (Treg), which may be related to the interaction between DC and Treg (56). NK cells are important natural immune cells in vivo, which are related to anti-tumor immunity, anti-viral infection, and immune regulation. NK cells can directly recognize and kill tumor cells, which is of great signi cance to tumor immunity. The in ltration degree of NK cells is positively correlated with the good prognosis of cancer patients (57). The cytokines in the body have a regulatory effect on the cytotoxicity and recruitment of NK cells, thereby regulating tumor immunity (58). Studies have shown that in patients with colon cancer, Th1/Th2 cytokines are imbalanced, which may be related to the immune evasion of tumor cells. This phenomenon promotes the occurrence, development and metastasis of tumors (59,60).
This study discovered the ARGs related to the prognosis of colon cancer patients and constructed a prognostic model, providing a new method for the prognosis assessment of colon cancer patients. Hovever, this study has some limitations. Firstly, the small number of samples may affect the accuracy and reliability of the prediction model. Therefore, it needs to be further veri ed by other independent large sample cohorts. In addition, the potential role and mechanism of these ARGs in the progression of colon cancer require further research in basic experiments.

Conclusion
Our results indicates that 6-ARG prognostic signature has a certain clinical predictive ability.Its riskScore is also closely related to clinical characteristics, and nomogram based on this has stronger predictive ability than a single indicator.ARGs may provide a promising treatment for colon cancer patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.