A Tumor-Infiltrating Lymphocyte-Based Risk Model Predicts Prognosis of Colon Cancer


 Tumor-infiltrating lymphocytes are relevant to the tumor prognosis and response to immunotherapy in colon cancer. The gene expression data of colon cancer was obtained from the cancer genome atlas (TCGA) database and the components of immune cell types were analyzed by CIBERSORT. Selection operator (LASSO) and multivariate Cox regression filtered immune cells and selected the most significant cell types to construct an immune risk model including memory B cells, plasma cells, T follicular helper (Tfh) cells, M0 macrophages and resting dendritic cells. Receiver operating characteristic (ROC) curves were used to verify the sensitivity and specificity of the model, which was validated in Gene Expression Omnibus (GEO) dataset. Combined with the clinical traits, a nomogram was established to predict the prognosis of colon cancer. According to function analyses through weighted correlation network analysis (WGCNA) and gene set enrichment analysis (GSEA), tumor-infiltrating immune cells showed significant importance in tumor immune-associated regulation, especially the adhesion, migration and invasion in colon cancer.


Introduction
As one of the most common cancer in the world, colon cancer is the fth most signi cant causes of cancer death according to GLOBOCAN 2018 statistics [1] . Although the prognosis of colon cancer is gradually improved with the widespread colonoscopy and development of diagnosis [2] , biomarkers or indications of prognosis still remain uncertain.
Tumor microenvironment (TME), composed of heterogeneous cancer cells, cancer-associated broblasts (CAFs), stromal myo broblasts, endothelial cells, diverse immune cells and tissue cells, is critical in initiation and progression of colon cancer [3] . As one of the critical components of TME, research has shown that immune cell in ltration can predict the prognosis of colon cancer [4] . Immune cells are both associated with good and poor prognosis in colon cancer, through which a complex immune network is shaped [5] . On account of the immune-mediated anti-tumor immune responses, immunotherapy becomes more and more important in treatment of cancer. As a kind of immunotherapy, immune checkpoint inhibitor therapy (ICI) shows superiority in the treatment of cancers such as lung cancer and melanoma [6,7] . However, ICI in colon cancer is not commonly recommended due to the hyporeactivity [8,9] . Nonetheless, the recent research found that tumor-in ltrating CD4 + and CD8 + T-cell subsets co-expressed targetable activating and inhibitory receptors, which could enhance PD-L1 blockade therapy and improve prognosis [10] . Therefore, the application of ICI is not worthless but need further investigation of immune microenvironment in colon cancer.
Based on the quanti cation of lymphocyte populations, immunoscore is a prognostic factor as well as a marker to predict the sensitivity of biotherapies targeting key immune checkpoints [11] . In addition to immunoscore, the evaluation of immune cell types is also associated with prognosis in cancer. A new computational deconvolution algorithm (CIBERSORT) is recommended to analyze RNA mixtures for cellular biomarkers and therapeutic targets and identify the immune cell subsets. [12] CIBERSORT can assess the tumor-in ltrating immune cells in tumor, through which prognosis models of immune cells in ltration are constructed in several types of cancer including breast, gastric and lung cancer [13][14][15] . As a signi cant part of tumor-in ltrating immune cells, tumor-in ltrating lymphocytes (TILs) have been shown to indicate the clinical prognosis in colon cancer [16][17][18] .
Here, CIBERSORT was used to identify TILs of dataset from the cancer genome atlas (TCGA) database.
Then we constructed a new prognosis model using selection operator (LASSO) and multivariate Cox regression. The model was con rmed in testing group from the Gene Expression Omnibus (GEO) database. Weighted correlation network analysis (WGCNA) and gene set enrichment analysis (GSEA) analyzed the function of genes related to risk type [19,20] , which paved the way for the further study of TILs and potential immune treatments in colon cancer.

Flow chart
Colon cancer samples were obtained from TCGA and the further analysis process of this study was shown in the ow chart ( Figure 1). "High risk" and "low risk" groups were distinguished by median risk scores.

Validation of the immune risk model
To evaluate the immune risk model, we contrasted OS of two risk groups respectively in training and testing cohort (cases from GEO database). In both cohorts, the high-risk group showed worse OS signi cantly compared with the low-risk group (p < 0.0001 and p < 0.05; Figure 3A

Establishment of clinical prognosis model
The factors independently predicting disease outcomes were identi ed by univariate and multivariate Cox regression analyses, according to which a new clinical model was constructed. We integrated the immune risk type and clinical characteristics and establish a nomogram including stage, gender, risk type and age to predict the 1-, 2-, and 3-year OS ( Figure 3E). The C-index of the nomogram was 0.748 and 0.693 respectively in training and testing cohort.

The weighted gene co-expression network construction
The gene expression data of clinical samples was obtained from TCGA database and we constructed a weighted gene co-expression network through WGCNA. We clustered the samples and detected the outliers ( Figure 4A). After deleting the outliers, the clustering tree was drawn with trait heatmap ( Figure   4B). We chose the soft threshold to 16 (R 2 = 0.967) to construct the scale-free network for further analysis ( Figure 4C, D).

Identi cation of signi cant modules
Based on the dissimilarity of genes, 31 modules were identi ed in clustering dendrogram ( Figure 5A). To investigate the modules associated with prognosis of colon cancer, we analyzed the relationships between module eigengenes and clinical traits ( Figure 5B). The tan and dark orange modules was signi cantly related to risk in colon cancer and furthermore the tan module was highly involved in risk type (p < 0.0001). Therefore, the two modules were identi ed as signi cant modules for further analysis.

Functional analysis of eigengenes
According to the signi cant modules selected, GO analysis showed the top-level biological processes of eigengenes ( Figure 6A, B). In both tan and dark orange modules, the pathways were enriched including response to stimulus, multicellular organismal process, localization, metabolic process, cellular process, positive regulation of biological process, developmental process, regulation of biological process, negative regulation of biological process, biological regulation, signaling, immune system process and biological adhesion. Next, the enrichment of pathways was visualized by Cytoscape to show the relationships among biological processes ( Figure 6C, D).

Gene enrichment of high-risk group
In high-risk group of training and testing cohort, the gene set was enriched by GSEA ( Figure 7A, B). In training cohort, genes showed enrichment in adheres junction, focal adhesion, PI3K-Akt signaling pathway and proteoglycans in cancer. And in testing group, the genes were enriched in cell adhesion molecules, ECM-receptor interaction, focal adhesion and NF-kappa B signaling pathway.

Discussion
Tumor immune microenvironment (TIME) is crucial for tumor initiation, response and therapy and the analyses of TIME can promote the progress of immune therapy and even detect novel therapeutic targets [21] .According to analysis of immune cell in ltration in colon cancer through CIBERSORT, our research established a novel immune score system including memory B cells, plasma cells, Tfh cells, M0 macrophages and resting dendritic cells. The LASSO and multivariate Cox regression analyses and testing cohort validation increased the reliability of our risk model. Furthermore, the sensitivity and speci city of the model were evaluated by ROC curves. Finally, combined with clinical characteristics, a nomogram was constructed to predict the prognosis of colon cancer.
As more attention paid to the tumor immunology and immunotherapy, investigation of TIME in colon cancer is developing. Earlier research has shown that CD3+, CD8+ and CD45RO+ memory T cells are associated with improved OS of colon cancer [22] . Our risk model advanced novel immune cell types related to prognosis of colon cancer. Plasma cells was reported to show a positive prognostic effect in colon and non-small cell lung cancer [23,24] . Tumor-speci c antibodies produced by plasma cells could bind to tumor cells and inhibit their target proteins, activate complement and nally promote antibodydependent cellular cytotoxicity [25] . Tfh cells could stimulates the differentiation of B cells into Ab-forming cells through producing IL-21 [26] . Plasma and Tfh cells have shown to mediate the anti-tumor response when ICI applied to breast cancer [27] . Similarly, the coe cients of plasma and Tfh in our immune risk model were negative, indicating the parallel tendency to existing research. The prognosis associated with M0 macrophages in colon cancer was controversial. According to an analysis of colon adenocarcinoma immune in ltration, M0 macrophages were related to poor prognosis and the distant metastasis [28] . While in another research, M0 macrophages showed a better prognosis [29] . Tumor-associated macrophages (TAMs) are signi cant component of TIME. Polarization of M1 and M2 macrophages depends on the growth factors and chemokines in TIME [30] . M1 macrophages can secret in ammatory cytokines, including TNF-α, IL-6, and IL-12, and engulf and kill target cells [31,32] . On the contrary, M2 macrophages are associated with angiogenesis and immune suppression, which contribute to tumorigenesis and metastasis [32][33][34] . Therefore, although our analysis showed M0 macrophages effected prognosis positively, the impact was still debatable. As for dendritic cells, research in hepatocellular carcinoma and melanoma reported that it could enhance the tumor response to ICI [35,36] . All in all, the effects of immune cells on tumor initiation and progress still need further investigation of function and molecular mechanism.
The application of WGCNA enabled us to analyze the function of eigengenes related to our risk model. The immune cells played an important role in pathways including response to stimulus, metabolic process, signaling and immune system process, which were highly relevant to TME [37] . The enrichment of localization and biological adhesion indicated association between immune cells and tumor migration and invasion [38] . Meanwhile, the function analyses in high-risk group also demonstrated the similar enrichment of adhesion process through GSEA. In addition, PI3K-Akt signaling pathway and NF-kappa B signaling pathway were both enriched in high-risk group. Recent research showed that M2 macrophages could stimulate PI3K-Akt signaling pathway and promote migration and invasion of tumors [39,40] . NFkappa B signaling pathway mediates in ammation-related tumor growth and progression and is associated with metastatic gene expression [41][42][43] . Therefore, we speculated that the adhesion, migration and invasion were signi cant processes that tumor-associated immune cells regulated tumorigenesis, progression and metastasis. The function of immune cells still remains mysterious and needs to be further explored in TIME.
Despite the above analyses and discussion, our risk model also had limitations. The analysis of immune cell types was based on datasets through bioinformatics. The results were restricted to clinical samples from database and lack of forward research for further validation of the ability to predict prognosis in colon cancer. In addition, the functional analyses were not quite reliable due to the insu cient experimental con rmation. The clinical application of the model and the correlated molecular mechanism still need further discussion and exploration.
To summarize, our research advanced a novel prognosis risk model including immune risk type and clinical characteristics, which provided the possibility to predict prognosis of colon cancer. The analyses of immune cell types and function through CIBERSORT, WGCNA and GSEA revealed new insights of TIME and immunotherapy and paved way for further research.

Colon cancer datasets and processing
The gene expression data of colon cancer were obtained from TCGA database. The data with overall survival (OS) time >1 month and su cient clinical information including age, gender, T, N, M stage were contained. The testing dataset (GSE29623) from the GEO database also followed the principles.

Analysis of immune cell in ltration
To explore the immune cells in ltration in colon cancer, the normalized gene expression data were analyzed by CIBERSORT and LM22 signature. These data were uploaded to the CIBERSORT web portal (http://cibersort.stanford.edu/) and the deconvolution algorithm was run using the LM22 gene signature matrix and 1000 permutations. The relative expression of tumor-in ltrating immune cells was normalized to sum up to one. Signi cant results (P < 0.05) were selected for further analysis.

Immune cell and clinical prognosis model construction and veri cation
The gene expression and clinical data were combined. To identify the most important immune cells related to prognosis, LASSO regression and cross-validation were used. Multivariate Cox regression was used to test associations between the proportions of immune cell types and survival. The nal coe cient of immune cell populations was also determined by multivariate Cox regression to construct the immune risk model. "High-risk" and "low-risk" groups were distinguished by risk scores. Kaplan-Meier curves were used to analyze the relation between OS and in ltrating immune cells, which was estimated by cohort receiver operating characteristic (ROC) curves to verify the sensitivity and speci city of the model. Then we applied the model to testing dataset from GEO to assess the reliability.

Riskscore = ∑ n i = 1 Coef i *Fraction i
The factors independently predicting disease outcomes were identi ed by univariate and multivariate Cox regression analyses, which were used to construct a new clinical prognosis model. To assess the association between the variables in the clinical model and prognosis, a nomogram was used to visualize the data and the discrimination was measured by Harrell's concordance index (C-index). The prognostic models were calibrated using the R survival package.

WGCNA
The gene co-expression network was constructed by the R package "WGCNA". Modules with signi cance (P < 0.05) were identi ed to be clinical traits associated modules. The topological property was con rmed by scale-free R 2 . Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) analysis were performed to estimate the functions of genes in the modules. And the network of enriched terms in certain modules was visualized by Cytoscape. The threshold of signi cance set as P<0.05.

GSEA
Based on GO, KEGG pathway and immunologic signatures analysis, differences in gene enrichment between the low-and high-risk cohorts were analyzed.

Statistical analysis
Statistical analysis was conducted using R software (version 3.6.1; R Foundation for Statistical Computing, Vienna, Austria). All statistical tests were two-sided, with p-values < 0.05 considered statistically signi cant.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
Xixian Yuan and Shuqiu Wang designed the study and reviewed the literature. Ziqi Sui performed the data analyses and wrote the manuscript for the study. Yanli Zhu downloaded the data from GEO database and participated in the drafting of the manuscript. Kejia Wuand Shuxiang Wang contributed to the reviewing of the literature. All authors read and approved the nal manuscript for submission.