Comprehensive analysis of the immune effect of TGFβ1 in colorectal adenocarcinoma: A TGFβ1-related prognostic model of tumor microenvironment

: Colorectal cancer is one of the most common cancer worldwide. Recently, tumor microenvironment (TME), especially its remoulding, is thought to control the colorectal cancer genesis and progression. In this study, we use ESTIMATEscore to make out the proportion of immune and stromal components in colorectal adenocarcinoma (CRA) samples from The Cancer Genome Atlas (TCGA) database. The differentially expressed genes (DEGs) were found by COX regression analysis and protein-protein interaction (PPI) network, among which TGFβ1 was supposed to be a prognosis factor and tumor environment indicator. Continuous analysis showed that TGFβ1 expression is positively correlated with lymph node metastasis (N stage) but negatively correlated with survival. Gene Set Enrichment Analysis (GSEA) revealed that the genes of the high-expression TGFβ1 group were mainly enriched in immune-related activities. Cluster analysis divided the samples into 2 subgroups. 24 HLA-related genes and 8 immune checkpoint genes were found upregulated in the high immunity group as well as TGFβ1, which suggests the possibility of novel therapies targeting immune checkpoints combined with TGFβ1. Tumor-infiltrating immune cell (TIC) profile of CRA patients was described by CIBERSORT analysis. Further analysis showed that the infiltration of Tregs and Neutrophils were positively correlated with TGFβ1 high expression. Then 3 TGFβ1-related genes were picked out to construct a prognostic model, which matches the survival data well. evaluate specificity prognostic model using the R software package. The area under the curve (AUC) was calculated.

cell cycle and promoting apoptosis in the early cancer stage. However, TGFβ exerts stimulative effects in tumor by prompting tumor invasiveness and metastasis in the late stage. TGFβ1 is a critical immunoregulatory cytokine that functions in the process of immune self-tolerance and T-cell homeostasis [22], which is secreted by various cells [23,24]. Treg cells are the primary source and target of TGFβ1 in some autoimmune diseases [25]. Here we pick up differentially expressed genes (DEGs) between the immune element and stromal element of CRA samples and find that TGFβ1 might be a potential marker for the alteration of TME status in CRA. Figure1|flow chart of this study.

ImmuneScore is related With the Clinic-Pathological Staging of CRA Patients
We matched the clinical information of CRA cases with ESTIMATEScore、 StromalScore and ImmuneScore. The results are shown in Figure2, only ImmuneScore showed negative correlation with T classification ( Figure 3A, p = 0.017) and M classification ( Figure 3D, p=0.0035) of TMN stage. It suggests that the ratio of immune components was mainly associated with the progress of CRA, such as invasion and metastasis. Residual analysis in this part gain no significant results.

Common DEGs between ImmuneScore and StromalScore were mainly enriched to Immune-Related Genes
To ascertain the exact alterations of gene profile in TME both in immune and stromal components, samples were divided into high-score group and low-score group by comparing to the median. 1603 DEGs were found by ImmuneScore (samples with high score vs. low score). Among them, 1553 genes were up-regulated, 50 genes were down-regulated (Figure3C,D). Similarly, 1901 DEGs were found by StromalScore, consisting of 1889 up-regulated genes and 14 down-regulated genes (Figure3C,D). The Venn plot showed 1103 up-regulated genes shared by high ImmuneScore group and high StromalScore group. 5 down-regulated genes were shared by low score group both in ImmuneScore and StromalScore. These DEGs (total 1108 genes) were potential determinate factors of TME. Results of gene ontology (GO) enrichment analysis suggest that the DEGs mostly map to the immune-associated GO terms, such as immune response−regulating cell surface receptor signaling pathway and lymphocyte mediated immunity ( Figure  3E,supplement1). The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis also showed the enrichment of immune response−regulating cell surface receptor signaling pathway, cytokine−cytokine receptor interaction and phagosome ( Figure 3F,supplement2). Thus, the overall functions of DEGs seemed to mostly fall in immune-related activities, which indicate that the participance of immune factors was a critical feature of TME in CRA.

Corelation Analysis by PPI Network and Univariate COX Regression
PPI network was carried out based on the STRING database using Cytoscape software. The correlation among 1108 genes are presented in Figure 4A, and the bar graph showed the top 30 genes ranked by the quantity of nodes ( Figure  4B). Univariate COX regression analysis was performed to pick up the significant survival factors among 1108 DEGs ( Figure 4C). Next, the intersection analysis between the top genes in PPI network and the 21 significant factors of univariate COX regression was performed, and only 1 factors, TGFβ1, were selected from all above analyses ( Figure 4D).

The Expression Of TGFβ1 Influence the Survival and Clinical Characteristics in CRA Patients
TGFβ1 exert vital influence on the intracellular signaling of T lymphocytes. In this study, all CRA samples were distributed into TGFβ1 high-expression group or TGFβ1 low-expression group compared with the median. The survival curve indicated that CRA patients in low-expression group had better survival than that in high-expression group( Figure  5A).And then , the correlation of TGFβ1 expression with clinical characteristics was described with Wilcoxon rank sum test, showing that the expression of TGFβ1 in the tumor tissue was significantly higher than that in the normal tissue( Figure 5B) in pairing sample gained from the same patient ( Figure 5B). All above results explicitly suggest that the expression of TGFβ1 is higher in tumor tissue than in normal tissue, and that this higher expression in TME bring worse outcomes. Besides, the expression of TGFβ1 increased along with the advance of N stage in TNM stages ( Figures 5C-F), which probably indicates that the expression of TGFβ1 prompt metastasis in lymph node.

New classification divide the samples into 2 subgroups
Transcriptome data of colon cancer patients was downloaded from TCGA database and calculated with ssGSEA method. These samples were divided into 2 subgroups (low immunity: 186 samples, high immunity: 212 samples) ( Figure 6A) by consensus cluster analysis, and ESTIMATEscore, Immunescore and Stromalscore were also involved in the heat map. Tumor purity describes the cancer cell proportion in the tumour, which is significantly higher in the immunity low group ( Figure 6C). Meanwhile, 29 immune-related pathways and infiltrating immune cells were integrated to evaluate immunity level of TME ( Figure 6B), which seems all gathered in the immunity high group. The RNA level comparison of HLA genes between the two immune subgroups was conducted ( Figure 6D), showing notable preponderance in immunity high group. All above shows that the new classification divided the samples into two groups according to its immunity capability. 8 immune checkpoint genes (TIGIT, Tim-3, VISTA, BTLA, LAG-3, IDO1, PD-L1, PD-1) were found upregulated in the high immunity group (P<0.001) by univariate analysis ( Figure 6A-L), which mostly inhibit the immune response of TIL. The strongly responsive microenvironment might improve the efficiency of immune checkpoint inhibitor in turn. Moreover, TGFβ1 is similarly up-regulated in the same group( Figure 6M), which suggests the possibility of novel therapies targeting immune checkpoint combined with TGFβ1. Figure 6| Identification and validation of immune-related subgroups of colon cancer. A. cluster analysis divided samples into 2 subgroups, that are immunity low group and immunity high group. B.29 immune-related gene sets are enriched in ssGSEA with colon cancer,. These gene sets are composed of immune cells and immune processes. The heat map also shows tumor purity, ESTIMATE score, immune score and matrix score. Heatmap of sample clustering at consensus k = 2.C. Tumor purity of samples from the two immune subgroups (*** P < 0.001). D. RNA expression levels of HLA genes in samples from the two immune subgroups. E-L. 8 immune checkpoint genes are significantly up-regulated in immunity high group (***P < 0.001). M. TGFβ1 is similarly upregulated in the immunity high group.

TGFβ1 Can Be an Prospective indictor For TME Modulation
GSEA was conducted in the TGFβ1 high-expression s divided by the median. As shown in Figure 7, the genes in high-expression group were mainly enriched in immune-related activities, such as allograft rejection, complement, adhesion molecules cams, T cell activation pathway and typical tumor pathways, which suggests that TGFβ1 might be a prospective indicator for the status of TME.

Correlation of TGFβ1 With the Proportion of TICs
To further demonstrate the correlation of TGFβ1 expression with the immune microenvironment, the ratio of tumor-infiltrating immune subsets was calculated by CIBERSORT algorithm. 21 types of immune cell profiles in CRA samples were constructed ( Figure 8A-B). Then, correlation analysis showed that Neutrophils were significantly correlated with the expression of TGF1 ( Figure 10C). Further analysis showed that Tregs and Neutrophils were positively correlated with TGFβ1 expression out of 21 types of immune cell (Figure10D-E). These results strongly suggest that the expression of TGFβ1 definitely influence the immune activity of TME.  (C10of54,CD27,CD48,CD70,CD80,CD86,CXCR4,IL2RA,LTA,TNFRSF18,TNFRSF4,TNFRSF8,  TNFRSF9,TNFSF13B,TNFSF14,TNFSF4,CD276,CD28,CD40,CD40LG,CXCL12,ENTPD1,ICOS,IL6,IL6R,KLRC1,KLRK1,MICB,PVR,T  EME173,TNFRSF13B,TNFRSF13C,TNFRSF17,TNFSF9). Among these genes, 18 significant differential expression genes between tumor samples and normal counterpart were picked out by limma R package, shown in figure 9A-B. TGFβ1-related prognosis signature was built with risk scores based on the expression of genes multiplied by regression coefficients in the multivariate Cox regression analysis( P < 0.01 is significant). Samples were distributed into high risk group and low risk group according to the risk score , and the cut-off value derive from the "survminer" R package. This prognosis risk score model match with the survival data well ( Figure 9C-E). The risk score was significantly connect with survival in CRC in the stepwise multivariate COX regression analysis (P < 0.001). And the area under the curve (AUC) values of the risk score were 0.622, 0.613 and 0.643 respectively of 1 year, 3years and 5 years( Figure 9D).

Discussion:
In this study, we try to find TME relevant gene that influence the survival and immunity therapy with data from TCGA database. TGFβ1 was identified to be involved in immune reaction, and continuous bioinformatics analysis suggest that TGFβ1 could be a signal for the status of TME in CRA patients. The tumor microenvironment is mainly consist of tumor stroma, infiltrating inflammatory cells and various associated supporting cells [9],and is fabricated and modified all the time by tumor itself. It has been fully demonstrated that the tumor microenvironment plays a critical role in tumor progression, which indicates remoulding the tumor microenvironment could be a promising therapic explore in turn. Besides, the condition of the tumor environment, which falls in immunosuppression or active can also suggest the patients' prognosis and therapy response.
Our results from the transcriptome analysis with CRA data from TCGA database implied that the immune components in TME is significantly correlated with the progression of CRA, such as metastasis. This highlighted the significance of exploring the association between tumor cells and immune cells, which provided a novel sight for new effective treatment. Recently, a great advance has been made in immunotherapy, as pleasant successes has been seen in immune checkpoint inhibitors. While numerous cancers have seen preliminary evidence of efficacy from immune checkpoint inhibitors, colorectal cancers remain the firm exception [26].It is previously acknowledged that colorectal cancer benefited rarely from immunotherapeutics [27].However, increasing data demonstrates that subsets of colorectal cancer patients, those with hypermutated tumors, which is called microsatellite instability (msi) tumor, may benefit from immune checkpoint inhibitors.
Transforming growth factor β 1 (TGFβ1) is a member of the large family of a series of structurally and functionally relevant proteins, which contains TGFβ1, TGFβ2 and TGFβ3 and also activins, nodal, inhibins, Mullerian-inhibiting substance (MIS), growth and differentiation factors (GDF) and bone morphogenetic proteins (BMPs) [28]. The TGFβ1 gene codes a raw peptide of 390 amino acids, which is subsequently disposed by furin proteases into the mature bioactive one [29]. TGFβ1 plays a suppressive role in the pre-malignant phase and accelerate tumor progress in the late stage by TGF-β/SMAD and Non-SMAD Signaling [25].In the progressive stage, TGFβ1 may affect T-cell activation, regulatory T (Treg) cell and effector-cell function and tumorigenesis to boost the malignant behavior [30][31][32].
Epithelial-mesenchymal transition (EMT) is stimulated by various cytokines, leading to decreased adhesion of cells, compelling motility, susceptibility to invasion and metastasis and resistance to chemotherapy [33] [34], among them TGF-β1 being a very powerful factor, which influences the expression and liveness of many transcription factors called EMT-TFs (Snail1/Snail, Snail2/Slug, Twist1/Twist etc.) [35]. Evaluation of gene expression profiles disclosed that TGF-β1 signaling is one of the most important gene pathway in liver metastases of colorectal cancer [36]. And EMT-TFs are the initiation of EMT in cell differentiation [37]. In fetal hepatocytes, TGF-β1 is associated with EMT and resistance to apoptosis [38,39] [40].
TGFβ1 also inhibit tumour immunosurveillance to foster the neoplasm, particularly aiming at T cell and natural killer (NK) lymphocytes by block the generation and function of it [41][42][43] [44][45][46]. Moreover, TGF-β1 drive the TME against the cancer by induicing lymphocytes into suppressive subtypes, such as CD4+ regulatory T cells (Treg cell) [47]. Tregs participate in self-tolerance and immune suppression, discouraging the proliferation of CD4+ and CD8+ T cells as well as the production of IFN-γ [48,49]. TGFβ1 in the tumor microenvironment turned pretumor (M2) type of tumor-associated macrophages (TAMs) into the antitumor (M1) phenotype, and similarly drove the tumor-associated neutrophil (TAN) phenotype from N1 to N2, increasing the propotion of protumor TANs [50]. Some TGFβ signaling inhibitors have been discovered to stop the aberrant behavior of TGF-β signaling in tumors, such as ligand traps, antisense oligonucleotides (AONs), neutralizing antibodies, receptor domain-immunoglobulin fusions and receptor kinase inhibitors, which are in pre-clinical development.
TGFβ1 has increasingly been arranged as an novel druggable target in several preclinical experiment [51]. As TGFβ1 signaling pathway is fundamental to the formation of tumor microenvironment and exert crucial influence on the immune inflammatory responses such as the Treg cells , combining modern immunotherapy trials with TGFβ1 signalling-related biomarkers is reasonable and necessary. The role PD-1/PD-L1 paly in immune activities is not fully understood, and TGFβ1 signaling pathway might enhance this process, which may a novel strategy deserving further investigation.

Materials and Methods
Raw Data Transcriptome RNA-seq data of 437 CRA cases (39 normal samples; 398 tumor samples) and the corresponding clinical data were downloaded from TCGA database.
Generation of ImmuneScore, StromalScore, and ESTIMATEScore ESTIMATE algorithm of R language version 3.5.1 loaded with estimate package was used to evaluate the proportion of immune and stromal component in TME for each sample, presenting three kinds of scores: ImmuneScore, StromalScore, and ESTIMATEScore, which respectively positive correlated with the ratio of immune, stromal, and the sum of both. The higher the respective score is, the larger the ratio of the corresponding component is in TME.
Survival Analysis R package survival and survminer was used for the survival analysis. 398 tumor samples had a specific survival time record, were used for survival analysis. Kaplan-Meier method was used in the survival curve, with log rank as the statistical significance test; p < 0.05 was considered significant.
Generation of DEGs Between High-Score and Low-Score Groups According to ImmuneScore and StromalScore tumor samples were divided into high-score group and low-score group comparison to the median score in according to ImmuneScore and StromalScore, respectively. Package limma was used to identify the different expression of the gene, and DEGs were found by the comparison between the high-score group vs. the low-score group. DEGs with fold change larger than 1 after transformation of log2 (high-score group/low-score group) and false discovery rate (FDR) <0.05 were considered significant.
GO and KEGG Enrichment Analysis GO and KEGG enrichment analysis of 1108 DEGs were performed with clusterProfiler, enrichplot, and ggplot2 packages of R language. Only terms with both p-and q-value of <0.05 were considered significantly enriched.
Differential gene expression analysis Heatmaps were produced by pheatmap R package. volcano plots were displayed by "ggplot2" R package, with an adjusted false discovery rate (FDR) < 0.05 and |log2(fold change)| > 1 as the thresholds. Volcano plots were displayed by "ggplot2" R package, with an adjusted false discovery rate (FDR) < 0.05 and |log2(fold change)| > 1 as the thresholds.

Difference Analysis of Scores With Clinical Stages
The clinicopathological data of the CRA samples were downloaded from TCGA database. The analysis was accomplished by R language, with Wilcoxon rank sum or Kruskal-Wallis rank sum test as the significance test.
PPI Network Construction PPI network was produced by STRING database, and reconstruction with Cytoscape of version 3.6.1. Nodes with confidence of interactive relationship over 0.95 were used for network.
Univariate COX Regression Analysis Survival package of R language was used for univariate COX regression. The top 21 genes ordered by p value from small to large in univariate COX were shown in the plot.
Multivariate COX Regression Analysis multivariate cox regression analysis were conducted to estimate the prognostic value of the risk score, clinicopathological features including age, clinical stage, grade, and TNM stage with R software (version 3.5.1).
Gene Set Enrichment Analysis Hallmark, KEGG ,GSE and GO gene sets collections were downloaded from Molecular Signatures Database as the target sets with which GSEA performed using the software gsea-3.0 downloaded from Broad Institute. The whole transcriptome of all tumor samples was used for GSEA, and only gene sets with NOM p < 0.05 and FDR q < 0.06 were considered as significant.
TICs Profile CIBERSORT package was used to evaluate the TIC profile in all tumor samples, followed by quality check that 154 tumor samples with p < 0.05 were chosen for the following analysis. ssGESA A set of gene signatures of immune cell populations from previously reported articles were applied to individual cancer samples. The calculation method used in this study includes immune cell types and immune pathways related to innate immunity and adaptive immunity, a total of 29 immune gene sets, including immune cell types and functions, regulatory T (Treg) cells, immune checkpoint, cytokine and cytokine receptor (CCR), human leukocyte antigen (HLA) , proinflammatory, para-inflammation (PI) , tumor-infiltrating lymphocytes (TILs). R package "gsva" is used to quantified the infiltration levels of immune cell types.
hierarchical agglomerative clustering In order to study the correlation between the immunity of colon cancer and the clinical phenotype, we divided the samples from TCGA into 2 different groups (high and low immunity) based on "TCS genetic clustering" (50 iterations, 80% resampling rate) with ssGSEA scores. R package "estimate" is included to calculate the Immunescore, Stromalscore and ESTIMATEscore of each sample. Tumor purity is sescribed by Mann-Whitney U test.

ROC curve
The time-dependent receiver operating characteristic (ROC) curve was produced to evaluate sensitivity and specificity of the prognostic model using the "timeROC" R software package. The area under the curve (AUC) was calculated.