Identification of cell cycle related subtypes in colon cancer
We first merged the colon samples from TCGA and GSE17538 to increase the sample size via removing the batch effect (Fig. 1A-B). We conducted kaplan-meier analysis of cell cycle genes and proved that 56 of 125 genes were related to the overall survival of colon cancer patients (Sup-Figure 1–2; Sup-Table 1). By performing univariate regression analysis, we screened 9 cell cycle genes associated with overall survival of colon cancer patients (p < 0.05), including GADD45B, CDKN2A, CDKN1C, TGFB1, CDKN2B, GADD45A, CDC25C, MAD1L1, and TGFB2 (Fig. 1C). For example, CDKN2A and CDKN2A had a positive correlation and were both harmful factors in colon cancer. In contrary, patients with high expression of CDC25C have a better survival status.
Based on the expression of 9 cell cycle genes, we identified two subtypes (cluster A and cluster B) of colon cancer (Fig. 1D). There were significant survival differences in the two clusters (Fig. 1E). We displayed the distribution of clinical features and expression of cell cycle genes in the two clusters (Fig. 1F-G). We further conducted GSVA to reveal the difference between two clusters. For HALLMARK pathways, the scores of immune-related pathways were higher in cluster B, including TGF beta signaling and inflammatory response (Fig. 2A). KEGG results indicated that vascular smooth muscle contraction and cell adhesion molecules cam were enriched in cluster B (Fig. 2B). For Reactome pathways, the scores of malignant pathways, such as crosslinking of collagen fibrils, were higher in cluster B (Fig. 2C).
By performing PCA analysis, we found a clear distinction between clusters A and B (Fig. 3A). To explore the difference of tumor microenvironment between two clusters, we also estimated the immune infiltration information of colon cancer samples. The stromalscore, immunescore, and ESTIMATEscore were all higher in cluster B (Fig. 3B). The immune cell levels were highly infiltrated in cluster B, except activated CD4 T cell, CD56bright tural killer cell, and Type 17 T helper cell (Fig. 3C).
Identification of gene subtypes
We screened 79 DEGs between two clusters A and B (Fig. 4A). GO analysis revealed that DEGs were enriched in extracellular matrix organization in Biological Process (BP), collagen-containing extracellular matrix and endoplasmic reticulum lumen in Cellular Component (CC), and collagen binding in Molecular Function (MF) (Fig. 4B). For KEGG, DEGs were enriched in ECM-receptor interaction and TGF-beta signaling pathway (Fig. 4C). Figure 4D displays the the association of top 5 KEGG results with DEGs.
We then conducted univariate Cox regression analysis to identify the prognostic value of 79 DEGs and screened out 28 genes related to overall survival (p < 0.01) (Fig. 5A, Supplementary Table 2). To further validate the above results, unsupervised clustering was performed to divide patients into two subtypes based on 28 prognostic genes; namely, geneCluster A-B (Fig. 5B). Colon cancer patients in geneCluster B have worse survival than patients in geneCluster A (Fig. 5C). We also displayed the distribution of clinical features and expression of 28 prognostic genes in two geneClusters (Fig. 5D-E). All 28 genes were over-expressed in geneCluster B.
Genetic alteration of prognostic genes
We further explored the expression and genetic alteration of 28 prognostic genes. We found 8 genes were up-regulated in tumor tissues compared to normal tissues,including ARL4C, AEBP1, OLFML2B, SERPINE1, THBS2, BGN, SPP1, and CTHRC1 (Fig. 6A). We next explored the SNV these genes.The SNV frequencies of genes were generally low, in which FN1 had the highest SNV frequency (Fig. 6B-C). We further assessed the CNV of these genes. AEBP1 had highest amplification CNV frequency, while ISLR showed significant CNV decreases (Fig. 7A). We also represented the percentage of heterozygous CNV and homozygous CNV of each gene in colon cancer, including heterozygous amplification, heterozygous deletion, homozygous amplification, and homozygous deletion (Fig. 7B-C). The linear CNV level was not correlated with mRNA expression of all genes (Fig. 7D). Figure 7E displayed the methylation difference between tumor and normal tissues. The expression of these genes were generally negatively correlated with their DNA methylation level (Fig. 7F).
Construction of cell cycle score (CCscore)
The CCscore was constructed based on the 28 prognostic genes via PCA algorithm. Patients with high cell cycle scores have worse survival status (Fig. 8A). The sankey diagram visualized the correlation between cluster, geneCluster, CCscore, and survival status of colon cancer patients (Fig. 8B). The CCscore was higher in cluster B and genecluster B (Fig. 8C-D). The correlation of CCscore with immune cell infiltration indicated that CCscore was positively associated with immune cell infiltration level (Fig. 8E). Besides, these genes were also positively associated with immune infiltration score and most immune cells (Sup-Figure 3).
In addition, alive patients have lower CCscore than dead patients (Fig. 9A). The percentage of alive patients in the high CCscore group (65%) was lower than that in the low CCscore group (77%) (Fig. 9B). The CCscore was not associated with the gender of patients. In addition, higher CCscore indicated worse recurrence, metastasis, and stage status of colon cancer patients (Fig. 9C-H).
The correlation of CCscore with immune microenvironment and efficacy of immunotherapy
We next assessed the correlation of CCscore with the immune microenvironment. Figure 10A displayed that the expression of chemokines and its receptors were significantly higher in the high CCscore group. CCscore was positively linked to immune-related pathways, such as epithelial mesenchymal transition, angiogenesis, and interferon gamma response (Fig. 10B). These results indicated that patients with high CCscore may be sensitive to immunotherapy. We also observed that the gene mutation frequency was higher in the high CCscore group (Fig. 11A-C). Moreover, the immune checkpoints, such as CD274, CTLA4, LAG3, PDCD1, and TIGIT, were highly expressed in high CCscore group (Fig. 12A). These results all support our hypothesis: tumor patients with high CCscore may be sensitive to immunotherapy.
In order to verify our hypothesis, we performed further analyzes using immunotherapy datasets. We proved that patients with high CCscore were sensitive to ICI treatment in GSE61676 (Fig. 12B, late stage non-squamous non-small cell lung cancer), IMvigor210 (Fig. 12C, Urothelial carcinoma), and Checkmate (Fig. 12D, Kidney renal clear cell carcinoma) cohorts
In addition to immunotherapy, we also analyzed conventional anti-tumor drugs. Based on the predicted results by R package “pRRophetic”, we displayed 6 anti-tumor drugs that may be resistant to patients with high CCscore and 6 anti-tumor drugs that may be sensitive to patients with high CCscore (Fig. 13).