Background

Genomic alterations constitute crucial elements of colorectal cancer (CRC). Accumulating evidences have elucidated their clinical significance in predicting outcomes and therapeutic efficacy. However, a comprehensive understanding of CRC genomic alterations from a global perspective is lacking.

Methods

A total of 2778 patients in 15 public datasets were enrolled. Tissues and clinical information of 30 patients were also collected. Consensus clustering was performed for samples classification based on mutation signatures.

Results

We identified two distinct mutation signature clusters (MSC) featured by massive mutations and dominant somatic copy number alterations (SCNA) respectively. MSC-1 was associated with defective DNA mismatch repair, exhibiting more frequent mutations such as ATM, BRAF, and SMAD4. The mutational co-occurrences of BRAF-HMCN and DNAH17-MDN1 as well as the methylation silence event of MLH-1 were only found in MSC-1. MSC-2 was linked to the carcinogenic process of age and tobacco chewing habit, exhibiting dominant SCNA such as MYC (8q24.21) and PTEN (10q23.31) deletion as well as CCND3 (6p21.1) and ERBB2 (17q12) amplification. MSC-1 displayed higher immunogenicity and immune infiltration. MSC-2 had better prognosis and significant stromal activation. Based on the two subtypes, we identified and validated the expression relationship of FAM83A and IDO1 as a robust biomarker for prognosis and distant metastasis of CRC in 15 independent cohorts and qRT-PCR assay.

Conclusions

We identified two subtypes with heterogeneous molecular alterations and functional status, which might advance precise treatment and clinical management in CRC. A robust biomarker for predicting prognosis and distant metastasis of CRC was identified and validated.

Figure 1

Figure 1

Figure 1

Figure 2

Figure 2

Figure 2

Figure 3

Figure 3

Figure 3

Figure 4

Figure 4

Figure 4

Figure 5

Figure 5

Figure 5

Figure 6

Figure 6

Figure 6

Figure 7

Figure 7

Figure 7

Figure 8

Figure 8

Figure 8

Figure 9

Figure 9

Figure 9

This is a list of supplementary files associated with this preprint. Click to download.

- FigureS1.pdf
Somatic mutation landscape in the TCGA-CRC cohort.

- FigureS1.pdf
Somatic mutation landscape in the TCGA-CRC cohort.

- FigureS1.pdf
Somatic mutation landscape in the TCGA-CRC cohort.

- FigureS2.pdf
The extraction of mutation signatures and generation of the mutation signature relevant subtypes in CRC. A. Combining the cophenetic correlation coefficients and RSS curve, it was decided that rank = 8 was optimal in NMF clustering analysis. B. The correlation analysis of de novo mutational signatures and curated signatures in COSMIC using cosine similarity. The rows are de novo mutational signatures and the columns are curated signatures in COSMIC. C. The cumulative distribution functions (CDF) of consensus matrix for each k (k =2~9, indicated by colors). D. Proportion of ambiguous clustering (PAC) score, a low value of PAC implies a flat middle segment, allowing conjecture of the optimal k (k =2) by the lowest PAC. E. Recommended number of clusters using 26 criteria of Nbclust package. F. The relative proportion of eight mutation signatures between MSC-1 and MSC-2.

- FigureS2.pdf
The extraction of mutation signatures and generation of the mutation signature relevant subtypes in CRC. A. Combining the cophenetic correlation coefficients and RSS curve, it was decided that rank = 8 was optimal in NMF clustering analysis. B. The correlation analysis of de novo mutational signatures and curated signatures in COSMIC using cosine similarity. The rows are de novo mutational signatures and the columns are curated signatures in COSMIC. C. The cumulative distribution functions (CDF) of consensus matrix for each k (k =2~9, indicated by colors). D. Proportion of ambiguous clustering (PAC) score, a low value of PAC implies a flat middle segment, allowing conjecture of the optimal k (k =2) by the lowest PAC. E. Recommended number of clusters using 26 criteria of Nbclust package. F. The relative proportion of eight mutation signatures between MSC-1 and MSC-2.

- FigureS2.pdf
The extraction of mutation signatures and generation of the mutation signature relevant subtypes in CRC. A. Combining the cophenetic correlation coefficients and RSS curve, it was decided that rank = 8 was optimal in NMF clustering analysis. B. The correlation analysis of de novo mutational signatures and curated signatures in COSMIC using cosine similarity. The rows are de novo mutational signatures and the columns are curated signatures in COSMIC. C. The cumulative distribution functions (CDF) of consensus matrix for each k (k =2~9, indicated by colors). D. Proportion of ambiguous clustering (PAC) score, a low value of PAC implies a flat middle segment, allowing conjecture of the optimal k (k =2) by the lowest PAC. E. Recommended number of clusters using 26 criteria of Nbclust package. F. The relative proportion of eight mutation signatures between MSC-1 and MSC-2.

- FigureS3.pdf
The mutation drivers and MMR genes in CRC. A. The distribution of tumor mutation burden (TMB) between two subtypes. B. The mutation co-occurrence and exclusive relationships of 28 candidate driven genes. Co-occurrence, green; Exclusion, brown. C. Kaplan-Meier survival analysis of APC-TP53 co-occurrence. D. Mutational oncoplot of nine MMR genes between two subtypes.

- FigureS3.pdf
The mutation drivers and MMR genes in CRC. A. The distribution of tumor mutation burden (TMB) between two subtypes. B. The mutation co-occurrence and exclusive relationships of 28 candidate driven genes. Co-occurrence, green; Exclusion, brown. C. Kaplan-Meier survival analysis of APC-TP53 co-occurrence. D. Mutational oncoplot of nine MMR genes between two subtypes.

- FigureS3.pdf
The mutation drivers and MMR genes in CRC. A. The distribution of tumor mutation burden (TMB) between two subtypes. B. The mutation co-occurrence and exclusive relationships of 28 candidate driven genes. Co-occurrence, green; Exclusion, brown. C. Kaplan-Meier survival analysis of APC-TP53 co-occurrence. D. Mutational oncoplot of nine MMR genes between two subtypes.

- FigureS4.pdf
The driven segments identified from GISTIC algorithm in CRC. A. The distribution of gain and loss load in arm-level and focal-level. B. Oncoplot for the CNA of 39 driver segments in two subtypes, including 14 amplification segments (orange) and 25 deletion segments (purple). C. The expression difference of CNA relevant oncogenes and tumor suppressive genes between gain (red) and no-gain (blue) groups or between loss (dark blue) and no-loss (orange) groups. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. D-E. Univariate Cox regression analysis of 16 CNA relevant oncogenes and tumor suppressive genes for OS (D) and DFS (E). F. Kaplan-Meier survival analysis of MAP2K2 gain, as well as CNTN6, DKK1, APC, MCC, and SMAD4 loss.

- FigureS4.pdf
The driven segments identified from GISTIC algorithm in CRC. A. The distribution of gain and loss load in arm-level and focal-level. B. Oncoplot for the CNA of 39 driver segments in two subtypes, including 14 amplification segments (orange) and 25 deletion segments (purple). C. The expression difference of CNA relevant oncogenes and tumor suppressive genes between gain (red) and no-gain (blue) groups or between loss (dark blue) and no-loss (orange) groups. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. D-E. Univariate Cox regression analysis of 16 CNA relevant oncogenes and tumor suppressive genes for OS (D) and DFS (E). F. Kaplan-Meier survival analysis of MAP2K2 gain, as well as CNTN6, DKK1, APC, MCC, and SMAD4 loss.

- FigureS4.pdf
The driven segments identified from GISTIC algorithm in CRC. A. The distribution of gain and loss load in arm-level and focal-level. B. Oncoplot for the CNA of 39 driver segments in two subtypes, including 14 amplification segments (orange) and 25 deletion segments (purple). C. The expression difference of CNA relevant oncogenes and tumor suppressive genes between gain (red) and no-gain (blue) groups or between loss (dark blue) and no-loss (orange) groups. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. D-E. Univariate Cox regression analysis of 16 CNA relevant oncogenes and tumor suppressive genes for OS (D) and DFS (E). F. Kaplan-Meier survival analysis of MAP2K2 gain, as well as CNTN6, DKK1, APC, MCC, and SMAD4 loss.

- FigureS5.pdf
The difference of the methylation and expression level of 13 ssMDGs between two subtypes. A. The expression difference of 13 ssMDGs between two subtypes. B. The methylation difference of 13 ssMDGs between two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001.

- FigureS5.pdf
The difference of the methylation and expression level of 13 ssMDGs between two subtypes. A. The expression difference of 13 ssMDGs between two subtypes. B. The methylation difference of 13 ssMDGs between two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001.

- FigureS5.pdf
The difference of the methylation and expression level of 13 ssMDGs between two subtypes. A. The expression difference of 13 ssMDGs between two subtypes. B. The methylation difference of 13 ssMDGs between two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001.

- FigureS6.pdf
The difference of 10 immunogenicity relevant indicators between two subtypes. A-K. The distribution of 10 immunogenicity relevant indicators in two subtypes, including CTA score (A), aneuploidy score (B), intratumor heterogeneity (C), number of segments (D), HRD (E), number of segments with LOH (F), fraction of segments with LOH (G), BCR Shannon (H), BCR richness (I), TCR Shannon (J), and TCR richness (K).

- FigureS6.pdf
The difference of 10 immunogenicity relevant indicators between two subtypes. A-K. The distribution of 10 immunogenicity relevant indicators in two subtypes, including CTA score (A), aneuploidy score (B), intratumor heterogeneity (C), number of segments (D), HRD (E), number of segments with LOH (F), fraction of segments with LOH (G), BCR Shannon (H), BCR richness (I), TCR Shannon (J), and TCR richness (K).

- FigureS6.pdf
The difference of 10 immunogenicity relevant indicators between two subtypes. A-K. The distribution of 10 immunogenicity relevant indicators in two subtypes, including CTA score (A), aneuploidy score (B), intratumor heterogeneity (C), number of segments (D), HRD (E), number of segments with LOH (F), fraction of segments with LOH (G), BCR Shannon (H), BCR richness (I), TCR Shannon (J), and TCR richness (K).

- FigureS7.pdf
The expression and regulation of immune checkpoint molecules (ICMs) in MSC-1 and MSC-2. A. The expression difference of 37 stimulatory ICMs in two subtypes. B. The expression difference of 23 inhibitory ICMs in two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. C. The expression difference of ITGB2 between the mutation and wild groups. D. The expression difference of CD40 between the gain and no-gain groups. E-F. The expression difference of ITGB2 (E) and TNFRSF18 (F) between the loss and no-loss groups.

- FigureS7.pdf
The expression and regulation of immune checkpoint molecules (ICMs) in MSC-1 and MSC-2. A. The expression difference of 37 stimulatory ICMs in two subtypes. B. The expression difference of 23 inhibitory ICMs in two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. C. The expression difference of ITGB2 between the mutation and wild groups. D. The expression difference of CD40 between the gain and no-gain groups. E-F. The expression difference of ITGB2 (E) and TNFRSF18 (F) between the loss and no-loss groups.

- FigureS7.pdf
The expression and regulation of immune checkpoint molecules (ICMs) in MSC-1 and MSC-2. A. The expression difference of 37 stimulatory ICMs in two subtypes. B. The expression difference of 23 inhibitory ICMs in two subtypes. ns, P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. C. The expression difference of ITGB2 between the mutation and wild groups. D. The expression difference of CD40 between the gain and no-gain groups. E-F. The expression difference of ITGB2 (E) and TNFRSF18 (F) between the loss and no-loss groups.

- FigureS8.pdf
The prognostic value of FABP4|GBP5 in seven cohorts. A. Forest plot of GBP5-high versus FABP4-high groups in seven cohorts. B-H. Kaplan-Meier survival analysis of FABP4-high and GBP5-high in the GSE17536 (B), GSE17537 (C), GSE29621 (D), GSE38832 (E), GSE39084 (F), GSE39852 (G), and GSE71187 cohorts (H).

- FigureS8.pdf
The prognostic value of FABP4|GBP5 in seven cohorts. A. Forest plot of GBP5-high versus FABP4-high groups in seven cohorts. B-H. Kaplan-Meier survival analysis of FABP4-high and GBP5-high in the GSE17536 (B), GSE17537 (C), GSE29621 (D), GSE38832 (E), GSE39084 (F), GSE39852 (G), and GSE71187 cohorts (H).

- FigureS8.pdf
The prognostic value of FABP4|GBP5 in seven cohorts. A. Forest plot of GBP5-high versus FABP4-high groups in seven cohorts. B-H. Kaplan-Meier survival analysis of FABP4-high and GBP5-high in the GSE17536 (B), GSE17537 (C), GSE29621 (D), GSE38832 (E), GSE39084 (F), GSE39852 (G), and GSE71187 cohorts (H).

- SupplementaryTable.xlsx
- SupplementaryTable.xlsx
- SupplementaryTable.xlsx

Loading...

Posted 19 Nov, 2020

###### No community comments so far

Posted 19 Nov, 2020

###### No community comments so far

Background

Genomic alterations constitute crucial elements of colorectal cancer (CRC). Accumulating evidences have elucidated their clinical significance in predicting outcomes and therapeutic efficacy. However, a comprehensive understanding of CRC genomic alterations from a global perspective is lacking.

Methods

A total of 2778 patients in 15 public datasets were enrolled. Tissues and clinical information of 30 patients were also collected. Consensus clustering was performed for samples classification based on mutation signatures.

Results

We identified two distinct mutation signature clusters (MSC) featured by massive mutations and dominant somatic copy number alterations (SCNA) respectively. MSC-1 was associated with defective DNA mismatch repair, exhibiting more frequent mutations such as ATM, BRAF, and SMAD4. The mutational co-occurrences of BRAF-HMCN and DNAH17-MDN1 as well as the methylation silence event of MLH-1 were only found in MSC-1. MSC-2 was linked to the carcinogenic process of age and tobacco chewing habit, exhibiting dominant SCNA such as MYC (8q24.21) and PTEN (10q23.31) deletion as well as CCND3 (6p21.1) and ERBB2 (17q12) amplification. MSC-1 displayed higher immunogenicity and immune infiltration. MSC-2 had better prognosis and significant stromal activation. Based on the two subtypes, we identified and validated the expression relationship of FAM83A and IDO1 as a robust biomarker for prognosis and distant metastasis of CRC in 15 independent cohorts and qRT-PCR assay.

Conclusions

We identified two subtypes with heterogeneous molecular alterations and functional status, which might advance precise treatment and clinical management in CRC. A robust biomarker for predicting prognosis and distant metastasis of CRC was identified and validated.

Figure 1

Figure 1

Figure 1

Figure 2

Figure 2

Figure 2

Figure 3

Figure 3

Figure 3

Figure 4

Figure 4

Figure 4

Figure 5

Figure 5

Figure 5

Figure 6

Figure 6

Figure 6

Figure 7

Figure 7

Figure 7

Figure 8

Figure 8

Figure 8

Figure 9

Figure 9

Figure 9

Loading...