Gastric cancer is a heterogeneous disease with poorly understood genetic and microenvironmental factors. Mutations in collagen genes are associated with genetic diseases that compromise tissue integrity, but their role in tumor progression has not been extensively reported. In contrast, aberrant collagen expression has been long associated with malignant tumor growth, invasion, chemoresistance, and patient outcomes. We hypothesized that somatic mutations in collagens could functionally alter the tumor microenvironment, including the extracellular matrix.
We used publicly available datasets including the Tumor Cancer Genome Atlas (TCGA) to interrogate somatic mutations in collagens in stomach adenocarcinomas. To demonstrate that collagens were significantly mutated above background mutation rates, we used a moderated Kolmogorov-Smirnov test along with combination analysis with a bootstrap approach to define the background. Association between mutations and clinicopathological features was evaluated by Fisher or chi-squared tests. Association with overall survival was assessed by Kaplan-Meier and the Cox-Proportional Hazards Model. Gene Set Enrichment Analysis was used to interrogate pathways. Immunohistochemistry and in situ hybridization tested expression of COL7A1 in stomach tumors.
In stomach adenocarcinomas, we identified individual collagen genes and sets of collagen genes harboring somatic mutations at a high frequency compared to background in both microsatellite stable, and microsatellite instable tumors in The Cancer Genome Atlas (TCGA). Many of the missense mutations resemble the same types of loss of function mutations in collagenopathies that disrupt tissue formation and destabilize cells providing guidance to interpret the somatic mutations. We identified combinations of somatic mutations in collagens associated with overall survival, with a distinctive tumor microenvironment marked by lower matrisome expression and immune cell signatures. Truncation mutations were strongly associated with improved outcomes suggesting that loss of expression of tumor cell secreted collagens have large impacts on tumor progression and treatment response.
These observations highlight that many minor collagens, expressed in non-physiologically relevant conditions in tumors, secreted from tumor cells, harbor impactful somatic mutations in tumors, suggesting new approaches for classification and therapy development in stomach cancer. In sum, these findings demonstrate how classification of tumors by collagen mutations identified strong links between specific genotypes and the tumor environment.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5
This is a list of supplementary files associated with this preprint. Click to download.
Additional File 1 Table S1. MutSig 2CV v3.1 analysis of significantly mutated collagen genes in STAD TCGA cohort. Data downloaded from Firebrowse.
Additional File 2 Table S2. Average expression level of each collagen gene in the STAD TCGA cohort. Values are RSEM.
Additional File 3 Table S3. Collagen gene combinations identified from combinatorics approach.
Additional File 4 Table S4. Summary of COL7A1 protein expression in stomach tumors from Rhode Island Hospital as assessed by immunohistochemical staining.
Additional File 5 Supplemental Figure Legends Figure S1. Alteration frequencies of collagens in ACRG and HK/Pfizer datasets. A. Alteration frequencies of sequenced collagens in other stomach cancer cohorts. B. Kaplan-Meier analysis of COl11A1, COL5, COL4, COL6 mutations in the ACRG targeted sequencing dataset compared to the same set of collagen genes in the TCGA cohort. Figure S2. Survival analysis of somatic mutations in each collagen gene. A. Kaplan Maier analysis of tumors with any type of mutation in each collagen gene across the whole STAD TCGA cohort. Tumors with the designated collagen mutation are in red. Wild-type tumors are in blue. P-values determined by log-rank test. B. Kaplan Maier analysis of tumors with truncation mutations in each collagen gene across the whole STAD TCGA cohort. C. Kaplan Maier analysis of tumors with any type of mutation in each collagen gene in MSIH cases. D. Kaplan Maier analysis of tumors with truncation mutation in each collagen gene in MSIH cases. Figure S3. Identification of combinations of collagens genes associated with overall survival relative to background. A. All mutations across the whole TCGA cohort. B. Representative examples of combinations of 2 collagens associated with overall survival. C. Combinations of all mutations in MSS tumors only. D. Combinations of all mutations in MSIH tumors only. E. Example of collagen genes with truncation mutations most frequently associated with overall survival when combined, classify MSIH tumors into high and low overall survival risk. Figure S4. Collagen mutations have MSIH and MSS context dependent differences in overall survival. Mutations in COL5A3 and COL14A1 have different associations in MSIH and MSS tumors even though the total number of mutations is similar. Figure S5. MSIH and MSS tumors have distinct microenvironments in TCGA. A. MSI status was associated with outcome in ACRG but not in TCGA. B. Comparison of MSIH and MSS stomach tumors by pre-ranked GSEA reveals differences in expression. Each heatmap plots the Normalized Enrichment Scores (NES) from the GSEA. NABA ECM gene sets were expressed higher in MSS tumors compared to MSIH tumors. Many immune cell expression signatures including cytotoxic cells were expressed higher in MSIH tumors compared to MSS tumors. B cells were expressed higher in MSS tumors. The majority of cancer hallmark expression signatures were expressed significantly higher in MSIH tumors compared to MSS tumors. Figure S6. Pre-ranked GSEA of collagen mutation combinations in Table S3 for the whole TCGA STAD cohort shows consistent impact for each mutation combination. A. Hallmarks for combinations with both missense and truncation mutations. B. The NABA and immune signature genes sets for combinations with both missense and truncation mutations. C. Hallmark, NABA, and immune signature gene sets for combinations with just truncation mutations. D. Clustering of hallmark gene sets for tumors with missense mutations only in the whole TCGA cohort showed significant difference for the EMT hallmark relative to overall survival. P-value calculated by Kolmogorov-Smirnov. Figure S7. In MSS cases, pre-ranked GSEA of tumors with either a missense or truncation mutation combination as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. For all mutations in MSS cases, some hallmarks such as EMT were associated with overall survival as shown in the heat map and box plot. P-value calculated by Kolmogorov-Smirnov. B. NABA ECM and immune signature gene sets in MSS tumors. Basement membrane and macrophage signature gene sets were among the gene sets most associated with overall survival, showing consistent downregulation in tumors with mutant collagens and higher expression in wild-type tumors. P-value calculated by Kolmogorov-Smirnov. Figure S8. In MSIH cases, pre-ranked GSEA of tumors with either a missense or truncation mutation combination as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Clustering of hallmark gene sets partitions tumors with collagen combinations by overall survival. Box plot shows the significant difference in the EMT hallmark as defined by combinations associated with high or low risk of overall survival. B. NABA gene sets showing large differences in Basement Membrane and ECM Affiliated gene sets relative to overall survival. C. Immune cell signature gene sets showing large difference in Tregs and Macrophage expression signatures. P-value calculated by Kolmogorov-Smirnov. Figure S9. In MSIH cases, pre-ranked GSEA of tumors with only missense mutation combinations as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Hallmark gene sets. B. NABA ECM sets. C. Immune cell gene signatures. P-value calculated by Kolmogorov-Smirnov. Figure S10. In MSIH cases, pre-ranked GSEA of tumors with only truncation mutation combinations as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Hallmark gene sets. B. NABA ECM and immune cell signature gene sets. P-value calculated by Kolmogorov-Smirnov. Figure S11. COL7A1 somatic mutations resemble inherited germline mutations found in collagenopathies. A. Distribution of somatic variants in TCGA STAD is similar to the germline variants observed in DEB as determined by Kruskal-Wallis test. B. Lollipop plot showing the distribution of variants on the COL7A1 protein domain map. A recurring truncation variant is found in the collagen domain in exon 73. Other variants only were observed once or twice, but have redundant impacts in each domain. Figure S12. COL7A1 is expressed in some tumor cells in STAD. Representative images of COL7A1 protein and RNA expression in stomach adenocarcinoma. A. Immunohistochemistry (A, C, E) and in situ hybridization (B, D, F) for COL7. Stromal localization in C, E, D, and F, and mixed stromal and carcinoma localization (at white arrows; A, B). B. Higher magnification of panels A and B from S7A showing expression by IHC in panel A and ISH in panel B of COL7A1 in epithelial regions. The arrow shows ISH signal in tumor cells. C. Representative images at higher power of COL7A1 protein expression by IHC in the epithelium and stroma.
Loading...
Posted 01 Apr, 2021
Invitations sent on 28 Feb, 2021
On 28 Jan, 2021
On 28 Jan, 2021
On 28 Jan, 2021
On 19 Dec, 2020
Posted 01 Apr, 2021
Invitations sent on 28 Feb, 2021
On 28 Jan, 2021
On 28 Jan, 2021
On 28 Jan, 2021
On 19 Dec, 2020
Gastric cancer is a heterogeneous disease with poorly understood genetic and microenvironmental factors. Mutations in collagen genes are associated with genetic diseases that compromise tissue integrity, but their role in tumor progression has not been extensively reported. In contrast, aberrant collagen expression has been long associated with malignant tumor growth, invasion, chemoresistance, and patient outcomes. We hypothesized that somatic mutations in collagens could functionally alter the tumor microenvironment, including the extracellular matrix.
We used publicly available datasets including the Tumor Cancer Genome Atlas (TCGA) to interrogate somatic mutations in collagens in stomach adenocarcinomas. To demonstrate that collagens were significantly mutated above background mutation rates, we used a moderated Kolmogorov-Smirnov test along with combination analysis with a bootstrap approach to define the background. Association between mutations and clinicopathological features was evaluated by Fisher or chi-squared tests. Association with overall survival was assessed by Kaplan-Meier and the Cox-Proportional Hazards Model. Gene Set Enrichment Analysis was used to interrogate pathways. Immunohistochemistry and in situ hybridization tested expression of COL7A1 in stomach tumors.
In stomach adenocarcinomas, we identified individual collagen genes and sets of collagen genes harboring somatic mutations at a high frequency compared to background in both microsatellite stable, and microsatellite instable tumors in The Cancer Genome Atlas (TCGA). Many of the missense mutations resemble the same types of loss of function mutations in collagenopathies that disrupt tissue formation and destabilize cells providing guidance to interpret the somatic mutations. We identified combinations of somatic mutations in collagens associated with overall survival, with a distinctive tumor microenvironment marked by lower matrisome expression and immune cell signatures. Truncation mutations were strongly associated with improved outcomes suggesting that loss of expression of tumor cell secreted collagens have large impacts on tumor progression and treatment response.
These observations highlight that many minor collagens, expressed in non-physiologically relevant conditions in tumors, secreted from tumor cells, harbor impactful somatic mutations in tumors, suggesting new approaches for classification and therapy development in stomach cancer. In sum, these findings demonstrate how classification of tumors by collagen mutations identified strong links between specific genotypes and the tumor environment.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5
This is a list of supplementary files associated with this preprint. Click to download.
Additional File 1 Table S1. MutSig 2CV v3.1 analysis of significantly mutated collagen genes in STAD TCGA cohort. Data downloaded from Firebrowse.
Additional File 2 Table S2. Average expression level of each collagen gene in the STAD TCGA cohort. Values are RSEM.
Additional File 3 Table S3. Collagen gene combinations identified from combinatorics approach.
Additional File 4 Table S4. Summary of COL7A1 protein expression in stomach tumors from Rhode Island Hospital as assessed by immunohistochemical staining.
Additional File 5 Supplemental Figure Legends Figure S1. Alteration frequencies of collagens in ACRG and HK/Pfizer datasets. A. Alteration frequencies of sequenced collagens in other stomach cancer cohorts. B. Kaplan-Meier analysis of COl11A1, COL5, COL4, COL6 mutations in the ACRG targeted sequencing dataset compared to the same set of collagen genes in the TCGA cohort. Figure S2. Survival analysis of somatic mutations in each collagen gene. A. Kaplan Maier analysis of tumors with any type of mutation in each collagen gene across the whole STAD TCGA cohort. Tumors with the designated collagen mutation are in red. Wild-type tumors are in blue. P-values determined by log-rank test. B. Kaplan Maier analysis of tumors with truncation mutations in each collagen gene across the whole STAD TCGA cohort. C. Kaplan Maier analysis of tumors with any type of mutation in each collagen gene in MSIH cases. D. Kaplan Maier analysis of tumors with truncation mutation in each collagen gene in MSIH cases. Figure S3. Identification of combinations of collagens genes associated with overall survival relative to background. A. All mutations across the whole TCGA cohort. B. Representative examples of combinations of 2 collagens associated with overall survival. C. Combinations of all mutations in MSS tumors only. D. Combinations of all mutations in MSIH tumors only. E. Example of collagen genes with truncation mutations most frequently associated with overall survival when combined, classify MSIH tumors into high and low overall survival risk. Figure S4. Collagen mutations have MSIH and MSS context dependent differences in overall survival. Mutations in COL5A3 and COL14A1 have different associations in MSIH and MSS tumors even though the total number of mutations is similar. Figure S5. MSIH and MSS tumors have distinct microenvironments in TCGA. A. MSI status was associated with outcome in ACRG but not in TCGA. B. Comparison of MSIH and MSS stomach tumors by pre-ranked GSEA reveals differences in expression. Each heatmap plots the Normalized Enrichment Scores (NES) from the GSEA. NABA ECM gene sets were expressed higher in MSS tumors compared to MSIH tumors. Many immune cell expression signatures including cytotoxic cells were expressed higher in MSIH tumors compared to MSS tumors. B cells were expressed higher in MSS tumors. The majority of cancer hallmark expression signatures were expressed significantly higher in MSIH tumors compared to MSS tumors. Figure S6. Pre-ranked GSEA of collagen mutation combinations in Table S3 for the whole TCGA STAD cohort shows consistent impact for each mutation combination. A. Hallmarks for combinations with both missense and truncation mutations. B. The NABA and immune signature genes sets for combinations with both missense and truncation mutations. C. Hallmark, NABA, and immune signature gene sets for combinations with just truncation mutations. D. Clustering of hallmark gene sets for tumors with missense mutations only in the whole TCGA cohort showed significant difference for the EMT hallmark relative to overall survival. P-value calculated by Kolmogorov-Smirnov. Figure S7. In MSS cases, pre-ranked GSEA of tumors with either a missense or truncation mutation combination as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. For all mutations in MSS cases, some hallmarks such as EMT were associated with overall survival as shown in the heat map and box plot. P-value calculated by Kolmogorov-Smirnov. B. NABA ECM and immune signature gene sets in MSS tumors. Basement membrane and macrophage signature gene sets were among the gene sets most associated with overall survival, showing consistent downregulation in tumors with mutant collagens and higher expression in wild-type tumors. P-value calculated by Kolmogorov-Smirnov. Figure S8. In MSIH cases, pre-ranked GSEA of tumors with either a missense or truncation mutation combination as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Clustering of hallmark gene sets partitions tumors with collagen combinations by overall survival. Box plot shows the significant difference in the EMT hallmark as defined by combinations associated with high or low risk of overall survival. B. NABA gene sets showing large differences in Basement Membrane and ECM Affiliated gene sets relative to overall survival. C. Immune cell signature gene sets showing large difference in Tregs and Macrophage expression signatures. P-value calculated by Kolmogorov-Smirnov. Figure S9. In MSIH cases, pre-ranked GSEA of tumors with only missense mutation combinations as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Hallmark gene sets. B. NABA ECM sets. C. Immune cell gene signatures. P-value calculated by Kolmogorov-Smirnov. Figure S10. In MSIH cases, pre-ranked GSEA of tumors with only truncation mutation combinations as listed in Table S3 show impact of collagen mutations on pathways some of which are correlated with overall survival. A. Hallmark gene sets. B. NABA ECM and immune cell signature gene sets. P-value calculated by Kolmogorov-Smirnov. Figure S11. COL7A1 somatic mutations resemble inherited germline mutations found in collagenopathies. A. Distribution of somatic variants in TCGA STAD is similar to the germline variants observed in DEB as determined by Kruskal-Wallis test. B. Lollipop plot showing the distribution of variants on the COL7A1 protein domain map. A recurring truncation variant is found in the collagen domain in exon 73. Other variants only were observed once or twice, but have redundant impacts in each domain. Figure S12. COL7A1 is expressed in some tumor cells in STAD. Representative images of COL7A1 protein and RNA expression in stomach adenocarcinoma. A. Immunohistochemistry (A, C, E) and in situ hybridization (B, D, F) for COL7. Stromal localization in C, E, D, and F, and mixed stromal and carcinoma localization (at white arrows; A, B). B. Higher magnification of panels A and B from S7A showing expression by IHC in panel A and ISH in panel B of COL7A1 in epithelial regions. The arrow shows ISH signal in tumor cells. C. Representative images at higher power of COL7A1 protein expression by IHC in the epithelium and stroma.
Loading...