Patient cohort for whole exome sequencing (discovery cohort)
The clinicopathological characteristics of the discovery cohort are summarized in Table 1. A total of 45 samples were obtained from the primary tumors (3 to 7 samples per case; Table 1; Figure 1). In a single case, three samples were collected from three separate lymph node metastases. Including the non-neoplastic stomach mucosa, we finally obtained whole exome sequences from 57 tissue samples (4 to 11 samples per case). The median sequencing coverage was 163x for tumor samples and 170x for non-neoplastic mucosa (Suppl. Table 3). Median tumor purity (computationally assessed by Sequenza) was 33% (12%-98%) (Suppl. Table 7). Raw sequencing data were deposited at the European Genome Archive (EGAS00001004525).
Somatic signatures
We analyzed the observed somatic signatures in their 5’ and 3’ base context thus resulting in 96 possible mutation types.12 Based on the exome data, we retrieved the individual mutational signature from each tumor sample in relation to the non-tumorous tissue. We found that the somatic signatures of GCs were consistent with previous reports: signature 1 was found in 48 tumor samples (100%), signature 15 in 48 (100%), signature 18 in 29 (60.4%), signature 10 in 3 (6.3%) and signature 29 in 2 samples (4.2%).12
Non-synonymous somatic mutations
The results of the whole exome sequencing of the discovery cohort are summarized in Suppl. Table 4-10. Results from the independent validations of SNVs using Sanger sequencing, pyrosequencing or ddPCR™ demonstrating concordance are presented in Suppl. Table 2.
The discovery cohort harbored 16,537 non-synonymous mutations (i.e. missense, nonsense and frameshift; Suppl. Table 4), which were unevenly distributed among patients and patient samples (Figure 2). In the individual patient, the number of non-synonymous mutations ranged from 181 to 3111 (median: 369), and in the individual tumor sample, it ranged from 49 to 2348 (median: 159). The highest mutational burden was found in the MSI GC, with regard to patient (n=3111) and sample (n=2348).
The non-synonymous mutations were not evenly distributed among the patient samples. Between 8.7% and 46.8% of the non-synonymous mutations were found in all samples of the same patient (= “homogenous” distribution; Table 1). The vast majority of non-synonymous mutations (53.2-91.3%) was not present in each sample of the patient (= “heterogeneous” distribution). With regard to all non-synonymous mutations (n=4413 genes), mutations in 1642 (37.2%) genes were homogenously distributed and for 3272 (74.1%) genes they were heterogeneously distributed (Suppl. Table 8). Interestingly, mutations in 501 (11.3%) genes were both, homo- or heterogeneously distributed, albeit in separate cases (Suppl. Table 8).
The “spatial” heterogeneity of the mutational landscape was further diversified by genetic heterogeneity: 398 genes harbored two to five different non-synonymous mutations in the same patient (Suppl. Table 9). Again, the highest number of “multiple” non-synonymous mutations/genes was found in the MSI GC (320 genes with ≥2 mutations). The prevalence of genes with ≥2 non-synonymous mutations varied from 3 to 320 per patient and was also not evenly distributed among the different samples (Suppl. Table 9). The different mutations were present in the same as well as in different patient samples.
In a single case sequence data were obtained from three separate lymph node metastases. Similarly, the number and type of non-synonymous mutations varied between each lymph node metastasis and between the primary tumor and the three lymph node metastases. Interestingly, 172 mutations present in the primary were not detected in the lymph node metastases, while 58 mutations found in the lymph node metastases were absent in the samples obtained from the primary tumor (Suppl. Table 10).
Collectively, these data provide evidence that adenocarcinomas of the stomach and gastroesophageal junction show substantial genetic and spatial intratumoral heterogeneity.
Copy number variation
Next, we sought CNVs, i.e. homozygous or heterozygous deletions and amplifications. A total of 219 genes showed copy number variations, the majority of which were heterogeneous (Suppl. Table 13; Figure 3). The number of genes with CNVs per case ranged from 0 to 98 (Table 1). The highest number was found in case #5. Interestingly, this case did not have any obvious drivers in terms of SNVs and showed strong amplifications in MDM2, CD274 (PD-L1), JAK2, MYC, NOTCH2 as well as deletions in POLE1 and TGFBR2 in individual samples, which would point towards heterogeneity. Across all samples, we found examples for HER2 (validated independently by chromogenic in situ hybridization; case #3 and #4), MYC (case #5) and CDK12 (case #3 and #4) amplifications as well as CDKN2A (case #2, #3, #4 and #8), TP53 (case #3) and PTEN (Case #3 and #4) losses (Suppl. Table 13).
The homogeneous amplification of MDM2 and the heterogeneous amplification of CD274 (PD-L1) in case #5 were validated independently by fluorescence in situ hybridization (MDM2) and immunohistochemistry (Figure 3).
Clonality analyses
In view of the marked intratumoral heterogeneity we then assessed clonality and cancer cell fraction (CCF) on a per-sample basis and on a per-patient basis (Table 1). On a per sample basis, the percentage of clonal SNVs (non-synonymous or synonymous) ranged from 1 to 99% for non-synonymous mutations (Suppl. Table 6) and from 0 to 98.6% for synonymous mutations (data not shown). However, when clonality was assessed on a per-patient basis (all data for each patient were combined into a “single sample”), the number of clonal mutations ranged from 0 to 80 (median 1.0) for non-synonymous mutations and from 0 to 32 (median 1.0) for synonymous mutations (Table 1). These data show that a single sample analysis cannot reliably assess the clonality status of a somatic mutation and that the vast majority of the mutational landscape of GC is subclonal. However, the TP53 mutations in case #1, #2 and #3 were classified as clonal in the per-sample and in the per-patient analysis.
To further assess the reliability of the clonality estimations in our discovery cohort, we applied another approach, which was described by Opasic et al.13 and Werner et al.14 The number of samples required for correct identification of clonal mutations within the cancer with high certainty (>95%) was estimated. First, the “balance factor” g was assessed for every individual tumor by fitting the information gain with each multi-region sample to a theoretical curve. Next, the information from the fully reconstructed multiregional trees and the branch-defining subclonal alterations were used to perform the estimation (Figure 4A). Six (case #1, #2, #3, #4, #8, #9) tumors were considered to be highly unbalanced, which implies that the number of “truly” clonal mutations was indeed low. The identification of these mutations would require the sequencing of many additional tumor samples. In one patient (case #6) with a fairly balanced phylogenetic tree (g=0.56), five samples were sufficient for the identification of clonal mutations with a probability >90%. For two other patients (case #5 and #7) with low estimated values of g (g=0.2 and g=0.01) we could be fairly certain (>98% probability) that mutations from the root of the phylogenetic tree were indeed clonal using the existing number of samples (Figure 1).
Collectively, these data show that the assessment of clonality depends on the number of samples studied and the extent of intratumoral heterogeneity.
Evolutionary trajectory
Next, in order to investigate the variability in the mode of evolution in our discovery cohort we compared the variant allele frequency (VAF) histograms with the neutral model of cancer evolution as described initially by Williams et al.11 using their neutralitytestr package (Figure 4B; Suppl. Figure 2).
To reduce the probability that apparent deviation from neutrality is caused by increase in allelic frequency due to gene duplication events, all variant alleles that had likely undergone gene doubling were removed, as shown by Williams et al.15 All detected non-synonymous and synonymous mutations were included since a larger number of passenger mutations whose frequency had been increased by advantageous mutation supports the detection signs of selection.16
We conducted our analysis on VAFs without the purity correction, assuming that sample purity affects all variant frequencies equally. Therefore, a correction is unlikely to increase the resolution of our analysis. In addition, spatial constraints can introduce sampling bias into patterns of the clonal selection of the tumor,17 therefore the analyses also included the average frequency of mutations from all available samples.
The VAF-histograms (Figure 4B; Suppl. Figure 2) show that the majority of VAF distributions found in individual samples is compatible with a neutral expectation, i.e. no selection. However, there is a high degree of heterogeneity in deviation from neutrality within each tumor, and VAF profiles collected from individual samples can lead to very different results for the evolutionary dynamics of cancer compared to the combination of all samples (Figure 4C).
This could be a result of distinct evolutionary trajectories within different parts of the tumor, or spatial effects of tumor growth and consequential sampling effects. Although we reject the neutral hypothesis in many cases, we do not observe clear subclonal peaks in our VAF-histograms that would imply strong selection in parts of the tumor.
Pathway analysis including copy number variation
Next, we assigned non-synonymous mutations and CNVs to pathways, which have been linked to GC, i.e. the SWI/SNF, TGF-b, Hippo, sonic hedgehog, NOTCH, WNT and JAK-STAT (Figure 2). The MSI GC showed the highest number of mutations (n=10) in a single pathway, while the remainder showed alterations in 1-5 genes per pathway (Suppl. Table 14). Again, a homogenous distribution of mutations was the exception and a heterogeneous distribution was the rule (Suppl. Table 14). Interestingly, homogeneous mutations occurred most commonly in the WNT signaling pathway (case #2, #3, #6, #8), the SWI/SNF pathway (case #3, #4, #6, #8) and the JAK/STAT-pathway (case #2, #5, #6). No homogenous mutation was found in the Notch pathway.
Recently, Park et al.18 demonstrated in animal models that SMAD4 cooperates with p53 loss to promote the development and metastatic progression of GC. In support of the findings made in their animal model, our pathway analysis shows that four of five cases with TP53 mutations (including all three cases with clonal TP53 mutation) also had alterations in the TGFb-pathway. Furthermore, SMAD4 mutations and losses were only found in TP53 mutant cases (Suppl. Table 14). In addition, all TP53 mutant cases showed alterations in the SWI/SNF pathway. He et al.19 provided evidence that members of the SWI/SNF pathway regulate cellular senescence via the p53/p21 and p16/pRB pathways.
Collectively, these findings lend support to the hypothesis that different pathways are effective during cancer development and progression, pointing towards interdependency: cancer progression in TP53 mutant GCs is linked to subsequent (putative subclonal) alterations in the TGFb- and SWI/SNF pathway.
Phylogenetic analysis
Next, we generated multiregional trees based on driver SNVs and CNVs. As shown in Figure 1 all tumors of our cohort provided evidence of branched somatic evolution, with the most complex being the MSI GC, confirming data published recently by von Loga et al.7 In addition, we inferred maximum parsimony trees in accordance with Lee et al.5 (Suppl. Figure 4). The authors observed a common phylogeny pattern of five cases with GC in which the primary genome is branched from a trunk while all the lymph node genomes (n=3 for each of the 5 cases) cluster in a separate branch. We did not observe this pattern for the single sample (case #5) for which lymph node data was available. In contrast to Lee et al.,5 in case #5 the lymph node metastases did not cluster together in a separate, distinct branch, but rather clustered with different individual samples from the primary tumor (Figure 1). Comparing the multiregional trees with the maximum parsimony trees following the Lee et al.5 methodology, one can observe similar trees with differences attributed to methodology as well as the additional CNV data included in the multiregional trees. Generating LICHeE based multiregional trees including passenger mutation and/or synonymous mutations generated similar relationships in the trees (data not shown).
With regard to the TGFb-pathway, it was interesting to note that the SMAD4 mutations of four cases with TP53 mutations (i.e. cases #1, #2, #3, and #4) were sublconal and that different mutations of SMAD4 aligned with different subclones providing circumstantial evidence for parallel somatic evolution (Figure 1).
Decreased or lost expression of SMAD4 is associated with worse patient outcome
Finally, we assessed the putative effect of SMAD4 alterations on patient prognosis. Using immunohistochemistry, we studied the expression of SMAD4 in case #1, #2, #3, and #4. Interestingly, SMAD4-immunostaining was heterogeneous and, occasionally, even a black-and-white staining pattern was noted, i.e. areas with complete loss of SMAD4 expression (both nuclear and cytoplasmic) were clearly demarcated from areas with retained expression (Figure 5).
Using a validation cohort (described in detail in Supplemental Materials and Methods), we aimed to test the hypothesis that a decreased or lost expression of SMAD4 would correlate with clinicopathological patient characteristics and patient outcomes. SMAD4 expression was studied using whole tissue sections and a validation cohort of 487 GCs (Figure 5; Suppl. Data 8; for further details see Supplemental Results).
A decreased expression of cytoplasmic SMAD4 (Q1-3 vs. Q4) correlated significantly with local tumor growth (p=0.003; significant after correction for multiple testing) and UICC stage (p=0.007; lost significance after multiple testing). No significant correlation was found between nuclear SMAD4 expression and any clinicopathological patient characteristic (Table 2).
The entire validation cohort showed a median overall survival (OS) of 14.1 months and a median tumor specific survival (TSS) of 15.5 months. Patient prognosis significantly depended on the Laurén-phenotype, T-, N-, M-, L-, V-, Pn- and R-category, UICC-stage, lymph node ratio and cytoplasmic SMAD4 expression. Patients with SMAD4 loss showed significantly lower median overall survival (OS; 13.4 months, 95% C.I. 11.2 - 15.6; significant after multiple testing correction; p=0.001) and tumor specific survival (TSS; 14.9 months, 95% C.I. 12.1 - 17.7; significant after multiple testing correction; p=0.003) compared with retained SMAD4 expression (OS: 22.2 months, 95% C.I.10.0 - 34.5; TSS: 25.0 months, 95% C.I. 11.5 - 38.5)(Figure 5).