Ethics statement
Ethical approval was obtained from the local ethical review board (D 453/10 and D 525/15) of the University Hospital Schleswig-Holstein, Kiel, Germany.
Study population and histology
Discovery cohort (Table 1)
Between 2016 and 2017, we prospectively enrolled patients with an adenocarcinoma of the stomach or esophagogastric junction into the discovery cohort at the University Hospital Schleswig-Holstein, Campus Kiel. All patients were Caucasian patients from Northern Germany treated in a single center. The inclusion criteria were appropriate size of the primary tumor (diameter >3 cm) to enable multiregional tissue sampling without compromising the surgical pathological evaluation of the resection specimen. Immediately after the tumor was resected, the specimens were delivered on ice to the Department of Pathology. Depending on the size of the primary tumor, between 3 and 6 samples were punched out of the primary tumor using a core needle biopsy and frozen at -80°C until further use. Macroscopic pictures were taken from the surgical resection specimens before and after tissue sampling in order to facilitate anatomical reconstruction of the sampling procedure (Suppl. Figure 1).
Validation cohort
The validation cohort was collected from the archive of the Institute of Pathology, University Hospital Schleswig-Holstein, Campus Kiel. The cohort included all patients who had undergone either a total or partial gastrectomy for adenocarcinoma of the stomach or esophagogastric junction between 1997 and 2009. All tissue samples originated from routine therapeutic surgeries, for all of which the patients had given written informed consent. The following patient characteristics were retrieved: type of surgery, age at diagnosis, gender, tumor size, tumor localization, tumor type, depth of invasion, number of lymph nodes resected and number of lymph nodes with metastases. Patients were included if an adenocarcinoma of the stomach or esophagogastric junction was histologically confirmed. Exclusion criteria were defined as 1) histology identified a tumor type other than adenocarcinoma, and 2) patients had undergone perioperative or neoadjuvant chemo- or radiotherapy. Each resected specimen had undergone gross sectioning and histological examination by trained and board-certified surgical pathologists. For outcome analyses, the dates and causes of patients’ deaths were obtained from the Epidemiological Cancer Registry of the state of Schleswig-Holstein, Germany, thereby distinguishing between tumor-related deaths and deaths from other causes. Follow-up data of those patients who were still alive were retrieved from hospital records and general practitioners. All patient data were pseudonymized after study inclusion.
DNA sequence analysis (discovery cohort)
Genomic DNA was extracted from frozen tissue using the QIAamp DNA mini kit (Qiagen, Hilden, Germany). Cryosections were prepared prior to DNA isolation to guarantee tumor cell content. DNA Exome libraries were prepared using the Nextera Rapid Capture Enrichment Kit, CEX version (Coding Exome Oligos; Illumina, San Diego, USA). Sequencing was performed on a Hiseq4000 instrument (Illumina) with 1% phiX (v3, Illumina) spike-in at 2*75 bp paired-end settings with the 150bp SBS chemistry.
DNA isolation from formalin-fixed and paraffin embedded tissue specimens
Genomic DNA was extracted from formalin-fixed and paraffin-embedded tissue using the QIAamp DNA mini kit (Qiagen, Hilden, Germany). Tissue sections were manually microdissected prior to DNA isolation to ensure a tumor cell content of higher than 80%. The integrity and amplifiability of the isolated DNA was evaluated by a qualitative size range PCR assay.
Primary data analysis
Raw fastq data were quality-trimmed, and adapter sequences were removed using bbduk from the BBTools suite version 36.32 (http://sourceforge.net/projects/bbmap) with the following parameters: minlen=25 qtrim=rl trimq=10 ktrim=r k=25 mink=11 hdist=1 overwrite=true tbo=t tpe=t. The Burrows-Wheeler aligner 0.7.15 (https://arxiv.org/abs/1303.3997) with default parameter settings was used to align the sequencing reads to the human reference genome (hs37d5). Duplicate reads were marked with sambamba 0.6.3 (31) and indel realignment was performed using ABRA version 0.97 (32). FastQC 0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Qualimap 2.2 (33) were used to perform quality checks on the fastq and bam files respectively.
Somatic mutation calling
Paired tumor-normal variant calling was performed using VarDict 1.5.1 (34) with the following parameters: mapping quality Q=10, base quality phred score q=20, minimum allele frequency f=0.01, and number of nucleotides to extend for each segment x=2000. Additionally, the read position filter P=0.9 and maximum number of reads with mismatches m=4.25, were supplied to var2vcf_paired.pl with further downstream filtering steps to improve detection of low frequency variants as described by Brad Chapman (http://bcb.io/2016/04/04/vardict-filtering/). ANNOVAR(35) was used to annotate variants utilizing refGene, cosmic84, clinvar_20170905, icgc21, nci60, exac03, exac03nontcga, snp142, avsnp150, 1000g2015aug_all, ljb26_all, dbnsfp33a, and intervar_20180118 databases. Homopolymer regions were marked using vcfpolyx from the jvarkit suite (https://github.com/lindenb/jvarkit). Variants were retained if the following criteria were met: allele frequency (AF)≥5%, total read depth ≥10 in either the tumor or the normal sample, variant depth in tumor sample ≥2, no strand bias, AF>10% in homopolymer regions. Variants in blacklisted regions described by Fuentes et al. (36) as well as ENCODE (37) were filtered out. In addition, annotation-based filtering was utilized to retain only coding, non-synonymous variants with ExAC AF<0.01 (38). Variants unknown to either COSMIC or ICGC were retained only if they occurred within one of the 719 genes of the COSMIC Cancer Gene Census (https://cancer.sanger.ac.uk/census, downloaded June 12, 2018)(39)(https://dcc.icgc.org/). An additional variant recovery process on the list of filtered variants was implemented on a per patient basis to recover variants from individual samples with AF<5% if they were present in at least one other sample of the same patient with AF≥5%. Tumor purity and ploidy were computationally estimated using Sequenza (40). Clonality and cancer cell fraction (CCF) for each variant was determined using Palimpsest (41). In brief, CCF is computed by adjusting the variant allele fraction for the tumor purity and the absolute copy number at each locus in tumor and normal cells. Mutations were classified as subclonal if the upper boundary of the 95% confidence interval was below the threshold of 0.95. COSMIC Mutational Signatures Version 2 were inferred using the R package deconstructSigs (42)
Copy number profiling
Allele-specific copy number calling was done using CNVkit 0.9.5 (43). A pooled normal reference was created from the nine matched non-neoplastic stomach mucosa samples. The initial segments were called with a conservative significance threshold of t = 1e-6 and low coverage segments were dropped. Segments were then used along with raw variant calls from VarDict, the estimated tumor purity, to call major and minor copy number variants and annotated by ANNOVAR based on the refGene database. Calls were considered as deletions when total copy number was 0, and as amplifications when total copy number was at least 6.
Tumor mutation burden, microsatellite instability and viral sequence analysis
Tumor mutation burden was calculated for each sample in terms of the number of non-synonymous variants per 1 Mb and scaled according to the exome panel size. Microsatellite instability (MSI) status was determined by MSIsensor(44) with a threshold of <10% for MSS (microsatellite stable), <10% and >30% for MSI-L (low), and >30% for MSI-H (high). To screen for viral integration events, unmapped reads were aligned against a sequence database of 198 human viruses (EBV, human papilloma virus, herpes simplex virus, among others).
Phylogenetic analysis
Patient-specific multiregional trees from CCF data were constructed using LICHeE.(45) Since LICHeE is limited to constructing trees based on single nucleotide variations (SNV) only, copy number variants (CNV) were manually incorporated into the SNV-based trees. Driver CNVs as identified by Cancer Genome Interpreter, were added to the SNV-based trees; for some patients this did not result in any changes to the tree nodes, whereas for some patients the internal tree nodes were redrawn to reflect the additional CNVs. In addition, we constructed maximum parsimony trees based on a binary matrix of SNVs per patient using a branch-and-bound algorithm with PHYLIP (Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle.).
Assessment of neutrality
Variant allele frequency (VAF) histograms were used to test the neutral model of cancer evolution as described by Williams et al. (17) using their neutralitytestr package.
To reduce the probability that apparent deviation from neutrality is caused by increase in allelic frequency due to gene duplication events, all variant alleles that had likely undergone gene doubling were removed, as shown by Williams et al. (46). All detected non-synonymous and synonymous mutations were included since a larger number of passenger mutations whose frequency had been increased by advantageous mutation supports the detection signs of selection (47).
We conducted our analysis on VAFs without the purity correction, assuming that sample purity affects all variant frequencies equally. Therefore, a correction is unlikely to increase the resolution of our analysis. In addition, spatial constraints can introduce sampling bias into patterns of the clonal selection of the tumor (48), therefore the analyses also included the average frequency of mutations from all available samples.
Validation analyses using Sanger sequencing, pyrosequencing and digital polymerase chain reaction
In order to validate of the heterogeneous mutational patterns, Sanger sequencing analysis of ASXL3, TP53 (exon 5 and 8) and SMAD4 (exon 2, 9 and 11) was done using the PyroMark PCR Kit (Qiagen). PCR products were purified using the NucleoSpin® Gel and PCR Clean-up (Machery-Nagel, Düren, Germany) and sequenced by dye terminator cycle sequencing (BigDye Terminator v1.1 Cycle Sequencing kit, Applied Biosystems, Darmstadt, Germany) with universal M13- or PCR Primers. The sequencing products were purified using the DyeEx 96 Kit (Qiagen) and analyzed on a Genetic Analyzer 3500 (Applied Biosystems). Pyrosequencing, using the PyroMark PCR Kit (Qiagen) and the PyroMark Gold Q24 Reagents (Qiagen) were done to detect SNPs in BRCA1, BRCA2, CDH1, CTNNB1, KRAS, MLH1, MUTYH, PIK3CA, POLE, and RNF43. The PyroMark Q24 System and PyroMark analysis software (both Qiagen) were used for analysis. To validate low frequent mutations in ARID1A, ARID1B, AKT1, CLOCK, FLT4, IKBKB, IKZF3, LRP1B, MAP2K4, MCM8, PAX5, PRRC2A, and TP53BP digital PCR were done using the ddPCR™ Supermix for Probes (No dUTP) and the QX200™ Droplet Digital™ PCR System (both Biorad) following the manufacturer’s instructions. The primer sequences used are listed in Suppl. Table 1. Suppl. Table 2 summarizes the validated mutations.
Histology
Tissue specimens used for histology and immunohistochemistry were fixed in formalin and embedded in paraffin. Deparaffinized sections were stained with hematoxylin and eosin. Histological examination of primary tissue sections was carried out for all cases (discovery and validation cohort) to assure if inclusion criteria were met. Tumors were classified according to the Laurén classification (49). pTNM-stage of all study patients was determined according to the eighth edition of the UICC guidelines (50).
Immunohistochemistry, scoring of SMAD4-immunostaining and virtual microscopy
Immunohistochemistry was carried out with antibodies directed against SMAD4 (dilution 1:50; monoclonal rabbit; 50, Cell Signaling Technology Europe, Frankfurt am Main, Germany) and PD-L1 (dilution 1:100, E1L3N, Cell Signaling Technology) using whole tissue sections. Immunostaining was performed with the autostainer Bond™ Max System (Leica Microsystems GmbH, Wetzlar, Germany). The immunoreaction was visualized with the Bond™ Polymer Refine Detection Kit (brown labelling; Novocastra; Leica Microsystems, Wetzlar, Germany).
Scoring of each tumor was assessed by determining a histoscore (H-score), following a semi-quantitative approach combining both the immunostaining intensities (subsequently referred to as IHC-scores) and the percentages of positive cells of the tumor. The IHC-score was based on tumor cells showing either strong (3+), intermediate (2+), weak (1+) or no (0) staining of SMAD4 in the cytoplasm and nucleus, respectively. Tumor cells without detectable cytoplasmic or nuclear staining were scored with 0. The percentage of positive tumor cells (approximated to the nearest 10) showing the defined staining intensities (3+, 2+, 1+, 0) was gauged with respect to all tumor cells visible on each tissue specimen and always added up to a total of 100% tumor cells. Finally, a H-score was calculated according to the formula: H-score = [0 x percentage of immunonegative tumor cells] + [1 x percentage of weakly stained tumor cells] + [2 x percentage of intermediately stained tumor cells] + [3 x percentage of strongly stained tumor cells]. The maximum possible H-score was 300, if all cells of a given tumor sample showed a strong staining: [0 x 0%] + [1 x 0%] + [2 x 0%] + [3 x 100%] = 300. The multipliers within the formula yielded an improved stratification of the H-scores: tumor samples with a predominantly high staining intensity and such samples with a predominantly low staining intensity were more distinctively separated.
For virtual microscopy with area analysis for tumor heterogeneity, tissue slides were scanned using a Leica SCN400 microscopic scanner (Leica Biosystems, Nussloch, Germany).
MDM2 fluorescence in situ hybridization
Analysis of MDM2 amplification was done by fluorescence in situ hybridization using the Vysis MDM2/CEP 12 FISH Probe Kit (Abbott Diagnostika MediSense, Wiesbaden, Germany) following standard procedures. The results of FISH were evaluated by screening the entire tissue sections. Subsequently, MDM2 and centromer 12 signals were counted in at least 20 representative adjacent cancer cell nuclei within the invasive region. The presence of FISH clusters was noted and the ratio of MDM2/centromer 12 signals was calculated. The gene count was calculated by dividing the number of MDM2 gene signals by the number of cancer cell nuclei studied.
Assessment of further clinicopathological characteristics
pylori- (51), Epstein-Barr virus- (4), microsatellite (MSI)(6), and HER2 status (9) were assessed as described previously.
Statistical methods
SPSS version 24.0 (IBM Corp., Armonk, NY, USA) was used for statistical analyses. The correlation between non-ordinal clinicopathological patient characteristics and SMAD4 was tested with Fisher’s exact test. T category, N category, UICC stage and tumor grading as variables of ordinal scale were tested with Kendall’s tau-test. Median survival with 95% confidence intervals was determined by the Kaplan-Meier method. Differences between median survivals were tested with the log-rank test. A multivariate survival analysis (Cox regression) was performed. A p-value of ≤0.05 was considered to be significant. All p-values are given uncorrected. The Siemes (Benjamini-Hochberg) procedure was applied to compensate for false discovery rate. Any P-values that lost significance are marked.