Of the 5 patients, mean sequencing depth of the tumor (primary and LNM) and normal tissue were 96x and 76x respectively (Table s1).
SNV pattern
In total, we identified 7202 nonsynonymous somatic mutations, including 2808 (range: 400-688) mutations in primary tumor group, and 4394 (range: 239-2344) mutations in LNM group. The main variant type is ins and the main SNV class is C -> T substitution.(Figure S1) We also compare the nonsynonymous somatic mutations between primary tumor group and LNM group (Figure S2), and discovered no significant difference between the two groups.
Driver mutations identified in the 10 tumors by Varcode were TP53, APC, ERBB3, FBXW7 and SMARCB1(figure 1). These driver mutations could be classified as shared (present in both primary and metastatic tumors) and private (present only in the primary or metastatic tumors) for further analysis. TP53 was the only shared mutation and was detected in 2 patients (GBC-1 and GBC-2). The mutation pattern of TP53 in GBC-1 was in frame deletion at the same locus ( 17:7577514-7577517, GTGA->G). Similar mutation was reported in COSMIC (17:7577514-7577516, p.T256del , Deletion - In frame, and 17:7577515-7577516, p.L257Gfs*6, Deletion - Frameshift). And in GBC-2, it was missense mutation at same locus (17:7577539, G->A), which was not reported in COSMIC. Private mutation was identified in 3 patients either additional to shared mutation (GBC-1 and GBC-2) or merely in primary tumor (GBC-5). There were no known driver mutation in GBC-3 and GBC-4, which may imply other mechanism of oncogenesis [21, 22]. Theoretically, three driver genes are required to convert a normal cell to a cancer cell in solid tumors, and an average of approximately four driver genes were actually harbored per tumor[23]. In this study, none of the 10 tumor samples were recognized more than 2 known driver mutations. This result implies not all driver mutations could be recognized currently.
We also calculated tumor mutational burden (TMB), the median TMB was 5.82 per Mb and 5.49 per Mb in primary tumor group and LNM group respectively. Median non-silent TMB was 1.84 per Mb and 1.80 per Mb in primary tumor group and LNM group respectively (Figure S3). Significant higher TMB (24.65 per MB) was found in the LNM lesion of GBC-3. There were no significant difference between the two groups.
Mutational Signatures
Annotated with COSMIC V2, major signatures are similar in both groups. Signature 3, 1, 6, 12, 11, 22, 23, and 7 were detected in primary group (Figure S4). Signature 1 is age related represent a large numbers of C > T mutations. Signature 3 is strongly associated with germline and somatic BRCA1 and BRCA2 mutations in breast, pancreatic, and ovarian cancers. In pancreatic cancer, responders to platinum therapy usually exhibit signature 3 mutations[24]. These patients did not exhibit BRCA1 and BRCA2 mutations, although BRCA mutations have been reported in GBC patients[25]. Additionally, signature 9 was also enriched in LNM group. This signature is characterized by a pattern of mutations that has been attributed to polymerase η, which is implicated with the activity of Activation-induced cytidine deaminase (AID) during somatic hypermutation. Signature 12, 22 and 23 were also enriched in both groups (20% and 28%) implies there were some extent of similarity between GBC and liver cancer.
Phylogenetic tree
Based on SNVs, we identified two types of phylogenetic tree with Treeomics, according to previous classification of driver mutations as shared or private. The length of trunk and branch was corresponding to nonsynonymous somatic mutation numbers (Figure 2) . All known driver mutations were mapped on the tree. Since the VAF (Variant allele frequencies, calculated by total reads at the position carrying the variant/read depth at the position) of TP53 in GBC-2 was very low (1/36), Treeomics judged this mutation was unreliable. We then re-checked this mutation and manually included into analysis, since this mutation is shared in both primary tumor and LNM, and mutate at same locus (17:7577539, G->A). Linear or parallel progression models of metastasis were both identified in other tumors[26, 27]. The trees of our patients inferred both pattern of metastasis were existed. GBC-1 and GBC-2 inferred a linear pattern and GBC-5 inferred a parallel patten.
CNV
We then analyzed copy number variations. Compared to reported CNVs[9, 28], there were more losses than gains. In total, there were 3 gains and 11 losses in the samples (Figure 3). Losses of 8p23.3、9q21.11、14q32.13、16q23.1 were detected in several samples. Some CNVs were detected in both primary tumor and LNM. In GBC-1, gain of 17q21.1, and losses of 9q21.11 and 8p23.3 were detected. In GBC-2, 9q21.11 loss was detected. Losses of 8p23.3, 9q21.11 and 15q23.1 were detected in GBC-3.
Apart from the reported CNVs, we newly identified large segments alterations in both primary tumor and LNM in 2 patients. In GBC-5, the gain proportion was 0.99 and 0.75 in chromosome 7, and 0.69 and 0.65 in chromosome 20 for the primary tumor and LNM, respectively. In GBC-3, the loss proportion was 0.75 both in primary tumor and LNM in chromosome 10. Thus, whole arm gain and loss could be inferred in these two patients.
We further calculated CNV burden by dividing the copy numbers of loss or gain by all copy numbers founded with GATK (Figure S6). Median CNV loss burden was 0.013 and 0.043 in primary tumor and LNM respectively. Median CNV gain burden was 0.0009 and 0.0671 in primary tumor and LNM respectively. Though, there was no statistical significance in general due to limited patient number, higher CNV burden, either gain or loss was observed in LNM than their primary tumor in 3 patients.