Mutational landscape of paired primary and synchronous metastatic lymph node in chemotherapy naive gallbladder cancer

Comprehensive genomic analysis of paired primary tumors and their metastatic lesions may provide new insights into the biology of metastatic processes and therefore guide the development of novel strategies for intervention. To date, our knowledge of the genetic divergence and phylogenetic relationships in gallbladder cancer (GBC) is limited. We performed whole exome sequencing for 5 patients with primary tumor, metastatic lymph node (LNM) and corresponding normal tissue. Mutations, mutation signatures and copy number variations were analyzed with state-of-art bioinformatics methods. Phylogenetic tree was also generated to infer metastatic pattern. Five driver mutations were detected in these patients. Among which, TP53 was the only shared mutation between primary tumor and LNM. Although tumor mutational burden was comparable between primary tumor and LNM, higher mutation burden was observed in LNM of one patient. Copy number variations (CNVs) burden was higher in LNM than their primary tumor. Phylogenetic analysis indicated both linear and parallel progression of metastasis exist in these patients. TP53 mutation and CNVs were homogenously between primary tumor and LNM. High consistence of genetic landscape were shown between primary tumor and LNM in GBC. However, heterogenicity still exist between primary tumor and LNM in particular patients in term of driver mutation, TMB and CNV burden. Phylogenetic analysis indicated both Linear and parallel progression of metastasis were exist among these patients.


Introduction
Gallbladder cancer is the most common malignant tumor of the biliary tract worldwide. GBC exhibits wide geographical and ethnical variations in its incidence. The incidence rates are very high in Latin America (27.3/100,000 in female Mapuche Indians) and Asia (8.1/100,000 in male Koreans), moderate in some countries of eastern and central Europe (Hungary, Germany and Poland), low in the United States and most western and Mediterranean European countries (France, Norway and UK) [1]. China is among median rate of incidence [2]. However, based on the population scale, the annual cases are tremendous, which increases the burden of already overwhelming medical resources. Most GBCs, unfortunately, are discovered incidentally at routine cholecystectomy or present as advanced stage disease. Less than 20% of gallbladder cancer is eligible for potentially curative surgical resection at diagnosis [3]. The overall prognosis of GBC is very poor, despite recent improvement of chemotherapy, molecular targeted therapy and aggressive surgical resection with advances in perioperative care [4]. For advanced stage disease, the 5-year survival is only about 5% [5]. Regional nodal status and the depth of tumor invasion (T status) are the two most important prognostic factors [5]. Genetic landscape studies facilitated by next generation sequencing (NGS) has improved our understanding of this disease in terms of oncogenesis, metastasis and therapeutic [6][7][8]. However, only primary tumor of GBC was genetic profiled to date [9][10][11]. Emerging evidence suggest the importance of mutational profile between primary tumor and metastasis [12][13][14][15][16][17].
In this study we investigated the primary tumor and their synchronous lymph node metastasis (LNM) with whole exome sequencing (WES) in terms of somatic mutations, mutational signatures [18], and copy number variations (CNV) in order to define if there are distinct mutational landscapes between the two sites.

Material and methods
This study included five trios of samples, including primary tumor, metastatic lymph node (LNM), and matched normal tissue of gallbladder. Tumor cell contents from haematoxylin and eosin (H&E)-stained formalin fixed and paraffinembedded (FFPE) sections were estimated by a pathologist (W.Y.H).

Whole exome library preparation and sequencing
For each individual, the genomic DNA of cells from primary tumor, LNM and matched normal tissue sample was sequenced. DNA was extracted from FFPE tissue blocks using QIAGEN QIAAmp DNA extraction kits. The exome of each sample was captured using SureSelect Exon V6 (Agilent Technologies) according to the manufacturer's protocol. The product was quality checked and sequenced with illumina NovaSeq 6000, generating 2 × 150 bp pairedend reads.

Sequencing alignment and detection of somatic variants
The FastQC package (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc) was applied to assess the quality-score distribution of the sequencing reads. Paired-end reads were aligned to human reference genome (GRCh37) using the Burrows-Wheeler Aligner (BWA) with default parameters [19]. We then used the Picard-tools 1.76 to fix mate pairs and mark PCR duplicates (http:// Picard. Sourc eforge. net). The resultant aligned BAM files were then sorted and merged (if needed) using Samtools v0.1.19 [20]. After sorted by Samtools, the reads were subjected to local realignment and recalibration using the Genome Analysis Toolkit (GATK v3.8). Somatic substitutions and indels were called using MuTect2 mode in GATK on the GRCh37 genome with genomic DNA of white blood cells as the germline comparator following the best practices for somatic SNV/indel calling (https:// softw are. broad insti tute. org/ gatk/ best-pract ices/). Briefly, the algorithms compared the tumor with the matched normal sample to exclude germline variants. The algorithms applied the following criteria to increase calling reliability: (1) somatic mutations were not found in a panel of normal controls assembled from matched normal tissues; (2) somatic mutations were not located in the segmental duplication region marked by the UCSC browser (http:// genome. ucsc. edu/); (3) somatic mutations were not found in the 1000 Genomes Project (the Phase III integrated variant set release, across 2, 504 samples) with the same mutation direction. Mutations (SNVs/indels) were annotated with Varcode (https:// github. com/ openv ax/ varco de).

Mutational signature analysis
The R package: signature estimation was applied to infer the proportion of each known mutational signature proposed by Alexandrov et al. [18]. In total, 30 signatures reported by COSMIC (https:// cancer. sanger. ac. uk/ cosmic/ signa tures_ v2) were included in the analysis.

Phylogenetic tree construction
Treeomics v1.7.13, a Bayesian inference model, was applied to reconstruct the phylogeny of the tumor with the multiregion sequencing data for each case [21], which took the numbers of variant reads, read depth, chromosomal coordinates, gene symbol, and substitution pattern into account. Each phylogeny was rooted at the matched patient's normal sample and the leaves represented the primary or metastasis samples.

Estimation of the copy number variations
The GATK best practices for somatic copy number variations (CNVs) in exomes was applied to detect CNVs from the whole-exome sequencing data (https:// softw are. broad insti tute. org/ gatk/ best-pract ices/). We utilized ReCapSeg to estimate the somatic copy number, which is implemented as part of GATK (v4). Briefly, the read counts for each of the exome targets were divided by the total number of reads to generate proportional coverage. A panel of normal (PON) controls was built using proportional coverage from 5 normal samples and each of the tumor samples was compared with the PON, after which tangent normalization was applied. Circular binary segmentation (CBS) was then applied to segment the normalized coverage profiles. Sex chromosomes (X and Y) were excluded from this analysis. Referred to other studies [22,23] and took computational errors of the software into consideration, we defined copy number more than 2.5 as gain, and less than 1.5 as loss.

Statistics analysis
All statistical analyses were performed in R V4.0.2. Comparisons of continuous variables were performed using Mann-Whitney U. All statistical tests were two-tailed and P < 0.05 was considered statistically significant.

Results
Of the 5 patients, mean sequencing depth of the tumor (primary and LNM) and normal tissue were 96 × and 76 × respectively (Table S1).

SNV pattern
In total, we identified 7202 nonsynonymous somatic mutations, including 2808 (range: 400-688) mutations in primary tumor group, and 4394 (range: 239-2344) mutations in LNM group. The main variant type is ins and the main SNV class is C->T substitution ( Figure S1). We also compare the nonsynonymous somatic mutations between primary tumor group and LNM group ( Figure S2), and discovered no significant difference between the two groups.
Driver mutations identified in the 10 tumors by Varcode were TP53, APC, ERBB3, FBXW7 and SMARCB1 (Fig. 1). These driver mutations could be classified as shared (present in both primary and metastatic tumors) and private (present only in the primary or metastatic tumors) for further analysis. TP53 was the only shared mutation and was detected in 2 patients (GBC-1 and GBC-2). The mutation pattern of TP53 in GBC-1 was in frame deletion at the same locus (17:7577514-7577517, GTGA->G). Similar mutation was reported in COSMIC (17:7577514-7577516, p.T256del, Deletion-In frame; and 17:7577515-7577516, p.L257Gfs*6, Deletion-Frameshift). And in GBC-2, it was missense mutation at same locus (17:7577539, G->A), which was not reported in COSMIC. Private mutation was identified in 3 patients either additional to shared mutation (GBC-1 and GBC-2) or merely in primary tumor (GBC-5). There were no known driver mutation in GBC-3 and GBC-4, which may imply other mechanism of oncogenesis [24,25]. Theoretically, three driver genes are required to convert a normal cell to a cancer cell in solid tumors, and an average of approximately four driver genes were actually harbored per tumor [26]. In this study, none of the 10 tumor samples were recognized more than 2 known driver mutations. This result implies not all driver mutations could be recognized currently.
We also calculated tumor mutational burden (TMB), the median TMB was 5.82 per Mb and 5.49 per Mb in primary tumor group and LNM group respectively. Median nonsilent TMB was 1.84 per Mb and 1.80 per Mb in primary tumor group and LNM group respectively ( Figure S3). Significant higher TMB (24.65 per MB) was found in the LNM lesion of GBC-3. There were no significant difference between the two groups.

Mutational signatures
Annotated with COSMIC V2, major signatures are similar in both groups. Signature 3, 1, 6, 12, 11, 22, 23, and 7 were detected in primary group ( Figure S4). Signature 1 is age related represent a large numbers of C > T mutations. Signature 3 is strongly associated with germline and somatic BRCA1 and BRCA2 mutations in breast, pancreatic, and ovarian cancers. In pancreatic cancer, responders to  [27]. These patients did not exhibit BRCA1 and BRCA2 mutations, although BRCA mutations have been reported in GBC patients [28]. Additionally, signature 9 was also enriched in LNM group. This signature is characterized by a pattern of mutations that has been attributed to polymerase η, which is implicated with the activity of Activation-induced cytidine deaminase (AID) during somatic hypermutation. Signature 12, 22 and 23 were also enriched in both groups (20% and 28%) implies there were some extent of similarity between GBC and liver cancer.

Phylogenetic tree
Based on SNVs, we identified two types of phylogenetic tree with Treeomics, according to previous classification of driver mutations as shared or private. The length of trunk and branch was corresponding to nonsynonymous somatic mutation numbers (Fig. 2). All known driver mutations were mapped on the tree. Since the VAF (Variant allele frequencies, calculated by total reads at the position carrying the variant/read depth at the position) of TP53 in GBC-2 was very low (1/36), Treeomics judged this mutation was unreliable. We then re-checked this mutation and manually included into analysis, since this mutation is shared in both primary tumor and LNM, and mutate at same locus (17:7577539, G->A). Linear or parallel progression models of metastasis were both identified in other tumors [29,30]. The trees of our patients inferred both pattern of metastasis were existed. GBC-1 and GBC-2 inferred a linear pattern and GBC-5 inferred a parallel patten.

CNV
We then analyzed copy number variations. Compared to reported CNVs [10,31], there were more losses than gains. In total, there were 3 gains and 11 losses in the samples (Fig. 3). Losses of 8p23.3, 9q21.11, 14q32.13, 16q23.1 were detected in several samples. Some CNVs were detected in both primary tumor and LNM. In GBC-1, gain of 17q21.1, and losses of 9q21.11 and 8p23.3 were detected. In GBC-2, Apart from the reported CNVs, we newly identified large segments alterations in both primary tumor and LNM in 2 patients. In GBC-5, the gain proportion was 0.99 and 0.75 in chromosome 7, and 0.69 and 0.65 in chromosome 20 for the primary tumor and LNM, respectively. In GBC-3, the loss proportion was 0.75 both in primary tumor and LNM in chromosome 10. Thus, whole arm gain and loss could be inferred in these two patients.
We further calculated CNV burden by dividing the copy numbers of loss or gain by all copy numbers founded with GATK ( Figure S5). Median CNV loss burden was 0.013 and 0.043 in primary tumor and LNM respectively. Median CNV gain burden was 0.0009 and 0.0671 in primary tumor and LNM respectively. Though, there was no statistical significance in general due to limited patient number, higher CNV burden, either gain or loss was observed in LNM than their primary tumor in 3 patients.

Discussion
Despite recent improvement of chemotherapy, molecular targeted therapy and aggressive surgical resection with advances in perioperative care has markedly improved outcomes, the prognosis of GBC is still very poor. Molecular profiling of the primary tumor has significantly improved our understanding of this disease in terms of oncogenesis, prevention and treatment [6,[32][33][34][35][36]. However, the metastasis comprises a major obstacle to treatment and long-term survival. It is believed that a better understanding of mutational landscape and evolutionary pattern between primary tumor and metastasis would have profound clinical implications, as such findings could ultimately result in novel strategies to prevent and control metastases [13,14,[37][38][39], especially in the era of personalized medicine [6,7,40].
Although, a variety of driver mutations have been reported in GBC [41], TP53 is the most frequent one in current study (20%). The mutation patterns of LNM is homogenous with their primary tumors, which is consistent with studies on other tumors, including breast, colorectal, and hepatocellular cancer [17,38,42,43]. A recent analysis of genetic heterogeneity in untreated cancers also found 100% of the driver mutations were homogeneous among metastases from the same primary tumor, indicating the high homogeneity of functional driver mutations between paired primary and metastatic lesions [37]. These findings together reached an opinion of using the sequencing data of primary tumors to guide the treatment of metastatic lesions. However, this is not always the rule. There were 4 mutations only detected in either primary or LNM. In the view of personalized therapeutics, the different mutations may imply particular clinical significance.
TMB is increasingly accepted as a biomarker for immunotherapy. High TMB is correlated with good response. Although the definition of "TMB-high" is not consistent among literature, the cutoff value is usually set to 10 [44] or 20 [45][46][47]. In this study, TMB is generally less than 10 both in primary tumor and LNM (5.92/5.48 mutations per MB). This is consist with previous studies [9,10]. TMB is generally considered as a result of DNA damage repair deficiency, and correlates with mismatch repair genes and microsatellite instability (MSI) [48]. The only TMB high sample (LNM in GBC-3, TMB = 24.65 per MB) showed no such genetic feature. Further analysis of mutation signatures showed signature 9 may contribute to this. Although in most patients there is an opinion that a single biopsy of a primary tumor captures the information necessary for therapeutic choices about the treatment of extant or presumptive metastases [37], in terms of TMB, metastasis and primary tumor may not represent mutually. Differences of TMB were also detected in other tumors. Pulmonary metastasis of osteosarcoma [49] and peritoneal metastasis of colorectal cancer [50] indicated higher mutation burden than their primary tumor. If only the primary tumor was examined, the TMB of the metastasis would be underestimated, leading to a misjudgment of the choice of immunotherapy. In a practical point of view, when immunotherapy is to be considered, at least the significant lesion should be examined. Since CNV is an important indicator of chromosomal instability (CIN) and previous publications suggested that CIN is associated with metastasis [51][52][53]. Advanced tumors typically contain frequent gains and losses of focal genomic regions, chromosome-arms, and whole chromosomes [37]. Our analysis of CNV in the two groups showed consistent alterations in primary tumor and LNM. Even chromosome arm aneuploidies were consistent between primary tumor and LNM. A study of breast cancer also showed ERBB2/HER2 amplification and/or mutation frequency was consistent between primary tumor and metastases [54]. Our pilot analysis of CNV burden exhibited difference between primary tumor and LNM, though there is no related study. The clinical significance is to be tested.
Finally we built the phylogenetic tree to infer the evolution of the metastasis. Our patients showed both linear and parallel progression of metastasis. Studies of other tumor also inferred both models of metastasis could co-exist [29,30]. Due to the small sample size, we can not conclude which model is dominant.
There are several limitations of the study. Firstly, only 5 patients were included in this study, and this is surely underpower our analysis. Collecting larger scale cases is warranted to further validate our current finding or may generate new findings. Secondly, only metastatic lymph node was included, other metastases e.g. liver or lung metastasis were not included due to scarce of samples. Since patients with such metastasis were usually not surgical candidates. Thirdly, all the 5 patients were at late stage (stage III or stage IV). Our results of homogeneous CNV and driver mutation between primary tumor and LNM imply early metastasis with the evidences of. Recent evidence suggested metastasis could occur at early stage [39,55] and even 7% T1a tumor may have concomitant LNM [56]. Both indicate further analyses to generate robust data for this hypothesis.
Taken together, our study with WES showed high consistence of genetic landscape between primary tumor and LNM in GBC. However, heterogenicity still exist between primary tumor and LNM in particular patients in term of driver mutation, TMB and CNV burden. These differences may have clinical significances.
Author contributions XFX and BQF performed WES, critical data analyses, interpretation of results and wrote the manuscript. XFX performed DNA extraction, analyzed the data and carried out statistical analyses. BQF contributed samples and analyzed clinical data. BQF and XHW designed, implemented the study, wrote and critically reviewed the manuscript. This is to confirm that all authors read and approved the final manuscript.
Funding There is no funding for current work.
Data availability All raw data is available, and is ready to share.
Code availability All the software used in this manuscript is available online, and there is no custom code.

Conflict of interest
There is no conflicts of interest for all authors.
Ethical approval Since this is an retrospective diagnostic study, no previous agreement from our Ethical Committee was obtained.
Consent to participate Not applicable.

Consent for publication Not applicable.
Research involving human and animal participants Additional declarations for articles in life science journals that report the results of studies involving humans and/or animals.