Preparing exome libraries for NGS
We enrolled 14 patients with tumors in the PBTs invading the bile duct and requiring endoscopic intervention for biliary drainage. After biliary drainage and bile duct stenting, all patients subsequently underwent surgical resections and were pathologically diagnosed with cholangiocarcinoma, gallbladder carcinoma, ampullary carcinoma, PC, or intraductal papillary mucinous carcinoma in seven, one, three, one, and one patients, respectively (Table 1). This study was approved by the Institutional Review Board of our institution and the Ethics Committee of the Tohoku University Graduate School of Medicine (#2023-1-487) and conducted in accordance with the principles of the Declaration of Helsinki. The informed consents were obtained from all patients.
Genomic DNAs from the resected samples was obtained as previously described [38]. Briefly, genomic DNA was extracted from tumor and non-tumor portions of each patient’s FFPE tissue using a QIAGEN kit according to the manufacturer’s protocols. Genomic DNAs was quantified using a Nanodrop (Waltham, MA, USA). The WGS from library construction to variant calls was outsourced to Rhelixa (Tokyo, Japan). The extracted genomic DNAs were subjected to the library construction.
An Agilent SureSelect Human All Exon V6 (58 MB) kit (Agilent Technologies, Santa Clara, CA, USA) was used for DNA target enrichment. Each sample was sequenced using the 150 bp paired-end sequencing method on an Illumina NovaSeq 6000 platform (Illumina, Inc., La Jolla, CA, USA). The quality of raw paired-end sequence reads was assessed using FastQC (version 0.11.7; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Bioinformatics of exome sequencing: mapping, variant call, and filtration
Low-quality (< 20) bases and adapter sequences were trimmed using Trimmomatic software (version 0.38) with the following parameters: ILLUMINACLIP: path/to/adapter. fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:36 [39]. The trimmed reads were aligned to the indexed reference genome using bwa-mem (v0.7.17-r1188) (arXive. 2013: 1303.3997v1301 q-bio.GN). The generated SAM files were converted into BAM files and sorted using Picard SortSam (version 2.18.11; http://broadinstitute.github.io/picard/). The number of duplicates was calculated using the Picard Mark Duplicate. After removing duplicates, reads were realigned around known indels provided by the Genome Analysis Toolkit (GATK) group [40]. A base quality recalibration table was generated using the GATK BaseRecalibrator (Version 4.0.8.1) with known site files: dbSNP138 from the GATK resource bundle. The resultant base recalibration table was applied using GATK ApplyBQSR (Version 4.0.8.1). Somatic variants were called using GATK MuTect2 in either the tumor-only mode in 10 cases or the normal-tumor pair mode in 3 cases. Variants were filtered using the GATK FilterMutectCalls.
Bioinformatics of the exome sequencing: annotation
The identified variants were annotated using SnpEff (version 4.3t) and ANNOVAR [41] to identify the putative effects on protein translation and high-impact mutations.
The clinical significance of the genetic variants was determined by comparing with ClinVar (https://www.ncbi.nlm.nih.gov/clinvar). We annotated SNP-trait associations from gwasCatalog and 1000 Genome Project data, including African, American, East Asian, European, and South Asian individuals. We used domain information from Interpro and prediction scores from dbNSFP, including CADD, FATHMM, LRT, MetaSVM, MutationAssessor, MutationTaster, Provean, Polyphen-2, Sift, and PhastCons scores to predict the functional consequences of the observed amino acid substitutions.
To subtract potential germline variants from the tumor-only data, we employed the 54 KJPN dataset [21] as a normal panel. The 54 KJPN data were downloaded from jMORP (https://jmorp.megabank.tohoku.ac.jp/downloads: last visit 4th December 2023), and the bcftools norm –m–option was used to separate multiple allelic sites [42]. We then manipulated the dataset with bcftools query options and the ANNOVAR convert2annover.pl command. In the case of subtraction of variants using 54KJPN, we considered that some of the deleterious variants that drive carcinogenesis may be included in 54KJPN. For example, four pathogenic/likely pathogenic variants of TP53 (p.D281N, p.R273C, p.R248Q, and p.G245S) and frequent mutations in the TERT promoter (c.-124C>T) are present in 54KJPN. Therefore, we decided to include the variants found in 54KJPN with minor allele frequencies (MAFs) lower than 0.0002, which is the criterion used in the development of AlphaMissense as weak labelling for benign variants [43]. Furthermore, we annotated using ClinVer (version a 31st December 2022), the cosmic database ver. 91 [44], and the databases were prepared according to the descriptions on the ANNOVAR website (https://annovar.openbioinformatics.org/en/latest/: last visit, 4th December 2023). To further remove potential germline variants and platform-related biases [45], we generated another set of normal from three non-cancerous exome data in our cohort (#2, #6, and #10). After merging the three non-cancerous exome datasets with bcftools, we created a normal panel database using the convert2annovar command in the Annover package.
To calculate the tumor mutation burden, we removed annotated variants using the following databases: 54KJPN with MAF > 0.0002, intronic variants, benign variants in either ClinVar or AlphaMissense, and all synonymous variants divided by the target size of the exome in megabases.
Bioinformatics: Delineating mutational signature analysis for cholangiocarcinoma exome data
We also performed a deconstructSig analysis to trace the mutational signatures detected by Alexandrov et al [46]. To delineate the mutational signatures, we used deconstructSigs [47]. We removed potential non-cancerous variants by subtracting 54KJPN and three non-cancerous exome datasets.
Sample collection of cfDNA from bile juice and plasma
Endoscopic nasobiliary drainage (ENBD) tubes were placed endoscopically for biliary drainage or biopsy of cancerous lesions in the bile duct. We collected 5 mL bile juice from the ENBD tubes. Initially, the bile juice was centrifuged at 1600 g for 15 minutes at room temperature and the supernatants were carefully picked up in tube and stored at -80 ℃. In nine of all patients, we collected 5 mL plasma in the cfDNA collection tube (PAXgene Blood ccfDNA Tubes, Qiagen) and stored at -80 ℃ for analysis. Cell-free DNA was extracted using a QIAamp Circulatin Nucleic Acid Kit (Qiagen) according to the manufacturer’s instructions.
Library preparation, NGS, and bioinformatic analyses of the bile cell-free DNA
Cell-free DNA extracted from plasma or bile juice was subjected to LiquidPlex library construction according to the manufacturer’s instructions. Plasma from three non-cancer patients was used as a control to calculate the background noise. For library construction, 8.2–91.2 ng of Plasma cfDNA, 79.7–300 ng of bile cfDNA was subjected to end-repair, ligation, and addition of molecular barcodes. The cfDNA libraries were amplified using two separate PCR steps, and the index sequences were incorporated into the DNA fragments in the second PCR. The NGS libraries were quantified using the KAPA Universal Library Quantification Kit (KK4824; Roche, Basel, Switzerland) according to the manufacturer’s instructions. MiSeq has also been used for library quantification with an Illumina platform sequencer [48]. Sequencing data acquisition for mutation analysis was outsourced to Rhelixa, Inc. (Tokyo, Japan) using a NovaSeq 6000. A 150 bp paired-end protocol was used to obtain cfDNA sequence data. Sequence read mapping and variant calls were performed using the Archer Analysis platform, a web-based bioinformatics service for LiquidPlex, and other sequencing kits with default options. The GRCh37 coordinates served as the reference genome for mapping. After the variant call, annotations of the called variants were performed using ANNOVAR with a custom-made 54 KJPN variant dataset and COSMIC 94.
Droplet digital PCR of the cfDNAs
A ddPCR Mutation Assay (Bio-Rad, Hercules, CA, USA) was used to select commercially available probes for somatic mutations identified in the cfDNA samples using ddPCR. Two probes were selected to verify whether mutations detected by whole-exome sequencing in the excised sample DNA could be detected in the bile of the same patient. (for KRAS p.G12V: Thermo Fisher Scientific Inc Catalog #: A44177, Assay ID: Hs000000050_rm; for NRAS Q61R: Bio-Rad Laboratories, Inc Catalog #: 10049047, UniqueAssayID: dHsaMDS882187944), and ddPCR was performed using the QX200 AutoDG Droplet Digital PCR IVD system (Bio-Rad). Subsequently, 9.9 μL of the eluted cfDNA solution (average DNA content = 23.3 ng) was used for each ddPCR in a quadlicate manner for each assay points. Genomic DNA was extracted as previously described 15. The total reaction volume was 22 μL (20,000 drops were generated). The PCR conditions were as follows: initial denaturation at 95°C for 10 min; 40 cycle steps at 94°C for 30 s and 55°C or 60°C for 1 min; and final denaturation at 98°C for 10 min for enzyme deactivation. The probe fluorophores were FAM/HEX or VIC, and wild-type and mutant copy numbers were measured. All ddPCR reactions were performed at least twice. Allele frequencies were calculated using Quanta Software (Bio-Rad) according to the manufacturer’s instructions. Because the plasma cfDNAs were scarcely obtained and we used up to the cfDNA for construction of libraries of multi-gene panels, we could not test the plasma cfDNA with ddPCR. The results of plasma multi-gene panel liquid biopsies will be described elsewhere.
Sanger sequencing of cfDNA in the bile
We performed Sanger sequencing of the PCR amplicons at exon 5 of the TP53 gene for three cases (H21_10688_26, p.H179L; H22_161_19, p.P151R; and H23_1363_21, p.C135F) to confirm whether bile cfDNA included tumor-derived cfDNA. The primers for amplicon sequencing were as follows: TP53hg19_chr17_7578593F, 5′-GCCCTGACTTTCAACTCTGTCTC-3′ and TP53hg19_chr17_7578120R, 5′-GGCCACTGACAACCACCCTTAACC-3′ for exons 5 and 6. Sequencing analyses were performed by Fillgen (Nagoya, Japan).