Clinical samples
Primary tumor tissues and lymph node metastasis tissues were collected for sequencing during surgical resection from 15 patients diagnosed with PTC at Shanghai Jiao Tong University Affiliated Sixth People’s Hospital. Unilateral central cervical lymph node dissection was performed in all of the cases, and selective lateral lymph node dissection was performed for patients with clinical lateral lymph node metastasis.
In order to examine highly invasive papillary thyroid carcinoma as thoroughly as possible, in this study, specimens obtained from the necks of patients with more than 3 lymph node metastases confirmed by pathology were selected for further DNA extraction and sequencing.
All of the samples were collected and confirmed by postoperative pathology. This study was approved by the Ethics Committee of Shanghai Jiao Tong University Affiliated Sixth People’s Hospital. Informed consent was obtained.
DNA extraction, target genomic region capture, and sequencing
Genomic DNA was extracted using QIAamp DNA Mini Kit (Cat.51306, Qiagen). DNA quantification and integrity were assessed using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Inc., Wilmington, DE, US) and 1% agarose electrophoresis, respectively. Human genomic DNA samples were captured on an Agilent SureSelect whole exome library in accordance with the manufacturer’s protocol. Briefly, approximately 130 μl (3 μg) genomic DNA was sheared to 150 to 220 bp small fragments using a sonicator (Covaris, Inc., Woburn, MA, US). The sheared deoxyribonucleic acid (DNA) was purified and treated with reagents supplied with the kit according to the protocol. Adapters from Agilent were ligated onto the polished ends and the libraries were amplified by polymerase chain reaction (PCR). The amplified libraries were hybridized with the custom probes. The DNA fragments that were bound with the probes were washed and eluted with the buffer provided in the kit. Then these libraries were sequenced on the Illumina sequencing platform (HiSeq X-10, Illumina, Inc., San Diego, CA, US) and 150 bp paired-end reads were generated.
Bioinformatics pipeline for WES data filtering
Whole exome sequencing and analysis were conducted by OE Biotech Co., Ltd. (Shanghai, China). The raw data were compiled in fastq format. Further filtering of raw data for quality is necessary to produce high-quality clean reads that can be used for subsequent analysis. The preprocessing software is fastp (Version: 0.19.5), and the quality filtering standard is carried out in 4 steps: (1) removal of adapter sequences; (2) removal of reads with 5 or more N (non-AGCT) bases; (3) a sliding window with a size of 4 bases with an average base quality value is less than 20, then cut off; and (4) after these filtering steps, the reads shorter than 75 bp or with an average base quality value less than 15 were removed. Clean reads were aligned to the reference genome (hg19) utilizing the BWA (Burrows-Wheeler aligner, version 0.7.12). The mapped reads were sorted and indexed by using Samtools (Version 1.4), and the PCR repeats were removed using Picard (Version 4.1.0.0).
GATK version 4.1.0.0 was then used for recalibration of the base quality score and for single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) calling. Many annotation databases, such as Refseq, 1000 Genomes, the Catalogue of Somatic Mutations in Cancer (COSMIC), and OMIM, were used to SNP&INDEL annotation using ANNOVAR.
Somatic mutation screening
The screening principles for total mutation sites are as follows: 1) removal of at least one mutation in the four databases of the 1000 Genome Project, ESP6500 database, Exac data, and gnomAD data with a frequency higher than 1%; 2) retention of the variation of the exon region or the splice site region and removal of the variation of the synonymous mutation; 3) retention DP (supporting the total number of reads of the site) a mutation greater than or equal to 10; and 4) removal of the mutation sites that occur in more than 90% of the samples.
Mutation validation by Sanger sequencing
Polymerase chain reaction (PCR) was used to amplify target mutation sites in the genomic DNA samples. Each PCR was performed in volumes of 25 µL containing 12.5 μl 2xTaq Master Mix, 2 μl of forward and reverse primers (10 μM), and 20 ng of template DNA (PCR reagents were purchased from Vazyme). PCRs were carried out in ABI 9700. The amplification profile involved a denaturation step with Taq DNA polymerase for 5 min at 95°C, followed by a 35‐cycle PCR consisting of denaturation for 15 s at 95°C, annealing for 20 s at 60°C to 50°C, and elongation for 60 s at 72°C. The reactions were ended with a final extension step at 72°C for 7 min. The PCR products were separated on a 1% agarose gel and then subjected to Sanger sequencing.
Statistical analysis
Statistical analysis was performed using IBM SPSS Statistics 22.0 (SPSS, Chicago, IL, US). Demographic and clinicopathologic characteristics were compared using Fisher’s exact test for categorical variables, and a Student’s t test was used for continuous variables. A two-tailed P < 0.05 was considered to be statistically significant.