Gene mutations related to cell adhesion found in lymph node metastasis from papillary thyroid carcinoma by whole exome sequencing


 Background: Lymph node metastasis is the most common metastasis associated with papillary thyroid carcinoma (PTC). There are few data on gene mutations associated with lymph node metastasis from PTC. At present, the mechanism underlying lymph node metastasis of PTC is still poorly understood.Methods: Whole exome sequencing (WES) was performed for PTC of 10 primary tumor tissues and 10 lymph node metastasis tissues (non-match) for gene mutations. Gene ontology (GO) analysis was used to study biological functions and processes related to lymph node metastasis. Sanger sequencing was used to verify these genes in the lymph node metastasis group.Results: A total of 2772 gene mutations were identified in the primary tumor group. A total of 3033 gene mutations were enriched in the lymph node metastasis group. There were 50 gene mutations that were found to be present at higher frequency in the lymph node metastasis group than in the primary tumor group. Among 50 gene mutations, nonsynonymous mutations of 5 genes (LPP, MSLN, MAEA, VCAN, and MUC16) involved in cell adhesion were present. Three nonsynonymous mutations were also found in RP1L1, QSER1, and TLR4. These high-frequency genetic mutations that showed significant enrichment were targeted sequencing and validated by Sanger sequencing.Conclusion: Thyroid cancer and corresponding metastatic lymph nodes contain a large number of SNPs and 50 SNP genes associated with lymph node metastasis. Genes related to cell adhesion (LPP, MSLN, MAEA, VCAN, and MUC16) may play an important role in the process of lymph node metastasis. The contribution of BRAF in lymph node metastasis may not be high.


Abstract
Background: Lymph node metastasis is the most common metastasis associated with papillary thyroid carcinoma (PTC). There are few data on gene mutations associated with lymph node metastasis from PTC. At present, the mechanism underlying lymph node metastasis of PTC is still poorly understood.
Methods: Whole exome sequencing (WES) was performed for PTC of 10 primary tumor tissues and 10 lymph node metastasis tissues (non-match) for gene mutations. Gene ontology (GO) analysis was used to study biological functions and processes related to lymph node metastasis. Sanger sequencing was used to verify these genes in the lymph node metastasis group.
Results: A total of 2772 gene mutations were identified in the primary tumor group. A total of 3033 gene mutations were enriched in the lymph node metastasis group. There were 50 gene mutations that were found to be present at higher frequency in the lymph node metastasis group than in the primary tumor group. Among 50 gene mutations, nonsynonymous mutations of 5 genes (LPP, MSLN, MAEA, VCAN, and MUC16) involved in cell adhesion were present. Three nonsynonymous mutations were also found in RP1L1, QSER1, and TLR4. These high-frequency genetic mutations that showed significant enrichment were targeted sequencing and validated by Sanger sequencing.
Conclusion: Thyroid cancer and corresponding metastatic lymph nodes contain a large number of SNPs and 50 SNP genes associated with lymph node metastasis. Genes related to cell adhesion (LPP, MSLN, MAEA, VCAN, and MUC16) may play an important role in the process of lymph node metastasis. The contribution of BRAF in lymph node metastasis may not be high.

Background 3
The incidence of thyroid cancer has increased rapidly in many countries during recent decades [1]. This change is related to an increase in papillary thyroid cancer (PTC), which is the most common histological subtype of thyroid cancer accounting for approximately 80% of cases [2]. Most PTCs have a slow clinical course and favorable survival rates.
However, PTCs combined with lymph node metastasis may have a higher risk of recurrence and persistent decrease in quality of life [3,4]. Currently, the incidence of lymph node metastasis of PTC is reported to be about 10%-30% in different works [5,6].
Until now, there has been no reliable method of predicting lymph node metastasis in PTC patients. The mechanism underlying lymph node metastasis of PTC remains poorly understood.
Whole exome sequencing (WES) has become an important technical tool in cancer research in recent years [7]. Exon region of a gene encoding a protein, accounting for the whole 1% of the genome, is more than 80% single nucleotide polymorphism (SNP) and mutation. In recent years, the mutation sites like BRAF and RAS associated with papillary thyroid carcinoma have been reported [8,9]. The V600E somatic mutation of the BRAF gene is the most common genetic alteration in PTC patients and it plays an important role in the tumorigenesis of various thyroid tumors [10]. For example, Xu et al. [11] demonstrated that BRAF mutation occurred in 21 of 56 PTCs (38%). Furthermore, studies have shown a strong correlation between BRAF-V600E and poor clinical pathology of PTC, including invasive pathological features, increased recurrence, and lymph node metastasis [10,12,13]. The frequency of RAS mutations was greater in the follicular variant PTC [14].
The Cancer Genome Atlas (TCGA) has released a sequencing result for thyroid papillary carcinoma in 2014 [15]. The sequencing results involved more than 500 large-scale sequencing studies of patients not only demonstrated the existence of BRAF and RAS mutations in papillary thyroid carcinoma, but the results also revealed several new mutations, including eukaryotic translation initiation factor 1A, X-1inked (EIFlAX), protein phosphatase 1D (PPMID), and checkpoint kinase 2 (CHEK2). The study included a large number of patients, and the data were plentiful and complete, which has great significance for the study of papillary thyroid carcinoma. However, TCGA sequencing specimens do not cover patients with lymph node metastasis.
Currently, data covering the analysis of gene mutations with lymph node metastases for PTC are limited. If relevant SNPs or mutation sites that may be related to lymph node metastasis of papillary thyroid carcinoma can be identified by whole exome sequencing, the mechanism underlying metastasis can be explained. These molecular features can also be used to predict whether a PTC has lymph node metastasis in clinical settings.

Clinical samples
Primary tumor tissues and lymph node metastasis tissues were collected for sequencing during surgical resection from 15 patients diagnosed with PTC at Shanghai Jiao Tong University Affiliated Sixth People's Hospital. Unilateral central cervical lymph node dissection was performed in all of the cases, and selective lateral lymph node dissection was performed for patients with clinical lateral lymph node metastasis.
In order to examine highly invasive papillary thyroid carcinoma as thoroughly as possible, in this study, specimens obtained from the necks of patients with more than 3 lymph node metastases confirmed by pathology were selected for further DNA extraction and sequencing.
All of the samples were collected and confirmed by postoperative pathology. This study was approved by the Ethics Committee of Shanghai Jiao Tong University Affiliated Sixth People's Hospital. Informed consent was obtained.

Bioinformatics pipeline for WES data filtering
Whole exome sequencing and analysis were conducted by OE Biotech Co., Ltd. (Shanghai, China). The raw data were compiled in fastq format. Further filtering of raw data for quality is necessary to produce high-quality clean reads that can be used for subsequent analysis. The preprocessing software is fastp (Version: 0.19.5), and the quality filtering standard is carried out in 4 steps: (1) removal of adapter sequences; (2) removal of reads with 5 or more N (non-AGCT) bases; (3) a sliding window with a size of 4 bases with an average base quality value is less than 20, then cut off; and (4) after these filtering steps, the reads shorter than 75 bp or with an average base quality value less than 15 were removed. Clean reads were aligned to the reference genome (hg19) utilizing the BWA

Somatic mutation screening
The screening principles for total mutation sites are as follows: 1) removal of at least one mutation in the four databases of the 1000 Genome Project, ESP6500 database, Exac data, and gnomAD data with a frequency higher than 1%; 2) retention of the variation of the exon region or the splice site region and removal of the variation of the synonymous mutation; 3) retention DP (supporting the total number of reads of the site) a mutation greater than or equal to 10; and 4) removal of the mutation sites that occur in more than 90% of the samples.

Mutation validation by Sanger sequencing
Polymerase chain reaction (PCR) was used to amplify target mutation sites in the genomic DNA samples. Each PCR was performed in volumes of 25 µL containing 12.5 μl 2xTaq Master Mix, 2 μl of forward and reverse primers (10 μM), and 20 ng of template DNA (PCR reagents were purchased from Vazyme). PCRs were carried out in ABI 9700. The amplification profile involved a denaturation step with Taq DNA polymerase for 5 min at 95°C, followed by a 35-cycle PCR consisting of denaturation for 15 s at 95°C, annealing for 20 s at 60°C to 50°C, and elongation for 60 s at 72°C. The reactions were ended with a final extension step at 72°C for 7 min. The PCR products were separated on a 1% agarose gel and then subjected to Sanger sequencing.

Statistical analysis
Statistical analysis was performed using IBM SPSS Statistics 22.0 (SPSS, Chicago, IL, US).
Demographic and clinicopathologic characteristics were compared using Fisher's exact test for categorical variables, and a Student's t test was used for continuous variables. A two-tailed P < 0.05 was considered to be statistically significant.

Clinical and histopathological characteristics of PTC patients
We collected 10 PTC samples and 10 lymph node metastasis samples. In order to determine their genomic profile, the fresh-frozen primary tumor tissues (n = 10) and lymph node metastasis tissues (n=10) were evaluated with WES. Clinical and histopathological characteristics are shown in Table 1. Median age was 44 (range 25-57) and 48 (range 25-69) years for patients with primary tumors and patients with lymph node metastasis, respectively (P > 0.05). Female/male ratios were 6/4 in the primary tumor group and 8/2 in the lymph node metastasis group (P > 0.05).

Whole exome sequencing
A total of 319G of high-quality whole exon sequencing data were generated. About 99.9% of the sequence reads were mapped to the human reference genome and quantified according to their genomic location. The coverage of target region was more than 99.7% in each sample, and the average sequencing depth of target region was over 139X in each sample. The detailed information of exome capture in each sample is summarized in Table S1. In the mutation type spectrum, the frequency of transition was significantly higher than the transversion, mainly focusing on C>T/T>C and A>G/G>A, which is shown in Fig. 1c.

Identification of gene mutations in primary tumor group
WES was performed with all of the samples (10 primary tumors and 10 lymph node metastasis tissues). Due to the difficulty in collecting clinical samples, the lateral lymph node and the primary tumor samples were collected from different patients. In order to rule out false positives, we set the database as a normal control for the primary tumor group. Gene mutations in different genes were found by excluding germline mutations by comparing the databases of 1000 Genome Project, ESP6500 data, Exac data, and gnomAD data. Among the final variant genes after screening, 10 primary tumors showed 70% mutation frequency of BRAFV600E (7/10), which was higher than the Cancer Genome Atlas Research Network's report in PTCs (59.7% BRAFV600E). We also found many novel mutations in different genes occurring at high frequency (Fig. 2).

Identification of high-frequency lymph node metastasis gene mutations compared to primary tumor
In order to identify variant genes related to metastasis, we selected four databases as normal controls for the metastatic group to rule out false positives that were merely similar to the primary tumor group. The screening results produced the average ratio of the mutation site in the metastasis group and primary group are obtained. The average ratio in the metastasis group was high (high frequency), while the average ratio in the primary group was low (low frequency) to show the results of the mutation site that may lead to metastasis. We obtained 50 high frequency and potentially pathogenic SNPs related to metastasis, as shown in Fig. 3.

GO analysis of mutations in high frequency metastasis mutation genes
GO analysis of biological processes, cellular components, and molecular function for 50 high-frequency metastasis mutation genes indicated alterations in multiple additional biological functions (Fig. 4). Specifically, 5 genes related to cell adhesion (LPP, MSLN, MAEA, VCAN, and MUC16) were found to be mutated and significantly enriched.
Meanwhile, three genes related to tumor metastasis (RP1L1, QSER1, TLR4) were also identified. We analyzed 8 mutation sites of these genes with respect to gene mutation frequency and number (Fig. 5a and 5b). This altered their ability to function severely.

Driver gene mutations verified by Sanger sequencing
In 50 high frequency mutation genes related to metastasis, 8 of the high frequency mutational genes, namely, RP1L1, QSER1, TLR4, LPP, MSLN, MAEA, VCAN, and MUC16, were verified by Sanger sequencing (Fig. 6). The results showed the Sanger sequencing results to be basically consistent with the results of WES sequencing, which further confirmed the mutation of these genes. Among them, the mutations in LPP, MSLN, MAEA, VCAN, and MUC16, which are cell adhesion genes, were found in multiple lymph node metastasis specimens.

Discussion
Lymph node metastasis is a common mode of metastasis of papillary thyroid carcinoma.
We lack a method of predicting lymph node metastasis and understanding the underlying mechanisms. The TAGA database contains a number of PTC-related variant data, but it does not include data on lymph node metastasis. By understanding the molecular biological characteristics of lymph node metastasis, we can better its course and understand its underlying mechanisms. Here, we provide data on the genomic profile of key genes associated with lymph node metastasis.
Next-generation sequencing was used to assess the prevalence of a subset of lymph node metastatic mutations. WES was used for PTC of primary tumor and lymph node metastasis samples. In the primary tumor group, 2772 mutated genes were identified. Some known gene mutations were identified in PTCs, such as BRAF (7/10), ARID1B (3/10), and APC (1/10). These findings are consistent with previous work describing driver mutations of PTC in TCGA. Here, 3033 gene mutations were enriched in the lymph node metastasis group. To further screen for genes associated with lymph node metastasis, we compared genetic mutations in the primary tumor and lymph node metastasis groups. We obtained 50 high-frequency mutant genes in the lymph node group, and these genes were found to have low frequency of mutations or no mutations at all in the primary tumor group. Some novel gene mutations were found in lymph node metastasis samples such as RP1L1, QSER1, TLR4, LPP, MSLN, MAEA, VCAN, and MUC16. However, analysis of sequencing results in the lymph node metastasis group suggests that BRAF mutations may not play a major role in lymph node metastasis. These data suggest that the mechanism underlying lymph node metastasis may be different from pathways for genetic aberrations in PTC.
Previous studies have reported that BRAF mutations are associated with older age, high cell variability, extrathyroid expansion, positive surgical margins, lymph node metastasis, and stage IV disease [16,17]. Here, we found that BRAFV600E is present in seven tumor tissues (70%), all of which were found in the primary tumor group, suggesting that BRAF mutations may be unrelated to invasion or metastasis. Because our work is limited to a small number of samples, we need to verify our preliminary data.
As cancer progresses and metastasizes, it usually acquires new mutations. Through key pathways and biological function analysis, the cores involved in non-synonymous mutation genes (LPP, MSLN, MAEA, VCAN, and MUC16) associated with cell adhesion were identified in the metastatic group. Because the occurrence and development of tumors is dynamic and related to adhesions, especially tumor cell migration and metastasis, it may be associated with the metastasis of PTC [18]. For example, LPP, a gene related to metastasis, has a promoter region that can be recruited together with many genes (such as MMPs) in breast cancer as a transcriptional coactivator of PEA3 to regulate gene expression [19].
Targeted sequencing of the 8 high-frequency mutated genes identified by direct Sanger sequencing also verified enrichment. These 8 genes may have clinical applications for PTC and may be used to distinguish patients with metastatic tendencies from patients with inert tumors by using a genome classifier (GC) similar to ThyroSeq v3 [20]. However, large samples require further research to confirm that we can use preliminary data to predict PTC lymph node metastasis and allow for customized treatment.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analyzed during this study are included in this published article.

Competing interests
The authors declare that they have no competing interests.

Funding
This project was supported by the National Natural Science Foundation of China (No.

Authors' contributions
XL, MG conceptualized the study and designed experiments. PX, ZY, SZ and XX performed experiments. FZ analyzed the data. PX and ZY wrote the manuscript. All authors read and approved the final manuscript.