Structural Variation Detection of Adolescent Thyroid Cancer Using Optical Mapping

Structural variation (SV) is a fundamental genetic cause for cancer, with demonstrated correlation to disease progression and treatment response. Traditional sequencing method cannot provide full genomic landscape especially large-scale and complex structural variation. To overcome these limitations, we adopted a combined sequencing approaches, including optical mapping, single molecular sequencing and short reads shotgun sequencing, to evaluate the SV in thyroid cancer. Different numbers, length and types of structural variation, with genes affected by SV were scrutinized. Integrating these results could showed comprehensive scenario for thyroid cancer in a genomic view. We demonstrate that integrated approaches could provide a powerful tool for capturing a higher level of genomic SV, creating new interpretation of sequencing data of particular relevance to human cancer.


Introduction
Genomic variation, ranging from point mutation, deletion, duplication, insertion, inversion, translocation and complex structural variation (SV) [1], associated with poor prognosis in tumor [2]. Speci c genomic SV conducted functional gene alteration, which affected essential signaling pathway of tumor biology of initial differential and progression [3,4]. Traditional tumor genomic research preferred second/next generation sequence (NGS) technology, only detecting small sequence genomic variation including gene mutation, copy number alteration, methylation, and transcription using short reads. However, accompany with tumor progression, large genomic SV or rearrangements requires superior genomic detection methods to better comprehensively and accurately identify the speci c changes in the tumor genomics, and help us better understand the characteristics of tumor, but there are still challenges.
We dedicated in researching the characteristic of thyroid cancer genome. Many studies have revealed the gene variation, biological signaling pathway, and relationship between molecular classi cation and clinical traits in papillary thyroid carcinoma (PTC). On the contract, large genomic SV and adolescent PTC studies were not clear, as the adolescent genome variation were more complicated than adults [5,6]. Therefore, we need a thorough and comprehensive understanding of tumor genomic variation and biology. So, we integrated multi-sequence technology, NGS, Oxford Nanopore Technology (ONT) Sequencing and Bionano Optical Mapping, to inspect and complement the adolescent PTC genome characteristic.

Genomic Structural Variation of thyroid cancer
One adolescent PTC sample (from a 15 years old female) was used in this test. General speaking, we conducted four sequencing technologies to reveal the characteristic genome structural variation, including Illumina NGS, ONT Sequencing, Bionano Genomics Nt.BspQI nickase endonuclease and Direct Labeling Enzyme (DLE) Optical Mapping. Sequencing blood was used as control. After initial tumor sample sequencing and bioinformatic analysis, we detected ve different kind of SV, including deletion, duplication, insertion, inversion and translocation, and the total numbers of SV within Bionano DLE, BspQI, ONT and NGS were 243, 127, 3120 and 2357 respectively (Table 1).
After scrutinizing the detecting SV numbers and length, huge differences among sequencing technologies were shown (Figure 1). ONT and NGS were obviously advanced in total SV detecting number, where the number of deletion and insertion were dominant within other SV types (Figure1-A). As for the length distribution of different SV types, Bionano sequencing DLE and BspQI enzyme were advanced in shortread SV detecting, but defect in total SV number (Figure1-B/C). ONT detecting SV length were concentrated in 0-1000bp, where deletion and insertion number were 1361/1631, however the other types of SV were relative less, which means the ONT was good at detecting short-read deletion and insertion (Figure1-D). NGS detecting processed a huge number of SV and distributed in all length, which could be a complement for long-read single molecular sequencing (Figure1-E).
Then, we integrated the SV of deletion, insertion and inversion of four sequencing method into one genomic circle plot ( Figure 2). Moving inward from the outer the four ideograms are histograms of deletions, the median are insertions, and the inner are inversion in four detection method (NGS, ONT, BspQI, DLE for black, brown, purple, red). The SV disperse within all chromosome, and different sequencing method are inconsistent in detecting SV effects, which demonstrate that these sequencing methods are complementary in cancer genomic detection.

Genes affected by genomic SV
Large scale SV affected a wide region within genome, especially for the numerous essential genes. All genes located within the detecting tumor genomic SV region were screened. The total number of SV genes in Bionano DLE, BspQI, ONT and NGS were 1218, 454, 1219, and 1112 respectively ( Table 2). Next, we proceeded preliminary screening and summarization for the different types of SV genes within different sequencing technologies, then took intersection of these SV genes lists ( Figure 3). For the number of SV genes within deletion, duplication, and insertion affected, ONT and NGS sequencing were much than Bionano DLE and BspQI enzyme. For inversion, Bionano DLE and BspQI were much more than ONT and NGS. For translocation, Bionano DLE, BspQI and ONT technologies were advanced than NGS, indicating that the single molecular reads processed perceptible superiority in detection the affected genes of large-scale and complex genomic structural variations.

Gene enrichment functional analysis
The tumor occurrence and progression often accompanied with a large number of genomic SV, affected many key genes alteration [2,3]. Each SV affected gene was manually ltered by fully reviewing known functions and literatures related with cancer, and 585 key tumor genes affected by SV were identi ed (supplementary le). Using these genes to proceed GO/KEGG analysis, the result showed that cancer related metabolism and pathway were signi cantly enriched, and some cell component, molecular function and biological process were elucidated which may be essential in tumor proliferation and progression, such as gland development, DNA-binding transcription activator activity, DNA repair complex, P13K-Akt, Ras, Rap1 signaling pathway ( Figure 4).

Discussion
At present, traditional tumor genomic variation research were mostly devoted in single nucleotide polymorphism, copy number variation, DNA methylation, mRNA, lncRNA, etc. However, large-scale variations and complex chromosomal rearrangement (like generated by double strands break) were common and essential in tumor differential and progression mechanisms [3,4], and the short-read variation cannot describe genomic or chromosome alteration completely. Therefore, superior long-read or optical mapping genome detection methods were required for accurate identi cation of tumor-speci c genetic variation with a better and comprehensive perspective. Here, we have integrated Bionano optical sequencing and ONT single-molecule sequencing, by extending the read length of sequenced DNA molecules, to comprehensively detect structural variations of tumor genomes, compared with traditional NGS technology.

Genomic Structural Variation in thyroid cancer
The onset and progression of caner are often triggered by genome-wide accumulated structural abnormalities, especially by dysregulated oncogenes variation. Many genomic variations have been found in thyroid cancer, including point mutation, SV, gene fusion and other complex genomic variation [7][8][9], which were reported to be correlated with genomic instability and patient clinical traits [10][11][12][13].
BRAF V600E and RAS gene mutation was the main mutation type in thyroid cancer, accompany with activity of MAPK signal pathway through different metabolism [7]. By summarizing the published research report so far, 95 papillary thyroid cancer mutant genes were identi ed (supplementary le).
Common mutation genes in PTC include BRAF, RAS, EIF1AX, and rare mutations include PP1MD, CHEK2, TSHR, etc. Gene fusion was a diverse and common genomic variation in PTC, such as RET fusion which was related with genetic function abnormality and alteration in tumor progressive signaling pathway [7,14]. Copy Number Alteration can be also detected in PTC, and approximately 10% occurring at 22q chromosome, 15% occurring at chromosome 1q, 2% owning a high frequency of focal CNV [7]. Adolescent thyroid cancer genomic structural variation was more complex than adults [5,6], including several uncertain gene mutation or fusion and other complex chromosome rearrangement [5]. What's more, some speci c family hereditary syndrome like Gardner, Cowden, DICER1, Werner and PPNAD syndrome, causing by certain gene mutation, could increase the risk of thyroid cancer incidence and progression [15][16][17][18]. The complex genomic structural variation and key oncogene could be better detected through the advanced sequencing technology [19], which can lead to improved diagnostics and therapies [20]. Through the Bionano optical sequencing, ONT single-molecule sequencing, and NGS technology, we got the adolescent thyroid cancer SV.
Discrepancy among Several Sequencing Technologies Traditional Next Generation Sequencing processed advancement of mature technology, high quality and low cost. Nucleotide sequence were quantitatively analyzed through short-read DNA synthetic detection technology, which limited in detection of repeated sequence [21]. Nowadays, NGS technology can successfully identify different kind of variations through the bioinformatics technology update progress.
Bionano opitcal mapping was designed with optical uorescent labeling technology which shearing DNA into a special nanochip for detecting. Scanning the optical signal, genome motives were directly measured through sheared high-quality molecular DNA. Bionano optical mapping was more precise than traditional uorescent technologies like FISH, CNV array, karyotyping, in both simple and complex SV [22], which can be a novel clinical genetic diagnostic method in future [23,24]. Bionano optical mapping reveals a wide spectrum of SVs through visualizing DNA structure, which can be a good complement to short-read sequencing [25]. Since points is limited to label density, more enzymes like DLE-1 or BspQI are needed to improve the performance of SVs characterization.
Recent advances in single-molecule sequencing are showing promising utility in detecting comprehensive structural variation analysis, particularly for duplication, deletion, insertion, inversion, and translocation [26]. Owned for long-read-based, the Paci c Biosciences and Oxford Nanopore Technologies are capable of reconstructing the entire human genome, which is impossible by short-read sequencing. The ONT single molecular sequencing technology was designed using nanopore protein motor, passing through exactly high-quality long read DNA and detecting electricity changes to analyzed nucleotide sequence. Due to the long-read DNA detection, the continuity and integrity was preferable to genomic denovo assembly application. However, due to the sequencing technology itself, the error rates were 15% higher mismatch rate than NGS, and haplotype genome algorithm is weak in detection complex genomic structural variation [27], and it was either too expensive to obtain su cient coverage, or suffered from poor accuracy [28]. However, combining the advantages of different sequencing technologies, the accuracy of SV detection will be greatly improved.

The mismatching for SV and affected genes
Our result showed that the number of genomic SV and affected genes were different between sequencing technologies. For Bionano DLE enzyme sequencing technology, the total SV number were 243, however, the total SV affected genes were 1218. Based on Bionano optical mapping, each SV segment contained more genes, which advanced in detecting extension SV affected genes. For ONT and NGS sequencing technology, the total SV number were more than 2000, however, the total SV affected genes were approximately 1000. Although large SV number were detected by ONT and NGS, the accuracy detecting genes were approximately equivalent within different technologies. The Bionano BspQI enzyme technology in detection SV genes were signi cantly less accuracy than other three method.

Limitation
There still some limitation in our study. There is only one thyroid cancer samples for genomic SV detection with several technologies, and validation still needed. For the sequencing thyroid cancer sample, although tissue have been validated by pathology group, the cell heterogeneity still existed.
Stromal cell and tumor cell DNA mixture could affect accuracy of tumor speci c genomic SV. To comprehensive and precise detection genomic alteration in future, single cell sequencing technology can be combined with advanced sequencing technology in tumor genomic research.

Conclusion
We used four sequencing technologies to explore the genomic SV of this adolescent thyroid cancer. Different numbers, length and types of structural variation, with genes affected by SV were scrutinized. Integrating these results could showed comprehensive scenario for thyroid cancer in a genomic view. We demonstrated that integrated approaches could provide a powerful tool for capturing a higher level of genomic SV, creating new interpretation of sequencing data of particular relevance to human cancer.

Declarations Ethical compliance
All methods were performed in accordance with the relevant guidelines and regulations based on declaration of Helsinki, and the written informed consent to attend this study was obtained from the parents of the subject prior to inclusion in the study, as approved by the Ethics Committee of Beijing Children's Hospital, Capital Medical University, National Center for Children's Health.

Consent for publication: Not Applicable.
Availability of data and materials: The datasets generated and analyzed during the current study are not publicly available due to privacy and ethical restrictions of the current teenager thyroid cancer patient, but are available from the corresponding author on reasonable request.
Competing interests: The authors declare no competing nancial or non-nancial interests.

Methods
Brie y, we took four method, Bionano-DLE, Bionano-BspQI enzyme, Oxford Nanopore Technology, and Next Generation Sequence, to detect adolescent thyroid tumor genome SV using the paired blood sample as normal control.

Sample collection and Ethics
We collected thyroid tumor tissues and paired blood samples from one fteen years old papillary thyroid cancer. Brie y, fresh tissue samples were collected at the time of resection of the abnormal thyroid mass. Samples were con rmed by H&E histology section content. The paired peripheral blood samples were drawn from the same patient for sequencing as control. Written informed consent to attend this study was obtained from the parents of the subject prior to inclusion in the study, as approved by the Ethics Committee of Beijing Children's Hospital, Capital Medical University, National Center for Children's Health.
Bionano optical mapping with DLE and BspQI enzyme: The tumor sample and blood were sequenced with Bionano-DLE and Bionano-BspQI enzyme strategies. Brie y, High molecular weight (HMW) DNA was extracted using the Bionano Kits for the sample preparation. Then, the DNA was labeled by DLE and BspQI enzyme in two strategies. The labeled DNA was loaded on Bionano chips, and automatically running with Bionano system. Genomic mapping and assembling was performed with Bionano Access and Solve software. SV calling was performed with Bionano algorithm aligning hg38 genome and detecting genomic structural variation after ltering common SV and choosing the tumor speci c SV. More speci c procedures and parameters were in supplementary le method.
Oxford Nanopore Technology sequencing: The tumor sample and blood were also sequenced using Oxford Nanopore Technology (ONT). The original sample treatment and HMW DNA were in accordance with Bionano process. Next, the library preparation was used ONT protocol suggestion, following MinION sequencing and software basecalling. After trimming, the reads were mapped with software NGMLR using reference Hg38 genome, and then genomic structural variations were called with software Sni es and ltered with only precise SV which presented in tumor while not in blood sample. More speci c procedures and parameters were in supplementary le method.
Next-Generation Sequencing: The tumor sample and blood were also sequenced using Next-Generation Sequencing (NGS). The DNA was extracted from sample tissues and fragmented into small size for the library preparation followed the NGS DNA preparation protocols. After PCR ampli cation, the enriched DNA library was sequenced by HiSeq, and common bioinformatic procedures of mapping, alignment and SV calling was performed with Bcl2fastq, BWA and Delly softwares. After trimming the SVs lists excluding normal variation, the nal speci c thyroid tumor SVs were determined by those presented in tumor but not in blood sample. More speci c procedures and parameters were in supplementary le method.
SV affected genes identi cation and classi cation: