Previous study demonstrates that PacBio long reads could detect over 20,000 SVs in a typical whole human genome18. However, whole genome third-generation sequencing is rather expensive which limited its application to the clinic. To address this issue, we applied to the best of our knowledge the first clinical TGS panel using PacBio HiFi platform to breast cancer samples. We conducted a comprehensive analysis on structural variations across 28 breast cancer-related genes through long-read genomic and transcriptomic sequencing of paired breast cancer tissue and blood. Our results suggested that germline and somatic SVs were common in the selected genes among breast cancer patients, though the majority of them occurred in the non-exonic region. We also identified a potential hotspot region for somatic SVs. Taking together, our results demonstrated that SVs are potentially important in the tumorigenesis of breast cancer. Indeed, the International Cancer Genome Consortium (ICGC) previously showed that driver SVs are more prevalent than point mutations in breast adenocarcinomas (6.4 SVs compared with 2.2 point mutations on average)19.
The traditional NGS platforms have poor mapping to repetitive elements including tandem repeats and interspersed repeats, which has made a substantial fraction of most genomes inaccessible and limited its ability to detect SVs20. One of representative type of interspersed repeats is Alu element which accounted for 11% of the human genome sequences on average, it belongs to a class of retroelements termed short interspersed elements (SINEs) and often causes SVs through homologous recombination21. An important reason that we developed the 28-gene TGS panel for illuminating the full landscape of SVs in breast cancer is to overcome the limitations of NGS in detecting SVs around repetitive elements. The repetitive elements are abundant in the 28-gene panel which contains most of the breast cancer-related genes, for instance, the BRCA1 gene has around 40% of Alu family repetitive elements in its DNA sequences22, 23.
In this paper, by acquiring paired blood, paratumor and tumor tissue from patients, we delineated germline and somatic mutations which were both reported to be responsible for carcinogenesis. Interestingly, we found a potential somatic SV hotspot in the AT-rich region of ERBB2 gene. Although this region is not belonging to interspersed repeats which often causes SVs through homologues recombination, there are proofs in previous studies that SV hotspots could exist in regions other than SINE elements and DNA transposons24. Hence, our method of fine-scale characterization of genomic structural variations using TGS holds great potential to elucidate the full landscape of SV in breast cancer.
We have also systematically examined the paratumor tissues which was used as control samples to identify somatic mutations in tumor. During the process of carcinogenesis, somatic mutations continuously accumulated within the tumor tissue, turning the genomic structure different from surrounding paratumor tissues25. It is important to figure out how different is paratumor compared to the blood and to the tumor. We have shown that most SVs were the same in both blood and paratumor tissues, but different from those in the breast cancer tissues. This is in accordance with previous study that demonstrated copy number variations mostly occurred between paratumor and tumor26.
Our 28-gene TGS panel also showed great promise in identify casual SVs of breast cancer. NF1 is one of the 12 breast cancer predisposition genes identified to date, however, virtually all previous studies have focused on evaluating breast cancer risk associated with putative pathogenic SNVs and small InDels27, 28. We have successfully identified two exonic SVs in two breast tumor tissues, which proves that our TGS panel is useful for detecting cancer-related SVs. Moreover, our TGS panel is robust in identifying SVs, as indicated by the concordant results between long-read genomic and transcriptomic sequencing in identifying fusion genes.
Our findings that somatic SVs are abundant in the cancer genome suggest that they may play an important role in the process of tumorigenesis and development. This is especially important for breast cancer, since the pan-cancer studies by ICGC found that the driver SVs is most evidently prevalent in breast cancer compared to driver point mutations19. Taking together, our clinical TGS panel shown here is an accurate and robust method to detect SVs in breast cancer, which is both important for breast cancer research and holds great potential for further clinical application.