Background: For many years, a major question in the field of cancer genomics has been the identification of those variations that can have a functional role in cancer, and distinguish from the majority of genomic changes that have no functional consequences. This is particularly challenging when considering complex chromosomal rearrangements, which are often composed of multiple DNA breaks, resulting in difficulties to classify and interpret them functionally. Despite recent efforts towards the classification of structural variants (SVs), more robust statistical frames are needed to better classify these variants, and to isolate those that derive from specific molecular mechanisms.
Results: We present a new statistical approach to analyze SVs patterns from 2392 real tumor samples from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium and to identify significant recurrence, which can inform of relevant mechanisms involved in the biology of tumors. The method is based on recursive KDE clustering of 152,926 SVs, graph mining techniques and statistical measures. The proposed methodology was able not only to identify complex patterns but also to prove them as not random occurrences. Furthermore, a new class of pattern that was not previously described has been identified.
Conclusions: We developed a new and unbiased methodology for clustering SVs to search further for complex patterns by using a cost-efficient graph mining method. Followed by deep statistical analysis and applying randomization techniques, our proposed framework allows for discerning between stochastic chromosomal rearrangements and complex patterns that might have specific underlying mechanisms present in different cancer types.