Clustering and Graph Mining Techniques for Classification of Complex Structural Variations in Cancer Genomes

doi:10.21203/rs.3.rs-476852/v1

Download PDF

Research

Clustering and Graph Mining Techniques for Classification of Complex Structural Variations in Cancer Genomes

https://doi.org/10.21203/rs.3.rs-476852/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Background: For many years, a major question in the ﬁeld of cancer genomics has been the identiﬁcation of those variations that can have a functional role in cancer, and distinguish from the majority of genomic changes that have no functional consequences. This is particularly challenging when considering complex chromosomal rearrangements, which are often composed of multiple DNA breaks, resulting in diﬃculties to classify and interpret them functionally. Despite recent eﬀorts towards the classiﬁcation of structural variants (SVs), more robust statistical frames are needed to better classify these variants, and to isolate those that derive from speciﬁc molecular mechanisms.

Results: We present a new statistical approach to analyze SVs patterns from 2392 real tumor samples from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium and to identify signiﬁcant recurrence, which can inform of relevant mechanisms involved in the biology of tumors. The method is based on recursive KDE clustering of 152,926 SVs, graph mining techniques and statistical measures. The proposed methodology was able not only to identify complex patterns but also to prove them as not random occurrences. Furthermore, a new class of pattern that was not previously described has been identiﬁed.

Conclusions: We developed a new and unbiased methodology for clustering SVs to search further for complex patterns by using a cost-eﬃcient graph mining method. Followed by deep statistical analysis and applying randomization techniques, our proposed framework allows for discerning between stochastic chromosomal rearrangements and complex patterns that might have speciﬁc underlying mechanisms present in diﬀerent cancer types.

General Biochemistry

Molecular Biology

complex rearrangements

structural variants

cancer genomics

clustering

graph mining

motif ﬁnding

AdditionalFile1.pdf
Additional ﬁle 1 (pdf) Figures S1 showing the abundance values of the evaluated cycles for the 36 cancer types. Figure S2 showing the abundance values for diﬀerent triangle categories and Figure S3 showing common clusters between every pair of triangle types.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Clustering and Graph Mining Techniques for Classification of Complex Structural Variations in Cancer Genomes

Archived Versions:

Version 1

Abstract

Figures

Full Text

Supplementary Files

Archived Versions:

Version 1