Background: Chromosome organization plays an important role in biologicalprocesses such as replication, regulation, and transcription. One way to study therelationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation.Such studies generate vast amounts of data. The problem is exacerbated by thefact that chromosome organization is dynamic, requiring snapshots at differentpoints in time, further increasing the amount of data to be stored. We present anovel approach called the High-Efficiency Contact Matrix Compressor (HiCMC)for efficient compression of Hi-C data.
Results: By modeling the underlying structures found in the contact matrix,such as compartments and domains, HiCMC outperforms the state-of-the-artmethod CMC by approximately 8% and the other state-of-the-art methods cooler,LZMA, and bzip2 by over 50% across multiple cell lines and contact matrixresolutions. In addition, HiCMC integrates domain-specific information into thecompressed bitstreams that it generates, and this information can be used tospeed up downstream analyses.
Conclusion: The HiCMC is a novel compression approach that utilizes intrinsicproperties of contact matrix, such as compartments and domains. It allows fora better compression in comparison to the state-of-the-art methods. HiCMC isavailable at https://github.com/sXperfect/hicmc.