Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. Building on this formalism we introduce the concept of k-mer distances between chromosomes. We formulate two distance measures, D1 and D2, where the first takes into account k-mers appearing on single strands of the two chromosomes, whereas the second takes into account both strands.
We first define the various distance measures and summarize their properties. We also define distances that rely on existence of synteny blocks between chromosomes of different strains. Studying E Coli and Salmonella strains, we evaluate the different distance measures, and find correlations between synteny distances and k-mer distances, thus establishing the usefulness of the latter as measures of evolutional proximity of chromosomes. Applying our measures to human genomes, we find that chromosomes 5 and 6 are the closest ones on the k-mer distance evolutional scale.
The novel distances carry information about evolutional proximity and provide useful tools for future studies. The finding of proximity between human chromosomes 5 and 6 is an examples of a novel insight provided by these tools.