CGR—A genetic sequence X(k) can be considered as a string composed of A, G, C and T which represent Adenine, Guanine, Cytosine and Thymine, respectively.
Xk∈C,A,T,G
We consider a unit square U and name corners Ci (i = 1,2,3,4) as C, A, T and G respectively, which corresponds to the value of X(k). The initial point P(0) is the midpoint of the square. Now the second point P(1) is the midpoint between P(0) and CX(1). In General, P(k) is plotted as the midpoint between P(k–1) and CX(k)[14].
After plotting the genetic sequence X in unit square U, the unit square is divided into 2N x 2N sub squares; each sub-square represents a unique sub-sequence of length k (k-mer).
An example for movement of points in CGR is shown with the first eight members of the data sequence (GCTTATGT) in Supplementary Figure 1. An example of addresses of the sub-squares for nucleotides, di-nucleotides (2-mer), tri-nucleotides (3-mer) and tetra-nucleotides (4-mer) is given in Supplementary Figure 2.
PC-plotsTo make these plots, the percentage of points plotted in sub-square is calculated. This percentage value represents the intensity of points in each sub-square. After plotting points by CGR and dividing the unit square into 2k x 2k sub squares, each sub-square is color-filled based on the calculated intensity values. Supplementary Figure 3 shows the percentage plot (Y) made for the SARS-Cov–2 and SARS-Cov–1 for k = 7. Similar plots were made for all the pathogens (See supplementary Figs 4–33).
SP-plots and k-mer proximity Index—Subtraction plot between genome1 (g1) and genome2 (g2) is plotted as
Sg1-g2 = Yg1- Yg2
For example, if percentage density values of Yg1 and Yg2 in 4x4 matrices corresponding to di-nucleotides (2-mer) are
From the subtraction plot S, the sum of all the positive numbers (also the sum of modulus of negative numbers) is a measure of similarity or dissimilarity between two genetic sequences.
In the above example, the 2-mer proximity index (
Pr) is calculated by adding all positive (4+4+7+11+12+8 = 46) or all negative (15+15+5+5+4+2 = 46) values.