Visual Variations between Pairs of SARS-CoV-2 Genomes on Integrated Density Matrix

This paper is the B2 module of the MAS. The quantiﬁcation matrix is formed according to the four-base arrangement in the genome sequence. The differences in new coronavirus genome sequencing sequences in different samples were demonstrated by using the most concise methods. Using 4 primitive variable value measures, changes in the virus genome sequence base order conditions were determined. When two relatively large genomic sequences are slightly different, the integrated distribution of the difference calculation is subtly similar to the Bose-einstein distribution, while the sum calculation shows a powerful distribution complexity. It can be formed under the macroscopic angle and can distinguish 16 combinations of supersymmetric structures. In view of the abundant transformation structure in this kind of transformation system, the detailed exploration remains to be followed by the systematic expansion of theory and medical application.


Introduction
Since the end of 2009, SARS-cov-2 has rapidly broken out, and the international epidemic situation has gradually become severe.The cases of Japan, South Korea, the United States and Italy are growing sharply (rapidly), and European countries are at a high risk of infection.The World Health Organization (WHO) directorgeneral Tan Desai pointed out that now is the time for all countries, communities, families and individuals to concentrate on controlling the epidemic and preparing for a possible "pandemic".The WHO will continue to conduct risk assessments and constantly monitor the development and changes of the epidemic.Until the beginning of 2020, the International Virus Classification Committee declared that the new coronavirus would be named "SARS-CoV-2" (Severe Acute Respiratory Syndrome Coronavirus 2).SARS-CoV-2 is a type of RNA virus with an envelope and a linear single-stranded genome.The particles are round or oval, with a diameter of approximately 60 to 140 nm.Positive-strand RNA means that the virus can enter the cell directly via direct protein synthesis and self-replicate by generating negative strands with RNA polymerase.People with SARS-CoV-2 infection will have fever, blood clotting symptoms, whitish lungs and other symptoms, and severe cases may be life-threatening.Recently, asymptomatic new coronary carriers of viruses may have mutated their genes.
To study the possible genomic sequence variation, this article randomly selects 8 countries and uses the visualization method under the variant construction to add or quantify the four bases of the viral genomic sequence.Based on vector logic, modern matrix theory, geometric measure theory, combinatorial algebra and discrete mathematics, variant construction starts from n 0-1 variables to form 2 n states and 2 2 n functions via vector permutation and complement operations on state space to establish a variant logic framework to contain 2 n !× 2 2 n configurations as a variation space.Variant measurement acts as a core of quantitative measurement, starting from m 0-1 variables to explore relevant clustering conditions on 2 m states.Many sample applications have been developed for 40 years using variant construction [1]- [7], such as content-based image retrieval, medical image processing, bat echo identifications, DNA maps, hierarchical organization, phase space classification, feature extraction, filtering, combinations, projections and conjugate transformations [8].From the perspective of overall invariance, comparing the statistical distribution characteristics of the 2D and 3D diagrams, the possibility of genomic sequence differences between countries is explored macroscopically, which lays the foundation for the study of the difference between SARS-CoV-2 and the typical coronavirus genome.

Data Sources
The genomic sequence data used in this article are downloaded from the open source databases NCBI (National Center for Biotechnology Information) and GI-SAID (Global Shared Influenza Data Initiative) [11]- [12].The description of the data used is shown in Table 1.

Distribution Characteristics and Method Description
Download representative new coronavirus SARS-CoV-2 and various influenza virusrelated data at NCBI (National Center for Biotechnology Information) and GISAID (Global Influenza Data Initiative).First, the genome sequence was screened and cleaned and processed in sections to calculate the number of ATCG, four bases in the corresponding sections.Second, substitution and combination operations were performed on the calculation results of the same genome sequence, and the numbers were counted according to the same counting information contained in different segments.At the same time, different genome sequences of each virus are used for difference or sum operations, and the results are recorded to form 256 quantization matrices.Finally, combined with the visual analysis method in the variant logic system, the characteristics of the possible variation differences of the base pairs of the genome sequence are displayed from a macro perspective to form a distinguishable classification diagram with supersymmetric reflection characteristics.For the specific formula derivation process, please refer to the paper [10].The flow of the method used is shown in Fig. 1.
Fig. 2 The flow of the method maps

Difference Comparison Results
Randomly select the sequence of the SARS-cov-2 genome belonging to eight countries, calculate the number of bases in each segment, and perform the difference operation between the two sequences.The data size used is approximately 30k, the number of segments is 18, the image color is strengthened from blue to red, blue represents the least data distribution, and red represents the largest number of scatter values.Taking the data from Australia and Canada as differences, Brazil and Japan, Italy and the United States, and China and France as examples, the distribution characteristics of the four images are not the same, and the differences are obvious.Among them, the difference between Brazil and Japan showed the most abundant image color and showed a similar Bose Einstein distribution.The difference between Australia and Canada is clearly distinguished by color, with yellow areas as the main color and blue as the paving.The difference between Italy and the United States is distinctive, and the data are dense.Almost all are distributed in the yellow area.However, the difference between China and France is mainly in the dark blue area, with little color fluctuations, almost no difference, and a large degree of similarity.The specific overall distribution of the difference comparison maps is shown in Fig. 2.

Add Operations on Comparison Results
Still taking the sum of data from Australia and Canada, Brazil and Japan, Italy and the United States, China and France as examples, it can be found that the sum of data from Australia and Canada and the sum of Brazil and Japan show the richest colors, and the distribution of the two images is very similar, not as obvious as the comparison chart.Italy and the United States have distinctive summing characteristics.The image is composed of many cones, without too much dense data, and is distributed more in the blue and green areas.The combined images of China and France are almost evenly distributed, mainly in dark blue areas, with little color fluctuation.The specific overall distribution of the added comparison maps is shown in Fig. 3.

Conclusion
Bioinformation data are increasing with the development of life sciences.Due to the complexity of living organisms, biological information often has a larger quantity with levels of more uncertainty, and complicated relationships.Results of biological information is more secret and complex.People's research on biological information is in the stage of exploration and discovery.The analysis of massive and complex biological information data and the development of new visual analysis tools are more practical.This paper forms a quantization matrix based on the four base arrangements in the genome sequence.Using the most concise method of calculation of difference and addition, the sequence differences of the genome sequence of SARS-cov-2 under different samples are displayed.The analysis and visualization methods used   The ow of the method maps

Figures Figure 1
Figures