Functional Group Decomposition of Multiple Coronaviruses on Variant Maps

Different coronaviruses can be identiﬁed as three categories: common coronaviruses, fatal coronaviruses, and domestic coronaviruses. It is convenient to generate various visual results for their RNA sequences on variant maps. In this paper, a functional group measurement method is proposed to combine discrete math-ematics and computational technologies on the A2 module of the MAS. Various samples are processed by this scheme and interesting results can be observed. The projections of the segmented groups on each coronavirus compared with the projec-tive effects on different coronaviruses in 2D maps of coordinate systems are shown by statistical measures on the density matrix with similarity and dissimilarity prop-erties for further exploration.


Research Background
Coronavirus is a kind of RNA virus with envelope and linear single positive strand genomes. It accounts for approximately 1/100000 of the human genome data and approximately 1/150000 of the E.coli genome data. The coronavirus genome data are as small as 26KB and as large as 30 KB. There is no large difference in the data length, but it is the largest single stranded RNA virus among all [1]. Biochemical performance is mainly able to cause human and many livestock and animal diseases.
At present, there are seven kinds of coronaviruses: HCoV-NL63,HCoV-HKU1, HCoV-OC43,and HCoV-229E, which can cause different degrees of human infection, but the mortality rate is almost zero [2]. The other three kinds are fatal coronavirus SARS-CoV, MERS-CoV and SARS-CoV-2. All of the above seven species have one thing in common: they are all human coronaviruses, and there is also a domestic coronavirus, such as porcine deltacoronavirus (PDCoV). The experimental data in this paper were also analyzed and processed. The mortality of this disease among domestic animals and pigs is very high, 50%-100% [3].
Currently, the mainstream research on coronavirus is not to define them by various morphological characteristics and biochemical reactions but to analyze and process coronavirus by combining bioinformatics. One of the important fields of bioinformatics is to use mathematical methods to model and combine them with powerful tools in computer science to explore the mysteries of biology. The two basic processes that have to be involved are the acquisition of relevant data and the analysis behind it. These two processes are equally important. The acquisition of data is mainly through molecular biotechnology and then through high-throughput and other similar sequencing means to explore the gene sequence data of the essence of life.
The analysis of data uses the high-efficiency processing technology of computers to analyze and process the biological data to obtain their essential characteristic information and observes many results of data analysis to find the law of life. Of course, with the progress of sequencing technology, most algorithms have been unable to catch up with the data explosion speed. This requires researchers to think more closely and consider the desire to achieve low time complexity in an all-round way. However, its job is to extract effective information from nucleotide and protein sequences. This task can be transformed into the study of sequence essence, starting from four bases, to explore the mystery.
In this paper, based on geometric statistics, we present two-dimensional images of coronaviruses and analyze their characteristics. This method belongs to the A2 function module of the MAS and describes a kind of color RNA sequence visualization model. This model is based on the variant construction [9]. The variant construction is a new system composed of variant logic, measurement and visualization models, which can be used to analyze gene sequences under the condition of variations. it has been used in the random detection of sequence cipher, and also used in the medical field to detect different diseases visualization of human ECG data can not only solve the problem of information loss and degradation, but also realize dense visualization [4,5]. It needs less plane space to visualize DNA se-quences with larger data. It mainly uses the relationship between A, T, G and C to process the whole DNA sequence and finally forms 2D maps in space. Through the processing of this module, we can see the different projection effects of different coronaviruses in the coordinate system, and according to the decomposition of coronavirus gene sequence, observe its functional gene segmentation, contact the statistical characteristics of bases, and obtain the internal rules.

Data Sources
All coronavirus data in this paper are collected from the NCBI (https://www.ncbi.nlm.nih.gov/) and GISAID (https://platform.gisaid.org/), and a series of corresponding serial login numbers are shown in the

Main Methods
In the gene sequence, base pairing follows a strict complementary symmetry. From the 15 parameters of the probability measure (A, T, G,C, A + T, A + G, A + C, T + G, T +C, G +C, A + T + G, A + T +C, A + G +C, T + G +C, A + T + G +C), a variety of one-dimensional or multidimensional visualization modes can be formed [8]. According to prior experiments and biological knowledge, to obtain better graphical results, this paper mainly selects the two-dimensional visualization framework, focusing on the statistics of the probability of A+T and A+G parameters, and then, in the variant model, a 2D map is generated according to the count of the coresponding numbers [7].

Architecture
In this paper, we first obtain the relevant data from the NCBI website as the input and then filter the data to obtain the required format. Finally, we process all sequences with the same segmentation and method, and each data point can obtain a 2D map output. The whole frame structure of the RNA gene sequence is shown in Figure 1.
As the basic experimental data, the RNA gene sequence is divided into several equal length subsequences, and the data in each subsequence are calculated according to the measurement parameters.  The following example shows the projection results of the module to deepen the understanding of the subsequent visualization results. The range of the two coordinate axes and chromatographic bars is (0,30). The larger the projection value is, the closer it is to yellow, and the smaller the value is, the closer it is to blue. The points in the coordinate system represent (Number(A + T ), Number(A + G)).

Fig. 3: A projection example
Visualization Results and Analysis Fig. 4: Nine different coronaviruses Figure 4 shows nine different coronaviruses, the first four (Figure 4(a)-(d)) are common coronaviruses, the fifth to the eighth (Figure 4(e)-(h)) are fatal coronaviruses, and the last one (Figure 4(i)) is domestic coronaviruses. It can be seen from the above pictures that, in general, the distribution areas of all coronaviruses in the coordinate system are relatively similar, and the higher the mortality rate of coronaviruses, the higher the projection area.  6 show the functional genes of the HCoV-229E segment, which is convenient to observe their distribution characteristics. 100 lines of data are used as segments, and it can be seen that the coverage rate of the 100-200 segment ( Figure  5(b)) of HCoV-229E is the closest to that of the whole gene sequence, as well as that of HCoV-NL63 in Figure 6.

Result Discussions
Because the influence of coronavirus on humans cannot be ignored, it plays an important role in the field of bioinformatics. In this paper, the study of coronavirus is to analyze the expression of coronavirus diversity and explore the existence of functional genes by using the probability statistical variable value model. This method can describe the distribution characteristics of different kinds of coronaviruses quickly, conveniently, concisely and intuitively.