Mutational Analysis of SARS-CoV-2 Genomes in Key Cities of China

From December 2019, SARS-CoV-2 induced pneumonia broke out in Wuhan and then spread rapidly from multiple resources to other provinces and other cities in China. In this paper, genomes collected in four Chinese cities: Wuhan, Guangzhou, Shanghai, Hangzhou were analyzed as the A1 module of the MAS. Starting from the virus gene sequence itself, multiple probability statistics are ap-plied to extract characteristics from virus genomes. Variations of genomes can be compared and visualized in such conditions. It is interesting to see various similar and different properties visualized under various groups after transformations. In this way, key mutation characteristics could be observed and this type of results is helpful for further scientiﬁc researches on COVID-19 applications.


Research Background
SARS-CoV-2 is a new type of highly infectious coronavirus. Since the beginning of December 2019, Wuhan City, Hubei Province, China has reported cases of viral pneumonia of unknown causes, which has become the focus of global attention. Chinese scientists quickly identified the novel coronavirus by genetic sequencing and virus isolation. The SARS-CoV-2 belongs to the coronavirus genus, which is closely related to SARS-CoV and MERS-CoV. However, SARS-CoV-2 is not evolved from SARS-CoV and MERS-CoV. Therefore, the problem of virus source has become one of the core issues for scientists in various countries since the beginning of the epidemic. From the perspective of virus gene sequence analysis, scholars in various countries analyze that the source of virus source may come from bats according to the similarity of virus gene sequence, but the exact source needs to be further explored [4] and studied.
At present, the domestic virus outbreak has been effectively controlled, but the virus is spreading rapidly in other areas. European and American countries have become the center of new pneumonia outbreak. SARS-CoV-2 has posed a great threat to the health and safety of people all over the world due to its amazing spreading ability and potential harm.
With the spread of the virus from Wuhan to major cities in China, the gene sequence of the virus itself is constantly changing with the environment changing, so it is necessary to find out the location of virus sequence variation and the degree of variations. So we focus on the comparison of the gene sequence of the virus, and select multiple gene sequences measured in the core cities of China. In view of the sequence variation in different cities, under the framework of variation measurement, we use the A1 model of the MAS, a sequence comparison model with good effect is constructed, which can accurately locate the variation interval between the sequences of mutated viruses, and determine the variation location for more detailed research and exploration [2].

Aim of The Study
According to the gene sequence of major cities in China, the variation process was analyzed, focusing on Wuhan, Guangzhou, Shanghai and Hangzhou, with the following two focuses.
(1) In view of the special case of gene sequence visualization, this paper will focus on gene sequence itself, and explore the distribution law of base number in gene sequence to find the characteristics (2) In view of the current situation of virus outbreak, a large number of virus gene data are generated every day. For such a large number of data, we can quickly and effectively extract the characteristics and variation information of virus by the way of segmented measurement, which is conducive to more rapid and in-depth research by researchers.

Variation Projection Model
For the variation projection model of virus sequence, the process is as follows: Input: viral gene sequence Processing: segments, measurement Output: Variation projection graph In the process of processing the sequence, the total length of the measurement sequence is recorded as N, and the whole sequence is divided into M segments and each segment has m length, then the number of bases contained in each segment is N/M, then M segments are measured, and the number of A, G,C, T bases in the sequence is measured in turn, and four groups of bases m A , m G , m C , m T are obtained, and then the number is projected on the one-dimensional graph.

Variation Measurement Model
For the variation measurement model of virus sequence, the process is as follows: Input: viral gene sequence Processing: segments, measurement, comparison Output: variation measurement graph Sum the base numbers of four groups of bases in each section ∑ M A , G , C , T .Take the absolute value of the difference between the values of the corresponding segments of the two sequences, we can get m values and project them on a onedimensional graph.

Results Discussions
According to the transmission time line of the virus, we selected 20 sequences from 4 cities on GISAID for measurement, including 5 in Wuhan, and the detection time of the virus sequence is December 23, 2019; 6 in Hangzhou, testing time is January 22, 2020; 5 in Shanghai, testing time is February 1, 2020; and 4 in Guangzhou, testing time for February 25, 2020 [3].
We can clearly see that the trend of gene sequence visualization is the same in all regions, but we can also see the existence of differences, indicating that the virus has changed in the process of transmission [5].
From the above figure that although the difference can be seen from the projection map, the degree of difference is not obvious enough. we can compare the Gene sequence between four cities. We can clearly see the location and degree of variation of the virus in the process of transmission. From the above figure, we can see that the similarity of multiple sequences in the same city is different. The number of bases of gene sequences detected from Wuhan, Guangzhou and Shanghai is approximately the same, and the difference between the two sequences is less than 5, which indicates that there is little or almost no variation in the part of the variation when the virus spreads in the city, but the curve in Hangzhou has great changes, which can be inferred that the virus transmitted in Hangzhou may have been transmitted from other areas and changed in the new environment.

Conflict Interest
No conflict of interest has been claimed.