Genome-wide Mutation Landscape of SARS-CoV-2

DOI: https://doi.org/10.21203/rs.3.rs-47392/v1

Abstract

Background: SARS-CoV-2 has become a pandemic and researchers have built phylogenetic trees to trace the spread of the virus. However, the accumulation rate of variations and mutational hotspots remain largely unclear.

Results: We collected more than 3,100 SARS-CoV-2 genome sequences from GISAID and profiled the landscape of whole genome variations. We detected 2,096 single nucleic variants (SNVs) and seven short deletions. 1,224 of them (58.4%) are missenses variation, altering the corresponding residues. We found the accumulation rate of SNVs in the current spreading situation is 6.36e-2/day. We found 15 missenses SNVs are extremely high frequent (existing in more than 100 genome sequences, p < 1e-5), effecting ORF1ab, S, ORF3a, M, ORF8, and N. Moreover, one frequent substitution at locus 23,403 changes the 614th amino acid of spike glycoprotein from D to G, potentially effecting the functions of this key protein.

Conclusion: Our study provided the genome-wide mutation landscape of SARS-CoV-2. We found the continent specific mutational patterns and 15 missenses high frequent SNVs effecting 6 genes of the virus, may promoting the adaption of the virus during evolving.

Full Text

This preprint is available for download as a PDF.