Background
The lack of virus fossilization precludes any references or ancestors for inferring evolutionary processes, and viruses have no cell structure, metabolism, or space to reproduce outside host cells. Most mutations yielding high pathogenicity go extinct from the population, but adaptive mutations could be epidemically transmitted and fixed in the population. Therefore, determining how viruses originated, how they diverged and how an infectious disease was transmitted are serious challenges.
Results
Surprisingly, some inter-species genetic distances of Coronaviridae were shorter than the intra-species distances, which may represent the intermediate states of different species or subspecies in the evolutionary history of Coronaviridae. The insertions and deletions of whole genome sequences between different hosts were separately associated with new functions or turning points, clearly indicating their important roles in the host transmission and shifts of Coronaviridae. Furthermore, we believethat non-nucleosomal DNA may play dominant roles in the divergence of different lineages of SARS-CoV-2 in different regions of the world because of the lack of nucleosome protection. We suggest that strong selective variation among different lineages of SARS-CoV-2 is required to produce strong codon usage bias. Interestingly, we found that an increasing number ofother types of substitutions, such as those resulting from the hitchhiking effect, have accumulated, especially in the pre-breakout phase, even though some previous substitutions were replaced by other dominant genotypes. To predict potential epidemic outbreaks, we tested our strategy, Epi-Clock, which applies the ZHU algorithm on different SARS-CoV-2datasets before outbreaks to search for real significant mutational accumulation patterns correlated with the outbreak events. Here, we imagine that specific amino acid substitutions would be triggers for outbreaks. From most validations, we could accurately predict the potential pre-phase of outbreaks with amedian interval of5 days before the outbreaks. Using our pipeline, users may review updated information on the website https://bioinfo.liferiver.com.cnwith easy registration.
Conclusions
Tracking the spread of infectious diseases to assist in their control has traditionally relied on the analysis of case data gathered as the outbreak proceeds. Here,we propose Epi-Clock, a sensitive platform to help understand pathogenic disease outbreaks and facilitate the response to future outbreaks, similar to a clock that can signal the need to assist individuals at focal locations by using diagnostics, isolation control, vaccines or therapy at any time.