To understand the spread and evolving dynamics of CoV2, here we mapped all the genomes available on NCBI virus database (www.ncbi.nlm.nih.gov/labs/virus). Total of 473 complete CoV2 genomes comprising sequence entries from 20 different countries were selected for analyses. Based on available reports Bat-CoV genome was used as an outgroup source [5]. Our analyses are consistent with other reports which shows that samples from Wuhan (MT291831) and Shenzhen/Hongkong (MN975262) are closest to the source. The former sample spread out into two clusters A and B engaging three samples (MN997409-Arizona, MT106054-Texas and MN938384-Hongkong/Shenzhen) to connect with cluster B and one sample, MT304489-Taxas for cluster A, sharing one and four mutations each (Figure 1). For better understandings, we have classified the whole network into five clusters, where the distant U1 and U2 are rich in samples of the USA. Cluster B is mainly a shared cluster of China and USA while A and C are diverse.
The center of cluster A is shared by samples from USA, China and Taiwan while the Chinese source shares ancestry (two mutations each) to Colombian (MT256924) and Indian (MT050493) sample respectively. The sample from Taiwan provide a sole outgroup (MN985325) to cluster U1 which densely contains the sequences from Washington DC, USA. Cluster B is heavily centered to USA and China and provide direct descendants to Vietnam, Israel, India, Pakistan, Italy, Nepal, Australia, Sweden and Korea sharing one to four mutations. Interestingly, the Swedish sample is using Australian node rather than Chinese. Second cluster of the USA, U2 is connected to cluster B by a rather small cluster C that contained European and South American samples from Spain, France and Peru. The French sample of cluster C provide an outgroup to the U2 cluster that contained sequences from different states of the USA.
Collectively, our global scale CoV2 spreading dynamics indicate countries with multiple or different source entries are assisting viral evolution at a rapid phase.
Phylodynamics of the USA
Until April 13, 2020 there were more than 400 sequences from the USA. Here, we have analyzed the 355 complete genome samples of the USA reported from seventeen different states including 24 samples from the cruise ship Diamond princess that had 3771 passengers on board out of which more than 700 confirmed cases of CoV2 [14]. Since the cruise was carrying CoV2 positive patients from Hongkong, we used Bat-CoV genome as an outgroup. To our interest, the cruise samples grouped next to the ancestor, here we call it Cruise-cluster. Along with the Cruise-cluster one sample each from Oregon (OR, MT304487) and Texas (TX, MT276331) stayed closer to the ancestor (Figure 2). The OR sample provide a base for one sample each for California (CA), Georgia (GA) and five for Washington (WA). The central base of the Cruise-cluster is shared with the Arizonian sample directly infected from China (discussed above). Overall, the C-cluster shares similarity with majority of the samples from CA and further bifurcated. The left side group of WA samples is in the same group we previously mentioned as U1 and is connected by an arbitrary ancestor to the C-group suggesting that Cruise samples are not the direct source for U1. Ultimately the only valid source left is from Taiwan. Similar case can be observed in the right cluster where the Cruise-cluster is not providing an actual ancestral link.