Nine genotypes were reclassified based on the VP1 sequences
According to the guidelines for genotype classification, 15%~25% nucleotide differences between genotypes and less than 15% within genotypes16, 113 full-length VP1 sequences of E25 could be classified into nine genotypes (A~I) (Figure 1). The nucleotide differences among the 9 genotypes ranged from 15.7-24.7%, and details about nucleotide and amino acid differences between the different genotypes are shown in Table 1.
The prototype strain (JV-4) was classified as genotype A, and the remaining eight genotypes were named sequentially in chronological order. Genotypes B, C, and I, all consisting of only one sequence, were isolated in Australia in 1995, China in 1998, and the United States in 2016, respectively. Genotype D included 22 sequences, and most of them (16/22) were isolated in India between 2010 and 2012. Similarly, genotype H was mainly composed of Indian strains (6/8) and two Chinese strains. Genotype E included 6 sequences from Australia, Germany, and Taiwan, China. The five sequences of genotype G were from Australia, France, and the United States. Genotype F included 68 sequences covering four countries, China, the USA, Australia, and France, of which 55 sequences were from China, and this genotype was also the main genotype prevalent in the Chinese mainland.
Phylodynamic analysis of E25
Using BEAST software, 110 full-length E25 VP1 were analyzed for evolutionary origin. The results showed that the mean evolutionary rate of E25 was 6.08×10-3 substitutions/site/year (95% HPD: 5.10×10-3 to 7.16×10-3); the time of origin was 1923 (95% HPD: 1901 to 1944) (Table 1, Figure 2A). Correspondingly, genotype F originated in approximately 1993 with an evolutionary rate of 6.44×10-3 substitutions/site/year (95% HPD:5.53×10-3 to 7.34×10-3) (Table 1). The Bayesian skyline plot shows that the E25 community size was stable and steady until 2004, with small increasing and decreasing fluctuations from 2004 to 2008; Then, there was a small increase after first starting to decline from 2012 to 2016; and a steady state after 2016(Figure 2B). The Genotype F showed a small fluctuation between 2008 and 2014 (Figure 2C). However, the sequences of other genotypes are not suitable for analysis by BEAST alone because of the large errors caused by the small number of sequences.
Table 1 Information on 9 genotypes of E25 relying on full-length VP1 sequences
Six important global geographic transmission paths of E25
Based on the above 110 full-length VP1 sequences of E25, a sequence database that includes six countries (the United States, China, Australia, France, Germany, and India) was established to analyze the global spatiotemporal dynamics of E25. Based on BF≥10 and PP≥0.5, six significant transmission pathways were identified: China to Germany (BF=23.88, PP=0.85); China to France (BF=37.23, PP=0.90); China to Australia (BF=21.56, PP=0.83); India to the United States (BF=12.24, PP=0.74); India to Australia (BF=400.25, PP=0.99); and France to the United States (BF=12.17, PP=0.74) (Figure 3A, Supplementary Table S3). The above pathways show that E25 primarily spreads from Asia to the rest of the world. In addition, the results of the Markov reward showed that China dominates the output of E25 worldwide with a Markov reward value of 11.32, which is much higher than other countries (Figure 3B, Supplementary Table S4). However, the results are somewhat biased due to the limited the number of available world series for each country.
Analysis of recombination patterns of E25
For a better understanding of the recombinant pattern of E25, a total of 37 sequences containing 7 genotypes (A, D-I) were used to construct phylogenetic trees based on the P1, P2, and P3 regions with the prototypes of other serotypes in the EVB group respectively. Among them, 18 sequences were obtained from this study and 19 were downloaded from GenBank (Figure 4A, B, C). The results showed that all 37 strains of E25 clustered together in the P1 region and branched in the P2 and P3 regions. We defined 18 lineages to facilitate the analysis of their recombination patterns (Fig 4A, B, C, Supplementary Table S5), and Genotype F could be divided into nine lineages (lineage F1-F9), while Genotype E had one lineage (lineage E). Analysis using Simplot software revealed significant differences in nucleotide similarity between Genotype E and Genotype F, with the nine lineages of Genotype F showing large differences in the P2 and P3 regions (Figure 4D). This is also consistent with the results of the phylogenetic trees constructed by P2 and P3 and also shows that the recombination pattern differs between different lineages. According to the differences among these lineages, 17 recombination patterns were identified, of which Genotype F can be divided into 9 recombination patterns. For further validation, we randomly selected one strain from each of these 17 lineages as the reference sequence and used RDP4 software for recombination analysis, which showed that the breakpoint position information of each of the 17 reference sequences was different, and in addition, the serotypes of the prevalent strains that recombined with the seventeen reference strains also differed significantly (Figure 4E, Supplementary Table S6). This also confirms that the reorganization patterns of the 17 lineages classified are indeed different.
A positive selection site was detected in genotype F
Since the P1 region sequences of other genotypes were rare or absent (Supplementary Table S6), we selected Genotype F, which had a relatively large number of sequences, for analysis. By analyzing the selection pressure of global Genotype F globally, we found that the average ratio of nonsynonymous to synonymous amino acid substitutions (dN/dS) in the P1 region was 0.207. There was a positive selection site at amino acid position 274 in the VP1 region, which was identified by both the MEME and SCLA models. Moreover, the amino acids at this site also differed among the reference strains on the 9 different lineages of Genotype F (Figure 5). Since the VP1 structural protein contains some important antigenic sites, we speculated that changes in this site may increase the ability of the virus to invade the host. Therefore, the pathogenicity and transmission of the Genotype F virus in the population are improved, but further animal experiments are needed to verify this.