A Proposal for the Classication and Nomenclature of Hand, Foot and Mouth Disease-Related Enteroviruses

Background: Enterovirus has diverged into many types, some of which cause hand, foot and mouth disease (HFMD) in children. The predominant enterovirus types associated with HFMD are EVA71, CVA16, CVA6 and CVA10. Subtyping of these enteroviruses is crucial to HFMD surveillance. Because of lacking proper and uniform criteria and being based on partial VP1 sequences, however, current classication resulted in some confusing and conicting results. Method: We reclassied EVA71, CVA16, CVA6 and CVA10 using a combined criteria of phylogenic relationship and genetic distance. Results: Using the combined criteria, we classied EVA71 into seven genotypes of A–G, CVA16 and CVA6 into three subtypes of A-C, and CVA10 into nine subtypes/sub-subtypes of A-G, H1 and H2, and identied eight unclassied subtypes that lack genomic sequences. The mean genetic divergence was 15.5-33.8% between subtypes, 12-15% between sub-subtypes, and less than 12% within subtypes/sub-subtypes. In addition, we identied two new EVA71 inter-subtype recombinants RF01_CG and RF02_CG and demonstrated that EVA71 subtypes D and F and CVA10 subtype B experienced inter-subtype recombination events during early evolution. Conclusions: The new nomenclature proposal provides a reasonable framework proper classication

To detect potential recombination occurring in enteroviruses, bootscanning and similarity plot analyses were performed using SimPlot v.3.5.1 (21). The Kimura two-parameter substitution model with a transition/transversion ratio of 2 was selected in the analysis.

Results
In principle, a combination of phylogeny-and distance-based criteria in the analysis of genomic sequences is required for genotyping a virus (19,20,22). To de ne a subtype of enterovirus, we proposed a distance criterion of over 15% between subtypes at the genomic level and/or at least at the nearcomplete VP1 gene level (when genomic sequences are not available) together with a well-supported clade (over 75% bootstrap value support). If two or more well-supported clades are formed within the same subtypes, they can be further divided into sub-subtypes based on a cut-off of over 12% genetic distance. To reclassify the subtypes of four dominant HFMD-related enteroviruses EVA71, CAV16, CAV6 and CVA10, we preformed phylogenetic analyses of both full-length genomic sequences and nearcomplete VP1 gene sequences.

Phylogenetic classi cation of EVA71
All 922 near full-length genomic sequences of EVA71 available in GenBank were downloaded (on November, 2019). Of them, 913 with a length of more than 7000 nt were subject to sequence alignment. After removing highly homologous sequences with more than 97% sequence identity, 131 representative genomic sequences were used to construct maximum likelihood (ML) tree. The ML tree of the genomic sequences showed that the vast majority of EVA71 strains were clustered within three well-supported large clades (with 100% bootstrap value), and the others formed small clusters or independent phylogenetic branches (Fig. 1A). Together with the genetic distance data (Table 1), EVA71 was classi ed into seven subtypes, and designated as genotypes A to G (Fig. 1A), in keeping with previous nomenclature order of EVA71 (Fig. 1A). The three large clades correspond to subtypes B, C and G. Among the seven subtypes, subtypes E had only one available full-length genomic sequence. Interestingly, one strain (DL71/CHN/2012) that was clustered between subtypes C and B, was not de ned as an independent subtype because it had about 13.4% genetic distances with subtypes C, and was highly suspected to be an inter-subtype recombinant. Whether complete VP1 sequences generated consistent phylogeny of EVA71 with the full-length genomic sequences and met the distance criterion of subtyping is critical to determine the accuracy of VP1 sequence-based classi cation, albeit it was previously widely used. To assess the performance of VP1 sequence-based classi cation, complete VP1 sequences from 131 full-length genomic sequences were subjected to further phylogenetic analysis, together with 31 additional near-complete VP1 sequences retrieved from GenBank. According to phylogeny-based and genetic distance-based criteria, ve EVA71 subtypes A-D and G could also be classi ed based on complete VP1 sequences ( Fig. 1B and Table 1). Apart from ve identi ed subtypes, three well-supported small clusters, as well as a single branch at the root of the tree were observed in the VP1 tree (Fig. 1B), and met the genetic distance criterion of being new subtypes ( Table 1). Because of the lack of available full-length genomic sequences, they were de ned as unclassi ed subtypes 1-4. Their nomenclatures need to be determined in the future.
It is worth noting that the strains of subtypes D and F did not form independent subtype clades in the VP1 tree, but were clustered within the clade of subtype G (Fig. 1B). The inconsistency of phylogeny between full-length genomic and near-complete VP1 sequences suggests that recombination event might have occurred during the evolution of EVA71 subtypes D and F. Bootscan analyses con rmed that subtype D was involved in recombination between subtypes G and C, and subtype F was involved in recombination among subtypes C, D and G (supplementary Fig. S1A and B). In particular, the proportion of permuted trees appeared to be relative low (less than 50%) in the majority of the genomic regions of both subtypes D and F, implying that the recombination events occurred in the distant past. The suspected recombinant DL71/CHN/2012 was found to belong to subtype G in the VP1 region. Bootscan analyses illuminated that DL71/CHN/2012 originated recombination between subtypes G and C, and had a mosaic genome structure of G-C-G-C-G-C-G ( Fig. 2A). The recombination pattern was further con rmed by separate phylogenetic analyses ( Supplementary Fig. S2). Furthermore, a subtype G strain VR1432/China/2009 was found to be clustered between subtypes C and G in the VP1 tree, suggesting the presence of recombination between subtypes C and G at least in VP1 region (Fig. 1). Bootscan and separate phylogenetic analyses con rmed that VR1432/China/2009 was a recombinant between subtypes C and G with a mosaic genome structure of C-G-C-G-C-G ( Fig. 2B and Supplementary Fig. S3). The recombinant strains DL71/CHN/2012 and VR1432/China/2009 were de ned as EVA71 recombinant form RF01_CG and RF02_CG, respectively.
On the basis of full-length genomic and/or near-complete VP1 sequences, EVA71 was classi ed into seven subtypes from A to G, as well as two recombinant forms RF01_CG and RF02_CG. Furthermore, four unclassi ed subtypes 1-4 were identi ed based on near-complete VP1 sequences. The mean intersubtype genetic distances ranged from 18-24.2% at the genome level and 13.5-33.2% at VP1 gene level, and the mean within-subtype distance ranged from 6-11% at the genome level and 1-11% at VP1 gene level (Table 1).

Phylogenetic Classi cation Of Cva16
All 142 available near full-length genomic sequences of CVA16 were retrieved from GenBank (on November, 2019). Of them, 138 had a genomic sequence of more than 7000 nt, and were used in sequence alignment. After the removal of 36 identical sequences, 102 representative genomic sequences were used in the phylogenetic analysis. Based on the phylogenetic relationship and genetic distance criterion ( Fig. 3A and Table 2), CAV16 was classi ed into A, B, and C subtypes. Although subtype C was de ned, it had only one available genomic sequence each, and was called as single genomic sequence subtype. To nd more support for the classi cation of single genomic sequence subtype, and to identify more potential subtypes, a phylogenetic analysis using 102 VP1 sequences of the representative CVA16 strains and 19 additional VP1 sequences was performed. The additional VP1 sequences were selected if they closely clustered with any single genomic sequence subtype to form a well-supported clade, or they were unable to cluster with other cluster within any de ned subtypes (A-C) in a preliminary phylogenetic analysis with all available near-complete VP1 sequences (data not shown). In the VP1 tree, six additional VP1 sequences were found to form a well-supported clade together with the representative strain of subtype C with a 100% bootstrap value (Fig. 3B). The subtype C clade completely met the genetic distance criterion of subtyping (Table 2), supporting the classi cation of subtype C. Furthermore, another ve additional sequences formed an independent clade with 87% bootstrap value between the clades of subtypes B and C (Fig. 3B). The genetic distance analysis showed that the new clade had a mean distance of 12.2% to the subtype B strains ( Table 2). Because of the lack of genomic sequences, the new clade was marked as unclassi ed (Fig. 3). As a result, three subtypes A, B, and C were identi ed for CVA16. In addition, a potential subtype was identi ed based on the near-complete VP1 analysis. The subtype B had mean genetic distances of 29.4% and 16.3% to subtypes A and C, and the unclassi ed subtype had mean genetic distances of 29.8%, 0.122 and 16.4% to subtypes A, B and C, respectively ( Table 2). According to the classi cation of CVA16, the vast majority of the circulating strains belonged to subtype B.

Phylogenetic Classi cation Of Cva6
A total of 230 near full-length genomic sequences of CVA6 were available in GenBank (on November, 2019), and 229 of them had a genomic sequence of more than 7000 nt. After the removal of 160 identical sequences, 69 full-length genomic sequences were used in the phylogenetic analysis. Simultaneously, the VP1 sequences of the 69 representative strains were also subjected to the phylogenetic analysis together with 17 additional VP1 sequences that might support the subtype with single genomic sequence or form potential new subtypes. Based on the phylogenetic relationship and genetic distance criterion (Fig. 4A  and Supplementary Table 3), CAV6 was classi ed into three subtypes (A, B and C) at genomic sequence level. Of note, two independent small clusters formed by 5 strains with more than 99% bootstrap value support, and an independent branch by a strain (NIV43883/India/2004) were observed in the VP1 tree (Fig. 4B). These new clusters and branch were dispatched to unclassi ed subtypes 1, 2 and 3 ( Fig. 4B and Table 3). Therefore, CVA6 was classi ed into three subtypes A-C. The mean inter-subtype genetic distances were all over 15% regardless of whether being analyzed at genomic or VP1 sequence levels (Table 3).

Phylogenetic Classi cation Of Cva10
A total of 127 near full-length genomic sequences of CAV10 were downloaded from GenBank (on November, 2019), and 124 that had a genomic sequence of more than 6800 nt were used. After the removal of 22 identical sequences, 102 representative genomic sequences were used in the phylogenetic analysis. Nine subtypes/sub-subtypes (A-G and H1 and H2) were identi ed for CVA10 based on the phylogenetic and genetic distance analyses of genomic sequences ( Fig. 5A and Table 4). There was only one representative genomic sequence available for subtypes A-B, and D-G. To con rm the classi cation of CVA10, phylogenetic analyses of VP1 sequences were performed using those from the representative strains and 19 additional VP1 sequences retrieved from GenBank. The classi cation of CVA10 was well supported by the VP1 analyses regardless of using either phylogeny or genetic distances ( Fig. 5B and Table 4). In particular, additional representative VP1 sequences were found to support the de nitions of single genomic sequence subtypes A, B, G and F. According to the classi cation of CVA10, the vast majority of the circulating strains belonged to subtypes C and H (H1 and H2). The data obtained from full-length and near-complete VP1 sequences are shown in lower left and top right quarters, respectively. NA: not available. It means that there are only one, no sequence, or more than two completely identical sequences.
Interestingly, we found an inconsistent location of the clade of subtype B between the ML trees of genomic sequence and VP1 sequence (Fig. 5). The subtype B clade was located at the root of the ML tree of genomic sequences, but it was located at middle, and closely clustered with the clade of subtype H (including H1 and H2) in the VP1 sequence tree. This result implies that recombination events between subtype B and other subtypes have occurred during the evolution of subtype B strains. To con rm this hypothesis, we performed bootscan and phylogenetic analyses. The bootscan analysis showed that several genomic segments from subtypes H and F split the backbone of subtype A into seven segments (supplementary Fig. S1C), which was further con rmed by separate phylogenetic analyses (data not shown). Because subtype B was genetically closely related to subtype A, the subtype A strain used in the bootscan analysis can re ect the early strain of subtype B. Therefore, the bootscan analysis indicated that several subtypes H, F and C-related genomic segmented were inserted into the genomic backbone of subtype B by recombination during the evolution of subtype B.

Comparison Of The New Nomenclature With The Old Ones
According to the new classi cation, EVA71, CVA16, CVA6 and CVA10 were divided into 7, 3, 3 and 9 subtypes/sub-subtypes in alphabetical order, as well as 4, 1, 3 and 0 unclassi ed potential subtypes, respectively ( Table 5). The major difference between the new and old nomenclatures of EVA71 was that previous C1-C3 and C5 were rede ned as subtypes F and G, and C2 to subtype D. The dominant subsubtype C4 was rede ned as subtype C. Previous subtypes D, F and G were de ned as unclassi ed subtypes because of the lack of genomic sequence. For CVA16, the changes were that previous C and B1 strains were merged into new sub-subtype B, and previous subtype D was rede ned as subtype C. Previous nomenclature of CVA10 and CVA6 based on partial VP1 sequences seemed to be relatively confusing, and contained some misclassi cations. The corresponding relationship of new and old nomenclatures of both enteroviruses are detailed in Table 5. For the ease of use of the new nomenclature system, GenBank accession numbers of reference strains and sequence alignment for each enterovirus are provided in Table 5 and supplementary le, respectively.
A large number of molecular epidemiological investigations have suggested that EVA71, CVA16, CVA6 and CVA10 were the predominant enteroviruses responsible for HFMD in China and other countries (8).
Compared with previous nomenclatures of the four enteroviruses, the current version are simple and clear.
One of the major features between previous and current nomenclatures was that multiple subtypes in previous literatures were often merged into a single subtype according to the phylogeny-and distancebased criteria. For example, previous EVA71 subtypes C1-C3 and C5 were reassigned as subtype G, some strains of CVA6 subtypes A-B, E-G, and C-E were reassigned as subtypes A, C and unclassi ed 3, respectively, and CVA10 subtypes A/F, C-D, C/D/G, C/B/F and C/D were reassigned as A, B, C, F and H1, respectively. Another feature was that the same subtypes in previous literatures could be divided into multiple different subtypes. For example, previous EVA71 subtype C strains were re-divided into subtypes C, D, F and G, CVA6 subtype A strains were re-divided into subtypes A, and unclassi ed 1 and 2, and CVA10 subtype C strains were re-divided into subtypes B, C, F, G and H1. According the new nomenclatures of the four enteroviruses, the predominant circulating strains of the four enteroviruses were EVA71 subtype C, CVA16 subtype B, CVA6 subtype C and CVA10 subtypes C and H.
Three near full-length genomic sequences from epidemiologically unlinked individuals are preferred to de ne a new subtype of a virus (22). Because of the di culty of obtaining full-length genomic sequences, near-complete, especially partial VP1 sequences were often used to de ne the subtypes of enteroviruses identi ed two EVA71 recombinants (RF01_CG and RF02_CG), which originated from recombination between subtypes C and G.
On the other hand, inconsistent phylogenetic topology to full-length genomic sequences was often observed when too short VP1 sequences (less than 450 nt) were used. The vast majority of enterovirus VP1 sequences available in GenBank had length of less than 450 nt, and these short VP1 sequences were often used to de ne or determine subtypes (9,25). Because the short VP1 sequences carried too few informative sites to de ne or distinguish a subtype, they often resulted in misclassi cation of enteroviruses, which explained why there were much inconsistency between the current and previous nomenclatures of four dominant enteroviruses. On basis of our results, near-complete VP1 sequences with length of more than 870 nt are recommend to determine the subtypes of enteroviruses.
A cut-off of 25% genetic divergence in the complete VP1 region was previously suggested to distinguish or de ne novel types of enteroviruses (4,6). Subtypes and sub-subtypes are the distinct clades seen in the phylogenies of certain of enterovirus types. We found that the mean genetic distance in the complete VP1 region was 12.2-33.8% between subtypes (including unclassi ed subtypes), 12-15% between subsubtypes, and less than 12% within subtype/sub-subtypes. Inter-subtype genetic distance higher than 25% were commonly observed between subtype A and other subtypes in CVA16 and CVA10, and subtype C and other subtypes in CVA6. These indicate that enteroviruses are more divergent than previously thought, implying that there might have some new potential subtypes not yet discovered.
There are two limitations in our current study. First, some newly de ned subtypes had less than three representative full-length genomics sequences, which did not meet perfectly the rigorous nomenclature criterion to designate a new subtype/sub-subtype. In particular, 13 subtypes of the four dominant enteroviruses had only one available genomic sequence, albeit some of them were further supported by additional complete VP1 sequences. Therefore, these subtypes should be further con rmed in future as more genomic sequences have become available. Furthermore, there is an increasing need of obtaining genomic sequences to con rm these unclassi ed subtypes that lack supportive genomic sequences.
Besides the four HFMD-related enteroviruses, as well as CVB3 that was recently classi ed by us (20), all other enteroviruses need to be classi ed under the same criteria in the future.

Conclusion
We provided a new framework for the classi cation of enteroviruses on the basis of phylogeny-and distance-based criteria. Four dominant HFMD-related EVA71, CVA16, CVA6 and CVA10 were re-classi ed into 7(A-G), 3(A-C), 3(A-C), and 9(A-G, H1 and H2) subtypes/sub-subtypes, respectively. Furthermore, 2 recombinant forms RF01_CG and RF02_CG) of EVA71 and 8 unclassi ed subtypes of EVA71, CVA16 and CVA6 were identi ed. The clear and consistent genetic classi cation proposal for enteroviruses will be useful for epidemiological surveillance of HFMD, disease management, and vaccine development. The last number "1" indicates that there was only one unique genomic sequence.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. idpsupplementarymaterials.docx