Phylogenetic and diversity analyses revealed that leek yellow stripe virus population consists of two types: S and N


 Leek yellow stripe virus (LYSV, Potyvirus), a pathogen affecting Allium spp. worldwide, has been suspected to consist of two types: S and N, based on genetic and host species differences. In this study, phylogenetic and evolutionary analyses were performed to P1 and CP regions of genome of global LYSV isolates to clarify the variation among members of S-type and N-type. Constructed phylogenetic trees clearly divided isolates into S-type and N-type, with N-type was further split into L and N groups, according to hosts. Significant nucleotide (nt) and amino acids (aa) sequence variation were observed on full ORF, P1, HC-Pro, P3, VPg coding regions. The dN/dS values of P1 and CP confirmed that both genes are under strong negative selection pressure. Neutrality tests on Eastern Asian isolates argued that ancestor of current LYSV isolates may had evolved with garlic while they were in Asia before spread to other world regions through garlic propagative materials. The genetic differentiation and gene flow analysis showed that there was very frequent gene flow from S-type to L and N groups and these phylogroups differentiated from each other over time. Host differences, substantial nt and aa variation, inconsistent serological test results, and phylogenetic and diversity analyses results highly suggested that LYSV can be separated into two types: S and N.

Forty-six isolates with complete and 40 isolates with partial nt sequences of P1 (for a total of 86 isolates) and 121 isolates with complete nt sequences of CP were obtained from NCBI GenBank (Online Resource 1, Table 1). Each of P1 and CP sequences was aligned together using ClustalW with default parameters in MEGA X v.10.2.4 (megasoftware.net). Recombination signal on each of P1 and CP sequences alignment was scanned by Recombination Detection Program (RDP v.4.56). Recombination event should be supported by at least ve of RDP, GENECONV, BootScan, MaxChi, ChiMaera, SiScan, and 3Seq algorithms with Bonferroni-corrected P value of < 0.05 to be considered credible [9]. Only results of recombination analysis on CP gene sequences were reported since analysis on the highly variable P1 gene sequences did not give reliable result.
Construction of phylogenetic trees based on each of P1 and CP nt sequences were carried out in MEGA X software using Maximum Likelihood statistical method, with uniform rates among sites and use all sites treatment of gaps/missing data [23]. Kimura-2 parameter was applied for each data set, with 1000 bootstrap replicates to test statistically signi cance of isolate clusters [24]. Estimations of nucleotide (nt) and amino acid (aa) similarity percentages among phylogroups in different genome regions and full ORF were done by Sequence Demarcation Tool (SDT) v1.2 software [25].
The mean evolutionary distance of LYSV of P1 and CP genes within and between phylogroups were calculated using Kimura-2 parameter [24] in MEGA X software, estimates of standard error (SE) obtained by a bootstrap procedure (1000 replicates).
Differences in each of P1 and CP gene aa sequence were examined using BioEdit v.7.2.5 [26]. Forty-six isolates which have complete sequence were aligned together for P1 gene analysis. In CP gene alignment, up to three isolates were selected as representative of their respective group or subgroup.
Therefore, recombinant-free isolates were analyzed in this paper to learn more about molecular diversities and evolution of LYSV while also presenting evidences that its population consist of two distinct types.
Recombinant signals in CP gene of six isolates were detected by at least ve statistical methods which supported by Bonferroni-corrected P value of < 0.05. Recombinant events in isolate no. JX429967 occurred in two different regions (Online Resource 1, Table 2). These results presented the rst detection of recombinant events on LYSV genome. Isolates with multiple parents were omitted from further analyses, to give more accurate phylogenetic analysis.
The both constructed phylogenetic trees clustered all isolates into three major groups (S, L, and N). Members of S group belong to S-type whereas members of L and N groups belong to N-type ( Figure 1). P1 gene based phylogenetic trees by [8] also clustered isolates into two major lineages (S-type and Ntype). S-type was further split into S and O groups, while N-type into N, U, Sp and L groups. However, this phylogrouping was not used widely since P1 is not commonly targeted in molecular detection of LYSV, and the high variations in P1 would potentially generate new groups every time new isolates were characterized. So, since P1 gene based phylogenetic tree of our study clustered isolates in similar manners, a new phylogrouping was proposed in which it simply clustered isolates into S-type and Ntypes. N-type was further split according to host differences into N (garlic isolates) and L (leek isolates) groups. Since this new P1 phylogrouping is also applicable in CP (each of tested isolates was in identical group in both P1 and CP trees), future LYSV isolates phylogrouping should be based on CP, which is more conserved and much more frequently sequenced in LYSV characterization works.
S-type of CP phylogenetic tree was further divided into 11 subgroups (S1-S11) based on nt changes and deletions in compared sequences, but these subgroups cannot be applied in S-type of P1 since members of different CP subgroups formed same branch in the P1 tree ( Figure 1). S1 isolates experienced deletion in position 121-123 of their nt sequences, resulted in a single residue deletion in position 41 of their aa sequences. S7 and S9 isolates in position 124-126 of their nt sequences, resulted in deletion in position 42 of their aa sequences. S11 isolates in position 100-108 of their nt sequences, resulted in three residue deletions in position 44-46 of their aa sequences. DAG motif that is important in potyviruses transmission by aphids was conserved in CP of all tested isolates except no. AJ409307, which changed into DSG (Online Resource 1, Figure 1). Major aa changes in P1 gene happened throughout its coding sequence. Some isolates experienced large nt deletions which resulted in high aa length variations among compared isolates (Online Resource 1, Figure 2). Identity analysis revealed very large variation in nt and aa sequences of P1 gene even among S-type isolates. In CP comparison, nt and aa identity percentages among S-type isolates were shown to be lower than that of among N-type isolates. S-type isolates also had high nt and aa differences to N-type isolates. In 115 complete CP sequences comparison, the lowest nt similarity percentage among S-type and N-type isolates (76.6%) was very close to, and the lowest aa similarity percentages (75.4%) was well below than that of proposed for Potyvirus species demarcation: 76% and 80% for nt and aa, respectively. The further comparisons on 46 complete ORF sequences determined that nt similarity percentages among S-type and N-type isolates in P1, HC-Pro, P3, VPg coding regions, and full ORF were also lower than limits for Potyvirus species demarcation [7] (Table 1).
Besides that, Immunocapture-Reverse Transcriptase-Polymerase Chain Reaction (IC-RT-PCR) and standard RT-PCR has been shown to be 10 2 -10 4 times more sensitive than Double-Antibody Sandwich-Enzyme-Linked Immunosorbent Assay (DAS-ELISA) in LYSV detection [32]. [19] reported that DAS-ELISA results for detection of isolates no. GQ475411-13 (N-type) were erratic and unreliable, while DAS-ELISA results for onion yellow dwarf virus (OYDV, Potyvirus) and garlic common latent virus (GarCLV, Carlavirus) in the same study were consistent and reliable. DAS-ELISA was also suggested to be less sensitive than RT-PCR in detection of isolates no. MN070130-31, MN864794-95 (N-type) [4]. These phenomena could be caused by the antibodies for those DAS-ELISAs were developed from CP of S-type isolates which have large sequence variations to N-type isolates.
The overall evolutionary distance value of P1 was 0.468±0.022; and the main within lineages values were 0.411±0.020 (S-type) and 0.148±0.010 (N-type). Also, the value between phylogroups was 0.604±0.032 (S-type/N-type). The overall mean evolutionary distance value of CP was 0.203±0.011; and the values within main lineages were 0.190±0.010 (S-type) and 0.114±0.008 (N-type). In addition, the evolutionary distance between the main phylogroups was 0.237±0.013 (S-type/N-type).
Within all isolates, the mean evolutionary distance value of P1 was signi cantly higher than that of CP, which may indicate that P1 gene region plays active roles in the formation of new types/variants through host selection and environmental adaptation in evolutionary processes. In addition, the evolutionary distance values obtained for both gene regions are particularly high within S-type lineage. The distance value of S-type from N-type was also high, suggesting large genetic variation between the two types. The high evolutionary distance values of S-type from N-type may be related to host plant selection or geographic adaptations, and these results may con rm the emergence of phylogroups over time.
In general, π values were higher among phylogroups and continental populations in CP gene region compared to the values obtained from P1 gene region. In P1, the highest π were 0.29896 (S-type) and 0.36447 (Africa) among phylogroups and continents populations, respectively. The lowest π were 0.02358 (L group) and 0.24684 (America). However, the low π of L population may be due to this phylogroup having only two isolates. Hd values which provide important information for predicting evolutionary processes, reached 1.000 for all phylogroups and continental populations. S, k, and h values were all obtained in descending order for S-type, N-type, N group, and L group. Meanwhile, the values obtained for those parameters, except k, among different continents were East Asia, America, Africa, Europe, and Oceania populations in descending order (Online Resource 1, Table 3). These results may indicate that there are high genetic diversities among global LYSV populations.
In analysis for CP gene, π ranged from 0.05075 (N group) to 0.17521 (Oceania). Although the number of isolates in N group was large, its π was very low. The highest π for phylogroup and continental populations were 0.16286 (S-type) and 0.17521 (Oceania), respectively. For Hd metric, high values varying between 0.0972 and 1.000 were obtained for all phylogroups and continental populations (Online Resource 1, Table 3).
The obtained ω values were all < 1, showing that P1 and CP genes were both under negative (purifying) selection pressures. The dN/dS ratios between phylogenetic groups and continental populations ranged from 0.28332 to 0.49867 for P1 gene, and between 0.09605 and 0.16526 for CP gene (Online Resource 1, Table 3). The evolutionary rates of P1 and CP genes were estimated to be 0.42 and 0.15, respectively, which were similar to that of obtained by [5]: 0.49 and 0.19, respectively. The results con rmed that both genes are under strong negative (purifying) selection pressure, with CP gene is under stronger pressure compared to P1 gene.  Table 4). These positive with statically signi cant results indicated that they emerged from an excess of intermediate frequency alleles, and can evolve out of population balancing selection, bottlenecks, or structures.

S-type which consists largely of isolates from East Asian countries, and if it represents strained isolates, may have been endemic in East Asia and infected different hosts at different times in their life cycle.
Although S-type isolates showed host selection, so far, onion was not found to be a host for them, as they were detected mostly in garlic and some (no. DQ925452-53) in leeks (Online Resource 1, Table 1). It is assumed that the genetic center of garlic was in Central Asia, and then spread to two major destinations: China and India, and Europe and Russia [33]. In our study, neutrality tests performed on Eastern Asian populations (China, Japan, South Korea, Taiwan, and Vietnam) gave positive and statistically signi cant results. Ancestor of current LYSV isolates may have evolved with garlic while they were in Asia, and then spread to different continents of the world through garlic propagation materials, as also observed in a study by [8]. The extensive adaptation of S-type isolates to garlic as a host was also con rmed by phylogenetic analysis of our study.
Gene ow and genetic differentiation analysis of LYSV populations revealed that all phylogroups (S-type, L group, N group, and N-type) in both P1 and CP genes had statistically signi cant K S *, K ST * and Z* values. K ST * results indicated statistically signi cant genetic differences between lineages for both P1 and CP genes, and all of these values were greater than 0. In addition, K ST * values for both genes differed in continental populations. Statistically signi cant values were obtained between some continents, while some others were not signi cant, and some values were less than 0 (Online Resource 1, Table 5).
Among LYSV populations based on P1 and CP genes, signi cant Z* values were calculated between all main lineages (S-type, L group, N group, N-type) populations. Furthermore, the Z* values calculated for CP gene were signi cantly higher than P1 gene. In particular, Z* value of S-type was statistically signi cant and higher than other lineages, for both gene regions. Statistically signi cant results were obtained from Snn values for all phylogroups in the two genes, and these values mostly were 1 or close to 1. F ST ( xation index) values between main lineages for both gene regions were greater than 0.25 (Online Resource 1, Table 5).
The genetic differentiation and gene ow analysis performed between main nodes (S-type, L group, and N group) results may reveal that phylogroup populations for P1 and CP genes are genetically different from each other and also show that the reliability rates of dendograms created for both genes were high. They also showed that there was very frequent gene ow from S-type to L and N groups (N-type), and these phylogroups differentiated from each other over time. Therefore, these analyses may support conclusion by [8] that the sequences missing in P1 of S-type were present in the common ancestor of S-type and Ntype.
Host differences, substantial nt and aa variation, inconsistent serological test results, and phylogenetic and diversity analyses results strongly indicated that LYSV should be differentiated into two types: S and N. Recognizing these two types, and other new variants that might emerge among LYSV isolates, could lead to development of more accurate diagnostic tools which in turn could help to control the virus epidemiologically.