Among viruses affecting Allium spp. cultivation, leek yellow stripe virus (LYSV, Potyvirus) is of particular interest since it has been identified worldwide as an important virus on leek (Allium ampeloprasum L.), garlic (Allium sativum L.), and onion (Allium cepa L.) [1-4]. The virus genome sequence is ca. 10,131 nucleotide (nt) long which, typical to Potyvirus, contain a single open reading frame (ORF) of ca. 9,456 nt encoding a polyprotein of ca. 3,152 amino acid (aa) . P1 protein as the most variable among potyvirus-encoding proteins, is important in evolutionary study of Potyvirus species . CP is regarded as the most conserved region in Potyvirus genome and commonly used in phylogenetic studies of its species .
A small evolutionary study involving 12 fully sequenced isolates found that selection pressures on P1, CP and P3 were higher than other coding regions of LYSV genome . Still, very little is known about the virus genetic diversity. Therefore, older and latest addition of nt sequences from diverse regions of world and different Allium spp. into NCBI GenBank were analyzed in this paper to get the first detailed insight into global genomic diversity, evolution, and population structure of LYSV. The analyses on P1 and CP genes largely followed  study which clustered isolates into two major lineages (S-type and N-type). Similarity percentages on other genomic regions were also investigated here to determine wheather S-type and N-type isolates could also be differentiated based on genetic and host species variation.
Nucleotide sequence of P1 gene is 1182 nt long (encoding a protein of 362 aa), covering position 91-1086, while CP gene is 864 nt long (encoding a protein of 288 aa), covering position 8683-9546 in LYSV genome; reference isolate: accession no. AJ307057. However, large variation in nt and aa lengths exist in some tested isolates.
Forty-six isolates with complete and 40 isolates with partial nt sequences of P1 (for a total of 86 isolates) and 121 isolates with complete nt sequences of CP were obtained from NCBI GenBank (Online Resource 1, Table 1). Each of P1 and CP sequences was aligned together using ClustalW with default parameters in MEGA X v.10.2.4 (megasoftware.net). Recombination signal on each of P1 and CP sequences alignment was scanned by Recombination Detection Program (RDP v.4.56). Recombination event should be supported by at least five of RDP, GENECONV, BootScan, MaxChi, ChiMaera, SiScan, and 3Seq algorithms with Bonferroni-corrected P value of < 0.05 to be considered credible . Only results of recombination analysis on CP gene sequences were reported since analysis on the highly variable P1 gene sequences did not give reliable result.
Construction of phylogenetic trees based on each of P1 and CP nt sequences were carried out in MEGA X software using Maximum Likelihood statistical method, with uniform rates among sites and use all sites treatment of gaps/missing data . Kimura-2 parameter was applied for each data set, with 1000 bootstrap replicates to test statistically significance of isolate clusters . Estimations of nucleotide (nt) and amino acid (aa) similarity percentages among phylogroups in different genome regions and full ORF were done by Sequence Demarcation Tool (SDT) v1.2 software .
The mean evolutionary distance of LYSV of P1 and CP genes within and between phylogroups were calculated using Kimura-2 parameter  in MEGA X software, estimates of standard error (SE) obtained by a bootstrap procedure (1000 replicates).
Differences in each of P1 and CP gene aa sequence were examined using BioEdit v.7.2.5 . Forty-six isolates which have complete sequence were aligned together for P1 gene analysis. In CP gene alignment, up to three isolates were selected as representative of their respective group or subgroup.
Genetic diversity and polymorphism analysis were done using DnaSP v.6.12.03 . A gene is under positive (diversifying), neutral, and negative (purifying) selection when dN/dS (ω ratio) is > 1, = 1 and < 1, respectively . The three ‘Neutrality tests’ (Fu and Li’s F*, Fu and Li’s D*, and Tajima’s D) were carried out by DnaSP v.6.12.03 (default window length of 100 sites and step size of 25 sites; without out-group) . KS*, KST*, Z*, Snn and FST values [28-29] which determined genetic differentiation among each of P1 and CP populations were calculated by DnaSP v.6.12.03 . KST* will be near zero if there is no genetic differentiation (null hypothesis) . The smaller Z* means smaller genetic differentiation among population . The value of Snn describes a range of exact same population (value of 0.5) (null hypothesis) to distinctly differentiated population (value of 1) . The null hypothesis in KS*, KST*, Z*, and Snn is rejected by a significant P value [28-30]. FST ranges between exact same population (value of 0) to fully distinct populations (value of 1) [28, 30]. FST > 0.25, in most cases, is indicating a large gene flow and big genetic differentiation in the tested populations .
Studies on LYSV genetic variation mainly were limited on construction of phylogenetic trees [4, 10, 17, 19]. Moreover, hosts difference and large genetic variances were observed among its phylogroups [2, 8]. Therefore, recombinant-free isolates were analyzed in this paper to learn more about molecular diversities and evolution of LYSV while also presenting evidences that its population consist of two distinct types.
Recombinant signals in CP gene of six isolates were detected by at least five statistical methods which supported by Bonferroni-corrected P value of < 0.05. Recombinant events in isolate no. JX429967 occurred in two different regions (Online Resource 1, Table 2). These results presented the first detection of recombinant events on LYSV genome. Isolates with multiple parents were omitted from further analyses, to give more accurate phylogenetic analysis.
The both constructed phylogenetic trees clustered all isolates into three major groups (S, L, and N). Members of S group belong to S-type whereas members of L and N groups belong to N-type (Figure 1). P1 gene based phylogenetic trees by  also clustered isolates into two major lineages (S-type and N-type). S-type was further split into S and O groups, while N-type into N, U, Sp and L groups. However, this phylogrouping was not used widely since P1 is not commonly targeted in molecular detection of LYSV, and the high variations in P1 would potentially generate new groups every time new isolates were characterized. So, since P1 gene based phylogenetic tree of our study clustered isolates in similar manners, a new phylogrouping was proposed in which it simply clustered isolates into S-type and N-types. N-type was further split according to host differences into N (garlic isolates) and L (leek isolates) groups. Since this new P1 phylogrouping is also applicable in CP (each of tested isolates was in identical group in both P1 and CP trees), future LYSV isolates phylogrouping should be based on CP, which is more conserved and much more frequently sequenced in LYSV characterization works.
S-type of CP phylogenetic tree was further divided into 11 subgroups (S1-S11) based on nt changes and deletions in compared sequences, but these subgroups cannot be applied in S-type of P1 since members of different CP subgroups formed same branch in the P1 tree (Figure 1). S1 isolates experienced deletion in position 121-123 of their nt sequences, resulted in a single residue deletion in position 41 of their aa sequences. S7 and S9 isolates in position 124-126 of their nt sequences, resulted in deletion in position 42 of their aa sequences. S11 isolates in position 100-108 of their nt sequences, resulted in three residue deletions in position 44-46 of their aa sequences. DAG motif that is important in potyviruses transmission by aphids was conserved in CP of all tested isolates except no. AJ409307, which changed into DSG (Online Resource 1, Figure 1). Major aa changes in P1 gene happened throughout its coding sequence. Some isolates experienced large nt deletions which resulted in high aa length variations among compared isolates (Online Resource 1, Figure 2).
Identity analysis revealed very large variation in nt and aa sequences of P1 gene even among S-type isolates. In CP comparison, nt and aa identity percentages among S-type isolates were shown to be lower than that of among N-type isolates. S-type isolates also had high nt and aa differences to N-type isolates. In 115 complete CP sequences comparison, the lowest nt similarity percentage among S-type and N-type isolates (76.6%) was very close to, and the lowest aa similarity percentages (75.4%) was well below than that of proposed for Potyvirus species demarcation: 76% and 80% for nt and aa, respectively. The further comparisons on 46 complete ORF sequences determined that nt similarity percentages among S-type and N-type isolates in P1, HC-Pro, P3, VPg coding regions, and full ORF were also lower than limits for Potyvirus species demarcation  (Table 1).
Besides that, Immunocapture-Reverse Transcriptase‐Polymerase Chain Reaction (IC‐RT-PCR) and standard RT‐PCR has been shown to be 102-104 times more sensitive than Double‐Antibody Sandwich‐Enzyme‐Linked Immunosorbent Assay (DAS-ELISA) in LYSV detection .  reported that DAS-ELISA results for detection of isolates no. GQ475411-13 (N-type) were erratic and unreliable, while DAS-ELISA results for onion yellow dwarf virus (OYDV, Potyvirus) and garlic common latent virus (GarCLV, Carlavirus) in the same study were consistent and reliable. DAS-ELISA was also suggested to be less sensitive than RT-PCR in detection of isolates no. MN070130-31, MN864794-95 (N-type) . These phenomena could be caused by the antibodies for those DAS-ELISAs were developed from CP of S-type isolates which have large sequence variations to N-type isolates.
The overall evolutionary distance value of P1 was 0.468±0.022; and the main within lineages values were 0.411±0.020 (S-type) and 0.148±0.010 (N-type). Also, the value between phylogroups was 0.604±0.032 (S-type/N-type). The overall mean evolutionary distance value of CP was 0.203±0.011; and the values within main lineages were 0.190±0.010 (S-type) and 0.114±0.008 (N-type). In addition, the evolutionary distance between the main phylogroups was 0.237±0.013 (S-type/N-type).
Within all isolates, the mean evolutionary distance value of P1 was significantly higher than that of CP, which may indicate that P1 gene region plays active roles in the formation of new types/variants through host selection and environmental adaptation in evolutionary processes. In addition, the evolutionary distance values obtained for both gene regions are particularly high within S-type lineage. The distance value of S-type from N-type was also high, suggesting large genetic variation between the two types. The high evolutionary distance values of S-type from N-type may be related to host plant selection or geographic adaptations, and these results may confirm the emergence of phylogroups over time.
In general, π values were higher among phylogroups and continental populations in CP gene region compared to the values obtained from P1 gene region. In P1, the highest π were 0.29896 (S-type) and 0.36447 (Africa) among phylogroups and continents populations, respectively. The lowest π were 0.02358 (L group) and 0.24684 (America). However, the low π of L population may be due to this phylogroup having only two isolates. Hd values which provide important information for predicting evolutionary processes, reached 1.000 for all phylogroups and continental populations. S, k, and h values were all obtained in descending order for S-type, N-type, N group, and L group. Meanwhile, the values obtained for those parameters, except k, among different continents were East Asia, America, Africa, Europe, and Oceania populations in descending order (Online Resource 1, Table 3). These results may indicate that there are high genetic diversities among global LYSV populations.
In analysis for CP gene, π ranged from 0.05075 (N group) to 0.17521 (Oceania). Although the number of isolates in N group was large, its π was very low. The highest π for phylogroup and continental populations were 0.16286 (S-type) and 0.17521 (Oceania), respectively. For Hd metric, high values varying between 0.0972 and 1.000 were obtained for all phylogroups and continental populations (Online Resource 1, Table 3).
The obtained ω values were all < 1, showing that P1 and CP genes were both under negative (purifying) selection pressures. The dN/dS ratios between phylogenetic groups and continental populations ranged from 0.28332 to 0.49867 for P1 gene, and between 0.09605 and 0.16526 for CP gene (Online Resource 1, Table 3). The evolutionary rates of P1 and CP genes were estimated to be 0.42 and 0.15, respectively, which were similar to that of obtained by : 0.49 and 0.19, respectively. The results confirmed that both genes are under strong negative (purifying) selection pressure, with CP gene is under stronger pressure compared to P1 gene.
Fu and Li’s F* & D* assigned statistically non-significant negative or positive values in P1 and CP of all populations. Thus, the results demonstrated that global LYSV population is undergoing balancing selection or bottlenecks. Positive values with statistical significance were only predicted by Tajima's D test for overall populations, S-type, and East Asia populations in both P1 and CP (Online Resource 1, Table 4). These positive with statically significant results indicated that they emerged from an excess of intermediate frequency alleles, and can evolve out of population balancing selection, bottlenecks, or structures.
S-type which consists largely of isolates from East Asian countries, and if it represents strained isolates, may have been endemic in East Asia and infected different hosts at different times in their life cycle. Although S-type isolates showed host selection, so far, onion was not found to be a host for them, as they were detected mostly in garlic and some (no. DQ925452-53) in leeks (Online Resource 1, Table 1). It is assumed that the genetic center of garlic was in Central Asia, and then spread to two major destinations: China and India, and Europe and Russia . In our study, neutrality tests performed on Eastern Asian populations (China, Japan, South Korea, Taiwan, and Vietnam) gave positive and statistically significant results. Ancestor of current LYSV isolates may have evolved with garlic while they were in Asia, and then spread to different continents of the world through garlic propagation materials, as also observed in a study by . The extensive adaptation of S-type isolates to garlic as a host was also confirmed by phylogenetic analysis of our study.
Gene flow and genetic differentiation analysis of LYSV populations revealed that all phylogroups (S-type, L group, N group, and N-type) in both P1 and CP genes had statistically significant KS*, KST* and Z* values. KST* results indicated statistically significant genetic differences between lineages for both P1 and CP genes, and all of these values were greater than 0. In addition, KST* values for both genes differed in continental populations. Statistically significant values were obtained between some continents, while some others were not significant, and some values were less than 0 (Online Resource 1, Table 5).
Among LYSV populations based on P1 and CP genes, significant Z* values were calculated between all main lineages (S-type, L group, N group, N-type) populations. Furthermore, the Z* values calculated for CP gene were significantly higher than P1 gene. In particular, Z* value of S-type was statistically significant and higher than other lineages, for both gene regions. Statistically significant results were obtained from Snn values for all phylogroups in the two genes, and these values mostly were 1 or close to 1. FST (fixation index) values between main lineages for both gene regions were greater than 0.25 (Online Resource 1, Table 5).
The genetic differentiation and gene flow analysis performed between main nodes (S-type, L group, and N group) results may reveal that phylogroup populations for P1 and CP genes are genetically different from each other and also show that the reliability rates of dendograms created for both genes were high. They also showed that there was very frequent gene flow from S-type to L and N groups (N-type), and these phylogroups differentiated from each other over time. Therefore, these analyses may support conclusion by  that the sequences missing in P1 of S-type were present in the common ancestor of S-type and N-type.
Host differences, substantial nt and aa variation, inconsistent serological test results, and phylogenetic and diversity analyses results strongly indicated that LYSV should be differentiated into two types: S and N. Recognizing these two types, and other new variants that might emerge among LYSV isolates, could lead to development of more accurate diagnostic tools which in turn could help to control the virus epidemiologically.