Phylogenetic and diversity analyses revealed that leek yellow stripe virus population consists of three types: S, L, and N

Phylogenetic and evolutionary analyses were performed on the P1 and CP genes of global isolates to clarify the phylogrouping of leek yellow stripe virus (LYSV, genus Potyvirus), a pathogen affecting Allium spp. worldwide, into different types based on genetic variation and host species. The constructed phylogenetic trees divided the isolates into three major groups: S, L, and N. Low nucleotide (nt) and amino acid (aa) percent identities among the three groups were observed on full ORF (75.4–99.0 and 79.1–99.0), P1 (59.1–98.3 and 36.8–98.3), and CP (76.6–100 and 75.7–100) coding regions. The dN/dS values of P1 and CP confirmed that both genes are under strong negative (purifying) selection pressure. Neutrality tests on Eastern Asian isolates suggested that the ancestors of current LYSV isolates evolved with garlic while they were in Asia before spreading to other world regions through garlic propagative materials. Genetic differentiation and gene flow analysis showed extremely frequent gene flow from S group to L and N groups, and these phylogroups differentiated from each other over time. Host differences, inconsistent serological test results, substantial nt and aa variation, and phylogenetic and diversity analyses in this study supported previous reports that LYSV can be separated into three major evolutionary lineages: S, L, and N types.

Important information on population structures, new host adaptations, and evolutionary mechanisms of RNA viruses can be understood through molecular evolution studies [9, Edited by Seung-Kook Choi.
Adyatma Irawan Santosa and Filiz Randa-Zelyüt have contributed equally to this work and share first authorship. 1 3 10]. In Potyvirus, the coat protein (CP) gene is commonly examined in phylogenetic and population studies since it is the most conserved region in the genome [11,12]. In addition to the CP gene, the P1 gene which has the most variable nt sequences among Potyvirus genes is known to contribute key data for evolutionary analysis [13,14].
The phylogeny of LYSV is divided into the following six groups according to the diversity of P1 gene: N, U, Sp, L, O, and S [15]. However, a recent report simplified this phylogeny into two major types: S and N [16]. The L group is restricted to isolates from naturally infected leeks only; however, this group is combined with the N group (isolates from garlic) into the N type due to the lack of available data on L isolate sequence [16]. Unfortunately, these phylogroups based on the P1 gene are not widely applied because molecular characterization works on LYSV mostly target the CP gene. Allium viruses could provide excellent data on the evolution of viruses spreading mainly through propagative materials; however, only a few evolutionary studies on these species have been conducted [17]. A small study involving 12 fully sequenced isolates found that selection pressures are higher on P1, CP, and P3 than on other coding regions of the LYSV genome [8]; nevertheless, this report still could not provide a complete picture of the virus evolution. In the present research, the nt sequences of P1 and CP genes of leek and garlic isolates were collected from several provinces of Turkey and other isolates from diverse regions of the world and different Allium spp. already registered in NCBI GenBank were analyzed to obtain insights into the global genomic diversity, evolution, and population structure of LYSV. The possibilities to separate the L group into a distinct type and to use the CP gene as the base for future LYSV phylogrouping were also investigated.

RT-PCR and sequencing
Molecular examination against LYSV using RT-PCR was conducted on 10 leek and 3 garlic samples collected from Aydın, İzmir, Osmaniye, Bursa, Kastamonu, Çanakkale, and Düzce provinces of Turkey in 2007-2008 to increase the number of available nt sequences of L isolates for population analysis. The samples were positive for LYSV as tested by DAS-ELISA with standard manufacturer protocols (Bioreba, Switzerland) shortly after collection. They were then kept in desiccated leaf tissue over silica gel at 4 °C for long-term storage.
Total RNA was isolated from the samples using the CTAB method [18]. Complementary DNAs (cDNAs) were synthesized from the obtained total RNAs with RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, USA). F-5ʹATC TCA ACA CAA CTT ATG CAA3ʹ plus R-5ʹGCA CAC AAC GAC CGT CCA ATCT3ʹ [8], and F-5ʹTCA CTG CAT ATG CGC ACC AT3ʹ plus R-5ʹGCA CCA TAC AGT GAA TTG AG3ʹ [19] primer pairs were applied in the 1000 bp of partial P1 and 1020 bp of partial NIb + complete CP genes amplification, respectively, using 2X Emerald Master Mix (Takara, Japan) and MJ Mini Thermal Cycler (Biorad, USA).
Successfully amplified RT-PCR products were submitted to a commercial biotech company (BM Labosis, Ankara, Turkey) for bidirectional sequencing by the Sanger method. The obtained raw sequence data were assembled in CLC Main Workbench V.20 (https:// digit alins ights. qiagen. com/) and then deposited into NCBI GenBank.

LYSV isolate selection and recombination analysis of CP gene
The nt sequence of P1 gene is 1290 nt long (encoding a protein of 430 aa) and covers positions 83-1372 nt, and that of CP gene is 867 nt long (encoding a protein of 289 aa) and covers positions 8879-9745 in the LYSV genome (RefSeq: LYSV-MG [KP258216]). However, large variations exists in nt and aa lengths due to indel mutations in some isolates.
Fourteen and thirty-five isolates with complete and partial nt sequences of P1 (for a total of forty-nine isolates), respectively, and eighty-one isolates with complete nt sequences of CP were obtained from NCBI GenBank. The sequences were aligned to novel Turkish isolates length and trimmed accordingly using ClustalW with default parameters in MEGA X [20] to create P1 and CP alignments. Recombination signals on each P1 or CP sequence alignment were scanned by Recombination Detection Program (RDP v.4.56). A credible recombination event should be supported by at least five of RDP, GENECONV, BootScan, MaxChi, ChiMaera, SiScan, and 3Seq algorithms with Bonferroni-corrected P value of < 0.05 [21].

Analysis of phylogenetic and identity percentage of P1 and CP genes
Phylogenetic trees based on each P1 and CP nt sequence were constructed in MEGA X using the maximum likelihood statistical method. The best nt substitution model for both alignments was determined using the lowest Bayesian Information Criterion scores to be Tamura-Nei [22] and Gamma distributed with Invariant sites (TN93 + G + I). The statistical significance of isolate clusters was tested with 1000 bootstrap replicates. The furcation of major lineages was visualized using the neighbor-net method implemented in Splitstree4 freeware (version 4.17.1, built 28 Jun 2021) [23]. Percentages of nt and aa identity among phylogroups in P1, CP, and full ORF were estimated using Sequence Demarcation Tool v1.2 [24].

Amino acid variation in P1 and CP
Fourteen and thirty-nine isolates with complete P1 and CP sequences, respectively, were aligned with eleven novel Turkish isolates. Differences in the aa sequence of P1 and CP were examined using BioEdit v.7.2.5 [25].

Mean evolutionary distance and diversity
The mean evolutionary distance of the P1 and CP genes of LYSV within and between the phylogroups was calculated using the Tamura-Nei parameter [22] in MEGA X. Estimates of standard error (SE) were obtained by a bootstrap procedure of 1000 replicates.

Analysis of genetic diversity and polymorphism, neutral selection, and gene flow and genetic differentiation among populations
Genetic diversity and polymorphism were analyzed using DnaSP v.6.12.03 [26]. A gene is under positive (diversifying), neutral, or negative (purifying) selection when the dN/dS (ω ratio) is > 1, = 1, or < 1, respectively [26]. Three "neutrality tests" (Fu and Li's F*, Fu and Li's D*, and Tajima's D) were carried out in DnaSP v.6.12.03 (default window length of 100 sites and step size of 25 sites; without out-group) [26]. Independent K S *, K ST *, Z*, Snn, and F ST test values [27,28], which determine genetic differentiation in each P1 or CP population, were calculated by DnaSP v.6.12.03. K ST * is near zero when genetic differentiation does not exist (null hypothesis) [29]. A small Z* indicates minimal genetic differentiation among populations [27]. Snn describes the range from exact same population (value of 0.5) (null hypothesis) to distinctly differentiated population (value of 1) [28]. The null hypothesis in K S *, K ST *, Z*, and Snn is rejected by a significant P value [27][28][29]. F ST ranges from the exact same population (value of 0) to fully distinct populations (value of 1) [27,29]. F ST (fixation index) > 0.33 usually indicates large gene flow and genetic differentiation between tested populations [30].

Recombination analysis of the nt sequences of CP gene
Only the results of the recombination analysis of CP gene sequences were reported because the analysis of the highly variable P1 gene sequences did not generate reliable results. Recombinant signals in the CP gene of four isolates were detected by at least five statistical methods and supported by Bonferroni-corrected P value of < 0.05. Recombinant events in isolate AG1 (JX429967) occurred in two different regions. Among novel Turkish isolates, LYSV-18 (OL944630) was determined to be recombinant (Table 1). These isolates with multiple parents were omitted from population analysis to produce accurate results.

Analysis of phylogenetic and identity percentage of P1 and CP genes
The constructed phylogenetic trees suggested that evolutionary processes occurred simultaneously and in the same direction throughout the LYSV genome because the P1 and CP trees consistently clustered all the isolates into three major lineages (S, L, and N). In addition, each isolate that provided the nt sequences of both genes was clustered in the identical group in both trees. In P1, the S-type lineage was further divided into two subtypes (named SP1a and SP1b) based on nt changes and deletions in the compared sequences. In CP, the S-type was separated into 10 subtypes (named SCPa-SCPj) (Fig. 1). In line with the results of recombinant analysis, LYSV-18 was positioned in S lineage in P1 and in L lineage in CP trees. Congruence between the two genes was further confirmed by SplitsTree network analysis, which trifurcated P1 and CP split trees into S, L, and N-types (Fig. 2).
Comparative analysis revealed a large variation in nt and aa sequence identities of P1, not only among S, L, and N but also within the S-type. For CP, the nt and aa identity percentages among S-type isolates were lower than those among N-type isolates. S-type isolates also had higher nt and aa differences than N-type isolates. Among the 92 complete CP sequences, the lowest nt identity percentage found among S-type and N-type isolates (76.6%) was close to the limit for Potyvirus species demarcation (76%), and the lowest aa identity percentage (75.7%) was well below this limit (80%) ( Table 2). Further comparison on 16 complete ORF sequences determined that the lowest nt identity percentages for S-type versus N-type and L-type versus N-type isolates were also lower than the limits for Potyvirus species demarcation ( Table 2).

Amino acid variation in P1 and CP
The main aa changes in P1 among three LYSV types occurred throughout the coding sequence. SP1a isolates experienced large nt deletion at positions 320-523 (Ref-Seq: KP258216), resulting in the deletion of 68 aa residues (Supplementary material 2- Fig. 1). The nt and aa sequences of SP1b, SP1c, L, and S isolates in this position remained intact, but SP1b and SP1c had higher nt and aa identities with SP1a than L and S. In addition, L and N isolates experienced nt deletion at positions 128-142, resulting in the deletion of five aa residues. Meanwhile, the sequences of SP1a, SP1b, and SP1c in this region were complete. Therefore, SP1b and SP1c isolates could be an intermediate form in the evolution of the P1 gene of three LYSV types.
The aa changes in CP among three LYSV types mostly occurred on the 5ʹ end half of the coding sequence, and the remaining half toward the 3ʹ end was highly conserved. SCPa, SCPe, and SCPh isolates experienced deletion at positions 9005-9007 nt (RefSeq: KP258216), resulting in single aa residue deletion at position 43 of the CP sequence. Deletion was also observed at positions 8987-8995 of the nt sequences of SCPj isolates, resulting in three aa residue deletions at positions 37-39 (Supplementary material 2- Fig. 2). DAG motif, which is important in Potyvirus transmission by aphids, was conserved in the CP of all tested isolates except for no. AJ409307, in which DAG was changed to DSG (Supplementary material 2- Fig. 2).

Genetic diversity and evolutionary selection pressure on the P1 and CP genes of LYSV
In general, π values were lower among the phylogroups and continental populations in the CP gene region compared with those in the P1 gene region. In P1, the highest π among the phylogroups and continent populations was 0.2883 (S-type) and 0.3614 (Africa), respectively, and the lowest was 0.0905 (N type) and 0.2684 (America). H d values, which provide important information for predicting evolutionary processes,  (Table 3). These results indicated high genetic diversities among global LYSV populations, with Asia as the diversity center.
In the analysis of CP gene, π ranged from 0.0446 (N type) to 0.1706 (East Asia). The highest π for phylogroup and continental populations was 0.1671 (S-type) and 0.1706 (East Asia), respectively. Although the number of isolates in the N group was quite large, the S, η, k, and π values were extremely low, suggesting scarce variation among isolates. For H d , high values varying between 0.0998 and 1.000 were obtained for all phylogroups and continental populations ( Table 3).

3
The demographic parameter π value for ORF was 0.1833 for S-type and 0.0699 for N type. Furthermore, the dN/dS ratios between phylogenetic groups and continental populations ranged from 0.3019 to 0.4358 for the P1 gene and from 0.0884 to 0.1768 for the CP gene. In addition, the ω of ORF was 0.1237 (Table 3). All dN/dS results showed that the CP, P1, and ORF regions were under highly negative selection pressure.

Gene flow and genetic differentiation among populations
Analysis of the gene flow and genetic differentiation of LYSV populations revealed that all phylogroups (S, L, and N types) in P1, CP, and ORF comparisons had K S * and K ST * values that were statistically significant and greater than 0. The K S * and K ST * results indicated statistically significant genetic differences between lineages at P1, CP, and ORF  (Table 5). These findings confirmed the existence of phylogenetic lineages and that S, L, and N types differ from each other and have unique genetic characteristics. Among the LYSV lineage populations based on P1, CP, and ORF, significant Z* values were calculated between all main lineages (S, L, and N types). Furthermore, the Z* value of the S-type for both genes was statistically significant and higher than that of other lineages. F ST values between main lineages for all gene regions were greater than 0.33 (Table 5). These results indicated that the virus may be transmitted and spread by plant materials between geographic regions.

Discussion
Although early molecular studies on LYSV revealed that leek isolates form a distinct L group (from leek) [15,31], the genomic data of leek isolates surprisingly remained scarce because most of the newly sequenced isolates were from garlic, which belongs to either the S or N group [1,2,4,5,32]. Therefore, this population study involving recent additions of complete CP gene sequences of L isolates from onion [33] and leek [7] and P1 and CP gene sequences of L isolates could help elucidate LYSV evolution. The new Turkish isolates shared high nt and aa identities and were dispersed in the same phylogroups of previously characterized Turkish isolates in the P1 and CP trees [7,33] (Fig. 1). LYSV was identified for the first time in Aydın, İzmir, Osmaniye, Bursa, Kastamonu, and Düzce provinces of Turkey. The previous P1 gene-based phylogenetic tree divides isolates into two major lineages (S and N types). The S-type is further split into S and O groups, and the N type into N, U, Sp and L groups [15]. However, this phylogrouping is not used widely because P1 is rarely targeted in the molecular detection of LYSV. Furthermore, phylogenetic analysis based on P1 is generally not reliable because the high variations in the gene could potentially generate new phylogroups every time new isolates are characterized. Given that the P1-and CP-based phylogenetic trees generated in the current study clustered isolates in the same manner, a new phylogrouping was proposed: isolates are simply clustered into S, L, and N types, and future phylogrouping of LYSV isolates should be based on CP, which is more conserved and frequently sequenced in LYSV characterization than P1.
Serological methods have low sensitivity for LYSV detection have been reported. Immunocapture-RT-PCR and standard RT-PCR are 10 2 -10 4 times more sensitive than DAS-ELISA in LYSV detection [34]. One study described that DAS-ELISA results were erratic and unreliable for the detection of isolates no. GQ475411-13 (N type) but consistent and reliable for the detection of onion yellow dwarf virus (OYDV, genus Potyvirus) and garlic common latent virus (GarCLV, genus Carlavirus) [1]. DAS-ELISA was also found to be less sensitive than RT-PCR in the detection of isolates no. MN070130-31 and MN864794-95 (L-type) [33]. These phenomena could be caused by the antibodies used in DAS-ELISA that are produced from the CP of isolates of one of the types and failed to cross-react with  (Table 2). Therefore, specific antibodies for the detection of each LYSV type must be developed. Among all the isolates, the mean evolutionary distance of P1 was significantly higher than that of CP. This finding indicated that the P1 gene region plays active roles in the formation of new types/variants of LYSV through host selection and environmental adaptation. The evolutionary processes of the P1 gene are highly affected by recombination and gene duplication and help potyviruses adapt to different host species [13]. In addition, the evolutionary distance for both gene regions was particularly high within the S-type lineage. The distance of S-type to L and N-types was also high, suggesting the larger genetic variation of S-type than that of L and N-types. The high evolutionary distance of S-type to L and N-types may be related to host species selection instead of geographic adaptation. Current attempts to identify resistant Allium spp. varieties only involved field observations [35,36]. The findings of the present study showed that leek is highly resistant to S and N types and garlic is highly resistant to L type. This observation should be considered in future breeding programs using advanced DNA marker techniques.
The obtained ω values were all < 1, showing that the P1 and CP genes were both under negative (purifying) selection pressure, with the CP gene being under stronger pressure compared with the P1 gene. The evolutionary rates of P1 and CP genes were estimated to be 0.40 and 0.16, respectively, similar to those obtained in [8] at 0.49 and 0.19, respectively. Evolutionary analysis on soybean mosaic virus, another Potyvirus, also revealed that its CP gene (ω = 0.02) was more constrained for diversification than its P1 gene (ω = 0.2) [14].
In this study, neutrality tests performed on both genes assigned a positive number to Eastern Asian populations (China, Japan, South Korea, and Taiwan) and a negative number to other Asian, Oceanian, European, and African   Table 4). Analysis of genetic differentiation and gene flow between East Asia versus other regions consistently produced F ST values < 0.33 (Table 5). These results suggested that the ancestors of current LYSV isolates may have evolved with garlic while they were endemic in East Asia and then spread to different regions of the world through garlic propagation materials. This finding was in line with the conclusion in [15] and the assumption that the dissemination of garlic started from its genetic center in Central Asia and then spread to two major destinations: China/ India and Europe/Russia [37]. In particular, S-type might represent the strained isolates showing strict host selection; so far, onion is not a natural host for these viruses because they were mostly detected in garlic and sometimes in leek (no. DQ925452-53) (Supplementary material 1- Table 1). The extensive adaptation of S-type isolates to garlic as a natural host was also confirmed in the phylogenetic analysis of the current study. The large F ST values between main nodes (S, L, and N) strongly supported the existence of three LYSV types, the high genetic diversity among them, and the high reliability rates of dendrograms created for both genes. These values also indicated a frequent gene flow from S-type to L and N-types, and these phylogroups differentiate from each other over time. Therefore, neutrality analysis may also corroborate the conclusion in [15] stating that the sequences missing in the P1 of S-type are present in the common ancestor of S and N types.

Conclusion
Host differences, inconsistent serological test results, substantial nt and aa variation, and phylogenetic and diversity analyses supported earlier studies on the separation of LYSV into three distinct evolutionary lineages: S, L, and N types [15,31]. Recognition of these three types and other new variants that might emerge among LYSV isolates could lead to the development of accurate and specific diagnostic tools and the identification of resistant genes, which in turn will help epidemiologically control the virus.
Author contributions All authors contributed to the study conception and design. P1 and CP gene sequence alignments and phylogenetic and evolutionary distance analyses were performed by AIS. Genetic diversity and evolutionary selection pressure, neutral selection, and genetic differentiation analyses were performed by FR-Z. DAS-ELISA, RT-PCR, and nucleotide sequencing were performed by AK. Sample collection and maintenance were performed by SK. Phylogenetic analysis was supervised by SH. Funding acquisition and experiments were supervised by FE. The first draft of the manuscript was written by AIS and FR-Z. All authors commented on previous versions of the manuscript and have read and approved the final manuscript.
Funding This research was partially funded by Scientific Research Projects Coordinator of Ankara University with Project No. 18H0447001, titled "Detection of Viral Infection Sources on Onion Production Areas". Adyatma Irawan Santosa was supported by the grant 3282/ UN1/PN/PN/PT.01.03/2022 from the Faculty of Agriculture, Universitas Gadjah Mada. Authors provided personal funding to cover some expenses.

Data availability
The data supporting the findings of this study are included in this published article (and/or) its supplementary materials. The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.