Dynamics of Genetic Diversity Among Indian Sugarcane Bacilliform Virus Species and Implications of Associated Recombination Events in the Virus

Sugarcane bacilliform virus (SCBV) is a plant pararetrovirus causing leaf fleck disease in sugarcane across the globe. Since it occurs throughout the sugarcane growing areas and germplasm in India, we have assessed the genetic divergences among its 104 isolates from various germplasm and Saccharum hybrid varieties. With the evidence gathered from phylogenetic analysis and sequence demarcation tool, five novel subgroups, viz., SCBV-U, SCBV-V, SCBV-W, SCBV-X and SCBV-Y, were proposed. Interestingly, SCBV-W isolate CBJ 46 showed highest variation in its nucleotide and protein sequence. Compared with the existing genomic database of SCBV and the findings from the present study, SCBV isolates from India exhibited greater diversity than that isolated from other regions. SCBV-U, SCBV-W and SCBV-X, potent recombinants with other eight recombinants discovered in this study indicated the plausible heterogeneity and genetic exchange occurred within SCBV species over time that might have led to the evolution of new variants. Neutrality tests indicated the existence of low-frequency polymorphism and selection pressure of < 1 pointed out at purifying selection of SCBV population. The current study regarding the genetic variation within SCBV species will aid in devising robust diagnostic tools for the detection of virus in quarantine laboratories and improving the knowledge of evolutionary changes in SCBV species.


Introduction
Sugarcane is an important sugar and bioenergy crop grown under tropical and subtropical conditions. It meets nearly 80% of the world's sugar requirement (http:// www. fao. org/ faost at). Sugarcane bacilliform virus (SCBV; genus Badnavirus, family: Caulimoviridae), a plant pararetrovirus that causes leaf fleck in sugarcane, is reported from more than 20 sugarcane cultivating countries (Karuppaiah et al. 2013). It is considered as an economically important pathogen, and it limits the exchange of sugarcane germplasm worldwide. It was initially reported from Cuba in the commercial cv B34104 in 1985(Rodriguez-Lema et al. 1985. Its occurrence had been predominantly reported in numerous germplasm clones and cultivars from India, Australia, Brazil, China, Morocco, Mauritius, Guadeloupe, USA and other countries (Ahmad et al. 2019;Lockhart and Autrey 2000;Muller et al. 2011;Viswanathan et al. 1996;Wu et al. 2016). It is spread through infected planting materials and by sugarcane pink mealybug (Saccharicoccus sacchari) and grey mealybug (Dysmicoccus boninsis) in a semi-persistent manner (Lockhart and Autrey 1991). Sugarcane is the principal host for its infection and it also infects other grasses such as Sorghum halepense, Brachiaria spp., Panicum maximum, Pennisetum spp and Rottboellia exaltata (Borah et al. 2013;Lockhart and Autrey 2000;Lockhart et al. 1996;Viswanathan et al. 1996). Unlike other badnaviruses with a limited host range, SCBV is an exception in its ability to infect plants of two different families, viz., Poaceae and Musaceae (da Silva et al. 2015). It is reported widely from germplasm Abstract Sugarcane bacilliform virus (SCBV) is a plant pararetrovirus causing leaf fleck disease in sugarcane across the globe. Since it occurs throughout the sugarcane growing areas and germplasm in India, we have assessed the genetic divergences among its 104 isolates from various germplasm and Saccharum hybrid varieties. With the evidence gathered from phylogenetic analysis and sequence demarcation tool, five novel subgroups, viz., SCBV-U, SCBV-V, SCBV-W, SCBV-X and SCBV-Y, were proposed. Interestingly, SCBV-W isolate CBJ 46 showed highest variation in its nucleotide and protein sequence. Compared with the existing genomic database of SCBV and the findings from the present study, SCBV isolates from India exhibited greater diversity than that isolated from other regions. SCBV-U, SCBV-W and SCBV-X, potent recombinants with other eight recombinants discovered in this study indicated the plausible heterogeneity and genetic exchange occurred within SCBV species over time that might have led to the evolution of new variants. Neutrality tests indicated the existence of low-frequency polymorphism and selection pressure of < 1 pointed out at purifying selection of SCBV population. The current study regarding the genetic variation within SCBV species will aid in devising robust diagnostic tools for the detection of virus in quarantine laboratories and improving the knowledge of evolutionary changes in SCBV species.
Symptomatology of SCBV infection varied among germplasm and hybrid varieties (Viswanathan and Premachandran 1998). Rao et al. (2014) reported the intensity of its infection from mild to severe chlorosis in 28 different hybrids. Yellowish flecking and interveinal chlorotic streaks were also found among them. Later in a comprehensive study, Viswanathan et al. (2019) revealed that the symptoms were typical in S. officinarum and less common in S. barberi, S. sinense, S. robustum and interspecific hybrids. In Saccharum spp. clones, common symptoms comprise of varying degrees of chlorotic stripes, stunted growth, severe chlorotic mottling and pronounced fleck. The symptoms started to appear prominently from tillering phase of the crop and the severity continued to increase with aging. However, in hybrid clones, the symptoms appeared as intense flecks to mild mottle at the distal portion of the leaf lamina gradually progressed towards the proximal region. The chlorotic flecks turned yellow, leading to the reddening of the entire leaf lamina (Viswanathan et al. 2019).
SCBV has a non-covalently closed double-stranded DNA genome of 7.3-7.8 kilobase pair (kb) in size and replicates via a virus-encoded reverse transcriptase (RT) [Bouhida et al. 1993;Geering and Hull 2012). It owns a typical badnavirus genome organization comprising three open reading frames (ORFs). ORFs 1 and 2 encode for a small protein of unknown functions while the ORF3 encodes for functional proteins, including movement protein, coat protein, aspartic protease, reverse transcriptase (RT) and ribonuclease H (RNase H) protein (Karuppaiah et al. 2013;Muller et al. 2011;Sun et al. 2016). Even though banana streak virus (BSV-OL) was shown to be integrated in to the Musa genome (Geering et al. 2001), there was no integration reported of SCBV to its host genome sugarcane (Geijskes et al. 2004).
The RT/RNase H-coding region is a common taxonomic marker for species demarcation within the genus Badnavirus. The demarcation criterion is > 20% nucleotidal variation in the RT/RNase H region, proposed by the International Committee on Taxonomy of Viruses (ICTV) (Geering and Hull 2012). Currently, four species of SCBV were recognized such as sugarcane bacilliform IM virus (SCB IMV), sugarcane bacilliform MO virus (SCBMOV), sugarcane bacilliform Guadeloupe A virus and sugarcane bacilliform Guadeloupe D virus by ICTV as different species under Badnavirus (Geering and Hull 2012;Muller et al. 2011). At present, 20 genotypes (SCBV-A to SCBV-T) of SCBV have been reported worldwide. Seven genotypes, viz., SCBV-E, H, I, J, K, L and M (Ahmad et al. 2019;Wu et al. 2016); five from France, SCBV A, B, C, D and G (Muller et al. 2011) and five from Brazilian germplasm SCBV A, C, F, M and H (da Silva et al. 2015).
In recent years, extensive sequence information was generated from various sugarcane growing provinces across the globe. Our earlier studies revealed widespread occurrences of the virus and genomic diversity in India (Karuppaiah et al. 2013;Viswanathan et al. 2019). Hence, further studies were taken up to address the genetic variability among SCBV. This study with 104 new SCBV isolates from germplasm and hybrid varieties brought out new information on the prevalence of enormous genetic variation among the SCBV population in India for the first time and the involvement of recombination events associated with a genomic variation in the virus population.

Leaf Samples
Detailed surveys were made to collect leaf fleck suspected sugarcane leaf samples from various parts of India and in Saccharum germplasm during 2019-2020. Samples from 125 germplasm clones and 233 cultivated varieties were collected from Tamil Nadu, Kerala, Karnataka, Maharashtra, Andaman and Nicobar Islands ( were stored at −80 °C until DNA extraction. The leaf samples were ground in liquid nitrogen, and DNA was isolated using CTAB buffer (Karuppaiah et al. 2013). The quality of the DNA sample was verified quantitatively in a Nanodrop spectrophotometer (Thermo Scientific, USA) and qualitatively by agarose gel (0.8%) electrophoresis.

Primer Designing and PCR Amplification
Whole-genome sequences available to date were retrieved from NCBI and were aligned using BioEdit 7.0.5.3 software. RT/RNase H region was selected as the region for PCR amplification based on the ICTV criteria for species demarcation. Failure of the established primers to identify SCBV in various sugarcane varieties led to designing a new set of degenerate primers targeting the conserved motifs of the ORF3 polyprotein-RT/RNaseH region. The primers SCBV 794 FP-5'GCR CCW GCAGTVTTY CAR AGG AAG ATG3' and SCBV 794 RP-5' CCA YCT GAT CTC HGAA GGY TTRTG 3' have specifically amplified ~ 794 bp fragment. About 300 ng of total genomic DNA was subjected to PCR using Taq polymerase (TaKaRa, Japan) by following the optimized assay conditions: an initial denaturation step at 94 °C for 5 min, followed by 35 cycles of 94 °C for 45 s; 59 °C for 45 s; 72 °C for 45 s; and a final extension step at 72 °C for 10 min. Amplified DNA was resolved on 1.5% agarose gels pre-stained with ethidium bromide and then visualized under a gel documentation system to verify their size specificity (GBox, Syngene). PCR-positive sample derived from a single sugarcane variety was considered one isolate, hereafter referred to as an SCBV isolate.

Sanger Di-deoxy Sequencing and in Silico Sequence Analysis
The amplicons were purified in 15 µl of nuclease-free water using GenElute Gel Extraction kit (Sigma, USA) and cloned before sequenced by Sanger di-deoxy sequencing (Eurofins Scientific, Bangalore). Three positive samples per isolate were subjected for cloning followed by sequencing to avoid in vitro PCR errors. Forward and reverse strands of the isolates were sequenced and the obtained sequences were aligned to form a contiguous sequence using the Cap contig assembly programme of Bioedit 7.0.5.3. The resulting contigs were used to query the National Centre for Biotechnology Information (NCBI) database (www. ncbi. nlm. nih. gov) with the BLASTn search functions where BLAST analysis provided a match to viral sequences.

Phylogenetic Profile Analysis
Contig sequences were used for inferring evolutionary relationships among SCBV species. Phylogenetic analysis was done using the trimmed nucleotide sequences of 603 bp in the RT/RNase H-coding region. The analysis included 104 sequences from the present study, 23 SCBV genome sequences and one rice tungro bacilliform virus sequence (RTBV Accession no: NC_001914.1) (as outgroup) from the NCBI database. Twenty sequences from the NCBI represented 20 subgroups established (SCBVA-T) earlier. All the sequences were trimmed correspondingly to 603nt, aligned using ClustalW, and the phylogenetic tree was constructed using the maximum likelihood method with Tamura-Nei model (Tamura and Nei 1993) with MEGAX software (Kumar et al. 2018). The robustness of the bootstrap consensus tree was inferred from 1000 replicates.

Pairwise Sequence Alignment and Identity Calculation
Recently pairwise-identity-based viral classification has proven highly useful particularly with small viral genomes. Among pairwise-identity-based classification approaches, sequence demarcation tool-SDT 1.2 (available from http:// web. cbio. uct. ac. za/ SDT) software has been adopted for pairwise sequence alignment and identity calculation. The identity scores were calculated using ClustalW alignment approaches as 1-(M/N) where M represents the number of mismatching nucleotides and N represents the total number of positions along the alignment at which neither sequence has a gap character (Muhire et al. 2014). We have included the previously established 20 genotypes/species of SCBV with five possible genotypes from the current study derived through phylogenetic analysis for identity calculation.

Outline of Recombination Events in SCBV
Because of the variation found even within the conserved region of SCBV (RT/RNaseH), efforts were made to ascertain any recombination events that happened among SCBV isolates through Recombination detection programme 4.39 (RDP4) (Martin et al. 2015) by implementing the following methods: RDP, GENECONV, Bootscan, Maxchi, Chimaera, Siscan and 3seq methods. A data file of 124 SCBV sequences was prepared in nexus format, and potential recombination events were detected using the above-mentioned methods. Recombination breakpoint hot spots were established using the permutation-based test from the breakpoint distribution plot. All potential recombinants were manually checked wherever necessary using extensive phylogenetic recombination signal and analysis features in RDP4.

Assessment of Population Genetic Parameters
Population genetics reveals the genetic variation within and among SCBV populations and the evolutionary factors that explain this variation for a particular gene. This is driven by a myriad of factors, including recombination, mutation rates, a pattern of selection and stochastic noise caused by random genetic drift in that order. The neutrality of the SCBV population was calculated using DNA sequence polymorphism (DnaSP) 5.10.01 software (Librado and Rozas 2009). One hundred four isolates from the study and 20 genotypes established earlier were used for carrying out the parameter. A neutrality test was conducted to verify the neutral mutation hypothesis by using an average number of nucleotidal differences in the genome and the number of segregation sites employing Tajima's D test (Tajima 1989) and Fu and Li's D (1993). Selection pressure on the coding region was assessed using the SLAC (Single-Likelihood Ancestor Counting) analysis using data monkey (http:// www. datam onkey. org/) server by HKY85 nucleotide substitution bias model. The ratio of non-synonymous substitution; dN (amino acid altering substitutions), and synonymous substitution; dS (substitutions that do not modify amino acids) (dN/dS), has been broadly used as an indicator of selection pressure (ω = dN/ dS) (Pond and Frost 2005).

PCR Amplification of RT/RNase H Fragments and Sequence Analysis
Of the 358 sugarcane samples, 57% (133/233) of Saccharum hybrid varieties and 48% (60/125) germplasm clones were positive with SCBV794 primer pair, which gave an amplicons size of 794 bp (Fig. 1). The primer detected the virus from a broad spectrum of infected sugarcane, mainly S. officinarum, S. spontaneum, S. barberi, S. sinense, S. robustum, E. arundinaceous, E. bengalensis, Sclerostachya spp, Narenga spp and various Saccharum hybrid varieties. Contigs derived from the partial sequencing of clones showed 80-99% similarity with existing SCBV sequences in NCBI nucleotide database. All the contigs derived from the present study were submitted to NCBI-Genbank (Supplementary  Table 1).

Phylogenetic Profile Analysis
Evolutionary analysis using the maximum likelihood method revealed the segregation of sequences into three major monophyletic groups where most isolates clustered in the third monophyletic group. Following the already established 20 subgroups reported from all over the world (SCBVA-T) (Ahmad et al. 2019), five new subgroups were assigned based on phylogenetic grouping ( Fig. 2A), namely SCBV-U, V, W, X and Y. Fifty-nine isolates from the study were grouped into a separate cluster forming a new subgroup SCBV-U (Fig. 2B). Another 17 isolates formed a branch of the SCBV-L subgroup (Fig. 2C). The isolate CBJ 46, showing < 88% similarity to the neighbouring N subgroup (SCBV-FJZZ3) from China, formed a distinct branch (SCBV-W) outside the third monophyletic tree. Unlike SCBV-N, CBJ 46 showed no similarity to existing whole-genome sequences. The isolates from germplasm viz. S. officinarum clones (Bangadya, Saipan G, Baragua), S. spontaneum 81-095 and interspecific hybrids, ISH 1 and Cym 08-666 formed a separate cluster forming the novel subgroup SCBV-V. Notably, SCBV isolate MW548486 from S. barberi shared identity of 87.57% with SCBV-CHN2 (KM 214,358.1) but grouped far from it while forming a novel subclade SCBV-X. Likewise, SCBV isolate MW584708 CB 2001-13 from Saccharum hybrid Co 2001-13 showed similarity to SCBV-BRU with 86.5% similarity in their nucleotide composition and it yet formed phylogenetically distinct clade forming the new subgroup SCBV-Y. In contrary, an isolate from Coimbatore (MW548472) from the same sugarcane hybrid formed the subgroup SCBV-U. Five subgroups that were already established in other sugarcane growing countries SCBV-G (Guadeloupe dots indicate the proposed genotypes. B Details of 59 samples, which clustered together to form SCBV-U genotype. C Details of 17 isolates from the study, clustered together with SCBV AP Co 693077 from SCBV-L genotype 84-432 and S. officinarum Khajuria, S. officinarum isolates MW548484 and MW548485 formed a cluster with SCBV-H genotype in the group 1. S. officinarum isolate MW645069 emerged as genotype-E with 92.7% similarities to SCBV Iscam reported earlier from India (Karuppaiah et al. 2013). SCBV-U, a novel subgroup from the present study, can be considered the most frequently occurring SCBV variant in India, especially in isolates from Saccharum hybrids and interspecific hybrids.

Pairwise Sequence Alignment and Identity Calculation
Pairwise sequence identity was established among the 25 SCBV subgroups (Fig. 3). All the five possible new subgroups deduced from the phylogenetic analysis (SCBV: U-Y) showed more significant nucleotidal differences in the pairwise sequence alignment. Hence as a continuation of earlier reported subgroups, these groups can be added. Based on the homology in Fig. 3, which includes four ICTV recognized species and every other species and variants reported from this study and around the world, we can determine seven different SCBV species

Distribution of Variation Across the Amino Acids in RT/RNase H Region
In the present study, 152 amino acid sequences retrieved from the RT/RNase H region of ORF 3 functional protein of 25 genotypes were aligned (Fig. 4). Amino acids from 1-9th position ''EEEHAEHL'' in SCBV E-Y were replaced by ''VQQHKEHLK'' in SCBV A-D genotypes. A single amino acid change was found throughout the selected region of SCBV genotypes. Variants reported from countries, viz., India, China and Australia, showed similarity in protein sequences, whereas more significant dissimilarity was found with Guadeloupe genotypes. The 10th amino acid (in the alignment) showed the maximum dissimilarity in the position with K, I, V, A, E, T and N. Resemblance of amino Fig. 4 Amino acid sequences alignment with variations in the RT/RNase H region of SCBV genotypes; SCBVA -Y with Genbank accession numbers acids within the SCBV E-Y genotypes advocates the origin of these isolates from a common ancestor, which might have undergone continued exchange of gene fragments.

Outline of Recombination Events in SCBV
Greater differences in nucleotide sequences generally present in SCBV lead to the emergence of numerous genotypes, thus making recombination and genomic reassortment possible inside the genome. Exploring a dataset of 124 nucleotides with seven methods using RDP4 revealed the presence of inter-SCBV recombination. Out of 45 recombination signals detected, a total of 11 events were found to be significant, where the p-value was ≤ 0.05 ( Table 2). The SCBV isolates CBJ 46, ISH 101 and SB Pathri from the current study with the proposed genotypes of SCBV-W, U and X were established as recombinants. SCBV-W variant CBJ 46 became recombinant with SO BS Aubin (genotype E) as a major parent and SS IND 81-003 (genotype R) as a minor parent. SCBV-FJZZ3 (genotype N) with PR1062 (genotype U) contributed to the evolution of SB Pathri isolate (genotype X); SO BS Aubin (genotype E) and SS IND 84-432 (genotype T) contribute the formation of ISH101 (genotype U) recombinant. SS 07-1488 (genotype S) and SCBV Guadeloupe A (genotype A) with a significant p-value of 1.618 × 10 -03 add to the recombinant SCBV Guadeloupe D (genotype D). Interestingly SCBV FJZZ3 (genotype N) reported from China is a recombinant from SO Penang (genotype U-This study) and SCBV-YN1 (genotype S-China). The genetic exchange must have happened through germplasm materials across provinces/countries that led to new variants, which is evident from the recombination events.
Furthermore, genomic fragments were derived from distinct parents who belong to different phylogenetic clusters. Breakpoint distribution plot of 124 isolates with 603nt sequence established 250-350 (position in the alignment) as a probable hot spot inside the RT/RNase H region (Fig. 5). Characterization of recombination events and gene assortment within this conserved region will offer insight into the evolution of the variations within the species. The breakpoint distribution plot aided in finding out the frequent variation sites within the RT/RNase H region of SCBV genome.

Assessment of Population Genetic Parameters
Neutrality tests of 124 SCBV populations with mutation site -450 and segregation site -260 gave a D value of −1.142 and −0.966, respectively, for Tajima's D test and Fu and Li's test (Table 3). The average number of nucleotidal differences between SCBV populations (K) was 54.769, and the nucleotide diversity (Pi) was 0.143. A negative neutrality test signified an excess of low-frequency polymorphisms indicating a selective swap/population expansion after a current bottleneck. Selection constraints on the coding region over each nucleotide data were estimated with dN/dS ratio. A mean value of < 1 indicated a negative selection (Fig. 6). Out of 201 sites considered, SLAC found evidence of pervasiveness, a negative/purifying selection at 156 sites with a p-value threshold of 0.1. Table 2 Recombination events detected using RDP4. Major parent and minor parent are the sequences which contributed to the formation of recombinant. R, G, B, M, C, S and 3s are acronyms used for the methods, viz., RDP, GENECONV, BOOTSCAN, MAXCHI, CHIMERA, SISCAN and 3SEQ to find out intra SCBV recombination with a p-default value of 0.05

Discussion
SCBV is considered one of the frequently detected viruses in quarantine during germplasm exchange, thereby considered an economically important pathogen (Ashraf et al. 2020). Surprisingly, no procedure is yet available to remove this virus from sugarcane materials. Apical meristem culture (AMC), commonly used to eliminate other viruses, is Light and dark grey areas symbolize local 99% and 95% breakpoint clustering, respectively not an appropriate and efficient method since this virus is expected to infect meristem tissues (Fernandez et al. 2020). Discarding sugarcane infected with this virus would result in a loss of 30-40% of the germplasm quarantine materials (Fernandez et al. 2020). During the first report of the virus in India, it was found in a few sugarcane germplasm clones (Viswanathan et al. 1996); however, subsequently, its widespread occurrence was observed in various species clones of Saccharum, allied genera and hybrids clones of Indian and foreign origin (Viswanathan et al. 1999;Viswanathan and Premachandran 1998). Recent studies have established that it infects the crop across 5.2 M Ha sugarcane growing areas in India and rampant occurrence of the disease has become a cause of concern to sugarcane production (Rao et al. 2014;Viswanathan et al. 2019), apart from its impediment in germplasm exchange. In this scenario, there is a need to document the genetic variation in SCBV. Based on its complete genome sequence, we have earlier documented the prevalence of five SCBV species, indicating enormous virus diversity in the country (Karuppaiah et al. 2013 Since the studies on the occurrence of SCBV in India are limited, the current work will showcase a comprehensive assessment of SCBV from sugarcane germplasm and cultivated varieties. Comparing the phylogeny of SCBV from hybrid Co 86032 collected from different regions (Pune, Maharashtra state, Avinashi, Neelambur and Vedapatti, Tamil Nadu state) and generations (tissue culture-derived canes −T0, T1 and T3) showed more significant similarity in nucleotide composition and clustered together in the new subclade SCBV-U. Likewise, SCBV detected from the hybrid cv Co 0212 from different fields, viz., SBI-VPT, Indiyampalayam and Avinashi, belong to the state Tamil Nadu dispersed in the same genotype clade SCBV-U. Similarly, SCBV isolated from Saccharum hybrid cv PI 1110 from three different regions shared the same genotype identity. The existence of the same genotype from different fields indicates that the virus might have transmitted through true seed. Recently Balan et al. (2022) reported that the virus is transmitted through true seed and the virus isolated from the parental clone is carried to the progenies and maintained through generations of vegetative propagation. In contrast, SCBV isolates from the cv Co2001-13 from Karnataka (MW5484708) were segregated as a distinct clade SCBV-Y while its counterpart from Tamil Nadu MW548472 showed identity to SCBV-U subgroup. This may probably be due to possible infection of a different isolate through a mealybug vector in a new location. Most of the novel subgroups reported from the study was from germplasm collections, indicating a higher percentage of nucleotidal variation in the viral genome infecting the germplasm. The germplasm clones have come from different regions of South East Asia, New Guinea, India, China and other regions (Viswanathan and Premachandran 1998), and they carry the varying virus population from the respective locations. In the case of hybrids, if the virus population in the parental clones used for breeding are prevalent, the same virus population is expected in the new hybrid varieties through the maternal transmission of the virus.
Phylogenetic analysis with the obtained sequences revealed five novel subgroups, viz., SCBV U-Y. Findings of the phylogenetic study are evident with pairwise sequence identity analysis, which pointed out a lower intra-genotype identity compared to already existing subgroups-SCBV A-T (Ahmad et al. 2019;Karuppaiah et al. 2013;Muller et al. 2011;Wu et al. 2016). SCBV-W exhibited > 22% at the RT/ RNase H region (conserved region), which is more than the taxonomic criteria for new species (> 20%). Even though high nucleotidal differences are generally reported from region other than RT/RNase H, whole genome analysis of CBJ 46 isolate is required to confirm the divergence. SCBV-L subgroup proposed earlier by Rao et al. (2014) as a new species is also reported from the current study. Seventeen isolates from the study were segregated with SCBV-L, the second-highest reported genotype from Indian states. Occurrence of similar subgroups was observed between Indian and Chinese domains compared to the Guadeloupe/Australian origin isolates. All the 104 isolates from the study were segregated into the following genotypes -SCBV-E, -G, -H, -I, -J, -L, -Q, -R, -S and -T apart from the new SCBV U-Y subgroups. Fifteen SCBV subgroups present within 104 isolates revealed genetic diversity within SCBV species in India. To date, 19 subgroups are found from India, excluding SCBV-A, -B, -C, -D, -N, -O and -P. However, further analysis of the SCBV population from unrepresented regions and germplasm clones may throw more light on the viral genomic diversity.
Recombination in viruses is a pervasive process leading to most viruses' genetic diversity. Since these pararetroviruses are well known for their diversity, efforts were made to ascertain recombination events, crucial for their variation. Potent recombination events were observed in SCBV isolates, especially from the proposed new subgroups -SCBV-U, -W and -X, which points out their existence might have happened due to the genetic reassortment over time. Even though the recombination phenomenon happened randomly or non-randomly, selective pressures must have acted against the breakpoints at a definite position in the genome (Martin and Rybicki 2002). Negative values obtained in Tajima's D and Fu and Li neutrality tests indicate the existence of low-frequency polymorphism in the SCBV population. In contrast to the present study, Rao et al. (2014) reported a positive D value of 2.68, indicating balancing selection and deep subdivision in the SCBV population. The estimation of the ratio of non-synonymous (dN) to synonymous (dN) substitution of < 1 revealed shreds of evidence of invasiveness. Purifying selection of SCBV population prevents the change of an amino acid residue, thereby favouring an excess of synonymous substitution over non-synonymous substitutions. The genomes from the same species vary in sequence due to different evolutionary processes. In abbreviation, the study pointed out the possible reasons behind the genetic diversity found in SCBV genotypes with the aid of recombination, neutrality test and selection pressure.

Conclusion
Genome sequences are the primary and principal tool for the characterization of viruses nowadays. The present study established genomic variation in SCBV isolates infecting many Saccharum spp. clones and hybrid varieties in India and detailed information on variation in SCBV species in the country. With phylogenetic evidence and sequence demarcation tool, five novel SCBV subgroups: SCBV-U, -V, -W, -X and -Y, were proposed in addition to 20 genotypes already established. Indian SCBV isolates are the most divergent population of SCBV compared to isolates from other regions. With the number of genotypes reported from Indian sugarcane materials, chiefly from germplasm clones, there is a high probability of additional variants if more sugarcane clones are screened and analysed. Recombination patterns from the present study indicate the probable heterogeneity and exchange of gene fragments between the genome sequences of parental isolates, leading to the evolution of new SCBV variants. Interestingly, Indian isolates showed considerable similarity to SCBMOV (NC 008017), and SCBIMV (AJ277091) isolates from Morocco and Australia, respectively. Low polymorphism indicates a recent selective swap or purifying selection obtained from a functional part of the genome and a probable gene assortment/ genetic drift that result in an adventurous effect in the SCBV genome pattern. In conclusion, our results emphasize the genetic variation within SCBV species, thus will aid in the molecular screening of commercial sugarcane material and germplasm clones in quarantine labs. The population genetic parameters and recombination signals can provide evolutionary details, thereby lending a hand in the parallel development of taxonomic studies. Perpetually increasing deposit of genome sequences in the depository and identifying new genotypes necessitate the importance of phylogenetic groups to identify species/strains in the genus Badnavirus.