Viruses of the family Geminiviridae contain circular, single stranded (ss) DNA genomes encapsulated in twinned, quasi-icosahedral capsids. Based on genome organisation, vector transmission, and host range, this family is classified into nine genera: Begomovirus, Mastrevirus, Curtovirus, Becurtovirus, Turncurtovirus, Eragrovirus, Grablovirus, Capulavirus, and Topocuvirus . Begmoviruses transmitted by whiteflies (Bemisia tabaci) are the largest group among geminiviruses which infect large array of dicotyledonous plants, including vegetable crops and weeds. Yellowing, inward curving of the leaves, and stunting of the plants are all common manifestation of begomovirus infection, resulting in severe yield losses . According to the number of genome components, begomoviruses are grouped as monopartite or bipartite. Bipartite begomoviruses have two small circular single strand DNA molecules of 2.5-2.7 kb marked as DNA A and DNA B. Genes of DNA A are encoded with all viral functions required for viral DNA replication, gene expression, encapsidation and transmission. DNA B codes for two proteins required for inter and intra cellular movement in host plants . Monopartite begomoviruses contain a single genome that is identical to bipartite DNA A.
Weeds can be the reservoirs of many unidentified economically important viruses and acts as a source of spread of new infections, but they are often neglected in virus diversity studies . Weeds are frequently affecting with many viruses, so these can act as mixing vessels for the component exchange and recombination of new viruses, which may lead to the generation of diversity of geminiviruses .
In August 2016, during the survey at chilli fields in Trichy, Tamil Nadu, India, we noticed Croton bonplandianus (family Euphobiaceae) plants showing typical begomovirus symptoms such as leaf curling, yellow veins and stunted growth, and the samples were collected (Figure 1). Total genomic DNA was isolated by using CTAB method  and subjected to PCR amplification to confirm the presence of begomoviruses by using degenerative primers BVF and BVR (BGVF- 5' GCCCACATYGTCTTYCCNGT 3'; BGVR- 5' GGCTTYCTRTACATRGG 3')  and universal betastellite primers (Beta F- 5' ACTACGCTACGCAGCAGC C 3'; Beta R- 5' TACCCTCCCAGGGGTACAC 3')  to identify the presence of betasatellite DNA. To obtain full-length of DNA A genome rolling circle amplification (RCA) was carried out with phi29 DNA polymerase (Thermo Fisher SCIENTIFIC). RCA products were analysed to identify unique restriction endonuclease sites and the following restriction digestion with BamHI, EcoRI, HindIII, XbaI, KpnI and SphI. Full length monomeric DNA A component of ~2.7 kb fragment released after restriction digestion of KpnI restriction enzyme was purified and ligated with pUC19 restricted with KpnI and transformed to E. coli competent cells. Recombinant clones are confirmed by colony PCR using universal M13 primers and restriction digestion with KpnI and BglI. The resulting recombinant clones were sequenced and full-length genome was obtained by primer walking method (Agrigenome, Kochi, India).
The nucleotide sequence data obtained from WK3 clone was aligned in NCBI’s nucleotide database using BLASTn tool  for identifying origin of the sequence. The sequences were assembled by BioEdit  tool and using NCBI ORF finder  Open Reading frames were identified. Basic Local Alignment Search Tool (BLAST) was used to compare nucleotide and amino acid sequence identity with other reported begomovirus genome sequences in NCBI and top BLAST hits were selected. Multiple sequence alignment was done using the Molecular Evolutionary Genetics Analysis (MEGA-X)  tool. Phylogenetic analysis was performed using MEGA-X program with 1000 replicates bootstrapping by using Maximum-likelihood method. Pairwise sequence identity comparisons were performed using the SDT (Species Demarcation Tool) v 1.2 program.
BLAST analysis showed that WK3 has highest sequence similarity of 89.62% to PaLCuV isolate (MK087120) from Karnataka, India, 89.26% sequence identity to BYVBhV (FJ589571) from Orissa, and 88.02% identity to croton yellow vein mosaic virus (CroYVMV) (JN831446) (Supplementary Table 1). The pairwise nucleotide sequence identity analysis with SDT showed <95% sequence identity with other begomovirus sequences (Supplementary Figure 1). Phylogenetic tree was constructed using GTR+G as the best-fit model, it showed WK3 placed with BYVBhV clade (Figure 2).
Recombination analysis using RDP4  revealed the presence of three breakpoints in the begomovirus isolate WK3 derived through the recombination supported by 7 algorithms (Supplementary Table 2). Two breakpoints spanning in the ORFs AC3, AC2, AC1, and AC4 (nt coordinates 1195-1791 and 2159-2314) were derived from the tomato leaf curl Iran virus (AY297924), whereas the third breakpoint (nt coordinates 2338-2776) falling on ORF AC1 and AC4 was derived from BYVBhV (FJ589571) (Supplementary Figure 2). This clearly highlighted the putative evolution of newly characterized the begomovirus isolate WK3 through frequent recombination in genomic fragments from tomato leaf curl Iran virus (ToLCIRV) and BYVBhV in the background of CroYVMV, which have possibly occurred in weed host Croton bonplandianum. Present study, therefore, supports our hypothesis of weeds occurring as major site of begomovirus recombination and evolution.Based on these results and as per the ICTV begomovirus species demarcation threshold (91%) , the identified begomovirus represents a new species for which the name Croton yellow vein leaf curl virus (Accession no. OM141479) is proposed.