The family Luteoviridae includes three important plant pathogenic genera- Enamovirus, Luteovirus and Polerovirus that are distinguished based on their genome organization, replication and expression strategies and genome organization . Members of the family Luteoviridae possess monopartite positive-sense single-stranded RNA genomes of lengths 5–6 kb with five or six major Open Reading Frames (ORF) [8, 21]. Poleroviruses are economically important as they affect the quality and quantity of the economic produce . In general, poleroviruses are transmitted by aphids in a persistent circulative manner .
In recent times, plant RNA datasets and contigs available in NCBI-Sequence Read Archive (SRA) and Transcriptome Shotgun Assembly (TSA) databases, respectively have emerged as one of the major resources for the discovery of putative novel viruses [2, 7, 22, 23]. Such identified novel viruses could be regarded as bona fide ones based on the report by Simmonds and colleagues  who suggested the incorporation of viruses that were identified only from metagenomic data into the official taxonomy scheme of the International Committee on Taxonomy of Viruses (ICTV) for comprehensive characterization of global virome. Considering the widespread occurrence of poleroviruses in plants and availability of transcriptomes of a large number of plant species in public domain, we hypothesized that the novel poleroviral sequences might be present in publicly available plant transcriptomes. Thus, the present study aimed to explore the publicly available plant transcriptomes for the discovery of novel poleroviral sequences that otherwise would require an expensive Next Generation Sequencing (NGS) based viromic study in each individual plant species.
To identify poleroviral sequences, reference RNA-dependent RNA polymerase (RdRp) sequence (P1−P2) of potato leafroll virus (PLRV) (NP056748.3) was used as query in tBLASTn analysis (expect threshold: 0.05, word size: 6; matrix: BLOSUM62) against TSA database and the search was limited to Viridiplantae (taxid:33090). The resulting hits were further shortlisted as putative poleroviral contigs based on the e-value cut-off (1e-50) and query coverage (>50%). Hits of lengths >4 kb were only considered for reliable identification and to avoid possible misidentification using smaller contigs. Putative poleroviral contigs were further analysed for the presence of intact ORFs using NCBI ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/). In cases where coding-complete genomes could not be identified, all the available RNA-seq datasets of the corresponding plant species, including the ones from where the contigs originated, were retrieved, trimmed using Trimmomatic v.0.39 , assembled using SPAdes v.3.13.1  and subjected to BLASTn analysis (e-value cut-off: 1e-5) against the recovered putative poleroviral contigs and PLRV reference genome (RefSeq) (NC001747) using NCBI BLAST+ v.2.9.0 to obtain coding-complete genomes. Further, the protein sequences coded by the recovered genomes were subjected to molecular weight estimation, motif and transmembrane helix (TMH) prediction using the tools mentioned in . The -1 ribosomal frameshift sites in the recovered genomes were predicted using the KnotInFrame tool (https://bibiserv.cebitec.uni-bielefeld.de/knotinframe) while ORF3a was predicted by aligning PLRV RefSeq with recovered poleroviral genomes using CLUSTALW tool in MEGA7 . Conserved domains in the protein sequences of identified viruses were determined after MUSCLE alignment in MEGA7 and visualized in WebLogo 3 (https://weblogo.berkeley.edu/) . Phylogenetic tree was constructed using Neighbourhood Joining (NJ) method and Poisson model with 1000 bootstrap replicates after MUSCLE alignment of RdRp sequences (P1−P2) of identified and retrieved poleroviral and enamoviral sequences in MEGA7. Using the recovered coding-complete genomes as queries in MEGABLAST analysis (expect threshold: 0.05, word size: 28), all the available RNA-seq datasets of corresponding plant species were individually searched for the presence of identified poleroviral sequences. Libraries containing atleast 10 viral reads were regarded as virus-positive.
A total of five putative novel poleroviruses and an enamovirus tentatively named as Celmisia lyallii enamovirus (ClEV), Foeniculum vulgare polerovirus (FvPV), Kalanchoe marnieriana polerovirus (KmPV), Paspalum notatum polerovirus (PnPV), Piper methysticum polerovirus (PmPV) and Trachyspermum ammi polerovirus (TaPV) were identified in various tissues of Celmisia lyallii, Foeniculum vulgare, Kalanchoe marnieriana, Paspalum notatum, Piper methysticum and Trachyspermum ammi, respectively (Table 1). Interestingly, KmPV and PnPV were also identified in sRNA libraries of respective plant species (Table 1). Coding-complete genomes (5.56−5.70 kb long) of ClEV, KmPV, PnPV, TaPV were obtained directly after initial tBLASTn analysis while the coding-complete genome of PmPV (5.74 kb) could be obtained only after assembly of one of the transcriptome libraries of P. methysticum. However, coding-complete genome of FvPV could not be obtained even after assembly of available transcriptome libraries of F. vulgare, though two non-redundant contigs (lengths 1.35 kb and 4.20 kb) of FvPV were recovered (Table S1).
The genomes of FvPV, KmPV, PnPV, PmPV and TaPV contains seven ORFs designated as ORF0,1,2,3a,3,4 and 5 that coded for P0, P1, P1−P2, P3a, P3, P4 and P3−P5 proteins, respectively while that of ClEV possessed five ORFs- ORF0,1,2,3 and 5 encoding P0, P1, P1−P2, P3 and P3−P5 proteins, respectively (Fig. 1). P0 protein of the identified viruses ranged from 243−299 aa in size with an estimated molecular weight of 28.0−34.1 kDa and contained a luteovirus P0 motif excepting ClEV. P1 protein of the identified viruses (620−810 aa) with molecular weights ranging from 68.4−90.1 kDa possessed a peptidase S39 motif (Table S1). The conserved H(X25)D(X70–80)GXSG domain of S39 peptidase  was observed in P1 protein of all the identified viruses (Fig. S1a). ORF1 possess a -1 ribosomal frameshift site at nt position 1998 (ClEV), 1665 (FvPV), 1564 (KmPV), 1586 (PnPV), 1743 (PmPV) and 1495 (TaPV) facilitating the expression of ORF2 of the identified viruses as P1−P2 fusion protein. The heptanucleotide slippery sequence facilitating ribosomal frameshift in ClEV is TTTAAAC which is identical to that of grapevine enamovirus 1 (GEV-1) . On contrary, the slippery sequence of FvPV, KmPV, PnPV, PmPV and TaPV is GGGAAAC which is identical to that of cardamom polerovirus . Similar to other poleroviruses and enamoviruses , the knotted structure of 40 nt length was predicted immediately downstream of the slippery sequence in genome sequences of all the identified viruses. P1−P2 protein of identified viruses (1044−1243 aa) with estimated molecular weights of 117.6−139.3 kDa contained a viral RdRp motif in addition to the peptidase S39 motif (Table S1). The conserved GXXXTXXXN(X25–40)GDD motif  was observed in P1−P2 fusion protein of all the identified viruses (Fig. S1b). Three TMHs were predicted in P1 and P1−P2 proteins of FvPV, PmPV and TaPV while two, four and five TMHs were predicted in P1 and P1−P2 proteins of KmPV, PnPV and ClEV, respectively (Table S1). It is worthy of note that the predicted TMHs are located at the amino terminal regions similar to other poleroviruses possibly facilitating the formation of replication complexes . ORF3a, predicted in all the identified viral genomes except ClEV, encodes protein P3a (44−45 aa) of 4.9−5.0 kDa by a non-canonical start codon. Similar to other poleroviruses , a TMH was predicted in P3a of identified viruses containing ORF3a. P3 protein (192−211 aa) of identified viruses with molecular weights 20.7−23.7 kDa contained a luteovirus coat protein domain. Protein P4 (174−196 aa) of 19.6−21.9 kDa containing a putative luteoviral VPg genome linked protein motif was encoded by all the identified viruses except ClEV. ORF5 is possibly expressed via a translational read-through of ORF3 stop codon to produce P3−P5 protein (456−698 aa) of 50.5−78.4 kDa with PLRV read-through protein motif in the identified viruses (Table S1). Length and molecular weight of P3−P5 protein could not be determined for FvPV as its coding-complete genome was not recovered. However a smaller contig (1.35 kb) containing 5’ truncated ORF5 sequences encoding a truncated protein with PLRV read-through protein motif was recovered (Table S1). ORF3 stop codon read-through site sequence of all the identified viruses including ClEV is AAAUAGGUA which is identical to that of other poleroviruses . The C-rich block with CCNNNN tandem repeat motif commonly found in luteoviruses  was determined after 9-12 nt downstream of the read-through site in all the identified viruses except FvPV with one CCCCA motif in each.
BLASTp analysis of encoded proteins revealed the sequence similarities of FvPV, KmPV, PnPV, PmPV and TaPV to other poleroviruses and ClEV to other enamoviruses (Table S1). Phylogenetic tree constructed based on the P1−P2 sequences grouped ClEV with citrus vein enation virus (CVEV), PnPV with cowpea polerovirus 2 (CPPV2) and TaPV with carrot red leaf virus (CtRLV). FvPV was placed in a sister clade to TaPV and CtRLV. KmPV was related to the poleroviruses of beet and turnip while PmPV was distantly related to PnPV, CPPV2 and poleroviruses of maize (Fig. 2).
All the proteins encoded by the identified viruses shared less than 90% sequence identities at maximum query coverage with any of the respective protein sequences of known poleroviruses/enamoviruses except for P3a of PnPV. Members of the family Luteoviridae can be regarded as a new species if any of the encoded proteins displayed more than 10% sequence divergence from the corresponding protein sequences of the existing members . Thus, based on the sequence based species demarcation criteria, genome organization, predicted motifs and phylogeny, FvPV, KmPV, PnPV, PmPV and TaPV can be regarded as putative novel members of the genus Polerovirus while ClEV can be regarded as a putative novel enamovirus. It is worth mentioning that the reads of few of the identified viruses were discovered in mRNA/sRNA libraries derived from different tissues and/or cultivars of the respective plant species implying that these viruses would possibly be widespread in nature. The plant species in which the novel poleroviruses were identified includes economically important culinary spice crops of the family Apiaceae- F. vulgare and T. ammi [10, 16], a traditionally important medicinal plant- P. methysticum , an important grass in natural grasslands of the western Hemisphere- P. notatum  and a wild relative of an important potted plant- K. marnieriana . Further, phylogenetic analysis grouped together the poleroviruses of plants belonging to Apiaceae in a monophyletic clade suggesting their possible origin from a common ancestor. On the other hand, ClEV forms a distinct subgroup of enamoviruses alongwith CVEV and pepper enamovirus. Interestingly, a cytorhabdovirus- Trachyspermum ammi virus 1  and a putative tymovirus- Kava virus 1  were identified in the same Bioprojects from where TaPV and PmPV, respectively were discovered in the current study.
In the present study, five putative novel poleroviruses and a novel enamovirus were identified in six plant species from public domain databases. Besides serving as a valuable resource for development of detection assays, the recovered genome sequences of identified viruses will help in better understanding the evolution and genomic features of the identified virus groups. Further studies are needed to understand the biological properties, distribution and economic importance of the identified viruses.