Exploration of plant transcriptomes reveals five putative novel poleroviruses and an enamovirus

Transcriptome datasets available in public domain serve as valuable resource for identification and characterization of novel viral genomes. Poleroviruses are economically important plant-infecting RNA viruses belonging to the family Solemoviridae. In the present study, we explored the plant transcriptomes available in public domain and identified five putative novel poleroviruses tentatively named as Foeniculum vulgare polerovirus (FvPV), Kalanchoe marnieriana polerovirus (KmPV), Paspalum notatum polerovirus (PnPV), Piper methysticum polerovirus (PmPV), Trachyspermum ammi polerovirus (TaPV) and a novel enamovirus named as Celmisia lyallii enamovirus (ClEV) in Foeniculum vulgare, Kalanchoe marnieriana, Paspalum notatum, Piper methysticum, Trachyspermum ammi and Celmisia lyallii, respectively. Coding-complete genomes (5.56–5.74 kb) of CIEV, KmPV, PnPV, PmPV and TaPV were recovered while only the partial genome of FvPV could be recovered. The genome organization of identified viruses except ClEV is 5’–ORF0–ORF1–ORF2–ORF3a–ORF3–ORF4–ORF5–3’ while that of ClEV is 5’–ORF0–ORF1–ORF2–ORF3–ORF5–3’. Phylogenetic analysis revealed that poleroviruses of apiaceous plants formed a monophyletic clade within the genus Polerovirus.

The family Solemoviridae includes four important plant pathogenic genera such as Enamovirus, Polemovirus, Polerovirus and Sobemovirus that are distinguished based on their genome organization [1]. Members of the family Solemoviridae possess monopartite positive-sense singlestranded RNA genomes of lengths 4-6 kb that contain four to ten open reading frames (ORF) [1]. Poleroviruses are economically important as they affect the quality and quantity of the economic produce [2]. In general, poleroviruses are transmitted by aphids in a persistent circulative manner [2].
In recent times, plant RNA datasets and contigs available in NCBI-Sequence Read Archive (SRA) and Transcriptome Shotgun Assembly (TSA) databases, respectively, have emerged as one of the resources for the discovery of putative novel viruses [3][4][5][6]. Such identified novel viruses could be regarded as bona fide ones based on the report by Simmonds and colleagues [7] who suggested the incorporation of viruses that were identified only from metagenomic data into the official taxonomy scheme of the International Committee on Taxonomy of Viruses (ICTV) for comprehensive characterization of global virome. Considering the widespread occurrence of poleroviruses in plants and availability of transcriptomes of a large number of plant species in public domain, we hypothesized that the novel poleroviral sequences might be present in publicly available plant transcriptomes. Thus, the present study aimed to explore the publicly available plant transcriptomes for the discovery of novel poleroviral sequences that otherwise would require an expensive next-generation sequencing (NGS)-based viromic study in each individual plant species.
Division of Genetics and Tree Improvement, Institute of Forest Biodiversity (ICFRE), Hyderabad, India resulting hits were further shortlisted as putative poleroviral contigs based on the e-value cut-off (1e-50) and query coverage (> 50%). Hits of lengths > 4 kb were only considered for reliable identification and to avoid possible misidentification using smaller contigs. Putative poleroviral contigs were further analysed for the presence of intact ORFs using NCBI ORF Finder (https:// www. ncbi. nlm. nih. gov/ orffi nder/). In cases where coding-complete genomes could not be identified, all the available RNA-seq datasets of the corresponding plant species, including the ones from where the contigs originated, were retrieved, trimmed using Trimmomatic v.0.39 [8], assembled using SPAdes v.3.13.1 [9] and subjected to BLASTn analysis (e-value cut-off: 1e-5) against the recovered putative poleroviral contigs and PLRV reference genome (RefSeq) (NC001747) using NCBI BLAST + v.2.9.0 to obtain coding-complete genomes. Further, the protein sequences coded by the recovered genomes were subjected to molecular weight estimation, motif and transmembrane helix (TMH) prediction using the tools mentioned in [5]. The −1 ribosomal frameshift sites in the recovered genomes were predicted using the KnotInFrame tool (https:// bibis erv. cebit ec. uni-biele feld. de/ knoti nframe) while ORF3a was predicted by aligning PLRV RefSeq with recovered poleroviral genomes using CLUSTALW tool in MEGA7 [10]. Conserved domains in the protein sequences of identified viruses were determined after MUSCLE alignment in MEGA7 and visualized in WebLogo 3 (https:// weblo go. berke ley. edu/) [11]. To estimate the size distribution of sRNA reads that mapped to recovered viral genomes, raw reads were first trimmed using the trimming tool (quality score: 0.001) available in CLC workbench v.20.0.4 to obtain 15-30 nt trimmed sRNA reads. Trimmed reads were, then, mapped onto the viral genome recovered from the respective plant species using the mapping tool (default) in CLC workbench. Phylogenetic tree was constructed using neighbourhood-joining (NJ) method and Poisson model with 1000 bootstrap replicates after MUSCLE alignment of RdRp sequences (P1-P2) of identified and retrieved poleroviral and enamoviral sequences in MEGA7. Using the recovered coding-complete genomes as queries in MEGABLAST analysis (expect threshold: 0.05, word size: 28), all the available RNA-seq datasets of corresponding plant species were individually searched for the presence of identified poleroviral sequences. Libraries containing atleast 10 viral reads were regarded as virus-positive.
The genomes of FvPV, KmPV, PnPV, PmPV and TaPV (BK059371-BK059376) contain seven ORFs designated as ORF0,1,2,3a,3,4 and 5 that coded for P0, P1, P1-P2, P3a, P3, P4 and P3-P5 proteins, respectively, while that of ClEV (BK059370) possessed five ORFs-ORF0,1,2,3 and 5 encoding P0, P1, P1-P2, P3 and P3-P5 proteins, respectively ( Fig. 1). P0 protein of the identified viruses ranged from 243 to 299 aa in size with an estimated molecular weight of 28.0-34.1 kDa and contained a polerovirus P0 motif excepting ClEV. P1 protein of the identified viruses (620-810 aa) with molecular weights ranging from 68.4 to 90.1 kDa possessed a peptidase S39 motif (Table S1). The conserved H(X 25 )D(X 70-80 )GXSG domain of S39 peptidase [13] was observed in P1 protein of all the identified viruses (Fig. S4a).  The heptanucleotide slippery sequence facilitating ribosomal frameshift in ClEV is TTT AAA C which is identical to that of grapevine enamovirus 1 (GEV-1) [13]. On contrary, the slippery sequence of FvPV, KmPV, PnPV, PmPV and TaPV is GGG AAA C which is identical to that of cardamom polerovirus [6]. Similar to other poleroviruses and enamoviruses [4], the knotted structure of 40 nt length was predicted immediately downstream of the slippery sequence in genome sequences of all the identified viruses. P1-P2 protein of identified viruses (1044-1243 aa) with estimated molecular weights of 117.6-139.2 kDa contained a viral RdRp motif in addition to the peptidase S39 motif (Table S1). The conserved GXXXTXXXN(X 25-40 ) GDD motif [2] was observed in P1-P2 fusion protein of all the identified viruses (Fig. S4b). Three TMHs were predicted in P1 and P1-P2 proteins of FvPV, PmPV and TaPV while two, four and five TMHs were predicted in P1 and P1-P2 proteins of KmPV, PnPV and ClEV, respectively (Table S1). It is worthy of note that the predicted TMHs are located at the amino terminal regions similar to other poleroviruses possibly facilitating the formation of replication complexes [2]. ORF3a, predicted in all the identified viral genomes except ClEV, encodes protein P3a (44-45 aa) of 4.9-5.0 kDa by a non-canonical start codon. Similar to other poleroviruses [2], a TMH was predicted in P3a of identified viruses containing ORF3a. P3 protein (192-211 aa) of identified viruses with molecular weights 20.7-23.7 kDa contained a luteovirus coat protein domain. Protein P4 (174-196 aa) of 19.6-21.9 kDa containing a putative movement protein motif was encoded by all the identified viruses except ClEV. ORF5 is possibly expressed via a translational read-through of ORF3 stop codon to produce P3-P5 protein (456-698 aa) of 50.5-78.4 kDa with PLRV read-through protein motif in the identified viruses (Table S1). Length and molecular weight of P3-P5 protein could not be determined for FvPV as its coding-complete genome was not recovered. However a smaller contig (1.35 kb) containing 5' truncated ORF5 sequences encoding a truncated protein with PLRV read-through protein motif was recovered (Table S1). ORF3 stop codon read-through site sequence of all the identified viruses including a putative enamovirus ClEV is AAA UAG GUA which is identical to that of other poleroviruses [14]. The C-rich block with CCNNNN tandem repeat motif commonly found in luteoviruses and poleroviruses [15,16] was determined after 9-12 nt downstream of the read-through site in all the identified viruses except FvPV with one CCCCA motif in each. BLASTp analysis of encoded proteins revealed the sequence similarities of FvPV, KmPV, PnPV, PmPV and TaPV to other poleroviruses and ClEV to other enamoviruses (Table S1). Phylogenetic tree constructed based on the P1-P2 sequences grouped ClEV with citrus vein enation virus (CVEV), PnPV with cowpea polerovirus 2 (CPPV2) and TaPV with carrot red leaf virus (CtRLV). FvPV was placed in a sister clade to TaPV and CtRLV. KmPV was related to the poleroviruses of beet and turnip while PmPV was distantly related to PnPV, CPPV2 and poleroviruses of maize (Fig. 2).
All the proteins encoded by the identified viruses shared less than 90% sequence identities at maximum query coverage with any of the respective protein sequences of known poleroviruses/enamoviruses except for P3a of PnPV. Members of the genera Enamovirus and Polerovirus can be regarded as a new species if any of the encoded proteins displayed more than 10% sequence divergence from the corresponding protein sequences of the existing members [1]. Thus, based on the sequence based species demarcation criteria, genome organization, predicted motifs and phylogeny, FvPV, KmPV, PnPV, PmPV and TaPV can be regarded as putative new members of the genus Polerovirus while ClEV can be regarded as a putative novel enamovirus. It is worth mentioning that the reads of few of the identified viruses were discovered in mRNA/sRNA libraries derived from different tissues and/ or cultivars of the respective plant species implying that these viruses would possibly be widespread in nature. The plant species in which the novel poleroviruses were identified includes economically important culinary spice crops of the family Apiaceae-F. vulgare and T. ammi [17,18], a traditionally important medicinal plant-P. methysticum [19], an important grass in natural grasslands of the western Hemisphere-P. notatum [20] and a wild relative of an important potted plant-K. marnieriana [21]. Interestingly, two of the partial RdRp sequences available in GenBank (LT595018, LT595019 from Greece) designated as fennel motley virus shared 88.5%-89.1% nucleotide sequence similarity with FvPV suggesting that these sequences possibly represent FvPV. Further, phylogenetic analysis grouped together the poleroviruses of plants belonging to Apiaceae in a monophyletic clade suggesting their possible origin from a common ancestor. On the other hand, ClEV forms a distinct subgroup of enamoviruses alongwith CVEV and pepper enamovirus. Interestingly, a cytorhabdovirus-Trachyspermum ammi virus 1 [3] and a putative tymovirus-Kava virus 1 [22] were identified in the same Bioprojects from where TaPV and PmPV, respectively, were discovered in the current study.
In the present study, five putative novel poleroviruses and a novel enamovirus were identified in six plant species from public domain databases. Besides serving as a valuable resource for development of detection assays, the recovered genome sequences of identified viruses will help in better understanding the evolution and genomic features of the identified virus groups. Further studies are needed to understand the biological properties, distribution and economic importance of the identified viruses. Fig. 2 Phylogenetic relationship of identified viruses to other poleroviruses and enamoviruses. Phylogenetic tree was constructed using neighbourhood-joining (NJ) method using Poisson model with 1000 bootstrap replicates. Only bootstrap values more the 50% are indicated. Viruses identified in the present study are shown in bold. Poinsettia latent virus was used as outgroup for phylogenetic tree construction ◂