Analysis of public domain plant transcriptomes expands the phylogenetic diversity of the family Secoviridae

Secoviruses are mono-/bipartite plant-infecting, icosahedral RNA viruses that incite economically important diseases in plants. In the present study, nine secoviruses tentatively named as Ananas comosus secovirus (AcSV), Artocarpus altilis secovirus (AaSV), Boehmeria nivea secovirus (BnSV), Gynostemma pentaphyllum secovirus (GpSV), Orobanche cernua secovirus (OcSV), Paris polyphylla secovirus 1 (PpSV1), Paris polyphylla secovirus 2 (PpSV2), Rhododendron delavayi secovirus (RdSV), and Yucca gloriosa secovirus (YgSV) were identified by probing publicly available transcriptomes of eight plant species. Coding-complete genome/genome segments of all the identified viruses encoding a polyprotein were recovered. Two of the nine identified viruses—AcSV and GpSV were discovered in few of the small RNA libraries of respective plant species. Putative cleavage sites were predicted in polyproteins encoded by AcSV, GpSV, PpSV2 and YgSV genome segments. Phylogenetic and sequence identity analyses revealed that AcSV, GpSV and YgSV, PpSV1 and RdSV putatively belong to the genera- Sadwavirus (sub genus: Cholivirus), Fabavirus, Nepovirus and Waikavirus, respectively, while AaSV, BnSV, and PpSV2 may represent a distinct group of viruses within the family Secoviridae as they could not conclusively be assigned to a single genus.

Secoviridae is a family of non-enveloped icosahedral viruses that cause economically important diseases in plants [1]. The positive-sense single-stranded RNA genomes of secovirids are either monopartite or bipartite and are typically 9-13.7 kb in size (size of combined RNAs in case of bipartite viruses). Mostly, the genome or genome segment contains a single large open reading frame (ORF) encoding for a polyprotein that will be cleaved into individual proteins by 3C-like proteinases encoded by the virus. In general, secoviruses are transmitted by insects or nematodes, while few members are also transmitted by seeds [1]. Currently, the family Secoviridae includes nine genera-Comovirus, Fabavirus, Nepovirus, Cheravirus, Sadwavirus, Torradovirus, Sequivirus, Waikavirus, and Stralarivirus [1,2] with the genus Sadwavirus comprising three subgenera-Satsumavirus, Stramovirus, and Cholivirus [3].
Owing to the steady increase in transcriptome sequencing projects, substantial quantum of data is being generated and deposited in public domain Sequence Read Archive (SRA) and Transcriptome Shotgun Assembly (TSA) databases of National Centre for Biotechnology Information (NCBI). Besides plant sequences, the plant transcriptome data can also contain viral sequences, if they were infected with viruses during the time of sample collection. Thus, these databases serve as valuable resources for comprehensive discovery of known/novel viral sequences from a wide range of hosts that would otherwise require costlier viromic studies [4,5]. Such viruses that are discovered from metagenomic datasets can be considered as bona fide ones and be included in International Committee on Taxonomy of Viruses (ICTV) taxonomy [6]. Public domain plant transcriptomes are speculated to contain secoviral sequences, whose discovery, if made, can broaden the phylogenetic diversity of the Edited by Seung-Kook Choi.
family Secoviridae. Thus, in the present study, eight putative novel viruses have been identified through mining of public domain plant transcriptomes for secoviral sequences.
For identification of secoviral RNA1 sequences in plant transcriptomes (Viridiplantae, taxid: 33090), tBLASTn analysis (word size: 6; e value: 0.05; matrix: BLOSUM62) was performed using RNA1-encoded polyprotein sequences of cowpea mosaic virus (NP613283.1) and rice tungro spherical virus (NP042507.1) as queries against TSA database (accessed on March, 2022). Resulting contigs of more than 5 kb were considered and manually examined. If more than one redundant contig was obtained from the same sample, the longest intact contig was considered as the putative RNA1 segment of the identified virus. Genome segments of same viruses identified from different samples/study were deemed as different isolates of the same virus. To obtain the second genomic segment of each putative novel virus, RNA2-encoded polyprotein sequence of the respective closely related virus was used as query in tBLASTn searches, and the longest intact non-redundant contig was regarded as the putative RNA2 segment. In cases where the obtained viral contigs were partial/truncated, coding-complete genome/genome segments were obtained by rnaviral-SPAdes (v 3.15.4) assembly [7] of datasets from where the truncated/partial contigs were derived, after trimming using Trimmomatic (v 0.36) [8]. Further, the assembled contigs were subjected to BLASTn (v 2.10.1) [9] analysis against the recovered partial/truncated viral contigs. The bioinformatic analyses were carried out in Galaxy Australia server (https:// usega laxy. org. au/) [10]. ORFs in putative genome segments of identified viruses were predicted using NCBI ORF finder (https:// www. ncbi. nlm. nih. gov/ orffi nder/). Molecular mass (MM) prediction and motif searches were performed as described in [11]. To identify maximum sequence identity of protein sequences encoded by the recovered viral genome segments with the existing viral sequences at maximum query coverage, BLASTp analysis against NCBI 'non-redundant (nr)' database was performed. Putative cleavage sites in the polyprotein sequence encoded by identified viral genome segments were predicted by multiple sequence alignment with the polyprotein sequences of related viruses. For phylogenetic investigation, the conserved protease-polymerase (Pro-Pol) amino acid (aa) sequences of secoviruses retrieved from NCBI virus database (https:// www. ncbi. nlm. nih. gov/ labs/ virus/ vssi/#/), along with the corresponding sequences of identified viruses were subjected to MUSCLE alignment and phylogenetic tree construction using neighborhood-joining (NJ) method with Poisson model and 1000 bootstrap replicates. Conserved protein domains in the identified viral sequences were visualized using WebLogo (v 3.7) (https:// weblo go. berke ley. edu/) [12]. To identify virus-positive small RNA (sRNA) libraries, BLASTn analysis (word size: 28; e-value: 0.05) of recovered viral genome sequences was carried out against the available sRNA libraries of respective plant species and libraries containing at least 10 reads were considered virus-positive.
The single large genomic segment of RdSV (12.4 kb) encodes for a 422 kDa polyprotein with Waikavirus CP1, HEL, tungro spherical virus-type peptidase, and RdRp domains (Fig. 1, Table S1). BLAST analysis of polyprotein sequence of RdSV showed its maximum amino acid sequence identity of 34.71% (89% query coverage) with poaceae liege virus 1 (Table S1). Phylogenetic analysis placed RdSV in a distinct sub-clade to other waikaviruses (Fig. 2). RNA 1 of AaSV and BnSV were of length 5.8 kb and 5.9 kb, respectively, while RNA 2 of AaSV and BnSV were 4.9 kb and 3.2 kb long, respectively. Genome segments of two PpSV2 isolates (designated as fruit and seed) were recovered in the present study. RNA 1 and 2 of PpSV2 isolate fruit were of length 5.6 kb and 3.2 kb, respectively, while RNA 1 and 2 of PpSV2 isolate seed were 5.7 kb and 3.3 kb, respectively. RNA1-encoded polyprotein 1 of AaSV with MM 204 kDa contains HEL and RdRp motifs, while that of BnSV and PpSV2 isolates with MM 213 kDa and 204 kDa, respectively, contain HEL, picornain 3C and RdRp motifs. RNA2-encoded polyprotein 2 of AaSV with MM 168 kDa contains large CP domain, while that of BnSV and PpSV2 isolates with MM 107 kDa and 110 kDa, respectively, contain large and small CP domains (Fig. 1, Table S1). Cleavage sites predicted in polyprotein 1 of PpSV2 isolate fruit  (Fig. 1). BLAST analysis revealed that AaSVencoded polyproteins shared 27.48% to 32.57% (at 62% to 97% query coverage) amino acid sequence identities with the corresponding sequences of Capsicum annuum fabavirus and a comoviral sequence, BnSV-encoded proteins shared 23.98% to 32.94% (at 96% to 99% query coverage) amino acid sequence identities with bean rugose mosaic virus, and BBWV2 and PpSV2-encoded proteins of both the isolates shared 26.57% to 40.69% (at 95% to 99% query coverage) amino acid sequence identities with PrVF and a comoviral sequence (Table S1). Phylogenetic analysis grouped PpSV2 isolates with BnSV and both these viruses fell in a distinct clade from other comoviruses and fabaviruses, while AaSV formed a distinct clade among the members of the family Secoviridae (Fig. 2).
The conserved GxxGxGKS motif found in plant picornalike virus NTP-binding proteins and three conserved domains found in picorna-like virus RdRps [14] were   (Fig. S1). Plants produce small RNA (sRNA) from viral replicative forms/ intermediates in response to viral infection as part of their antiviral defense strategy [15]. Interestingly, AcSV and GpSV reads were identified in few of the sRNA libraries of respective plant species in which the viruses were originally identified (Table S2). Also, BLASTp analysis showed that PpSV1 RNA1-encoded polyprotein sequence shared 95.80% amino acid sequence identity (87% query coverage) with one of the partial secoviral sequences available in GenBank with accession number MZ679335.1 suggesting that the sequence is of PpSV1. Further, family-level assignment of the identified viral sequences was confirmed through the GRAViTy pipeline (http:// gravi ty. cvr. gla. ac. uk) [16].
Based on species demarcation criteria for the family Secoviridae (< 80% and < 75% amino acid sequence identities for the conserved Pro-Pol and CP, respectively) [1], genome organization, predicted motifs, sequence identities and phylogeny, GpSV and YgSV can be regarded as novel members of the genus Fabavirus, AcSV as a novel sadwavirus (subgenus: Cholivirus), RdSV as a novel waikavirus, PpSV1 as a novel nepovirus, and AaSV, BnSV, PpSV2 as novel members of the family Secoviridae. It is worthy of note that two diverse secoviruses PpSV1 and PpSV2 were identified in the same host-P. polyphylla var. yunnanensis wherein a stralarivirus, cnidium vein yellowing virus 2, was also identified (data not shown). On the other hand, YgSV was identified in the same sample where a novel potexvirus-Yucca gloriosa virus 1 was identified [17]. Similarly, RdSV was identified in the sample wherein a novel betanucleorhabdovirus (Rhododendron delavayi virus 1) and a novel betaflexivirus (Rhododendron delavayi virus 2) were earlier identified [4,17].
Though OcSV and TgCV shared 89.2% amino acid sequence identity based on the conserved Pro-Pol sequence, both the viruses were divergent enough (> 25% amino acid sequence divergence) based on the CP polyprotein sequence. Moreover, TgCV was identified in Trillium govanianum [5], while OcSV was identified in O. cernua var. cumana in the present study. Thus, further studies on biological properties of both these viruses are needed to ascertain OcSV as a novel member of the genus Cheravirus. The present report will serve as a prelude for researches directed at exploring the biological properties of the identified viruses.