Virus diseases often cause serious yield losses in wheat fields in China and in many other countries [6]. Based on the virus taxonomy released by the ICTV in 2019 (http://ictv.global/report), the order Tymovirales has five families: Alphaflexiviridae, Betaflexiviridae, Gammaflexiviridae, Deltaflexiviridae, and Tymoviridae. Family Betaflexiviridae is the largest family with two subfamilies (Quinvirinae and Trivirinae), containing 13 genera and 108 species. Subfamily Quinvirinae and Trivirinae are distinguished by the types of their viral movement proteins. For example, viruses in the subfamily Quinvirinae encode triple gene block (TGB) movement proteins, while the viruses in the subfamily Trivirinae encode the 30K-like movement proteins [1]. Genus Foveavirus, Carlavirus and Robigovirus belong to the subfamily Quinvirinae, while genus Capillovirus, Chordovirus, Citrivirus, Divavirus, Prunevirus, Ravavirus, Tepovirus, Trichovirus, Vitivirus, and Wamavirus are classified in the subfamily Trivirinae [5]. Viruses in the family Betaflexiviridae contain monopartite positive-sense single-stranded RNA genome with 6.5-9.0 kilobases (kb) and 3’ terminal polyadenylated tail. Viruses in some genera are capped at the 5' ends. Virions of viruses in the family Betaflexiviridae are non-enveloped flexuous filamentous of 600-1000 nm long, or even longer, and 12-13 nm in diameter [1].
Recently, RNA-seq technology has been widely used to identify viruses in field-collected plant samples and resulted in identification of many novel plant-infecting viruses [10, 11]. In 2017, we sampled some wheat plants showing yellow mosaic in leaves and stunting from wheat fields in the Kaifeng region, Henan province, China (Fig.1 a, b). Analyses of these samples under a transmission electron microscope found the presence of two different flexuous filamentous virus-like particles of 200-500 × 13 nm (Fig.1 c, d) and 1000-1300 × 13 nm, respectively (Fig.1 e, f).
To characterize these two virus-like particles, total RNA was extracted from the sampled leaf tissues using the EASYspin Plus Complex Plant RNA Kit (Aidlab Biotech, Beijing, China) followed by RNA-seq. The RNA library was synthesized after removing ribosomal RNAs using the TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA), and then sequenced on the Illumina HiSeq X-ten platform (Biomarker Technologies, Beijing, China). Adapters of the paired-end raw reads were trimmed and the low quality reads were filtered using the CLC Genomics Workbench 9.5 software (QIAGEN, Germantown, DEU). The clean reads were first mapped to the wheat genome to identify and remove host sequences, and then assembled de novo into larger contigs using the Trinity program [2, 4, 8]. Sequences representing viral RNAs were reconstructed through the manual alignment using the closely related contigs. The assembled contigs were then used to BLASTx and BLASTp search the NCBI databases to identify sequences that shared high nucleotide (nt) sequence similarities and amino acid (aa) sequence identities. The 5' end and 3' end sequences of the assembled contigs were determined through the 5' end and 3' end RACE PCR as instructed (Roche Diagnostic, Milan, Italy). The amplified products were inserted individually into the pLB cloning vector (Tiangen, Beijing, China), and then transformed into Escherichia coli DH5 cells followed by DNA sequencing. The resulting nucleotide and the predicted protein sequences were analyzed using the Snap program V4.1.9 software (GSL Biotech, San Diego, CA). Multiple alignments using nucleotide and amino acid sequences were done using the MEGA X software as described with default settings [3].
The result of BLASTx search identified a contig of 7621 nt that covered 99.8% (7621 nt out of 7635 nt) of WYMV genome RNA1 and shared 99% nt sequence similarity with the WYMV genomic RNA1 (FJ361766.1). The BLASTx search also identified a contig of 3634 nt that covered 99.6% (3634 nt out of 3650 nt) of WYMV genome RNA2, and shared 99% nt sequence similarity with the WYMV genomic RNA2 (FJ361769.1). This result supported that the result obtained through transmission electron microscopy (Fig. 1 c, d), and indicated that WYMV was a causal virus of the diseased wheat plants. In addition, another contig of 8403 nt was found to share the highest amino acid sequence identity (43.78%) with the replication-associated protein of peach chlorotic mottle virus (PCMV). Identification of this novel contig indicated the presence of an unknown virus in the infected wheat plants, and this virus might be a member in the family Betaflexiviridae. To confirm this finding, we amplified the full-length novel virus sequence through RT-PCR followed by 5’end and 3’ end RACEs. The sequencing result showed that the novel virus genomic RNA contained 8140 nt, excluding its poly (A) tail (GenBank accession: MW771279). ORFs of the viral genomic RNA were then predicted using the ORF Finder software (www.ncbi.nlm.nih.gov/orffinder), and the proteins were predicted using the CDD software [7]. The results uncovered that this novel virus had six ORFs (Fig. 2 a), a 59 nt 5’ end untranslated region (UTR), and an 11 nt 3’ end UTR. The ORF1 started from the AUG start codon at the nt position 60 and ended at the TAA stop codon at the nt position 6213. This ORF1 was predicted to encode a putative RNA-dependent RNA polymerase (RdRp) with 2070 aa residues and an estimated molecular mass (Mr) of 233.8 kDa. This putative RdRp contained four conserved domains: a methyltransferase domain (aa position 44-357, pfam01660), a peptidase domain (aa position 1080-1167, cl05111), a helicase domain (aa position 1254-1506, cl26263), and an RNA-dependent RNA polymerase domain (aa position 1678-1976, cl03049). Although sequence alignment analysis showed that the RdRp aa sequence of this novel virus shared the highest aa sequence identity (60.9%) with the RdRp of banana virus X (BVX) (Table 1) followed by the RdRp of PCMV (37.0%), the novel virus ORF1 is 6213 nt long, while the ORF1s of BVX and PCMV are 906 nt and 6090 nt long, respectively (Fig. 2). Further analyses of ORF2, ORF3 and ORF4 that encode triple gene block (TGB1, TGB2, and TGB3) proteins showed that the novel virus ORF2-4 were similar to the ORF2-4 of other viruses in the subfamily Quinvirinae [1]. For example, the ORF2 of the novel virus was predicted to start from the AUG codon at the nt position 6286 and end at the UAG stop codon at the nt position 6957 (Fig.2). This ORF2 was predicted to encode a 24.5 kDa polypeptide containing a conserved helicase domain (aa position 24-218, pfam01443), and shared the highest aa sequence identity (42.5%) with the TGB1 of PCMV (Fig. 2, Table 1). The ORF3 of the novel virus was predicted to start from the nt position 6947 and end at the nt position 7309. This ORF3 was predicted to encode a 13.4 kDa protein containing a conserved viral movement protein domain (aa position 5-104, cl03157), and shared the highest aa sequence identity (43.2%) with the TGP3 of elderberry carlavirus A (EBCVA)(Fig. 2, Table 1). The ORF4 of the novel virus was predicted to start from the nt position 7218 and end at the nt position 7475. This ORF4 was predicted to encode a 9.2 kDa protein that shared the highest aa sequence identity (33.8%) with the TGB4 of butterbur mosaic virus (BMV) (Table 1). The ORF5 of the novel virus was predicted to start from the nt position 7487 and end at the nt position 8278 (Fig. 2). This ORF5 was predicted to encode a 28.5 kDa protein with a conserved flexi CP domain (aa 83-220, cl02836). This protein shared the highest aa sequence identity (33.5%) with the CP of PCMV (33.5%) (Fig.2, Table 1). The ORF6 of the novel virus is partially overlapped with its ORF5 and was predicted to start from the nt position 7941 and end at the nt position 8399. This ORF was predicted to encode a 17.4 kDa protein with unknown function, and have no significant sequence homology with other known proteins. Alignment using genomic sequences of this novel virus and nine representative members in the subfamily Quinvirinae showed that the genome of this novel virus shared the highest nt sequence similarity (45.6%) with the genome of PCMV followed by BVX (44.9%) (Table 1).
To investigate the infectivity of this novel virus, leaves of the plants showing typical yellow mosaic and stunting were ground in 0.1 M phosphate buffer and then mechanically inoculated to several assay plants, including wheat, Nicotiana benthamiana, N. occidentalis, N. tabacum cv. Samsun, Chenopodium quinoa, C. amaranticolor and Gomphrena globosa grown inside a greenhouse at 25°C. All the inoculated plants remained symptomless during the study and the novel virus was not detected in them through RT-PCR at 30 days post inoculation.
Phylogenetic analyses using whole genomic sequences of the novel virus and the representative viruses in five different families in the order Tymovirales indicated that the novel virus was more closely phylogenetically related to the members in the family Betaflexiviridae (Fig. 3 a). The phylogenetic trees constructed using the whole genome sequences, and the RdRp or CP sequences of the novel virus and the representative viruses in the family Betaflexiviridae showed that this novel virus should be placed in the subfamily Quinvirinae, but is distinct from the members in other existing genera (Fig. 3 b, c, d). Alinement analyses showed that the viral genome and protein sequence similarities between the novel virus and the other members in the subfamily Quinvirinae ranged from 41.8% to 45.6%, and 14.7% to 60.9%, respectively (Table 1). For RdRp and CP, the aa sequence identities between the novel virus and the other members ranged from 22.9% to 37%. Though BVX showed 60.9% aa sequence identities with the novel virus RdRp, but BVX RdRp sequence was remarkable shorten with only 301 aa in length, compared with the novel virus with 2070 aa in length. While, the novel virus CP showed only 33.5% aa sequence identities with BVX CP (Table 1). These genome, RdRp, CP sequence differences not only exceed the cut-off values for species discrimination (i.e., 72% for the nt and 80% for the aa sequence), but also the cut-off values for genus discrimination (i.e., 45% for the nt and aa sequences) [1]. Consequently, we consider that the novel virus as a new species in the subfamily Quinvirinae. Because the differences in virus genome size, genome structure, and phylogenetic relationship with other members in the family Betaflexiviridae, we were unable to place this novel virus in any existing genus in the family Betaflexiviridae. Therefore, we tentatively named this novel virus as wheat yellow stunt-associated virus (WYSaBV) and proposed to place it in a new genus in the family Betaflexiviridae.
Some questions need to be further investigated in future, including (i) Where is WYSaBV originated? (ii) What types of damages can single infection of WYSaBV on wheat cause in wheat plant? (iii) How is WYSaBV transmitted in the field? (iv) Considering that WYSaBV and WYMV mix-infection on wheat, and WYMV is transmitted by a soilborne fungus-like organism Polymyxa graminis [9]. Can P. graminis also transmit WYSaBV in the field? Nonetheless, identification of WYSaBV on wheat plants in this study indicates the need of caution in order to develop an effective management strategy for virus disease control in wheat fields in China.