Metatranscriptomics and small RNA analysis revealed that viral covert coinfection resulted in disease symptoms reminiscent of rice sterility

Viral pathogens are a major threat to stable crop production. The discovery of viral diseases traditionally concerns apparent infection that shows obvious symptoms in crop plants. However, little is known about the covert infection of crop plants by viruses. In this study, we used deep metatranscriptomic sequencing and small RNA analysis to identify covert infection of rice plants by viruses. Our results showed that introgression of the dominant brown planthopper (BPH) resistance gene Bph3 into the high-yielding but BPH-susceptible indica variety Ms55 via a backcross strategy significantly enhanced resistance to BPH. However, Bph3 -carrying backcross lines infested by BPH exhibited panicle enclosure and failed to produce seed at the mature stage, which are typical characteristics of sterile rice plants. Using a metatranscriptomic analysis, we identified six RNA viruses in backcross line Rby1 and eleven RNA viruses in backcross line Rby2, including eight novel viruses that fell within existing families and orders. Furthermore, our small RNA analysis revealed the biogenesis of viral small interfering RNAs that represented active virus infection in rice plants. We identified viral in sterile plants by deep metatranscriptomic sequencing and small RNA analysis. Our results suggested that covert coinfection of rice plants by RNA viruses resulted in disease symptoms reminiscent of rice sterility. To develop rice varieties resistant to BPH, it is necessary to genes resistant to not only BPH but also viral infection. Our that healthy rice while were the any of or We tried to determine rice sterility. we not any rice indicating that viral could not be identified traditional approaches.


Multiple RNA viruses were identified in sterile rice plants by deep metatranscriptomic sequencing
Rice plants of lines Rby1 or Rby2 not infested by BPH were as healthy as those of Ms55 or TZ21 ( Fig.   1a-b). However, those of lines Rby1 and Rby2 became sterile when they were infested by BPH, and these sterile rice plants did not exhibit any disease symptoms indicative of fungal or bacterial infections or typical characteristics of viral infection, such as pronounced stunting and dark green leaves. The only difference between the healthy and sterile rice plants was BPH infestation; it is known that BPH can transmit viruses as vectors to rice plants, but we did not isolate any viruses from the sterile rice plants. We  reads, respectively. We de novo assembled these reads into 328,146 contigs for Rby1-21 and 586,111 contigs for Rby2-45 (Table 1).
Next, we identified virus-like sequences by BLASTing against reference sequences from viral genomes available in GenBank. We discovered eight novel viruses in both rice lines (Fig. 2), including three negative-sense RNA viruses and five positive-sense RNA viruses (Table S1). Of the eight RNA viruses, rice mononega-like virus (RMV) and rice peribunya-like virus (RPeV) were present in both Rby1-21 and Rby2-45 (Table S1). Rice picorna-like virus 2 (RPiV2) and rice noda-like virus (RNV) were present only in Rby1-21 (Table 1 and Table S1). Rice phasma-like virus (RPhV), rice tombus-like virus 2 (RTV2), rice tombus-like virus 3 (RTV3) and Rice picorna-like virus 3 (RPiV3) were present only in Rby2-45 (Table 1 and Table S1). Thus, we identified thirteen RNA viruses from Rby1-21 and Rby2-45 in total, including five known RNA viruses and eight novel RNA viruses (Table 1). We detected six RNA viruses in Rby1-21 and eleven RNA viruses in Rby2-45 (Table 1). Of these viruses, four RNA viruses were present in   both Rby1-21 and Rby2-45, two RNA viruses were present in only Rby1-21, and seven RNA viruses were present in only Rby2-45 (Fig. S2). However, we did not detect any viruses in Rby1-N65 from line Rby1 or in Rby2-N32 from line Rby2, which were not infested by BPH.
We identified three novel negative-sense RNA viruses (Table S1). Of these viruses, RPeV is closely related to Penicillium roseopurpureum negative ssRNA virus 1, RPhV is related to Anopheles triannulatus orthophasmavirus, and RMV is similar to Tacheng tick virus 5 (Table S1). In the RdRp phylogeny, RPeV was clustered within the family Peribunyaviridae, and RPhV belonged to the family Phasmaviridae, both of the order Bunyavirales (Fig. 3). RMV clustered within a currently unclassified family of the order Mononegavirales ( Fig. 3 and Fig.   S3).
We also discovered five undescribed positive-sense RNA viruses in this study (Table S1).
Although phylogenetic analysis showed that these sequences are viral in origin, they may represent endogenous viral elements (EVEs) integrated into the rice plant genome [11] or may be derived from surface contamination rather than active infection. To exclude the possibility that these contigs represent EVEs segregating in rice plants, we mapped the raw reads from the Rby1-21 and Rby2-45 genomes to our set of candidate viruses revealed by BLAST and confirmed that no genome mapped at a rate high enough to be consistent with a genome copy of any virus in a particular individual. To map the resulting small RNA reads to the putative viruses described above, we first removed the small RNA reads that are specific to rice plants and then aligned the reads to the viral genomes described in Table 1. The results showed that differentially abundant small RNAs were mapped to putative viruses (Table S2) These small RNAs occur mainly in the sense and antisense orientations (Fig. 4), indicating that the vsiRNAs were produced from double-stranded RNA replicative intermediates and that Dicer-like enzymes cleaved double-stranded RNAs into vsiRNA. The cleaved double-stranded vsiRNAs are bound and sorted by AGOs depending on the 5'-terminal base [31]. To assess the base preference at the 5' terminus in the rice samples, we calculated the base percentage for the vsiRNAs. As shown for the three viruses (RTV1, RRSV and RPeV) in Fig. 4, base G was the least favored at the 5'-terminal nucleotide for all viruses, and the viruses showed a distinct preference for the remaining three bases at the 5' terminus. Base U was enriched in 21-nt vsiRNAs for RPeV and RRSV, while base C was enriched in those for RTV1. These results were consistent with those of a previous study [32]. Twentyone-nucleotide vsiRNAs occurred in regions of identified genomic RNAs (Fig. 5), and those derived from antisense strands showed a high proportion for each virus (Table S3). Collectively, our data demonstrated that RTV1 and RRSV in Rby1-21 as well as RPeV, RTV1 and RRSV in Rby2-45 were active infections.

Discussion
Viruses are a major threat to global rice production. In this report, we described a diverse set of new viruses in rice lines Rby1 and Rby2 through deep transcriptomic sequencing. We identified and assembled thirteen complete or nearly complete genomic sequences, eight of which were previously undescribed, including three negative-sense RNA viruses and five positive-sense RNA viruses. We also identified five known viruses containing two positive-strand viruses, RTV1 and RPiV1, as well as three double-stranded viruses, RRSV, NLRV and RToV [1,33,34]. Notably, RPiV1 and RToV were first discovered in the white-backed planthopper Sogatella furcifera [34,35]. In this report, both viruses were identified in Rby2-45, indicating that they have more hosts and that BPH might harbor RPiV1 and RToV and transmit them to Rby2-45. Metatranscriptomic approaches have been successfully implemented to discover highly diverse and novel viruses from mosquito, bat, and Drosophila samples [20, 22,36,37]. To the best of our knowledge, this is the first comprehensive highthroughput survey of viral sequences associated with rice plants.
Breeding for resistant rice cultivars is one of the most important approaches to control insect pests [38][39][40]. In this report, we used Ms55, an indica variety susceptible to BPH, to backcross with TZ21 harboring the resistance gene Bph3 to produce a BC 1 F population. During a 5-year period (2013-2017), the Bph3-carrying rice lines from the BC 1 F population were as healthy as TZ21 in 2013, 2015 and 2016, while the same rice lines were sterile in 2014 and 2017. We carefully sought to explain the rice sterility that had been occurring since 2014. In our study, these sterile rice plants were infested by BPH, an important vector that can harbor and transmit viruses to rice plants, and we speculated that virus infection may cause rice sterility. However, we have not isolated any viruses from these sterile rice lines, indicating that viral pathogen(s) could not be identified by traditional approaches.
We then tried to use deep metatranscriptomic sequencing to search for possible viral pathogen(s), and we in turn identified thirteen RNA viruses. Small RNA analysis revealed that RTV1 and RRSV in Rby1-21 as well as RPeV, RTV1 and RRSV in Rby2-45 were active infections. Although Rby1 and Rby2 are resistant to BPH, they are susceptible to viral infection. Thus, to develop rice varieties resistant to BPH, it is necessary to introgress genes resistant to not only BPH but also viral infection.
To confirm that covert coinfection of rice plants by RNA viruses is associated with sterile symptoms, we showed several lines of evidence: (i) Both rice lines from the BC 1 F population were sterile only when they were infested by BPH, while rice plants not infested by BPH grew normally, as did the healthy TZ21 plants, and It is known that BPH can harbor viruses and transmit them to rice plants; (ii) We identified multiple viruses in each line by deep metatranscriptomic sequencing, and vsiRNA analysis revealed that RTV1 and RRSV in Rby1-21 as well as RTV1, RRSV and RPeV in Rby2-45 were active infections; (iii) We searched for contigs that would match the DNA polymerase of DNA viruses, but we did not find any sequence related to viral DNA polymerases; and (iv) We also reasoned that contigs that lack similarity searches to reference sequences but that display a signature of DCL2, DCL3 or DCL4 processing (high levels of 21-24 nt vsiRNAs) may also be viral in origin [41,42]. Using these small RNA criteria, we did not identify any contig that is potentially viral in origin. Additionally, the agronomic traits of the backcross lines not infested by BPH were similar to those of Ms55 and TZ21, including earing and seed formation, indicating that these rice plants were not infected by pathogens, including bacterial or fungal ones. Healthy and sterile rice plants from lines Rby1 and Rby2 grew under the same greenhouse conditions, suggesting that the possibility that fungal or bacterial infection caused rice sterility was excluded. Thus, rice sterility was derived from coinfections of RTV1 and RRSV in Rby1-21 as well as RTV1, RRSV and RPeV in Rby2-45. Taken together, our results suggest that covert coinfection of rice plants by RNA viruses rather than rice hybridization resulted in rice sterility.
In this report, we did not isolate any virus from infected rice plants despite of extensive efforts. A virus that exhibited high (85%) nucleotide sequence similarity to RTV1, Dianthovirus RNA1-like RNA (DR1L RNA), was found in grassy stunt-diseased rice plants together with rice grassy stunt virus (RGSV) [43]. Although the nucleotide sequence of DR1L RNA was identified from cDNA library of grassy stunt-diseased rice leaf tissues, no virus-like particle was observed on fractions from a sucrose density gradient, or no virus was isolated from infected rice plants [43]. In our study, RTV1 identified from both Rby1 and Rby2 showed similar results. RTV1 and DR1L RNA were detected in samples with viral mixed infections, whereas individual RTV1 or DR1L RNA isolate has not been reported to date.

Sequence read assembly and virus discovery
Sequencing reads were assembled de novo by using Trinity [46]. The assembled contigs were first compared against the database of all reference RNA virus proteins downloaded from GenBank by using BLASTX, with an E value cutoff at 1E-5 to maximize sensitivity while minimizing false-positive results. The resultant contigs were then compared to information in the nonredundant nucleotide (nt) and protein (nr) databases to remove nonviral sequences. We also performed domain-based BLAST to detect highly divergent viruses. The assembled contigs were compared to the information within the

Virus genome annotation
The potential ORFs of the newly identified virus genomes were annotated based on the predicted amino acid sequences and conserved positions in the genome compared to those of the closest related virus genome available in GenBank. Functional domains within each ORF were identified by using BLAST against the CDD, with an expected value threshold of 1E-5.
For viruses with multiple RNA segments, we used various strategies to search for viral genome segments, as described previously [14,22]. Non-RdRp segments were identified by homology to the proteins of related reference viruses. Other potential segments that displayed no homology to sequences in the database were identified by using an in silico approach that utilizes information on RNA quantity, protein structure, and/or conserved genomic termini. To determine which segments belong to the same virus, we checked (i) the sequence depth of the segments, (ii) the presence of conserved regulatory sequences in the noncoding regions located at termini of the viral genomes, and (iii) the phylogenetic positions of related viral proteins.
We used RT-PCR and Sanger sequencing to verify the presence of each putative virus in the samples.
Primers were designed on the basis of the contigs assembled from next-generation sequencing.

Estimation of viral transcript frequency
Viral transcript frequency were estimated according to the methods described previously [14]. To determine the frequency of viral RNAs, we estimated the percentage of reads that mapped to viral RNA within the transcriptome of each sample. To reduce any bias caused by the unequal efficiency of rRNA removal during library preparation, we first removed reads that mapped to rRNA contigs from each library, the remaining reads were then mapped to the entire collection of virus sequences within the library, from which we calculated the overall percentage of viral reads. To confirm that viral RNA comprised a large percentage of transcripts within the sample transcriptome, we re-isolated the total RNA from aliquots of the original homogenates and performed RNA-seq library preparation and sequencing as described above.

Phylogenetic analysis
To determine the phylogenetic relationships of the newly identified RNA viruses, the amino acid sequences of the viral RNA-dependent RNA polymerase (RdRp) proteins identified in this study and those retrieved from GenBank were aligned to infer their evolutionary relationship [14]. The viral

Small RNA analysis
Bioinformatic analysis of the small RNA data was performed using the CLC Genomic Workbench software package (Qiagen). Briefly, small RNA reads were quality checked, and low-quality reads and adapter sequences were first removed from the raw small RNA data set. Trimmed small RNA sequences shorter than 15 nucleotides were discarded. The remaining reads were mapped to the rice genome to remove host-related reads, and the unmapped reads were subsequently mapped to putative viruses with the same stringency settings.

Availability of data and materials
All viral genome sequences generated in this study have been deposited in the GenBank database under accession numbers MT317153-MT317177. The raw sequence data reads are available at the