Overall view of the virome. We performed a large-scale viral metagenomics survey of potential plant leaf-associated viruses in 161 plant species belonging in 38 different orders, 7 classes (Coniferopsida, Cycadopsida, Dicotyledoneae, Filicopsida, Ginkgopsida, Magnoliopsida, and Monocotyledoneae), and 4 phyla (Angiospermae, Gymnospermae, Pteridophyta, and Tracheophyta) existing in a riverside plant ecosystem (Fig. 1a, Supplementary Table 1, and Supplementary Data 1 and 2). Among the 161 species sampled, 89 belong to wild plant, and 72 are cultivated types. For each species three leaf tissue samples from three different individual plants were collected. After crushing material with a mortar and pestle, supernatants of the 3 leaves from 3 different individual plants in the same plant species were mixed into a sample pool for viral metagenomics library construction. After virus nucleic acid particles enrichment using filtration (removing eukaryotic and bacterial cell-sized particles) and DNase and RNase treatment (digesting unprotected nucleic acid), total nucleic acid were then extracted and then organized into 161 libraries for Illumina Hiseq 2500 sequencing (Supplementary Table 1). In total, 50,586,188 paired-end reads were generated and binned by barcodes and quality-filtered, leaving high-quality sequence reads which were de novo assembled within each barcode. The resulting sequence contigs and unassembled reads were compared with the viral reference database and the GenBank non-redundant protein database using a BLASTx search with an E value cut-off of <10-5. Among the 161 libraries, 147 libraries contain sequences showing significant similarities to known viruses with viral reads consisting of 0.04% to 97.93% of the total unique reads, where 52 libraries contained >50% of viral sequence reads (Fig. 1a, Supplementary Table 1). From these plants, 34 different groups of viruses were detected, including viruses belonging in 26 families, 1 genus (Botybirnavirus) unclassified in family, and 7 unclassified groups including circular replication-associated protein encoding single-stranded DNA virus (CRESS DNA virus), Parvo-like virus, Hepe-like virus, Noda-like virus, Permutotetra-like virus, Rhabdo-like virus, Sobemo-like virus, unclassified members of Picornavirales order, and unclassified members of the Riboviria domain (Fig. 1b, Supplementary Table 1). Comparison of the percentage of viral reads against the total unique reads and the number virus types in each library showed no significant difference between wild and cultivated plant samples (Supplementary Data 3 and 4, Supplementary Table 1), suggesting that in the local plant ecosystem different cultivation modes of plants had no discernable effect on susceptibility to virus infection. From these viral sequences, 251 virus strains generated compete genome (n=202) or nearly complete genome sequences (n=49), including 5 RNA virus strains belonging to segmented viruses (Fig. 1b). BLASTx search using nucleotide sequences of the 251 virus strains revealed that 61 of them shared <40% amino acid sequence identities with their best matches in GenBank (Fig. 1c, Supplementary Table 2), suggesting many of the virus strains discovered here are highly divergent from previously known viruses, and could be considered as new virus families or orders. To phylogenetically analyze these viruses derived from plants, amino acid sequence of the most conserved regions, including RNA-dependent RNA polymerase (RdRp) domains for virus belonging in Riboviria, replication proteins (Rep) for CRESS DNA virus, and non-structural protein (NS) for parvovirus, were used in phylogenetic analysis.
Expanding plants as new host of some viruses. Since our sample processing only involves the leaf of plants, and the samples were carefully washed with double distilled water (ddH2O) three times before sample treatment, we assume most of the viruses characterized here were from leaf tissues instead of from other organisms on the plant leaf surface. All of the collected samples were leaves from healthy appearing plants. We detected members of 34 different groups of viruses and fully characterized the genomes from 27 of them, including 12 groups with viruses not previously reported from plants (Fig 2, see Supplementary Data 5-19 for detailed phylogenies).
The Dicistroviridae family, within the order Picornavirales, is a group of viruses currently composed of 3 genera, whose natural hosts are invertebrates, including aphids, leafhoppers, flies, bees, ants, and silkworms[2]. Here, we assembled 23 genomes from 9 different species of plants, where 7 virus strains were grouped into three previously classified genera while the other 16 strains were clustered into a separate group genetically far from the three known genera (Fig 2, Supplementary Table 1, see Supplementary Data 5). These dicistroviruses in the separate group showed typical genome organizations of dicistroviruses except that 10 of the 14 strains showed no cricket paralysis virus (CRPV) capsid superfamily domain in the capsid protein (Supplementary Data 6). Based on RdRp protein sequences of the 16 strains in the separate cluster, they shared <50% similarities to their best BLASTp matches in GenBank which were all phylogenetically located outside of the new clade, suggesting they might belong to a new genus in family Dicistroviridae. In arthropods, infection acquisition and transmission of dicistrovirus is prominently accomplished by ingestion and spread from the alimentary canal. In alimentary canal the virus generally replicates in epithelial cells of the gut and is subsequently shed into the gut lumen, being accumulated in the feces which is often an important infectious source[2,3]. Based on the transmission pattern of dicistrovirus in arthropod, the infection of dicistroviruses in plants may occur when virus-contaminated feces are shed onto the plant leaf surface by insects.
In three different species of plants, three divergent iflavirus strains were discovered and their complete genomes generated, all of which clustered within genus iflavirus based on phylogenetic analysis (Fig 2, Supplementary Data 7). The family Iflaviridae is a member in the order Picornavirales, which have also all been isolated from arthropods. Although vertical and sexual transmission has been reported among invertebrates for some iflaviruses, the most common route of infection for iflaviruses is through ingestion of virus-contaminated food sources[4,5]. Spread of iflaviruses in plants may therefore also occur through contaminated feces of arthropods.
We also identified 12 marnavirus strains from 5 different species of plants that shared 30%-60% sequence identities based on pairwise comparison of polyprotein sequence and showed typical genome organization of Marnaviridae (Supplementary Data 8). Based on BLASTx searches, two of the 12 plant marnavirus strains were also related to viruses from non-marine samples. Phylogenetic analysis including reference marnaviruses and the BLASTx matching viruses from non-marine samples revealed that the 12 plant marnaviruses grouped well into the cluster of genus marnavirus within Marnaviridae, which indicated the marnaviruses group includes closely related viruses from plants and two strains from fish and mollusk (Fig. 2, Supplementary Data 9), respectively. Marnaviridae is a newly defined virus family in order Picornavirales, the currently characterized representative member being Heterosigma akashiwo RNA virus, isolated from Heterosigma akashiwo algae in ocean water[6]. Closely related viruses have been identified in ocean marine environments [7]. Our data suggest that plants are capable of hosting some members in family Marnaviridae or their cellular hosts.
In 3 different plant species, we acquired 6 virus stains with complete genomes showing significant sequence similarity to parvovirus-like hybrid virus (PHV) and 2 viruses showing close relationship to densovirus (Fig. 2, Supplementary Data 10). These plant PHV genomes were linear with length of 3.6-4.0-kb containing two major forward-direction ORFs encoding the replication and capsid proteins (Supplementary Data 11), which is characteristic of viruses in family Parvoviridae. The 6 PHVs detected in plants were grouped in two different clusters, sharing sequence similarities of 50%-67% to other PHVs based replication protein sequence. PHV is a type of highly divergent DNA virus which was recently discovered and phylogenetically located at the interface between the Parvoviridae and Circoviridae[8,9]. Although this virus was first detected in Chinese patients with seronegative (non-A-E) hepatitis and subsequently discovered in a wide range of clinical samples, sharing ∼99% nucleotide and amino acid identity with each other[8], it was eventually traced to contaminated silica-binding spin columns used for nucleic acid extraction[9]. The silica matrix is generally generated by diatoms (algae), belonging to microscopic water plants, detecting PHV in silica-binding spin columns might be the initial evidence that plants can serve as the hosts of PHV. Our data further confirm that plants (or diatoms within them) are capable of hosting PHVs.
Besides the above four groups of viruses with multiple divergent stains found here in plant tissues, another 4 groups of viruses, not previously reported in plants, including noda-like virus, Permutotetra-like virus, Yanvirus-like virus, and Chuvirus-like virus, were also detected here (Fig. 2, Supplementary Data 12-15). These viral groups were recently reported from invertebrates meta-transcriptomes, and vertebrates and environment samples[10–12]. Discovering these viruses in plant leaf samples suggests that plants may also be the natural hosts for some members of these recently described clades. Bastrovirus was previously only detected in feces of mammals (including human) and mosquito, shows a distant relationship to astroviruses[13,14]. Here, a species of plant (Solanum melongena) was positive for virus genome sequence showing 25% RdRp sequence similarity to that of bastrovirus (Fig. 2, Supplementary Data 16). Detecting this divergent bastrovirus-like virus in plants may imply bastrovirus originates from plant and /or that its diverse members can infect widely different hosts including vertebrates, invertebrates and plants. Another species of plant was positive for hepe-like virus, which have been reported in mammals, invertebrates, protists, and different environments [12,15–17]. This hepe-like virus strain from plant was well grouped with other hepe-like viruses from different type of organism and environment samples and shared similar genome organization (Fig. 2, Supplementary Data 17), suggesting this type virus may also parasitize plants. Two types of viruses, botybirnavirus and narna-like virus, which were considered to be viruses of fungi[18,19] and more recently Caenorhabditis nematodes[20], were detected in two species of plants, respectively (Fig. 2, Supplementary Data 18 and 19). The botybirnavirus showed high sequence identity (96.4%) to fungi batybirnavirus based on RdRp protein sequence. The two narna-like virus strains from 2 different species of plants shared 99.9% nucleotide sequence identity and identical based on RdRp protein sequence, and were divergent from previous narna-like viruses.
Divergent viruses in plants. For these 12 groups of viruses, first reported in plants here, some genomes were so divergent from their closest identifiable relatives using BLASTx they may ultimately qualify as members of new genera or even new families (Fig. 2). For example, for the 23 dicistrovirus genomes, 7 of which grouped well into previously defined genera, the other 16 strains seem to form a separate clade which could be designated a new genus in the Dicistroviridae family. The same conclusion could also apply to some genomes in the groups of noda-like, hepe-like, and bastrovirus-like viruses and in the Marnaviridae family.
Another 23 divergent RNA viral genomes whose closest relatives are in the Picornavirales order were characterized. Phylogenetic analysis based on RdRp sequences of the 6 defined families and the best matches of the 23 strains in GenBank showed that they were grouped into 8 different clusters which were genetically distinct from the defined 6 families in the order Picornavirales (Fig 3, Supplementary Data 20).
Tombusviridae is a large family of plant viruses that is currently composed of more than 76 species divided among 3 subfamilies and 16 genera. Here, we acquired 21 genomes showing sequence similarity to members of the Tombusviridae. Seven genomes were genetically close to defined genera while the other 14 were highly divergent and seemed to form several distinct genera (Fig 3, Supplementary Data 21). Four different virus strains belonging to family Luteoviridae were also detected in plants here, 3 of which closely clustered with different defined genera, with the remaining forming a single deeply rooted separate branch, which may belong to a putative new genus clustering outside the genus luteovirus (Fig 3, Supplementary Data 22). Four partitivirus strains were characterized in three different species of plants, all of them were putative new species within three different genera of Partitiviridae (Fig 3, Supplementary Data 23). Seven virus genomes identified here also showed sequence similarity to sobemo-like viruses which were recently discovered from arthropods using meta-transcriptomics [15]. Although these plants sobemo-like viruses phylogenetically grouped together with invertebrate sobemo-like viruses they were genetically distinct and sharing 30%-62% amino acid sequence similarities to each other (Fig 3, Supplementary Data 24). Two plant rhabdo-like viruses also showed a close relationship to recently discovered invertebrate derived rhabdo-like viruses (Fig 3, Supplementary Data 25). Last one divergent RNA genome showed a distant relationship to three genomes belonging to an unclassified member of the Riboviria domain, all from wastewater or soil samples, consistent with a plant origin (Fig 3, Supplementary Data 26).
Plant CRESS virus. CRESS DNA virus is the informal name of several groups of single-stranded (ss) DNA viruses that have circular and replication-associated protein encoding genome, which show high diversity and abundance in various habitats[21,22]. Although there are currently several established CRESS DNA virus families including Bacillidnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Microviridae, Nanoviridae and Smacoviridae, a large number of novel CRESS DNA viruses have been discovered recently and have not been formally classified, for which the hosts are currently unknown [22–24]. Among these well-defined CRESS DNA virus families, Geminiviridae and Nanoviridae are two plant-infecting members, which also help the replication and package of a satellite virus: Alphasatellitidae, another type of circular ssDNA genome[25]. Here, from plant leaves we acquired 79 circular genomes, among which 7 were genetically close to Geminiviridae, 9 grouped well into the family Genomoviridae, 7 clustered closely to known sequences of Alphasatellitidae, 15 belong to new divergent members in family Microviridae presumably from bacteria, with the remaining 41 showing significant sequence similarity to unclassified CRESS DNA viruses (Fig. 4).
Among the 7 CRESS DNA viruses belonging in family Geminiviridae, 2 of them felt well into the cluster of the genus begomovirus, being closely to sweet potato leaf curl virus, a monopartite geminivirus. The other 5 were not grouped into any known genus in family Geminiviriae but deeply clustered outside of all known geminiviruses, suggesting these 5 novel geminiviruses might belong to new genus (genera) in Geminiviridae (Fig.4, Supplementary Data 27). Viruses in the family Genomoviridae have been frequently found to be associated with a variety of samples ranging from fungi to animal sera [26], indicating that genomoviruses are widespread as well as abundant in the environment. Here, 9 complete genomes of genomovirus, divergent from previous known members in that family, were characterized in 7 different plant species, which phylogenetically clustered into 5 different groups, including two identical genomes detected in two different plant species (Fig 4, Supplementary Data 28). Currently, the hosts of the large majority of CRESS-DNA viruses remain unknown except for one replicating in both fungi[27] and an insect[28]. Detecting genomoviruses in leaf samples from different species of plant may suggest plants or an internal plant-dwelling organism, may host some members in the family Genomoviridae.
We also discovered 7 divergent complete circular genomes in a single species of plant, which showed sequence identities of 38%-58% to previous known genomes of members in Alphasatellitidae based on amino acid sequence of encoded Rep protein. The 7 alphasatellites had genome sizes ranging from 1309 to 1503 nucleotides, which were divergent from each other and grouped into 4 different clusters composed of previous defined alphasatellites based on phylogenetic analysis of their Rep protein (Fig 4, Supplementary Data 29). Alphasatellites are circular ssDNA components which are generally associated with Nanoviridae or some members in Geminiviridae, however, we did not detect geminivirus or nanovirus sequence in this species of plant, but discovered a divergent CRESS DNA virus genome that showed the highest Rep protein sequence similarity of 60.7% to an unclassified CRESS DNA virus, temperate fruit decay-associated virus[29], suggesting this type of CRESS DNA virus may infects plant and serves as helper virus for alphasatellites.
Apart from the 3 groups of viruses within classified CRESS DNA virus families, other 41 unclassified CRESS DNA viruses were also discovered from different species of plant. These CRESS DNA viruses were so divergent from each other, we phylogenetically analyzed them in 6 different phylogenetic trees (Fig 4, Supplementary Data 30-35), where each of them includes strains identified here, their best matches in GenBank, and the representative members in known CRESS DNA virus families and other unclassified CRESS DNA viruses, using fewer sequences in each sequence alignment so as to include as large as possible number of conserved amino acid sites in the phylogenetic analysis. Based on Rep proteins sequences, these unclassified CRESS DNA viruses shared sequence similarities 26%-61% to their best matches, where 6 of them grouped with CRESS DNA viruses from feces of mammals, 3 of them with CRESS DNA viruses from invertebrates, 13 sequences with CRESS DNA viruses identified from environmental samples (mainly wastewater), 8 strains with CRESS DNA virus from fish species, one with plant-associated CRESS DNA virus, while the remaining 10 sequences were too divergent to cluster with any known viruses and were included in CRESSV group 6 in Fig.4 (Fig 4, Supplementary Data 30-35). Considering that most of the CRESS DNA viruses characterized in the present study best matched unclassified CRESS DNA genomes from environmental samples, mammalian feces, and arthropods, it is possible that most of these unclassified CRESS DNA viruses infect plants and were contaminants in feces or the gut content of arthropods.
Fifteen genomes showing sequence similarity to viruses in family Microviridae were detected in three different species of plants, 12 of which were from a single species of wild plant, Kummerowia striat (Supplementary Data 36). Many studies have demonstrated the ubiquity of Microviridae genomes across habitats (marine, freshwater, wastewater, sediment) and global regions (Antarctic to subtropical), especially those related to the Gokushovirinae lineage [30–33], which infect obligate intracellular parasites, members of the bacterial genera Chlamydia, Bdellovibrio and Spiroplasma [34].
Cross-species infection and co-infection of plant viruses. Other than through seed dispersal most plants are immobile; hence plant virus transmission is often assisted by others organisms [35,36]. Here, we investigated the virome in plant leaves collected in a single ecosystem, which includes interactions amongst plants, water, soil, air, insects and a multitude of micro-organisms providing favorable conditions for cross-species transmission. Using viral metagenomics, we detected the viral nucleic acids and determined 251 (nearly) complete viral genomes, allowing us to compare genome sequences from different species of plants and estimate whether cross-species transmission might occur for some viruses. Our results indicated cross-species transmission might have occurred for 9 groups of viruses. 24 genomes belonging in family Potyviridae were found in 17 different species of plant, all in the genus potyvirus (Supplementary Table 1, Supplementary data 37). Among the 9 groups of potyviruses, 2 groups were composed of 10 and 5 genomes, respectively, sharing 99%-100% sequence RdRp protein identities within each group, suggesting possible cross-species transmission (Supplementary data 37). We then compared the 10 and 5 genome sequences respectively in these 2 groups and found that the 10 genomes shared 94.6%-100% and the 5 genomes shared 94.8%-100% sequence identities (including several pairs of identical sequences) (Fig. 5), suggesting some strains of these potyviruses may be capable of cross-species transmission. Our data also showed that some dicistroviruses might be plant-infecting virus. Here we acquired 22 complete genomes of dicistrovirus from 10 different species of plants, of which 6 pairs presented possible cross-species transmission in plants as pair of genome sequences shared >94.9% identity, including one pair of identical sequences (Fig 5, Supplementary data 5). Putative cross-species transmissions were also observed with unclassified CRESS virus including 5 pairs of identical genomes derived from different species of plants (Supplementary Data 38). Two groups of marnaviruses showing >99% genomic sequence identity, and other 5 pairs of different viruses including geminivirus, genomovirus, luteovirus, parvo-like virus, and sobemo-like virus, from different putative host species showed 92.5%-99.8% sequence identities based on complete genome sequence (Supplementary Data 38).
We marked the accurate sampling sites for each plant species which makes it possible to measure the geographical distance of different species of plants involved in the cross-species transmission of a certain virus so as to infer whether geographical distance of the host plants have effect on the cross-species transmission. Our data indicated that cross-species transmission of potyviruses might be associated with their geographical distance as the genetically very close genomes were mainly from the same sampling site (Fig. 5). The same phenomena were also observed for the marnavirus, unclassified CRESS DNA virus genomes, luteovirus, and parvo-like virus. For example, all the 5 marnavirus genomes involved in putative cross-species transmission were from a single sampling site and 9 of the 11 CRESS DNA genomes were from the same sampling location (Fig. 5, Supplementary Data 38). However, the remaining several groups of viruses with properties of cross-species transmission (closely related genomes from different plants) including dicistrovirus, geminivirus, genomovirus, and sobemo-like virus seem to have no relationship to the geographical distance of their hosts’ location (Supplementary Data 38). The different effect of geographical distance on the cross-species transmission may reflect the different transmission potential of these viruses, for example, geographical distance had no effect on the cross-species transmission of dicistroviruses suggested that the spread of this virus might be assisted by arthropods. Our data also indicated that most of the putative viral cross-species infection in this ecosystem occurred across different levels of plant classification. For instances, the10 closely related potyvirus genomes were characterized from plants belonging to 7 different orders within 2 different classes (Fig. 5), suggesting a wide host range.
Co-infection of hosts by two or more plant viruses is common in both agricultural crops[37,38] and natural plant communities[39]. In the present study, apart from cross-species infection, co-infection of plant viruses was also commonly observed, where 73 out of 161 (45.3%) libraries contained >3 different virus types (or families) (Supplementary Table 1), suggesting co-infection of viruses existed in nearly half of the plants in this ecosystem as each library consisted of samples from three different individual plant. Considering the same virus families or type in a single library may contain different virus strain or type, the rate of co-infection is likely to be higher than 45.3%. Among the 251 genomes we acquired from these plants, some genomes were from the same libraries which allows us investigate the co-infection of certain viruses in specific species of plants. As shown in Fig. 6, PCR screening of different virus genomes in 7 different species of plants revealed that most of (20/21) the individual plant contained >2 different types of virus, where one plant species of Forsythia suspensab even carried 16 viruses belonging in 12 different families. The wide presence of apparent viral co-infections in these plants in a single ecosystem may lead to interactions between viruses that could influence disease development in individual plants.
Other plant viruses. Apart from these viruses mentioned above, many types of typical plant viruses belonging in the Bromoviridae, Closteroviridae, Comoviridae, and Botourmiaviridae families and Tymovirales order were also detected in several species of plants. These plant viruses were genetically close to previously described viruses (Supplementary data 39-43), indicating typical plant virus infections were readily detected in this plant ecosystem.