Virome of riverside phytocommunity ecosystem of an ancient canal

The virus community in plants in a local plant ecosystem has remained largely unknown. In this study, we investigated the virus community in these wild and cultivated plants in Zhenjiang ancient canal ecosystem. using viral metagenomic approach, we investigated the viral community in leaf tissues of 161 plant species belonging in 38 different orders in a local riverside plant ecosystem. We discovered 251 different plant-associated virus genomes which included 88 DNA and 163 RNA viruses belonging to 27 different virus families, orders or unclassified virus groups. The identified viruses include some that are sufficiently divergent to comprise new genera, families, or even orders. Our data indicated that some groups of viruses known to infect non-plant organisms had host switching to infecting plants. Cross-species infection and co-infection of viruses were common in this plant ecosystem. these data present a view of the viral community in plants present in a local plant ecosystem which is more diverse than that depicted in current classification of plant viruses and provide a solid foundation for studies in virus ecology and evolution in plants.

The virus community in plants in a local plant ecosystem has remained largely unknown. In this study, we investigated the virus community in these wild and cultivated plants in Zhenjiang ancient canal ecosystem.
Results using viral metagenomic approach, we investigated the viral community in leaf tissues of 161 plant species belonging in 38 different orders in a local riverside plant ecosystem. We discovered 251 different plant-associated virus genomes which included 88 DNA and 163 RNA viruses belonging to 27 different virus families, orders or unclassified virus groups. The identified viruses include some that are sufficiently divergent to comprise new genera, families, or even orders. Our data indicated that some groups of viruses known to infect non-plant organisms had host switching to infecting plants.

Cross-species infection and co-infection of viruses were common in this plant ecosystem.
Conclusions these data present a view of the viral community in plants present in a local plant ecosystem which is more diverse than that depicted in current classification of plant viruses and provide a solid foundation for studies in virus ecology and evolution in plants.

Background
Much effort has been devoted to studying viruses associated with economically important or symptomatic plants which only comprise a minute fraction of all plant species, suggesting that a large gap exists in our overall understanding of viral diversity, evolution, and ecology in uncultivated plants [1]. It is therefore necessary to study viruses existing in wild plants, whether symptomatic or asymptomatic, to gain a more objective view of virus populations in plant, which will undoubtedly discover novel or even so-call unclassified viruses and provide more information on viral evolution and diversity. High-throughput DNA sequencing coupled with viral metagenomics approaches also makes it possible to identify highly divergent viral genomes in wild plants. Comparing Table 1). In total, 50,586,188 paired-end reads were generated and binned by barcodes and quality-filtered, leaving high-quality sequence reads which were de novo assembled within each barcode. The resulting sequence contigs and unassembled reads were compared with the viral reference database and the GenBank non-redundant protein database using a BLASTx search with an E value cut-off of <10 - 5  The Dicistroviridae family, within the order Picornavirales, is a group of viruses currently composed of 3 genera, whose natural hosts are invertebrates, including aphids, leafhoppers, flies, bees, ants, and silkworms [2]. Here, we assembled 23 genomes from 9 different species of plants, where 7 virus 7 strains were grouped into three previously classified genera while the other 16 strains were clustered into a separate group genetically far from the three known genera (Fig 2,  In three different species of plants, three divergent iflavirus strains were discovered and their complete genomes generated, all of which clustered within genus iflavirus based on phylogenetic analysis (Fig 2, Supplementary Data 7). The family Iflaviridae is a member in the order Picornavirales, which have also all been isolated from arthropods. Although vertical and sexual transmission has been reported among invertebrates for some iflaviruses, the most common route of infection for iflaviruses is through ingestion of virus-contaminated food sources [4,5]. Spread of iflaviruses in plants may therefore also occur through contaminated feces of arthropods.
We also identified 12 marnavirus strains from 5 different species of plants that shared 30%-60% Marnaviridae is a newly defined virus family in order Picornavirales, the currently characterized representative member being Heterosigma akashiwo RNA virus, isolated from Heterosigma akashiwo algae in ocean water [6]. Closely related viruses have been identified in ocean marine environments [7]. Our data suggest that plants are capable of hosting some members in family Marnaviridae or their cellular hosts. to other PHVs based replication protein sequence. PHV is a type of highly divergent DNA virus which was recently discovered and phylogenetically located at the interface between the Parvoviridae and Circoviridae [8,9]. Although this virus was first detected in Chinese patients with seronegative (non-A-E) hepatitis and subsequently discovered in a wide range of clinical samples, sharing ∼99% nucleotide and amino acid identity with each other [8], it was eventually traced to contaminated silicabinding spin columns used for nucleic acid extraction [9]. The silica matrix is generally generated by diatoms (algae), belonging to microscopic water plants, detecting PHV in silica-binding spin columns might be the initial evidence that plants can serve as the hosts of PHV. Our data further confirm that plants (or diatoms within them) are capable of hosting PHVs.
Besides the above four groups of viruses with multiple divergent stains found here in plant tissues, another 4 groups of viruses, not previously reported in plants, including noda-like virus, Permutotetralike virus, Yanvirus-like virus, and Chuvirus-like virus, were also detected here (Fig. 2, Supplementary . These viral groups were recently reported from invertebrates meta-transcriptomes, and vertebrates and environment samples [10][11][12]. Discovering these viruses in plant leaf samples suggests that plants may also be the natural hosts for some members of these recently described clades. Bastrovirus was previously only detected in feces of mammals (including human) and mosquito, shows a distant relationship to astroviruses [13,14]. Here, a species of plant (Solanum Two types of viruses, botybirnavirus and narna-like virus, which were considered to be viruses of fungi [18,19] and more recently Caenorhabditis nematodes [20], were detected in two species of plants, respectively ( Tombusviridae is a large family of plant viruses that is currently composed of more than 76 species divided among 3 subfamilies and 16 genera. Here, we acquired 21 genomes showing sequence similarity to members of the Tombusviridae. Seven genomes were genetically close to defined genera while the other 14 were highly divergent and seemed to form several distinct genera ( also showed sequence similarity to sobemo-like viruses which were recently discovered from arthropods using meta-transcriptomics [15]. Although these plants sobemo-like viruses phylogenetically grouped together with invertebrate sobemo-like viruses they were genetically distinct and sharing 30%-62% amino acid sequence similarities to each other (

Plant CRESS virus. CRESS DNA virus is the informal name of several groups of single-stranded (ss)
DNA viruses that have circular and replication-associated protein encoding genome, which show high diversity and abundance in various habitats [21,22]. Although there are currently several established CRESS DNA virus families including Bacillidnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Microviridae, Nanoviridae and Smacoviridae, a large number of novel CRESS DNA viruses have been discovered recently and have not been formally classified, for which the hosts are currently unknown 11 [22][23][24]. Among these well-defined CRESS DNA virus families, Geminiviridae and Nanoviridae are two plant-infecting members, which also help the replication and package of a satellite virus: Alphasatellitidae, another type of circular ssDNA genome [25]. Here, from plant leaves we acquired 79 circular genomes, among which 7 were genetically close to Geminiviridae, 9 grouped well into the family Genomoviridae, 7 clustered closely to known sequences of Alphasatellitidae, 15 belong to new divergent members in family Microviridae presumably from bacteria, with the remaining 41 showing significant sequence similarity to unclassified CRESS DNA viruses (Fig. 4).
Among the 7 CRESS DNA viruses belonging in family Geminiviridae, 2 of them felt well into the cluster of the genus begomovirus, being closely to sweet potato leaf curl virus, a monopartite geminivirus.
The other 5 were not grouped into any known genus in family Geminiviriae but deeply clustered outside of all known geminiviruses, suggesting these 5 novel geminiviruses might belong to new genus (genera) in Geminiviridae (Fig.4, Supplementary Data 27). Viruses in the family Genomoviridae have been frequently found to be associated with a variety of samples ranging from fungi to animal sera [26], indicating that genomoviruses are widespread as well as abundant in the environment.
Here, 9 complete genomes of genomovirus, divergent from previous known members in that family, were characterized in 7 different plant species, which phylogenetically clustered into 5 different groups, including two identical genomes detected in two different plant species (Fig 4, Supplementary Data 28). Currently, the hosts of the large majority of CRESS-DNA viruses remain unknown except for one replicating in both fungi [27] and an insect [28]. Detecting genomoviruses in leaf samples from different species of plant may suggest plants or an internal plant-dwelling organism, may host some members in the family Genomoviridae.
We also discovered 7 divergent complete circular genomes in a single species of plant, which showed sequence identities of 38%-58% to previous known genomes of members in Alphasatellitidae based  Data 29). Alphasatellites are circular ssDNA components which are generally associated with Nanoviridae or some members in Geminiviridae, however, we did not detect geminivirus or nanovirus sequence in this species of plant, but discovered a divergent CRESS DNA virus genome that showed the highest Rep protein sequence similarity of 60.7% to an unclassified CRESS DNA virus, temperate fruit decay-associated virus [29], suggesting this type of CRESS DNA virus may infects plant and serves as helper virus for alphasatellites.  [30][31][32][33], which infect obligate 13 intracellular parasites, members of the bacterial genera Chlamydia, Bdellovibrio and Spiroplasma [34].

Cross-species infection and co-infection of plant viruses.
Other than through seed dispersal most plants are immobile; hence plant virus transmission is often assisted by others organisms [35,36]. Here, we investigated the virome in plant leaves collected in a single ecosystem, which includes interactions amongst plants, water, soil, air, insects and a multitude of micro-organisms providing favorable conditions for cross-species transmission. Using viral metagenomics, we detected We marked the accurate sampling sites for each plant species which makes it possible to measure the geographical distance of different species of plants involved in the cross-species transmission of a certain virus so as to infer whether geographical distance of the host plants have effect on the crossspecies transmission. Our data indicated that cross-species transmission of potyviruses might be associated with their geographical distance as the genetically very close genomes were mainly from the same sampling site (Fig. 5). The same phenomena were also observed for the marnavirus,  [40], closteroviruses causing grapevine leafroll disease [41], luteoviruses such as barley yellow dwarf virus [42] and sobemoviruses such as rice yellow mottle virus [43]. Relatives of all these pathogenic viruses were detected in this study in apparently healthy plants from diverse families or orders. The relatively unbiased sequencing of viral genomes within entire environments as performed here is changing the perspective of viruses from agents of disease to common components of ecosystems, as the plant tissue samples studied were all from apparently healthy plants.
The data in the present study also revealed that several viruses such as dicistrovirus, iflavirus, marnavirus, noda-like, and parvo-like viruses, which have not been reported in plants were detected here in leaf tissues. Among these viruses, dicistrovirus, iflavirus, and noda-like virus are generally hosted by arthropods [44,45]. Detection of these genomes in plants indicated that insects may might vectored them between plants. The closest non-plant-infecting relatives of some genomes from plants reported here tended to infect arthropods or fungi. Currently plant-infecting viruses may therefore have evolved from viruses that once infected non-plant organisms (or vide versa). Further, the hypothesis that many plant and vertebrate viruses may have originated from arthropod viruses is also plausible as some viruses infect arthropods can also infect plants. For example, flock house virus (in the Nodaviridae family) infects arthropods but can also systemically infect plants when it is complemented with the movement proteins of either tobacco mosaic virus or red clover necrotic mosaic virus (both of which are plant viruses) [46].
The cross-species transmission of viruses from one host species to another is responsible for the majority of emerging infections, both in animal and plant populations [47][48][49]. Decades of inventorying, tracking and analyzing of plant viruses showed that the emergence of new diseases is driven by adaptive viral evolution in response to novel ecological conditions [50,51], including the introduction of viruses and vectors to new areas, the intensification of agriculture and urbanization, and ecological changes in response to changing climatic conditions. Our data showed that a number

Sample collection and preparation
The goal of this study was to investigate the virome of plant species in an ancient canal ecosystem in different individual plants belonging to the same species were respectively collected into disposable materials, before this step, distilled water (ddH2O) was used to clean the dust and other non-plant organisms on the leaf surface. Before viral metagenomic analysis, about 0.1g leaf tissue sample of each plant was grounded using steel balls and re-suspended in 1mL of phosphate-buffered saline (PBS) and vigorously vortexed for 5 min. The grounded samples were then frozen and thawed three times on dry ice. The supernatants were then collected after centrifugation (10 min, 15,000×g) and stored at -80℃ until use. Host species identification was initially identified using APP "PictureThis" which is online plant encyclopedia and plant identifier, and future confirmed by experienced field biologists.

Viral metagenomic analysis
About 300 μL supernatant from each of the three different plant samples in the same species was mixed into one sample pool and filtered through a 0.45-μm filter and centrifuged at 120,000g for 20 minutes at 4℃ to remove eukaryotic and bacterial cell-sized particles. Un-encapsidated nucleic acids were then digested by DNase and RNase at 37 °C for 60 min [52][53][54][55]. Total nucleic acids were extracted as a mixed RNA/DNA solution using QiaAmp Mini Viral RNA kit (Qiagen) according to the manufacturer's protocol. 161 libraries were constructed using Nextera XT DNA Sample Preparation Kit (Illumina). For bioinformatics analysis, paired-end reads of 250 bp generated by MiSeq were debarcoded using vendor software from Illumina. An in-house analysis pipeline running on a 32-node Linux cluster was used to process the data. Reads were considered duplicates if bases 5 to 55 were identical and only one random copy of duplicates was kept. Clonal reads were removed and low sequencing quality tails were trimmed using Phred quality score ten as the threshold. The unique read number of each library was shown in Table 1. Adaptors were trimmed using the default parameters of VecScreen which is NCBI BLASTn with specialized parameters designed for adapter removal. The cleaned reads were de novo assembled within each barcode using the ENSEMBLE assembler [56].
Contigs and singlets reads are then matched against a customized viral proteome database using BLASTx with an E value cutoff of <10−5, where the virus BLASTx database was compiled using NCBI virus reference proteome (ftp://ftp.ncbi.nih.gov/refseq/release/viral/) to which was added viral protein sequences from NCBI nr fasta file (based on annotation taxonomy in Virus Kingdom). Candidate viral hits are then compared to an in-house non-virus non-redundant (NVNR) protein database to remove false-positive viral hits, where the NVNR database was compiled using non-viral protein sequences extracted from NCBI nr fasta file (based on annotation taxonomy excluding Virus Kingdom). Contigs without significant BLASTx similarity to viral proteome database are searched against viral protein families in vFam [57] database using HMMER3 to detect remote viral protein similarities [58][59][60]. A web-based graphical user interface was developed to present users with the virus hits, along with taxonomy information and processing meta-information. The genome coverage of the target viruses were analyzed by Geneious v11.1.2 [61].

Confirmation and extension of virus genomes
Viral contigs which might be from the same genome but without assembled overlaps were merged using the software Geneious v11.1.2 and primers bridge contigs were then designed [61]. Gaps were filled by (RT-)PCR and Sanger sequencing. To confirm the assembly results of a full genome, reads were de novo assemble back to the full length genome using the low sensitivity/fastest parameter in Geneious 11.1.2. For genomes with novel structures, we verified the complete or near complete viral genome by designing overlapping primers based on the assembled sequences. For those viruses that firstly isolated from plants, we used PCR and Sanger sequencing to verify it's accurate based on the assembled sequences.

Confirmation of viral co-infection
In 7 libraries including pt065, pt067, pt110, pt111, pt112, pt119 and pt151, which have far more than three different virus strains, showed evident co-infection in individual plant. To investigate the presence status of different viral strain in three individual plants from the same library, PCR and Sanger sequencing were performed using specific primers designed based on the conserved domain sequences of these viruses.

Phylogenetic analysis of viruses
Through analyzed the protein sequences obtained in this study, we divide them into three categories including RNA viruses, Parvovirus-like viruses and CRESS DNA viruses. To infer the phylogenetic relationships, protein sequences of reference strains belonging to RNA viruses, Parvovirus-like viruses, and CRESS DNA viruses were downloaded from the NCBI GenBank database. For RNA viruses, the phylogenetic tree was constructed based on the RNA-dependent RNA polymerase (RdRp), for parvovirus-like viruses, the phylogenetic tree was constructed based on nonstructural protein (NS), for the CRESS DNA viruses, the phylogenetic tree was constructed based on the replication-associated protein (Rep) except for Microviridae viruses whose major capsid protein was used for the phylogenetic tree construction. The related protein sequences were firstly aligned using alignment program implemented in the CLC Genomics Workbench 10.0, the alignment result was further optimized using MUSCLE in MEGA v7.0 [62] and MAFFT v7.3.1 employing the E-INS-I alforithm [63].
Sites containing more than 50% gaps were temporarily removed from alignments. Bayesian inference trees were then constructed using MrBayes v3.2 [64]. The Markov chain was run for a maximum of 1 20 million generations, in which every 50 generations were sampled and the first 25% of Markov chain Monte Carlo (mcmc) samples were discarded as burn-in. The approximate family/genus of viruses that obtained in this study was determined through the above tree, further constructed the detailed trees point at each virus family that are relatively closely related to the viruses discovered here using the same method. Maximum Likelihood trees were also constructed to confirmed all the Bayesian inference trees using software Mega v7.0 [62] or PhyML v3.0 [65].

Virus genome annotation
Putative viral open reading frames (ORFs) were predicted by Geneious v11.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.