Complete genome sequence of a novel fusarivirus from the phytopathogenic fungus Corynespora cassiicola

Corynespora cassiicola is an important phytopathogenic fungus that severely impairs crop production. Here, we report the molecular characterization of a novel positive-sense single-stranded RNA (+ssRNA) mycovirus, Corynespora cassiicola fusarivirus 1 (CcFV1), isolated from C. cassiicola strain 20200826-3-1. Excluding the poly(A) tail, the genome of the virus is 6491 nt in length and contains three putative open reading frames (ORFs). The large ORF1 encodes a polypeptide of 1524 aa with a conserved RNA-dependent RNA polymerase (RdRp) domain and a helicase (Hel) domain. BLASTp analysis showed that CcFV1 ORF1 has the highest similarity to Setosphaeria turcica fusarivirus 1 (StFV1, 50.45% identity, E-value 0.0). ORF2 encodes a polypeptide with a conserved chromosome segregation ATPase (Smc) domain. The smaller ORF3 encodes a polypeptide with an unknown function. Phylogenetic analysis based on the ORF1- encoded polypeptide showed that CcFV1 is phylogenetically related to members of the newly proposed family "Fusariviridae". Thus, we suggest that CcFV1 might be a novel member of the family "Fusariviridae", and is the first to be discovered in C. cassiicola.

Corynespora cassiicola is a phytopathogenic fungus responsible for a devastating leaf disease [12] that has severely impaired the production of important crops. It has been found on plant leaves, stems, and roots, in nematode cysts, and on human skin [13][14][15][16][17][18]. C. cassiicola has a wide host range and has been reported to grow on at least 530 plant species of 380 genera [19].
So far, fungal viruses that infect C. cassiicola have rarely been reported. To date, only one unassigned dsRNA mycovirus has been reported to infect C. cassiicola [20]. In this study, we describe a novel fusarivirus isolated from the phytopathogenic fungus C. cassiicola. This is the first report of the complete genome sequence of a fusarivirus infecting C. cassiicola, and this mycovirus is therefore tentatively named "Corynespora cassiicola fusarivirus 1" (CcFV1).

Provenance of the virus material
The seven isolates of C. cassiicola used in this study were preserved at the Institute of Plant Protection, Henan Academy of Agricultural Sciences, China (Supplementary Table S1). For total RNA extraction, C. cassiicola strains were grown on potato dextrose agar (PDA) plates at 28°C in the dark for 8 days. Total RNA was extracted from 0.2 g of mycelium from each isolate using an RNAiso Kit (Takara, Dalian, China), following the manufacturer's instructions. The RNA concentration of each preparation was adjusted to 200 ng/µl, and 15-µl aliquots from all seven isolates were pooled together. The mixed sample was sent to Shanghai Bohao Biotechnology Corporation for high-throughput sequencing. Total RNA was depleted of rRNA using a Ribo-Zero TM rRNA Removal Kit (Illumina, San Diego, CA, USA), and a paired-end (PE150) sequencing library was prepared and sequenced using an Illumina HiSeq 2500 platform. Clean reads to be used for data analysis were obtained by removing unqualified raw reads based on default parameters. These clean reads were assembled de novo using the scaffolding contig algorithm in CLC Genomics Workbench (CLC bio v6.0.4, Aarhus, Denmark) to obtain the primary unigenes. The final unigene sequences were obtained by applying CAP3 EST splicing software [21] for the second splicing of the primary unigenes. The final unigenes were used to search the NCBI GenBank database using BLASTx to identify homologous viral sequences. Using these methods, we identified the 6422-nt-long contig2308, assembled from 9415 related reads, as the genome sequence of a novel virus.
cDNAs of the seven C. cassiicola isolates were synthesized following the instructions of the PrimerScript II TM 1st Strand cDNA Synthesis Kit (TaKaRa, Dalian, China). To test for the presence of the newly discovered virus in each isolate, a virus-specific primer pair designed based on the contig2308 sequence was used for reverse transcription PCR (RT-PCR). A specific amplicon from contig2308 was detected in C. cassiicola strain 20200826-3-1, which was isolated from the diseased stem of a sesame plant from the city of Zhoukou, Henan Province, China.
The nucleotide sequence of contig2308 was verified by sequencing amplified fragments of the cDNA of strain 20200826-3-1, using 16 pairs of specific primers (Supplementary Table S2). To complete the 5'-and 3'-terminal genomic sequences of CcFV1, rapid amplification of cDNA ends (RACE) was performed using a SMARTer RACE 5'/3' Kit (TaKaRa, China), using the specific primers R_GSP_Contig2308 and F_GSP_Contig2308, respectively. The procedures were performed according to the user manual provided with the kit. All PCR products were purified using a FastPure Gel DNA Extraction Kit (Vazyme, China), cloned into the pMD TM 18-T vector, and then introduced into chemically competent E. coli JM109 cells (TSINGKE, China) for propagation. At least three recombinant bacterial clones were selected and sent to Sangon Biotech for sequencing to verify the accuracy of the CcFV1 nucleotide sequence.
The ORF Finder and CD-search tools of NCBI (https:// www. ncbi. nlm. nih. gov/) were used to search for ORFs and conserved motifs, respectively. The sequences obtained from the clones were assembled using DNAMAN software (Lynnon Biosoft, Canada). Multiple alignments of the amino acid sequence were performed using Clustal X [22] and DNA-MAN software. The corresponding phylogenetic trees were constructed by the maximum-likelihood (ML) method with 1000 bootstrap replicates, using the Jones-Taylor-Thornton (JTT) model in MEGA (7.0 version) [23]. The GC content was determined using NoVopro (https:// www. novop ro. cn/ tools/ gc-conte nt. html). Expasy was used to calculate the protein molecular mass (M r ) and isoelectric point (pI) [24]. A schematic diagram of the genome organization was drawn using Illustrator for Biological Sequences [25].

Sequence properties
The full-length genome sequence of CcFV1 (GenBank accession number OL456888) is 6491 nt long, excluding the poly(A) tail, and has a GC content of 43.54%. The positive-strand RNA genome contains two main ORFs (ORF1 and ORF2) and a smaller ORF3. The lengths of the 5'-UTR and 3'-UTR are 18 nt and 47 nt, respectively (Fig. 1A). A BLASTx search against the NCBI non-redundant protein sequence database showed that the three viral sequences showing the highest degree of similarity to CcFV1 were those of the RNA-dependent RNA polymerases (RdRps) of Setosphaeria turcica fusarivirus 1 (StFV1, 50.10% identity, E-value 0.0), Erysiphe necator-associated fusarivirus 2 (EnFV2, 52.23% identity, E-value 0.0), and Erysiphe necator-associated fusarivirus 1 (EnFV1, 50.10% identity, E-value 0.0). All three of these viruses belong to the newly proposed family "Fusariviridae". ORF1 (nt 19-4593) encodes a polyprotein of 1524 amino acids (aa) with an approximate M r of 172.59 kDa and an isoelectric point of 8.98. A BLASTp search against the NCBI non-redundant protein sequence database showed that this protein is 50.45%, 51.26%, and 49.55% identical to the RdRps of StFV1, EnFV2, and EnFV1, respectively, and the top 10 viruses with the highest degree of matching to this polypeptide were all fusariviruses (Table 1).
Using the CD search program on the NCBI website, two conserved sequence domains, encoding an RdRp (RdRp_1, pfam00680, E-value 7.80e-09, aa positions 479-741) and a helicase (Helicase_C, pfam00271, E-value 2.77e-09, aa positions 1229-1341), were identified in the ORF1-encoded polyprotein, and these have also been identified in other fusariviruses. The aa sequence similarities in the RdRp and Hel regions indicated that CcFV1 is a fusarivirus. Multiple alignments and comparisons of the viral RdRp domains between CcFV1 and other selected fusariviruses revealed eight conserved motifs that are characteristic of members of the proposed family "Fusariviridae" (Supplementary Fig.  S1).
ORF2 (nt 4645-6144) encodes a 499-aa polyprotein of 56.68 kDa with a pI of 9.06. A BLASTp search revealed  Table 1). The polyprotein encoded by ORF2 showed the highest sequence similarity to EnFV2 (35.42% identity, E-value 2.00e-87). A conserved domain spanning aa positions 89 to 246 was identified in this protein, which corresponded to a chromosome segregation ATPase (Smc, COG1196, E-value 1.50e-07) and was also predicted in other fusariviruses (Fig. 1A).
ORF3, covering nt positions 6145 to 6444, was predicted to encode a hypothetical polypeptide of 11.52 kDa with a pI of 9.23. A homology search indicated no aa sequence similarity to any other known proteins or conserved domains. Unlike those of StFV1, EnFV2, and EnFV1, ORF3 and ORF2 of CcFV1 are in a consecutive arrangement without overlapping regions (Fig. 1A).
To examine the relationships between CcFV1 and other mycoviruses, we performed phylogenetic analysis based on an alignment of the ORF1-encoded polyprotein containing the RdRp and RNA helicase (Hel) domains (Fig. 1B). The sequences of selected members of the family Hypoviridae were included as outgroups. The results of the phylogenetic analysis showed that CcFV1 clustered with previously reported fusariviruses.
The species demarcation criteria for fusariviruses have not been established [9], and therefore, whether CcFV1 should be considered a member of a novel species is uncertain. However, the RdRp gene product of CcFV1 is only 50.45% identical to that of the most closely matching virus (StFV1), and CcFV1 is the first fusarivirus isolated from C. cassiicola. Thus, it is likely that CcFV1 will be considered a member of a novel species in the proposed family "Fusariviridae".