Molecular characterization of a novel victorivirus infecting Corynespora cassiicola

A novel victorivirus was detected in an isolate of Corynespora cassiicola strain 20180909-03 and was named "Corynespora cassiicola victorivirus 1" (CcVV1). The complete genome sequence of this virus is 5140 bp in length and contains 57% GC with two large open reading frames (ORFs) overlapping at the tetranucleotide AUGA. The ORFs were predicted to encode a coat protein (CP) and an RNA-dependent RNA polymerase (RdRp), respectively, which are conserved in dsRNA fungal viruses of the family Totiviridae. Comparison and phylogenetic analysis of the deduced amino acid sequences of RdRp and CP showed that CcVV1 is a new member of the genus Victorivirus. This is the first report of a genomic sequence of a victorivirus infecting Corynespora cassiicola.

The complete genome of CcVV1 (GenBank accession number OK317696) is 5140 bp long, and it has a GC content of 57% (Fig. 1A). Sequence analysis showed that CcVV1 has two large open reading frames (ORF). ORF1 (nt 303-2570) encodes the CP, and ORF2 (nt 2567-5050) encodes RdRp. In addition, the 5' terminus of the genome contains two small ORFs (nt 95-184 and nt 197-301), but these have no similarity to other sequences in the NCBI database. The start codon of ORF2 overlaps with the stop codon of ORF1 at the tetranucleotide AUGA (nt 2567-2570). In addition, an H-type pseudoknot structure was found upstream of the AUGA motif, which is believed to be involved in the translation of the downstream ORF2 (Fig. 1B). The untranslated regions (UTRs) at the 5' and 3' ends are 94 and 90 bp long, respectively, and are predicted to have stable secondary structures ( Supplementary Fig. S2).
The CP encoded by ORF1 is 755 amino acids long, with a calculated mass of 79.36 kDa and a predicted isoelectric point of 5.88. A BLASTp search showed that the CcVV1 CP has the highest sequence similarity to the putative CP of Beauveria bassiana victorivirus NZL/1980 (BbVV_ NZL/1980; YP_009032632.1, 63.89% identity, 99% coverage, E-value 0). A global pairwise alignment showed that the CPs of CcVV1 and BbVV_NZL/1980 were 64.1% identical. We also found an Ala/Gly/Pro-rich region in the C-terminal sequence of CcVV1 ( Supplementary Fig. S3A), To the best of our knowledge, this is the first report of a victorivirus infecting C. cassiicola.

Source of virus material
C. cassiicola strain 20180909-03 was isolated from a sesame spot leaf sample in Henan province, China, in 2018. dsRNA was extracted from 0.2 g of fungal mycelia, using CF-11 cellulose column chromatography [20]. The dsRNA sample was analyzed by 1.2% (w/v) agarose gel electrophoresis. After treating the crudely extracted dsRNA with DNase I and S1 nuclease (TaKaRa Dalian, China), a single band of approximately 5 kb was observed ( Supplementary  Fig. S1).
To further analyze this dsRNA virus, we performed highthroughput sequencing of total RNA from C. cassiicola extracted using RNAiso Plus (TaKaRa Dalian, China) on an Illumina HiSeq 2500 platform at Shanghai Bohao Biotechnology. The resulting contigs with a high degree of matching with virus sequences in the NCBI database using BLASTx (https://www.ncbi.nlm.nih.gov/) were identified as potential viral sequences. One of these contigs, contig1246, shared sequence similarity with members of the genus Victorivirus.
The complete sequence of contig1246 was confirmed by sequencing 11 virus-specific RT-PCR products generated using specific primers designed based on the contig1246 data (Supplementary Table S1). To complete the 5'-and 3'-terminal genomic sequences, rapid amplification of cDNA ends was performed using a SMARTer RACE 5'/3' Kit (TaKaRa, China). At least three PCR products were sent to Sangon Biotech for Sanger Sequencing to verify the sequences.
Putative ORFs in the CcVV1 genome were identified using ORF Finder (https://www.ncbi.nlm.nih.gov/ orffinder). The Conserved Domain Database was used to search for protein domains [21]. EMBOSS Needle was used to perform global pairwise alignments [22]. Multiple alignments of the protein sequences of different viruses were performed using DNAMAN software (version 9.0) and the Clustal W program in MEGA (version 7.0). Phylogenetic trees were constructed by the maximum-likelihood (ML) method using the Jones-Taylor-Thornton (JTT) model using the MEGA 7.0 program with 1000 bootstrap replicates [23]. The GC content was determined using OligoCalc [24]. The ProtParam from Expasy was used to calculate the protein molecular weight and isoelectric point [25,26]. The Mfold was used to find potential secondary structures in the terminal sequences of the CcVV1 [27]. A genome diagram was made using Illustrator for Biological Sequences [28]. An H-type RNA pseudoknot was predicted using DotKnot [29].   Fig. S3B).
A phylogenetic tree based on the CP sequences of CcVV1 and other dsRNA viruses of the family Totiviridae was constructed by the ML method ( Supplementary Fig. S4). The CP of CcVV1 formed a branch with members of the genus Victorivirus. Similarly, phylogenetic analysis of the viral RdRp also showed a close relationship of CcVV1 to members of the genus Victorivirus (Fig. 1C).
According to the ICTV species demarcation criteria for the genus Victorivirus [30], which specify that the amino acid sequence identity in pairwise comparisons of either the CP or the RdRp gene product is no more than 60% and that the virus was isolated from a different filamentous fungus, CcVV1 should be considered a member of a new species of the genus Victorivirus. To the best of our knowledge, this is the first report of a victorivirus infecting the fungus C. cassiicola.