Two new umbravirus-like associated RNAs (ulaRNAs) discovered in maize and johnsongrass from Ecuador

Two new umbravirus-like associated RNAs (ulaRNAs) were found, respectively, in maize and Johnsongrass samples from Ecuador. The complete sequences consist of 3,053 and 3,025 nucleotides, respectively, and contain four open reading frames (ORFs). Their genome sequences were 58% identical to each other and 28 to 60% identical to the most closely related viruses. Phylogenetic analysis using full genome sequences and amino acid sequence of the RNA-dependent-RNA polymerase (RdRp) placed both sequences in a clade sharing the most recent common ancestor with ulaRNAs from sugarcane and maize, suggesting that they belong to a monophyletic grass-infecting lineage. Their terminal regions exhibit features common to umbraviruses and ulaRNAs.

Here, we report the characteristics and complete genome sequences of two new class-2 ulaRNAs found in maize (Zea mays) and johnsongrass (Sorghum halepense). In August of 2021, leaf tissue samples showing mild-to-moderate mosaic were collected in Santa Ana, a representative maize production area in Manabí province of Ecuador (GPS coordinates: -1.123533, -80.414250). Samples were collected from two commercial cultivars, a yellow type 'Trueno' and a white type 'INIAP-543', and from johnsongrass, which was the most prevalent grass weed in the area at the time of sampling.
A virus discovery analysis was conducted by HTS on three total-RNA pools. Pooled samples (pool 1, yellow maize; pool 2, white corn; pool 3, johnsongrass) were a composite of 10 (pool 1) or six (pools 2 and 3) individual totalRNA preparations, mixed in equal amounts totaling 4 µg per sample. After pooling, aliquots of each RNA sample were stored individually at -80 °C for later analysis. Total RNA was extracted from ~100 mg of fresh leaf tissue using a PureLink TM RNA Mini Kit (Life Technologies). The three pooled samples were subjected to DNase treatment, depleted of the host ribosomal RNA fraction using an Illumina Ribo-Zero Plant Kit and subjected to library preparation using an Illumina Nextera XT DNA Library Prep Kit. The libraries were sequenced as paired-end reads (2 × 150 bp) on an Illumina NextSeq2000 instrument at the Leibniz Institute DSMZ. A total of 38.2, 64.8, and 42.1 million raw reads were obtained from RNA pools 1, 2, and 3, respectively.
Raw reads were analyzed in Geneious Prime v. 2022.0.1 (Biomatters) using a bioinformatics pipeline developed in house to subtract host sequences and to assemble contigs, which were screened by BLASTn and BLASTp against a virus reference database for virus discovery, reconstruction of virus genome sequences, and taxonomic assignment.
Bioinformatics analysis revealed the presence of several virus contigs in each sample, most of which corresponded to previously reported viruses belonging to different genera (Online resource 1). However, two contigs of 2,908 and 2,746 nt in length, obtained from pools 1 and 3, respectively, were distantly related to known ulaRNAs (NCBI BLAST analysis date: November 3, 2021). The closest hits included EMaV (accession no. The 2,908-nt-long contig (pool 1) was assembled from a total of 2,040 reads, with an average sequencing depth of 106x, whereas the 2,746 nt-long contig (pool 3) was constructed from 972 reads, with an average sequencing depth of 54x (Fig. 1A). Pairwise alignments between the two contigs showed 58% identity at the nucleotide level and 60.5% identity when the deduced RdRp aa sequences were compared, indicating that the sequences represented two distinct ulaRNAs. Reverse transcription (RT)-PCR was used to confirm the presence of each ulaRNA in the original RNA preparations. Primers were designed using the consensus sequence of each assembly from the region with the highest coverage (Fig. 1A). Amplicons of the expected size were detected in one RNA sample from each group (Online resource 2). The 5ʹ and 3ʹ ends of each contig were verified by rapid amplification of cDNA ends (RACE), using a 5ʹ/3ʹ RACE Kit, 2 nd Generation (Roche, Germany) and specific primers designed based on the terminal genomic regions.
The complete genomic sequence of the ulaRNA assembled from the yellow maize sample consists of 3,053 nt (GenBank accession no. OM937759), whereas the one from johnsongrass consists of 3,025 nt (accession no. OM937760). For consistency in ulaRNA naming, we will refer to the new ulaRNA from maize as maize umbra-like virus (MULV) and the one from johnsongrass as johnsongrass umbra-like virus (JgULV).
The genomes of both viruses contain four ORFs organized in a similar manner, with minor variations in each ORF (Fig. 1A). ORF1 encodes a protein of 195 aa (22 kDa) for which no function was predicted. ORF2 is located after a stretch of 50 (MULV) or 170 (JgULV) nt downstream from ORF1. However, both contain the same heptameric ribosomal FS sequence (GGG UUU U), which is conserved in other class 2 ulaRNAs and in those of umbraviruses (consensus: GGA UUU U) (Fig. 1C). In addition, both MULV and JgULV can form structures similar to those of CYVaV in this region, including a hairpin that has the capacity for a tombusvirid-wide long-distance RNA:RNA interaction with a sequence near the 3ʹ terminus (Fig. 1D). This strongly suggests that translation of ORF2 occurs via a -1 ribosomal FS. Interestingly, MULV and the previously identified EMaV have unique ORF1 termination codons (UAG) two codons upstream of the position of the termination codon found in all other class 2 ulaRNAs (UGA), including JgULV. Frameshifting would result in a fused protein of 717 aa (82.5 kDa) and 674 aa (76.5 kDa) for MULV and JgULV, respectively. The non-overlapping region of the fusion protein contains conserved viral RdRp domains (pfam clan number: CL0027).
Unlike class 2 dicot-infecting ulaRNAs, which have only a single ORF that partially overlaps with the end of the The sequence that participates in long-distance interaction (LDI) with the 3ʹ end is shown in gray, and the interacting sequence is also found at the base of the hairpin (also in gray) and likely pairs with the terminal loop in an alternative structure (AES, manuscript in preparation).
RdRp ORF (absent in CYVaV because of two deletions), MULV and JgULV have two additional putative ORFs (ORFs 3 and 4) arranged in an out-of-frame overlapping configuration similar to those of umbraviruses but without the intervening intergenic region (Fig. 1A). The hypothetical protein encoded by ORF3 consists of 178 aa (20.4 kDa) and 200 aa (22.6 kDa) in MULV and JgULV, respectively, sharing 25% aa sequence identity. BLAST alignments did not reveal any homologues to this protein. The hypothetical product of ORF4 is a protein of 212 aa (23.6 kDa) and 207 aa (23 kDa), for MULV and JgULV, respectively, sharing 48% aa sequence identity, and 44-48% identity with the single ORF orthologs of 21-22 kDa from FULV, SULV, OULV, and EMaV. The recently reported wheat umbra-like virus (WULV), a new ulaRNA of 3.5 kb [18], has one ORF overlapping at the end of ORF2 and is suggested to have an additional ORF starting 48-nt apart from the termination codon of the previous ORF. However, this second ORF is in frame, with no intervening termination codons, and thus, its identity as a separate ORF requires further examination. Interestingly, SULV also contains a fourth ORF that partially overlaps with the class 2 orthologue, similar to MULV and JgULV.
Maximum-likelihood phylogenetic trees, constructed in MEGA X [19] using the complete nucleotide (Online resource 3) or amino acid sequences (Fig. 1B) of the RdRp, showed that MULV and JgULV form a clade with the class 2 ulaRNAs SULV and EMaV, suggesting a grass-infecting common ancestor for this lineage. A sister clade was formed by CYVaV, OULV, and FULV, within which CYVaV and FULV exhibit a closer relationship (Fig. 1B). Although demarcation criteria have not yet been established for ulaR-NAs, the nucleotide and amino acid sequence identity values obtained when comparing MULV, JgULV, and their closest relatives strongly suggest that there are two distinct class 2 ulaRNA lineages.
The 5ʹ UTR in JgULV is 9 nt in length, including a canonical "carmovirus consensus sequence (CCS; G 2-3 A/ U 4-9 ), found at the 5ʹ ends of all carmoviruses and nearly all ulaRNAs and umbraviruses. MULV has an extended 5ʹ UTR of 29 nt, which is unique among class 2 ulaRNAs, with the exception of FULV-1, which was reported to have a highly unusual 5ʹ UTR that requires additional verification [8]. As with all class 2 ulaRNAs (with the exception of FULV-1), the 5ʹ region of both new ulaRNAs contains two short terminal hairpins and an extended downstream third structure, according to secondary structure predictions for CYVaV using a combination of Selective 2′ Hydroxyl Acylation analyzed by Primer Extension (SHAPE) structure probing, computational predictions, and phylogenetic analysis [20] (Fig. 1C).
MULV and JgULV have 306-and 302-nt 3ʹ UTRs, respectively, similar to other class 2 ulaRNAs. The 3ʹ regions of CYVaV and other members of the family Tombusviridae have been studied extensively, and different step-loop structures have been shown to play key roles in replication and translation. Virtually all members of the family Tombusviridae have two 3ʹ-terminal hairpins (designated as H5 and Pr for carmoviruses and umbraviruses) that are connected by a four-nucleotide pseudoknot that includes the 3ʹ-terminal residues (Fig. 2) [20][21][22]. Many umbraviruses and carmoviruses contain two hairpins just upstream of H5 (designated as H4a and H4b), which, along with H5 and two pseudoknots, form a TSS-type 3ʹ cap-independent translation enhancer (CITE) [10,21,23]. Most class 2 ulaRNAs, including MULV and JgULV, contain similarly placed hairpins but lack the capacity to form pseudoknots. In CYVaV, the 3ʹ CITE was identified as a novel I-shaped structure (ISS)-like structure (ISSLS), with several critical stretches of perfectly conserved class 2 residues (Fig. 2, green with orange circles) that are also conserved in MULV and JgULV. Several regions of additional conservation among MULV, JgULV, and EMaV were also evident, especially in a lower supporting stem. Our findings evidence the diversity in genomic sequence, size, and organization of ulaRNAs, anticipating the existence of new classes of these RNA entities.
Lastly, an important biological feature of "true" umbraviruses is their association with a capsid-assistor virus, typically a polerovirus, for genome encapsidation and plant-toplant transmission by vectors [5]. Poleroviruses have been found incidentally (e.g., no formal experiments have been conducted to demonstrate their capsid-lender nature) for SULV, OULV, CYVaV, and StVA [6,[9][10][11]. For the papayainfecting ulaRNAs, an unusual dsRNA totivirus-like virus has been shown to be the capsid assistor of PMeV-2 [15] (Quito-Avila, unpublished). In this study, we found the polerovirus maize yellow dwarf virus (MYDV) in samples from the three RNA pools. However, MYDV was not detected in the two samples in which MULV and JgULV were found. A possible explanation could be that the respective host cannot be systemically infected by the helper virus, while class 2 ulaRNAs are capable of independent systemic movement, which likely involves the use of host movement proteins (Liu et al., manuscript submitted). Further studies are needed to determine the natural transmission of MULV and JgULV and their potential involvement in disease.
It should be noted that, at the time this manuscript was being prepared, a nucleotide sequence recorded as Teosinte-associated umbra-like virus (TULV) (accession no. OK018180) from Mexico became available in the NCBI GenBank database. The TULV sequence shares 99% nt sequence identity with MULV but is missing 5ʹ terminal residues and has additional sequence beyond the 3ʹ end sequence conserved with all other class 2 ulaRNAs. We propose that TULV represents a Mexican isolate of MULV. No formal publications about the discovery of TULV or its molecular characterization were available at the time of submission.  Structures at the 3ʹ end of CYVaV were determined by RNA structure probing. Names of hairpins are as found in Liu et al. [12,20]. Pseudoknot 1 (Ψ 1 )-connecting residues are shown in blue. The sequence that participates in the long-distance interaction (LDI) with the recod-ing site is shaded gray. The 3ʹ cap-independent translation enhancer (CITE) and the I-shaped structure (ISS)-like structure (the ISSLS), are labeled. Sequences in the 3ʹ CITE that are shared by all class 2 ulaRNAs are shaded green and circled. Other residues that are conserved between these ulaRNA 3ʹ CITEs are shaded green.