A novel picorna-like virus identified in the cotton boll weevil Anthonomus grandis (Coleoptera: Curculionidae)

The cotton boll weevil (CBW; Anthonomus grandis; Coleoptera: Curculionidae) is considered the major insect pest of cotton, causing considerable losses in yield and fiber quality. An increase in the boll weevil population due to increasingly inefficient chemical control measures is of great concernamong cotton producers. The absence of conventional or transgenic cultivars with minimal resistance to CBW has stimulated the search for new molecular and biological tools for efficient control of this insect pest. In this study, we used a metagenomic approach based on RNA deep sequencing to investigate the presence of viruses and coding viral RNA in apparently healthy native adult CBW insects collected from cotton crops in Mato Grosso state, Brazil. Using an Illumina HiSeq 2000 paired-end platform, 138,798 virus-related reads were obtained, and a consensus sequence of a putative new virus, 10,632 nucleotides in length, was assembled. The sequences of the 5’ and 3’ untranslated regions (UTRs) were determined by rapid amplification of cDNA ends (RACE), followed by Nanopore sequencing. The complete genome sequence included a 5’-UTR (1,158 nucleotides), a 3’-UTR (561 nucleotides), and a single ORF of 8,913 nucleotides encoding a large polyprotein. Sequence analysis of the putative polyprotein showed several regions with high sequence similarity to structural and non-structural proteins of viruses of the family Iflaviridae. Pairwise alignments of polyprotein amino acid sequences showed the highest sequence identity (32.13%) to a partial polyprotein sequence of a putative iflavirus (QKN89051.1) found in samples from wild zoo birds in China. Phylogenetic analysis based on full polyprotein sequences of different iflaviruses indicated that this new picorna-like virus is most closely related to iflaviruses found in lepidopteran insects, and it was therefore tentatively named "Anthonomus grandis iflavirus 1" (AgIV-1). This is, to our knowledge, the first complete viral genome sequence found in CBW, and it could provide a basis for further studies about the infectivity and transmission of this virus and its possible association with symptoms or acute disease. AgIV-1 could potentially be used to develop biological or molecular tools, such as a viral vector to carry interfering RNA molecules for CBW control.


Introduction
Cotton (Gossypium hirsutum) is one of the major socioeconomically important crops cultivated worldwide, serving as the main source of fiber for the textile industry [1][2][3]. Cotton crops are constantly challenged by several insect pest species [4]. The cotton boll weevil (CBW), Anthonomus grandis (Coleoptera: Curculionidae), is considered the major insect pest in South and North America, and cotton crops exhibit the highest incidence of infestation during the transition period from flowering to fructification [5,6]. CBW adults feed on and lay eggs within the cotton reproductive structures, often causing flower bud abortion [7,8]. Since CBWs are endophytic, their larvae can cause damage to flower buds when they are not aborted, impacting fiber quality [6,9]. Their high reproductive Handling Editor: Simona Abba'.

Accession number:
The partial genome sequence was deposited in the GenBank database under accession number OK413669. capacity, plasticity, and genetic variability and their occurrence in crop residues or stumps has helped to increase the incidence, density, and geographic distribution of CBWs worldwide [6,[10][11][12]. There is currently no conventional or transgenic cotton cultivar with satisfactory resistance to CBW available to cotton producers. Consequently, insecticides are applied many times each year for its management [13]. Unfortunately, the frequent occurrence of CBW populations with reduced susceptibility to insecticides and failure of chemical control in cotton crops has already been reported in Brazil [14,15]. Meanwhile, the identification of new viruses that infect CBW may provide information leading to the development of molecular or biological tools for their effective control.
Here, we used an RNA sequencing approach involving next-generation sequencing (NGS) to investigate the presence of viruses and coding viral RNA in apparently healthy native adult CBW insects collected in September 2020 in a cotton field situated in Serra da Petrovina (16 o 47'53''S and 54 o 07'53''W), Pedra Preta city, Mato Grosso state, Brazil. Pooled CBW insects (200 mg) were macerated in SM buffer (100 mM NaCl, 8 mM MgSO 4 , and 50 mM Tris-Cl, pH 7.5), using a mortar and pestle. The homogenate was filtered once through cheesecloth and centrifuged three times at 4,000 × g for 10 min to clarify the supernatant, which was then used for RNA extraction using a QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. rRNA was removed from the total RNA sample using a Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA, USA), and a cDNA library was constructed using a TruSeq RNA Library Preparation Kit (Illumina, San Diego, CA, USA). The cDNA sample was sequenced at Macrogen (Gangnam-gu, Seoul, Republic of Korea) using an Illumina HiSeq 2000 paired-end platform. The raw reads were quality trimmed and assembled de novo using MEGAHIT software [16]. The resulting contigs that were closely related to viral sequences were retrieved from an in-house viral RefSeq database using BLASTx. To extend the assembled sequences as far as possible, generated/trimmed reads were mapped back to the respective viral genomes using Geneious 11.1.5 software [17], which was also used for genome annotation. Open reading frames (ORFs) were confirmed using a BLASTx search against the NCBI non-redundant protein database (08/2021).
The NGS resulted in 56,210,608 total reads, 138,798 of which were considered virus-related sequences. De novo assembly of these viral reads generated a consensus sequence that was 10,632 nucleotides in length (GenBank accession number OK413669, Supplementary Material S1). The genome coverage was 1,440X. A single ORF of 8,913 nucleotides encoding a large polyprotein was predicted, in addition to a 5'-UTR (1,158 nucleotides) and a 3'-UTR (561 nucleotides). The 5' and 3' ends of the viral genome were confirmed by rapid amplification of cDNA ends (RACE) using 5' and 3' RACE System for Rapid Amplification of cDNA Ends, version 2.0 (Thermo Fisher Scientific) according to the manufacturer's protocol (data not shown). The amplified 5' and 3' products were sequenced using a MinION platform, using a Rapid RBK110.96 kit, following the manufacturer's instructions (Oxford Nanopore Technologies), and the sequences were analyzed using Geneious 11.1.5 software. Functional regions of the polyprotein encoding structural and non-structural proteins flanked by putative proteolytic cleavage sites were identified by sequence alignment (Fig. 1). From these data, the genome organization of the novel virus clearly resembled that of other members of the family Iflaviridae [18,19].
An amino acid sequence alignment made using the MAFFT method [20] showed that this putative polyprotein was 32.13% identical to the corresponding sequence of a putative iflavirus (QKN89051.1) found in samples from wild zoo birds in China. The International Committee on Taxonomy of Viruses (ICTV) [19] has established two demarcation criteria for creating new species in the genus Iflavirus: (1) natural host range and (2) amino acid sequence identity in the capsid protein under 90% [21,22]. We therefore conclude that this new picorna-like virus should be recognized as a member of a new species in the genus Iflavirus. We have tentatively named this putative new virus "Anthonomus grandis iflavirus 1" (AgIV-1).
Phylogenetic analysis was performed based on whole genome sequences of the putative novel virus and those of other iflaviruses (Supplementary Table S1). Sequence alignments were made by the MAFFT method [20] using the whole polyprotein coding regions of related viruses (Fig. 2). A maximum-likelihood tree was constructed by the FastTree method implemented in Geneious 11.1.5 software [17], and branch support was estimated using a Shimodaira-Hasegawa-like test. According to Silva et al. [23], iflaviruses do not form separate clades according to the insects they infect, suggesting that they did not follow the same evolutionary path as their insect hosts at the order level [23,24]. The results of this study corroborate this observation, and AgIV-1 can be seen to be ancestral within a clade with three iflaviruses found in lepidopteran hosts (Fig. 2).
The identification of new viruses infecting cotton boll weevil increases our knowledge of their diversity and evolution and provides new information that could lead to the development of biotechnological tools for its control. Few iflaviruses have been associated with symptoms or acute disease [25,26], with deformed wing virus of honey bees being an exception [27]. Future studies with this new picorna-like Fig. 2 Phylogenetic analysis of the new picorna-like virus, tentatively named "Anthonomus grandis iflavirus 1" (AgIV-1), identified in cotton boll weevil (Anthonomus grandis) and other viruses belonging to the families Iflaviridae and Dicistroviridae. The midpoint-rooted maximum-likelihood phylogenetic tree was constructed based on whole polyprotein sequences, using the FastTree method implemented in Geneious 11.1.5 software. The branch support was estimated using a Shimodaira-Hasegawa-like test.
virus will be focused on its prevalence, infectivity, and possible use as a viral vector to carry interfering RNA for biological control strategies.