New Volvovirus Isolate From Domesticated House Cricket, Acheta Domesticus (Orthoptera: Gryllidae)

Background DNA viruses have a broad variety of genetic material. Most viral DNA experiments focused on those with recognized pathogenic characteristics. Herein, we employed hybrid approach to systematically identify viral DNA from the Acheta domesticus genome and also curated primer library to reconrm the infection of Acheta domesticus volvovirus (AdVVV) from A. domesticus samples obtained from breeding facility in Thailand. Methods and Results The AdVVV nucleotide sequence was anchored and examined from genome sequence of A. domesticus. Subsequently, we sequenced the overlapping amplied DNA to assemble the whole genome of AdVVV isolate. The genome sequence began with the putative nonanucleotide origin of replication (1-TAGTATTAC), and had four open reading frames. The circular nature of AdVVV was conrmed as typical stemloop assembled by complementary initial and ending nucleotides sequences. The newly discovered volvovirus isolates from Thailand is also highly homologous (97.34 -98.77 %) with previously identied volvovirus sequences and having identical gene organization. This correlation is particularly surprising considering that the identied volvovirus considerably mutated compared to previously discovered volvovirus isolates. phylogenetic


Introduction
Insect based products have received much attention over the last two decades due to their high nutritional values [1]. Among many edible insects, Acheta domesticus (house crickets) gradually becoming popular around Thailand due to their nutty taste and crunchy texture. Numerous breeding facilities are rearing insects due to their high economical value. Usually, these breeding facilities are having enclose vessels for rearing insects. These captive breeding environments are very susceptible to epizootic diseases [2].
A. domesticus is a commercially important species of eld cricket and has been dominating insect breeding industry for over two decades in Thailand [3]. The infectious viruses often cause fatal disease triggering mortality and less vitality in the respective host population. The common problem in commercial insect breeding is mortality or reduction in yield. Cricket rearing needs a healthy atmosphere to protect them from environmental or natural hazards. Preventing insect pathogen entering in the breeding environment is important to ensure heath of rearing insect population [4]. Insects are susceptible to the several kinds of microorganism including viruses that can lead mortality to the host insect population like A. domesticus densovirus (AdDNV) decimated house cricket in North America in 2009 [5]. Due to the heavy losses in the A. domesticus production caused by AdDNV in USA and also in Japan after 2009, the Jamaican cricket (G. assimilis) has been introduced in breeding facilities as an alternative cricket for the sustainable production. Besides AdDNV, new viruses have been discovered which infects various species of cricket. In the report of a single-stranded circular DNA volvoviruses, researchers hypothesized that this ssDNA viruses infects several species of crickets. One of rapidly evolving ssDNA virus is Acheta domesticus volvovirus (AdVVV). The volvovirus isolates were found in samples of dead house crickets from Japan and Jamaican crickets from the USA [6][7][8]. These rapidly evolving AdVVV isolates are not related to cycloviruses, circoviruses, geminiviruses or nanoviruses [9], could belong to new a family or genus.
Screening of pathogen such as viruses in breeding centres can be part of preventive care before it contaminates population and sub-populations. However, screening of large number of individuals from populations requires the optimized and reproducible protocol for speci c virus extraction and detection.
The detection of pathogen without having prior information about symptoms can laborious with multidisciplinary procedures. The known infectious viruses for A. domesticus are A. domesticus densovirus (AdDNV), A. domesticus mini ambidensovirus (AdMADV), and A. domesticus volvovirus (AdVVV) [10,11]. These viruses are having DNA as genetic material. Known infectious viral genomes of A. domesticus are made of DNA, and it is apparent to nd their DNA in the infected host. Recent advancement is metagenomics and bioinformatics has repositioned approaches towards discovery and examination of novel viruses. But, it becomes di cult to classify rapidly evolving viruses with large spectrum host infectivity. Previously, studys were conducted to sequence volvovirus isolates (AdVVV-Japan, and AdVVV-Ga) in 2013 by a group of researchers [7], and they suggested that these two volvovirus isolates are closely related to previously discovered single stranded DNA virus isolate (AdVVV-IAF) [6]. A recent review literature on the circular DNA virus also mentioned the novel AdVVV and validated that ssDNA viruses are widespread and infects several terrestrial insects and invertebrate species [12].
The four volvovirus genomes that have been reported to date, including CrACV-1 [13]. These four volvovirus genomes are very identical (>99%) suggesting single viral family. Past studies on volvoviruses only proposed genome organization of this new family or genus which is rapidly evolving ssDNA virus.
However, the mechanism of evolution of this virus is still unexplained and yet to be understood. For AdVVV, this mechanism is needed better understanding for prevention in case of pathogenicity that might cause mortality in breeding facility. Therefore, it is essential to sequence and compare the viral DNA.
Next-generation sequencing technology like HiSeq X Ten is capable of sequencing short DNA reads from the whole-genome. A number of DNA viruses have been identi ed from host genome [14]. These short reads being assembled into long DNA sequences, which can be good source of DNA sequences come from foreign DNA if host is infected. A number of DNA viruses have been identi ed from host genome [14]. In our previous study, we sequenced whole genome of A. domesticus to develop microsatellite markers [15]. In present study, we utilised same genome database to detect whether any DNA particles from any known viruses are incorporated in genome. Preliminary in silico analysis provided evident that A. domesticus infected with AdVVV. We extended study to develop PCR based method to detect AdVVV as secondary con rmation. We developed primer library in order to amplify several DNA fragments. It is important to amplify multiple DNA fragments as AdVVV is mutating and might alter the primer biding sites that lead to unsuccessful ampli cation. The primer pairs were used to generate PCR products to con rm the viral infection in A. domesticus samples which were collected from breeding facilities. The same primer pairs were used to produce DNA fragments with overlapping sequence in for whole viral genome assembly. Unique strategy is used in present research that might serve as new mode of pathogen identi cation and viral genome sequencing. We have sequenced and annotated genome of newly discovered AdVVV isolate from Thailand, and developed primer library. Additionally, provided investigation also demonstrates phylogenetic analysis with previously discovered AdVVV isolates to facilitate the better classi cation of volvovirus found in domesticated A. domesticus from breeding facility. However, these ndings might not be entirely representative of the evolutionary mechanism, but provides the backbone for AdVVV identi cation and the phylogenetic analysis based on different segments of DNA.

Viral DNA sequence detection
A. domesticus genome assembly was used in present study for viral DNA sequence detection (GenBank accession number: JAAVVF000000000). Further details methodology used for DNA extraction, sequencing can be found in our previous research [15]. The local database was made using NCBI BLAST+ makeblastdb tool [16]. In present study, three viral genomes, A. domesticus densovirus (AdDNV), A. domesticus mini ambidensovirus (AdMADV), and A. domesticus volvovirus (AdVVV-Japan) were downloaded from NCBI nucleotide database with accession number NC_004290.1, NC_022564.1 and NC_021074.1, respectively. Single local database of three viral genomes was made using BLAST+ makeblastdb tool [16]. The nucleotide database of A. domesticus and virus genomes were compared for sequence similarity search using NCBI BLAST+ blastn [16]. Only one sequence was found to be having high similarity with isolate AdVVV-Japan. The anchored sequence GenBank accession number is JAAVVF010229171.1. BLASTn tool to inspect nucleotide database and BLASTX tool to inspect protein database from National Center for Biotechnology Information (NCBI) [17] were used to search similar sequences for secondary con rmation of nucleotide sequence extraction form whole genome of A. domesticus having similarity with viral genome sequence.
Originally, whole genome of A. domesticus was sequenced from samples collected from farm located at Nakhon Ratchasima, Thailand [15]. In order to check the viral infection in breeding facility, more samples of A. domesticus were collected from the same farm and their DNA extracted to check viral infection using PCR primers. Modi ed conventional DNA extraction method was used to extract DNA from twelve A. domesticus individuals. PCR primers were developed using viral query sequence (GenBank accession: KC794540.1) and anchored sequence from whole genome data (GenBank accession: JAAVVF010229171.1). The PCR primers were design using Primer3 (web-based primer designing tool) [18] using default parameters. However, PCR primers were manually selected to produce overlapping DNA fragments. The primer pairs used to check infection and PCR ampli cation of overlapping DNA fragments is given in Table 1. 2.2 Viral DNA identi cation DNA samples of twelve samples of A. domesticus were randomly selected from farm located at Nakhon Ratchasima, Thailand were used to inspect viral DNA particles using primer pair VV1. The ampli ed DNA was veri ed using gel electrophoresis (1.5 % Agarose gel).

Viral genome sequencing and assembly
To assemble viral genome and check its circular folding, the same DNA sample from single cricket was used in order to amplify overlapping DNA fragments using all primer pairs. These PCR products with overlapping regions were sequenced by 1st BASE DNA Sequencing Division, Malaysia. The sequenced DNA fragments were analysed and assembled in whole circular genome using CAP3 (de novo assembly) [19]. The draft DNA assembly was compared with reference sequence in order to validate the newly assembled viral genome organization.

Phylogenetic analysis
The sequence analysis was also followed by phylogenetic analysis for complete genome sequence with additional genome sequences from GenBank. Multiple sequence alignment was performed using MUSCLE algorithm. Evolutionary analyses were conducted in MEGA X [20].

in silico sequence analysis of viral DNA
Previously published genome assembly of A. domesticus (GenBank accession: JAAVVF000000000) used to detect sequence resemble with any of three viral genome sequences which is targeted in present study. A. domesticus genome dataset is having contig length from 500 bp to 50,990 bp with an average length of 1,309 bp with zero count of ambiguous nucleotide. The genome sequence quality showed promising insights for viral sequence detection. Therefore, downloaded viral genomes, which are A.domesticus densovirus (AdDNV), A.domesticus mini ambidensovirus (AdMADV), and A.domesticus volvovirus (AdVVV-Japan) were used for similarity search against A.domesticus genome. Local nucleotide last result reveals that A.domesticus whole genome sequence is consisting viral DNA sequence which is highly analogous with A. domesticus volvovirus (AdVVV-Japan) (GenBank accession: NC_021074.1).
The anchored sequence from local nucleotide blast (GenBank accession: JAAVVF010229171.1) showed multiple hits with AdVVV-Japan genome sequence. AdVVV-Japan is single-strand DNA virus with 2517 bp sequence length [7]. Signi cant sequence similarity in single contig is evident the presence of AdVVV DNA in the genome of A.domesticus used in the present study. The length of the matched sequence is 2587 bp and nucleotide sequence that is very close to the sequence of A.domesticus volvovirus (AdVVV-Japan), which is 2517 bp. AdVVV-Japan genome sequence and anchored sequence produced three signi cant alignments with high identity and low gaps in the alignment (Table 2).
Subsequently, Anchored sequence was extracted from the database for further examination using NCBI server with BLAST for nucleotide and protein sequence similarity search. The nucleotide BLAST shows the topmost similar sequences are virus nucleotide sequences, which are A. domesticus volvovirus isolates and having percentage identity between 82.21 to 82.45 % ( Table 3). The protein BLAST reveals that (GenBank accession: JAAVVF010229171.1) query sequence is having multiple hits with A. domesticus volvovirus protein sequences. The protein BLAST was restricted to the protein sequences assigned for AdVVV-Japan (NC_021074.1). Result revealed that query is coving all four hypothetical proteins (GenBank accession: YP_007878130.1, YP_007878129.1, YP_007878131.1, YP_007878128.1).
The hypothetical proteins alignment having accession YP_007878128.1 is divided into two segments located in the starting and in the end part of the query. This indicates the circularity of the anchored nucleotide fragment.
The identi ed nucleotide sequence conspicuous the presence of AdVVV in the house cricket sample which was used for whole genome sequencing. The nucleotide and protein blast also showed a strong correlation and authenticated that the viral DNA is also pooled into whole genome sequence of A. domesticus. The A. domesticus DNA sample used for whole genome sequencing was obtained from insect breeding facility, that leads to higher probability of AdVVV infection in breeding population.
Therefore, further analysis was necessary to substantiate the presence of AdVVV in breeding population of A. domesticus from Nakhon Ratchasima farm.

in vitro sequence analysis of viral DNA
Initially, only one primer pair was employed to check presence of viral DNA in total DNA extracted from twelve house cricket samples using PCR technique. The primer pair VV1 successfully ampli ed DNA fragments with expected size, which has proven AdVVV infection in breeding facility (Fig. 1). Results reported by Pham et al. in 2013a suggests that AdVVV is rapidly evolving single stranded DNA virus and have different strains. In present study, we also encounter mismatches in sequence alignments (see Table  2). Moreover, we also found gaps in alignment that could be due to sequencing or sequencing error.
Hence, we developed primer pairs targeted for viral DNA to generate overlapping fragments that can be sequenced and assembled into full genome of AdVVV isolate from Thailand. The assembled genome is deposited to NCBI genome database under GenBank accession: MW288623. The Thai isolate, as we named AdVVV-Thailand is containing 2,516 nucleotides. The newly assembled genome is having same architecture compared with other AdVVV isolates (Fig. 2a). Genome contains four open reading frame (ORFSs) of potential coding sequences (CDS). CDSs are previously assigned for four hypothetical proteins. According to reference genome sequence of AdVVV (Accession number: NC_021074 identical to KC794540), size of these four protein coding genes is the same with CDS present in newly assembled AdVVV isolate from Thailand and also coding for same length of peptides (Table 4). The hairpin structure (stemloop) producing sequence contains thirty-two nucleotides (at the end of genome 2504-2516, and at the beginning of genome [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. We also designed primer pair (VV8, see Table 1) to obtain PCR product containing the single nucleotide fragment which is responsible for formation of hairpin structure with putative nonanucleotide origin of replication (1nt, 5'-TAGTATTAC-3') (Fig. 2b). This evolutionary conserved sequence found in other viral genome as recognition site for the replication protein [21]. The primer pair VV8 successfully mapped origin of replication in PCR product of 315 bp fragment. The origin of replication and ending part of viral genome sequence adjacent to each other in sequenced fragment. The newly assembled genome sequence was utilized to conduct phylogenetic analysis with other isolates of AdVVV. The phylogram shows the scale based on genetic change in AdVVV isolates (Fig. 3). Three AdVVV isolates clustered together, AdVVV-IAF, Circular virus 1, and AdVVV-GA, respectively. AdVVV-Japan and AdVVV-Thailand are genetically more related to each other than cluster of other three isolates, However, AdVVV-Thailand isolate is distanced from all four isolates implied by higher observed sequence change in genome sequence.
Further analysis was conducted using four gene sequences of AdVVV. Each gene sequence from ve AdVVV isolates were aligned to distinguish them on bases of variable sites, singleton sites, and parsimony informative sites ( Table 5). Most of genetic variation are singleton type. The test revealed no signi cant difference in ORF4, ORF2, and ORF3 sequences compare to ORP1 sequences. 85.33 % variable sites were constrained to ORF1. As OFR1 found to be most variable gene, the phylogram was reconstructed to ensure genetic distance (Fig 4). The phylogram generated by employing ORF1 gene sequences is happened to be very identical with phylogram generated using whole genome sequences.

Discussion
This study appears to be the rst to nd AdVVV isolate from Thai A.domesticus sample. We used whole genome sequence of A.domesticus to nd viral DNA [15]. In previous research of RNA viruses in human and invertebrates, researchers used bioinformatic approach to detect viral transcripts from transcriptome [22,23]. In present research we used genome sequence and reference viral DNA sequence to detect the viral DNA. Present In silico approach made virus detection easier and less laborious. However, the assembling the whole virus genome of AdVVV isolate was need. The anchored sequence of viral DNA has con rmed the presence of viral DNA in genome, but it was inadequate for assembling the whole viral genome. Moreover, the A.domesticus sample used for genome sequencing was collected from breeding facility, systematically, we also collected samples from same breeding facility to verify the infection. Our result provided the rst proof of infection in Thai breeding facility. Additionally, PCR primer used in present study will also facilitate volvovirus detection from cricket species as AdVVV has potency to infect several cricket species [7]. Additionally, researchers found AdVVV isolates from dead cricket samples [6], but the pathogenicity of AdVVV isolate is not evaluated till the date due to limited knowledge of this evolving virus. AdVVVs are single stranded DNA viruses, could transmit vertically to their predators. Recently, researchers found single stranded DNA virus in a spider species, and they suggested vertical transmission of viral partial from prey to the predators [24]. Thus, vertical transmission of virus rises question of food security as crickets commonly used as food and feed. Besides vertical transmission to other species, virus can transmit within species through sexual behaviour [25] or cannibalism [26]. Furthermore, recent research proposed that breeding facilities in Thailand exchanges cricket eggs to maintain genetic variation, in such case, virus spread becomes pervasive in inbreeding population [15].
The size of AdVVV-Thailand is 2,516 bp. Previously found volvovirus isolates have two genomes sizes, 2,516 bp (AdVVV-GA, Cricket associated circular virus 1) and 2,517 bp (AdVVV-IAF, AdVVV-Japan). We utilized whole genome sequences of all ve isolates in phylogenetic analysis to track the evolutionary the history. Result showed that Cricket associated circular virus 1 is positioned between AdVVV-IAF and AdVVV-GA; which is also a volvovirus isolate. Three genome which is clustered together were isolated from crickets (samples from America). The Japanese AdVVV isolate is separated from the clustered AdVVV isolates and related Thai AdVVV isolate found in present study. Consequently, the polygenetic evaluation is also comparable with geographical location from where crickets were collected and used for AdVVV genome sequencing, where clustered isolates are from America, Thai isolates is neighboured with Japanese isolate. After conducting variation analysis on four CDS from ve genomes, ORF1 found to be most variable CDS which is previously designated for hypothetical protein (GenBank: AGJ03170.1)/putative capsid protein (GenBank accession: AXL65914.1). Our study suggests that AdVVV genome (ORF2, 3 & 4) is considerably conserved compared to diversity observed in ORF1. The construction of the ssDNA virus group is well conserved, but genomes may have high genetic variation [27]. Secondly, ssDNA virus genome replication is well conserved compared to capsid protein sequence [28]. On other hand, ORF2 found to be well conserved compared to other ORFs from ve isolates of AdVVV. BLASTx was used to examine the conserve nature of OFR2 with other viruses. According to BLASTx (limited to viruses) result, ORF2 showed alignments with sequences designated for replication associated protein or putative Rep protein for ssDNA viruses. The sequences coverage ranged from 98-88% and percentage identity ranged from 42.91-28.36% (AdVVV alignments were excluded). The majority of ssDNA viruses are recognised on bases of conserved segment that codes for Rep protein, and structure proteins are not optimal due to high diversity [27]. However, highly conserved DNA segments would not be an ideal choice for evolutionary analysis within family or genera, where conserved regions are identical but structural proteins are variable. In case of AdVVV isolates, we utilised ORF1 to established phylogenetic relation among ve isolates, including newly discovered AdVVV isolate in present study.

Conclusion
Our study suggests that, OFR2 would be optimal segment of genome to recognise AdVVV isolates, but ORF1 would be better for phylogenetic analysis among AdVVV isolates as phylogram produced using whole genome sequence and ORF1 are nearly identical. Phylogram revelated that three isolates clustered in one clade and other branched into taxon. Accordingly, we proposed that volvovirus could be the family of rapidly evolving ssDNA virus, which reasoning is also supported by variation analysis of AdVVV isolates. Due to limited studies on AdVVV, researchers were perplexed to determine the status to separate these viruses into family or genus [6], and in our study we attempt to clear ambiguity. We intensively utilized genome and ORF segments for classi cation study like previously utilized to classify family of Genomoviridae into nine genera [29]. Thus, it is important and necessary to classify these cricket viruses for viral taxonomy and for better understanding of AdVVV evolutionary mechanism in case of virulent strain present/evolve in environment, which can cause mortality in cricket breeding facility.

Author contributions
All authors contributed to the study conception and design. Material preparation, insect sample collection and analysis were performed by YMG and SH. The rst draft of the manuscript was written by YMG and corrections were made and approved by SH. All authors read and approved the nal manuscript.  Note*: KC794540.1 is the accession for AdVVV-Japan, the reference sequence accession for AdVVV-Japan is NC_021074.1