De novo genome assembly
Flow cytometry revealed that the genome size of A. viridicyanea ranged from 836.3±13.8Mb in females to 795.7±8.3Mb in males. We generated and assembled 187.3× coverage from Illumina short reads and 72.7× coverage via PacBio long reads from 157 female adults, thus creating a draft genome reference assembly of 864.8 Mb consisting of contig and scaffold N50s of 92.8 kb and 557.2 kb, respectively. The GC content was 31.67%. The size of the A. viridicyanea genome was larger than 85% of the currently published beetle genomes (Table S1). The draft genome assembly of A. viridicyanea was contained within 17,580 contigs that were assembled into 4,490 scaffolds, with the longest scaffold size of 5.6 Mb. Using the reference set of 1,658 insect BUSCOs, the genome contains 98.6% complete single-copy orthologs and multi-copy orthologs; using the reference set of 2,442 Endopterygota BUSCOs, our genome contains 95.8% complete single-copy orthologs and multi-copy orthologs (Table 1). Together, the results of the above analyses indicate that the genome of A. viridicyanea is a robust assembly. The estimated heterozygosity in the Illumina reads was about 0.70% ~ 0.96%, depending on k-mer size (k-mer 17, 19, 21, 23, 25 and 27).
We used these PacBio RNA-seq data to evaluate the genome assembly. Of the 13,550 polished reads, 60.46% could be successfully mapped to the genome. Transcripts that were unmapped or mapped with coverage or identity below the minimum threshold were partitioned into 1,177 gene families based on k-mer similarity. Of the 1,177 gene families, 121 could be mapped to the assembled genome, and hits to sequences from other species were found for 13 gene families. Blastn revealed that 1,032 of the remaining gene families were similar to sequences from plants which may represent DNA contamination by plant material during the DNA isolation step.
Genome annotation
Prior to gene prediction using the assembled sequences, repeat sequences were identified in the genome of A. viridicyanea. The repetitive sequence content was about 62.91 % of the assembly, which was similar to that of the cowpea weevil Callosobruchus maculatus (64%), lower than that of the ladybird Propylea japonica (71.33%), and much higher than that of other beetle species (Table S1). Most of the repetitive sequences were transposable elements. According to a uniform classification system for eukaryotic transposable elements [37], retrotransposons (Class I) accounted for 41.27% whereas DNA transposons (Class II) accounted for 26.24% of the genome (Table 2).
To check whether the PacBio reads could span most of the repeats (transposons here), we aligned the PacBio reads to the assembled genome. Focusing on the primary alignment only, there were 89.01% (5,792,616/6,507,752) of reads successfully mapped to the genome. We found that 99.33% (1,913,673/1,926,492) annotated transposons are fully covered by at least one read. Of these regions fully spanned by PacBio reads, the longest one was 29,177 bp. The result shows the length distribution of repeats fully covered and repeats not fully covered by reads. It is clear that repeats fully covered by reads are significantly shorter than those repeats not fully covered. So, we suggested longer reads could help to resolve these regions.
The integration of de novo, RNA-seq-based and homology-based gene prediction methods identified 17,730 protein-coding genes in A. viridicyanea (Table 3, Fig. S1), a number slightly less than the average of beetle species with available genomes (~18,600 genes on average, Table S1). In total, 16,625 genes were assigned to putative functions, accounting for approximately 93.77% of the predicted genes (Table S2), and 750 putative pseudogenes were identified (Table S3). There were 2,462 non-coding RNA models identified, including 45 miRNAs, 1093 rRNAs, and 1324 tRNAs, corresponding to 32, four and 24 gene families, respectively.
Phylogenetic analysis
We estimated the phylogenetic relationships of A. viridicyanea samples and an additional nine representative beetle species (Anoplophora glabripennis, Aethina tumida, Agrilus planipennis, Dendroctonus ponderosae, Diabrotica virgifera virgifera, Leptinotarsa decemlineata, Nicrophorus vespilloides, Onthophagus taurus and Tribolium castaneum). In total, 14,854 orthologs in A. viridicyanea clustered with the other nine representative beetle species. We identified 1,321 A. viridicyanea specific genes, corresponding to 470 gene families, and with the exception of Diabrotica virgifera virgifera, this number was much greater than the other representative beetle species included in this analysis (Table S4). The phylogenetic relationships were consistent with the results inferred from large datasets [38-40] based on 1,751 conserved single copy orthologs. For example, A. viridicyanea, Diabrotica virgifera virgifera and Leptinotarsa decemlineata, all belonging to chrysomelids, formed a clade, and these species clustered with Anoplophora glabripennis, a member of the superfamily Chrysomeloidea (Fig. 2). The estimated divergence time between A. viridicyanea and Diabrotica virgifera virgifera was about 74.7 million years ago. From this analysis, we also identified 155 gene families that expanded and 27 gene families that contracted along the A. viridicyanea lineage (Fig. 2). Some of these gene families were related to chemosensory and detoxification functions.
Chemosensory gene families
In many herbivorous insects, feeding, mating and oviposition behaviors are mediated by chemical cues [41]. The chemosensory system may also play important roles in speciation of some insects [42-44]. This is likely the case in A. viridicyanea as previous work has shown that this highly specialized beetle primarily uses chemical cues to achieve sexual isolation from its sibling species [8]. Furthermore, these contact chemicals also act as a mating signal to discriminate intraspecific variation in sexual maturity [9]. In addition, chemical cues are modified by and likely involved in host plant choice [8, 10]. Consequently, we investigated A. viridicyanea gene families known to be involved in chemosensory signaling in insects.
There are at least five gene families involved in the detection of chemicals, including three receptor families, odorant receptors (ORs), gustatory receptors (GRs) and ionotropic receptors (IRs), and two protein binding families, odorant binding proteins (OBPs) and chemosensory proteins (CSPs). These receptor families are usually expressed in insect olfactory sensory neurons and are involved in the detection of a suite of chemicals. For instance, volatile chemicals are detected by ORs [45-47], contact chemicals or carbon dioxide are detected by GRs [48], and nitrogen-containing compounds, acids, and aromatics are identified by IRs [49]. In contrast, the binding protein gene families are highly abundant in the sensillar lymph of insects and usually function as carriers of hydrophobic scent molecules to the receptors [50, 51].
In the genome of A. viridicyanea, we identified 195 putative chemosensory genes and two pseudogenes. Perhaps not surprisingly, the gene repertoire of the monophagous A. viridicyanea was considerably reduced as compared to that of host generalist species such as T. castaneum (630 genes plus 103 pseudogenes) and A. glabripennis (451 genes plus 65 pseudogenes). Upholding this pattern, A. viridicyanea has fewer chemosensory genes than the oligophagous species such as Dendroctonus ponderosae (240 genes plus 10 pseudogenes) and L. decemlineata (>300 genes) that specialize on a single family of host plants (Table S5). Yet there is a single outlier to this trend; Agrilus planipennis (132 genes and two pseudogenes), a species that is intermediate in host range, has fewer chemosensory genes than A. viridicyanea. These findings are generally consistent with the hypothesis that chemosensory gene content and host specificity should correlate in phytophagous beetles [52], although there is clearly an exception to this rule.
Insect ORs are proteins with seven transmembrane domains that are involved in the detection of volatile chemicals [46, 53, 54]. The number of ORs in beetle species varies widely from 30 to hundreds of ORs [55]. When we examined the A. viridicyanea genome for the presence of ORs, we found a diversity of gene families. There were 64 ORs and one pseudogene (PseudoGene48) that were classified into eight subfamilies: Group 1, 2A, 2B, 3, 4, 5A, 5B and 7 (Fig. 3; Table S5). Following the new OR classification scheme [55], we also identified one highly conserved olfactory co-receptor, Orco, that has been found in other beetle species. Interestingly, we also found a large expansion in A. viridicyanea in Group 4 that contained 17 ORs. By comparison, no more than four Group 4 OR genes have been previously identified in any other surveyed beetle species [52, 55].
In addition to ORs, we also compared GRs across beetle taxa. Most GRs are expressed in gustatory receptor neurons in taste organs and are involved in contact chemoreception and detection of CO2 [48]. We annotated 38 GRs in A. viridicyanea, including three conserved candidate CO2 receptors, seven candidate sugar receptors, and the remaining were candidate bitter taste receptors. Simple orthology of GRs is generally rare in beetles [52], and not surprisingly, no single-copy orthologs were revealed in the species that we compared. The phylogenetic analysis showed that 2-7 GRs from each of the seven species grouped within the clade of conserved sugar receptors. Additionally, two or three genes from five of the seven species formed a clade of CO2 receptors (Fig. S2).
The number of GRs varied from 10 to 245 in the ten surveyed beetles (Table S5). Comparisons with A. viridicyanea identified as many as 147 GRs in an oligophagous chrysomelid species Leptinotarsa decemlineata, whereas fewer than 20 GRs were annotated in four other chrysomelids (Colaphellus bowringi, Ophraella communa, Pyrrhalta aenescens and Pyrrhalta maculicollis). The extremely low numbers of GRs in the latter four species is likely the result of differences in data collection—those species only had transcriptomic data available, and that approach generally does not describe the full complement of chemosensory genes. For example, a study in the longhorn beetle Anoplophora glabripennis found 11 GRs when using transcriptomic data, however, genomic data revealed 234 GRs [36, 56].
The next chemosensory receptor group that we examined was the IRs, a conserved family that evolved from a family of synaptic ligand-gated ion channels, ionotropic glutamate receptors (iGluRs) [49, 57, 58]. In insects, the IRs include two groups: the conserved “antennal IRs” that have an olfactory function, and the species-specific “divergent IRs” which are candidate gustatory receptors [59]. Our genome annotations revealed 33 ionotropic receptors (IRs). Only the members of the conserved antennal IR21a group were identified in all seven of the beetle species that we surveyed, whereas the clades IR8a, IR25a and IR76b were formed by single-copy orthologs from five species, excluding P. aenescens and P. maculicollis (transcriptomic data are available for both of these species) (Fig. 4). Furthermore, IRs from all seven species fell within the well-supported non-single-copy IR75 clade (Fig. 4). Remarkably, for A. viridicyanea, 18 of 33 IRs were clustered into a “Galerucinae+Alticinae” specific lineage that consisted of genes from O. communa, P. aenescens, P. maculicollis and A. viridicyanea (Fig. 4). These four species belong to the sister groups, Galerucinae and Alticinae, two closely related subfamilies of Chrysomelidae [60].
Finally, we examined the protein binding gene families. OBPs and CSPs are generally regarded as carriers of pheromones and odorants in insect chemoreception, and a multitude of additional functions have also been suggested such as carrying semiochemicals and visual pigments, promoting development and regeneration, and digesting insoluble nutrients [61]. OBPs are small, soluble proteins with six conserved cysteines [50]. Although the detailed mechanisms remain unclear [62], it is believed that OBPs deliver hydrophobic molecules to the receptors [50]. In A. viridicyanea, we annotated 48 putative OBP genes and one pseudogene (PseudoGene855). Among these, 34 genes belonged to the Minus C OBPs. We found four clades of classic OBPs, which include single-copy orthologs from each of the seven species in the analysis, e.g., Classic I, IV, VIII and IX. In clades VII and X, two copies from Dendroctonus ponderosae were clustered with single-copy orthologs from other six species. The clades of Classic II, III, V and VI were formed by single-copy orthologs from 5-6 species (Fig. S3). Plus-C OBPs were not found in A. viridicyanea, and are also absent in the Pyrrhalta species that belong to the “Galerucinae+Alticinae” taxonomic group.
CSPs are characterized by the presence of four cysteines that form two disulfide bridges [63]. We annotated 12 CSP genes in A. viridicyanea. The phylogenetic analysis revealed that only one clade (clade 1) was formed by single-copy orthologs from the eight species surveyed. Clades 2-7 were formed by single-copy orthologs from 5-7 beetle species. In these lineages, the absence of IRs from transcriptomic sources (e.g., P. aenescens, P. maculicollis and O. communa) was more common whereas the orthologs of A. viridicyanea also lacked members of clade 5 (Fig. S4).
Similar to previous work on GRs, transcriptomes often fail to describe the full set of chemosensory genes due to low expression, spatiotemporal variation in expression, or shallow sequencing depth. For instance, 106 chemosensory genes were detected in Anoplophora glabripennis using transcriptomic sequencing [56] whereas more than 500 chemosensory genes (65 pseudogenes included) were annotated from its genome [52].
Detoxification supergene families
Novel plant secondary compounds often present a challenge for herbivorous insects, and physiological adaptation to novel plant secondary metabolites is a key problem. The detoxification and metabolism of most xenobiotics occurs via a common set of detoxification-related enzymes, all of which belong to multigene families [64]. The cytochrome P450s (P450s), carboxyl/cholinesterases (CCEs), and glutathione S-transferases (GSTs) are widely regarded as the major insect gene/enzyme families involved in xenobiotic detoxification [65-67]. In addition, the UDP-glucuronosyltransferases (UGTs) and ATP binding cassette transporters (ABCs) can also play a role in detoxification [68-70]. This diversity of detoxification enzymes is critical for many herbivorous insects [18, 71] as their diets often contain a suite of plant chemicals that can be toxic, reduce palatability, or slow development time.
The host plant of A. viridicyanea is Geranium nepalens which has a number of chemical defenses such as tannins, flavonoids and organic acids [72]. As a strict specialist, then, A. viridicyanea likely has adaptations that allow them to detoxify these chemicals. Indeed, we annotated 225 detoxification enzymes spanning all three families (101 P450s, 97 CCEs and 27 GSTs). Expansion and contraction of these gene families are considered important in adaptive phenotypic diversification [73]. Furthermore, meta-analyses have established that the size of the P450, CCE and GST gene families are correlated with insect diet breadth [66, 67, 74]. In contrast with these studies, we showed that although A. viridicyanea has a greater number of detoxifying genes than that of the closely-related oligophagous Leptinotarsa decemlineata (197 genes) [21, 67], it has fewer detoxifying genes than generalist T. castaneum (275 genes) [26, 75].
Insect cytochrome P450 proteins are important in both xenobiotic detoxification and synthesis and degradation of endogenous molecules such as ecdysteroids and juvenile hormone [76-78]. In insects, the cytochrome P450 family is divided into four major clades: the mitochondrial P450 clade, the CYP2 clade, the CYP3 clade, and the CYP4 clade [79]. We found 101 P450s in Altica viridicyanea spanning all four clades: five in the mitochondrial clade, seven in the CYP2 clade, 53 in CYP3 clade, and 36 in CYP4 clade (Fig. 5, Table S6). We found that a majority of these genes belonged to the CYP6 and CYP9 subfamilies of the CYP3 clade and the CYP4 subfamily of the CYP4 clade (Table S7). These P450 subfamilies are known to be involved in detoxification of plant allelochemicals as well as resistance to pesticides [21, 80-82].
In addition to the cytochrome P450s, the A. viridicyanea genome also contained 97 genes encoding putative CCEs (Fig. S5), which is slightly fewer than that of L. decemlineata (102), but more than that of the other eight beetle species that were included in the analysis (ranged from 44 to 82) [67]. The dietary/detoxification group included two clades: coleopteran xenobiotic metabolizing CCE (clade A) and ɑ-esterase type CCEs (clade B) [83]. In A. viridicyanea, there is a noteworthy expansion (62 genes) in clade A, whereas we did not identify any genes from Clade D (integument esterase), F (juvenile hormone esterase), or I (unknown function) (Fig. S5; Table S8).
Another group of detoxification enzymes that we examined are the GSTs. GSTs are involved in many cellular physiological activities, such as detoxification of endogenous and xenobiotic compounds, intracellular transport, biosynthesis of hormones and protection against oxidative stress [75, 84]. Insect GSTs are divided into two major groups, the cytosolic and the microsomal GST genes. The cytosolic group is further divided into six classes: Delta, Epsilon, Sigma, Omega, Theta, and Zeta [85]. The Delta and Epsilon classes are thought to be insect-specific [75, 84, 86], and members of the Epsilon subfamily are commonly involved in detoxification of xenobiotics [87]. We detected a total of 27 GST genes in A. viridicyanea (Fig. S6; Table S9). Both the total number and the number of detoxification-related Epsilon subfamily in A. viridicyanea were lower than that of most beetles [67]( Table S9).
UDP-glycosyltransferases (UGT) catalyze the conjugation of a range of diverse small lipophilic compounds with sugars to produce glycosides, playing an important role in the detoxification of xenobiotics and in the regulation of endobiotics in insects [68]. From 17 (Oryctes borbonicus) to 65 (Anoplophora glabripennis) UGTs were identified in the nine beetle species surveyed [3, 67]. Currently, the largest repertoire of UGTs in beetles was found in the polyphagous longhorn beetle Anoplophora glabripennis, with 65 putative UGT genes and 7 pseudogenes [36]. The expansion of UGTs in A. glabripennis is thought to be related to its ability to feed on a broad range of host plants [36]. In line with this, we annotated 32 UGTs in the A. viridicyanea genome. A number of UGT50s were identified in this species, which has been suggested as the most conserved UGT in insects [68], and we also observed a remarkable expansion in the UGT324 family (Fig. S7, Table 10).
Most ABC proteins engage in active transport of molecules across cell membranes. The ABC transporters are well-known components of various detoxification mechanisms across all phyla [88, 89]. In the present study, we identified 69 putative ABCs in A. viridicyanea, belonging to eight subfamilies (A to H). This is a similar number to two specialist species of chrysomelids, Chrysomela populi and Diabrotica virgifera virgifera, (65 in each based on transcriptomic data) (Table S11). The gene numbers of the conserved subfamilies D, E and F were consistent with other beetles analyzed (Table S11); however, the number of genes in subfamilies B and C in A. viridicyanea (46) are the highest among the five species with which we compared (Table S11; Fig. 6). These subfamilies are known to be involved in detoxification processes [64, 90].
Plant cell wall-degrading enzymes
Early views of insect digestion postulated that insects lack the endogenous enzymes required for plant cell wall (PCW) digestion, and that PCW digestion by insects depended on exogenous enzymes from symbiotic microorganisms [91]. Recent studies, however, have revealed that endogenous PCW degrading enzymes are present in many insects and are important in the digestion of cellulose, hemicelluloses, and pectin in PCW [40, 92]. In fact, these enzymes are likely a key innovation in the adaptive radiation of herbivorous beetles. Some insect PCW-degrading enzymes are also involved in immune-defense responses and detoxification [40, 92].
Beetle-encoded plant cell wall-degrading enzymes are carbohydrate esterases (CE), polysaccharide lyases (PL), and mainly glycoside hydrolases (GH) [40]. In A. viridicyanea, we identified 65 putative glycoside hydrolases, including 35 GH1 genes, 10 GH45 genes, two GH48 genes and 18 GH28 genes (Table S12). Genes of GH1 originated anciently in animals and are ubiquitous in beetles. The species of Phytophaga (i.e., Chrysomeloidea and Curculionoidea) examined thus far have a greater number of GH1 genes than A. viridicyanea, for example, 228 were found in Diabrotica undecimpunctata, 135 in Mastostethus salvini, and 136 in Rhynchitomacerinus kuscheli [21, 36, 40]. For another ancient and ubiquitous gene family, GH9, there are at least a dozen independent losses in beetles [40]. GH9 was not detected in A. viridicyanea, along with 4 of 7 other chrysomelid species (Callosobruchus maculatus, Donacia marginata, Diabrotica undecimpunctata and Leptinotarsa decemlineata) [21, 40].
The other plant cell wall-degrading gene families (CE8, PL4, GH32, GH5, GH10, GH43, GH44, GH45, GH48 and GH28) are suggested to be obtained from bacteria and fungi via horizontal gene transfer, and are mainly found in Buprestoidea and Phytophaga, with scattered genes in a few other taxa [40]. In A. viridicyanea, in addition to the ubiquitous GH1 genes, three families of PCW-degrading enzymes were identified, including cellulose degrading GH45 and GH48, and pectin degrading GH28. These observed gene numbers are similar to that of closely related species in the Chrysomelinae (Oreina cacaliae and Leptinotarsa decemlineata) and Galerucinae (Diabrotica undecimpunctata) [21, 40].