De novo genome assembly
Flow cytometry revealed that the genome size of A. viridicyanea ranged from 836.3 ± 13.8 Mb in females to 795.7 ± 8.3 Mb in males. We generated and assembled 187.3 × coverage from Illumina short reads and 72.7 × coverage via PacBio long reads from 157 female adults, thus creating a draft genome reference assembly of 864.8 Mb consisting of contig and scaffold N50s of 92.8 kb and 557.2 kb, respectively (Table 1). The GC content was 31.67%. The size of the A. viridicyanea genome was larger than 85% of the currently published beetle genomes (Table S1). The draft genome assembly of A. viridicyanea was contained within 17,580 contigs that were assembled into 4,490 scaffolds, with the longest scaffold size of 5.6 Mb. Using the reference set of 1,658 insect BUSCOs, the genome contains 98.6% complete single-copy orthologs and multi-copy orthologs; using the reference set of 2,442 Endopterygota BUSCOs, our genome contains 95.8% complete single-copy orthologs and multi-copy orthologs (Table 2). Together, the results of the above analyses indicate that the genome of A. viridicyanea is a high-quality assembly.
Table 1
Summary statistics of Altica viridicyanea draft genome assembly.
Statistics | Value |
Illumina (genome coverage) | 187.3× |
PacBio (genome coverage) | 72.7× |
Assembly size (Mbp) | 864.8 |
Number of contigs | 17,580 |
Longest contig (Kbp) | 1,351 |
Contig N50 (Kbp) | 92.8 |
Number of Scaffolds | 4,490 |
Longest scaffold (Kbp) | 5,675 |
Scafflod N50 (Kbp) | 557.3 |
Gap | 2.36% |
GC content | 31.67% |
Number of protein-coding genes | 17,730 |
Number of non-coding RNAs | 2,462 |
Table 2
BUSCO results showing completeness of the Altica viridicyanea genome assembly and annotation.
| Insect | | Endopterygota | |
| Counts | Percentage | Counts | Percentage |
Complete BUSCOs | 1,635 | 98.6% | 2340 | 95.8% |
Complete and single-copy BUSCOs | 1,540 | 92.9% | 2211 | 90.5% |
Complete and duplicated BUSCOs | 95 | 5.7% | 129 | 5.3% |
Fragmented BUSCOs | 7 | 0.4% | 52 | 2.1% |
Missing BUSCOs | 16 | 1.0% | 50 | 2.1% |
Total BUSCO groups searched | 1,658 | | 2,442 | |
Genome annotation
Prior to gene prediction using the assembled sequences, repeat sequences were identified in the genome of A. viridicyanea. The repetitive sequence content was about 62.91% of the assembly, which was similar to that of the cowpea weevil Callosobruchus maculatus (64%), lower than that of the ladybird Propylea japonica (71.33%), and much higher than that of other beetle species (Table S1). Most of the repetitive sequences were transposable elements. According to a uniform classification system for eukaryotic transposable elements [37], retrotransposons (Class I) accounted for 41.27% whereas DNA transposons (Class II) accounted for 26.24% of the genome (Table 3).
Table 3
Composition of repetitive sequences in the Altica viridicyanea genome assembly.
Repeat type | | Number of elements | Length (bp) | Rate (%) |
Retrotransposons (transposable element class I) | | 1,033,903 | 356,902,348 | 41.27 |
DIRS | 12,458 | 7,468,113 | 0.86 |
LINE | 3,17,300 | 97,559,643 | 11.28 |
LTR/uncertain | 45,604 | 28,406,497 | 3.28 |
LTR/Copia | 15,362 | 7,752,670 | 0.9 |
LTR/Gypsy | 143,535 | 83,332,210 | 9.64 |
LTR or DIRS | 91 | 14,310 | 0 |
PLE or LARD | 485,362 | 155,872,484 | 18.02 |
SINE | 78 | 80,868 | 0.01 |
TRIM | 13,553 | 6,401,760 | 0.74 |
Unknown | 560 | 58,144 | 0.01 |
DNA transposons (transposable element class II) | | 803,681 | 226,943,053 | 26.24 |
TIR | 550,895 | 150,574,911 | 17.41 |
MITE | 10 | 635 | 0 |
Crypton | 145,294 | 43,476,772 | 5.03 |
Helitron | 29,008 | 8,403,268 | 0.97 |
Maverick | 51,701 | 31,213,827 | 3.61 |
Unknown | 26,773 | 5,423,361 | 0.63 |
SSR | | 583 | 165,863 | 0.02 |
Unknown | | 77,813 | 23,953,352 | 2.77 |
Total | | 1,848,679 | 544,002,544 | 62.91 |
Notes: DIRS, dictyostelium intermediate repeat sequence; LINE, long interspersed nuclear element; LTR, long terminal repeat; PLE, penelope-like elements; SINE, short interspersed nuclear element; LARD, large retrotransposon derivative elements; TIR, terminal inverted repeat. |
The integration of de novo, RNA-seq-based and homology-based gene prediction methods identified 17,730 protein-coding genes in A. viridicyanea (Table 4, Fig. S1), a number slightly less than the average of beetle species with available genomes (~ 18,600 genes on average, Table S1). In total, 16,625 genes were assigned to putative functions, accounting for approximately 93.77% of the predicted genes (Table S2), and 750 putative pseudogenes were identified (Table S3). There were 2,462 non-coding RNA models identified, including 45 miRNAs, 1093 rRNAs, and 1324 tRNAs, corresponding to 32, four and 24 gene families, respectively.
Table 4
Statistics of gene prediction of Altica viridicyanea.
Method | Software | Gene number |
ab initio | Genscan v1.1.0 | 15,170 |
Augustus v2.4 | 31,813 |
GlimmerHMM v3.0.4 | 58,872 |
GeneID v1.4 | 14,970 |
SNAP v2006-07-28 | 72,679 |
homology-based | GeMoMa v1.3.1 Drosophila melanogaster Tribolium castaneum Dendroctonus ponderosae Anoplophora glabripennis | 9,310 |
14,234 |
13,066 |
18,274 |
transcriptome-based | TransDecoder v2.0 | 73,432 |
GeneMarkS-T v5.1 | 29,645 |
PASA v2.0.2 | 23,200 |
Integration | EVidenceModeler v1.1.1 | 17,730 |
Phylogenetic analysis
The phylogenetic analysis identified 15,240 orthologs in A. viridicyanea that clustered with the other eight beetle species. We identified 1,609 A. viridicyanea specific genes, corresponding to 529 gene families, a number far greater than the other species included in this analysis (Table S4). The phylogenetic relationships were consistent with the results inferred from large datasets [38–40]. For example, A. viridicyanea and Leptinotarsa decemlineata, both chrysomelids, were sister taxa, and these species clustered with Anoplophora glabripennis, a member of the superfamily Chrysomeloidea (Fig. 2). The estimated divergence time between A. viridicyanea and L. decemlineata was about 98.2 million years ago. From this analysis, we also identified 165 gene families that expanded and 47 gene families that contracted along the A. viridicyanea lineage (Fig. 2). Some of these gene families were related to chemosensory and detoxification functions.
Chemosensory gene families
In many herbivorous insects, feeding, mating and oviposition behaviors are mediated by chemical cues [41]. The chemosensory system may also play important roles in speciation of some insects [42–44]. This is likely the case in A. viridicyanea as previous work has shown that this monophagous beetle primarily uses chemical cues to achieve sexual isolation from its sibling species [8]. Furthermore, these contact chemicals also act as a mating signal to discriminate intraspecific variation in sexual maturity [9]. In addition, chemical cues are modified by and likely involved in host plant choice [8, 10]. Consequently, we investigated A. viridicyanea gene families known to be involved in chemosensory signaling in insects.
There are at least five gene families involve in the detection of chemicals, including three receptor families, odorant receptors (ORs), gustatory receptors (GRs) and ionotropic receptors (IRs), and two protein binding families, odorant binding proteins (OBPs) and chemosensory proteins (CSPs). These receptor families are usually expressed in insect olfactory sensory neurons and are involved in the detection of a suite of chemicals. For instance, volatile chemicals are detected by ORs [45–47], contact chemicals or carbon dioxide are detected by GRs [48], and nitrogen-containing compounds, acids, and aromatics are identified by IRs [49]. In contrast, the binding protein gene families are highly abundant in the sensillar lymph of insects and usually function as carriers of hydrophobic scent molecules to the receptors [50, 51].
In the genome of A. viridicyanea, we identified 195 putative chemosensory genes and two pseudogenes. Perhaps not surprisingly, the gene repertoire of the monophagous A. viridicyanea was considerably reduced as compared to that of polyphagous species such as T. castaneum (630 genes plus 103 pseudogenes) and A. glabripennis (451 genes plus 65 pseudogenes). In contrast, the gene number is greater than the oligophagous Agrilus planipennis (132 genes and two pseudogenes), but smaller than other oligophagous species such as Dendroctonus ponderosae (240 genes plus 10 pseudogenes) and L. decemlineata (> 300 genes) (Table S5). These findings generally support the hypothesis that chemosensory gene content and host specificity should correlate in phytophagous beetles [52].
Insect ORs are proteins with seven transmembrane domains that are involved in the detection of volatile chemicals [46, 53, 54]. The number of ORs in beetle species varies widely from 30 to hundreds of ORs [55]. When we examined the A. viridicyanea genome for the presence of ORs, we found a diversity of gene families. There were 64 ORs and one pseudogene (PseudoGene48) that were classified into eight subfamilies: Group 1, 2A, 2B, 3, 4, 5A, 5B and 7 (Fig. 3; Table S5). Following the new OR classification scheme [55], we also identified one highly conserved olfactory co-receptor, Orco, that has been found in other beetle species. Interestingly, we also found a large expansion in A. viridicyanea in Group 4 that contained 17 ORs. By comparison, no more than four Group 4 OR genes have been previously identified in any other surveyed beetle species [52, 55].
In addition to ORs, we also compared GRs across beetle taxa. Most GRs are expressed in gustatory receptor neurons in taste organs and are involved in contact chemoreception and detection of CO2 [48]. We annotated 38 GRs in A. viridicyanea, including three conserved candidate CO2 receptors, seven candidate sugar receptors, and the remaining were candidate bitter taste receptors. Simple orthology of GRs is generally rare in beetles [52], and not surprisingly, no single-copy orthologs were revealed in the species that we compared. The phylogenetic analysis showed that 2–7 GRs from each of the seven species grouped within the clade of conserved sugar receptors. Additionally, two or three genes from five of the seven species formed a clade of CO2 receptors (Fig. S2).
The number of GRs varied from 10 to 245 in the ten surveyed beetles (Table S5). Comparisons with A. viridicyanea identified as many as 147 GRs in an oligophagous chrysomelid species Leptinotarsa decemlineata, whereas fewer than 20 GRs were annotated in four other chrysomelids (Colaphellus bowringi, Ophraella communa, Pyrrhalta aenescens and Pyrrhalta maculicollis). The extremely low numbers of GRs in the latter four species is likely the result of differences in data collection—those species only had transcriptomic data available, and that approach generally does not describe the full complement of chemosensory genes. For example, a study in the longhorn beetle Anoplophora glabripennis found 11 GRs when using transcriptomic data, however, genomic data revealed 234 GRs [36, 56].
The next chemosensory receptor group that we examined was the IRs, a conserved family that evolved from a family of synaptic ligand-gated ion channels, ionotropic glutamate receptors (iGluRs) [49, 57, 58]. In insects, the IRs include two groups: the conserved “antennal IRs” that have an olfactory function, and the species-specific “divergent IRs” which are candidate gustatory receptors [59]. Our genome annotations revealed 33 ionotropic receptors (IRs). Only the members of the conserved antennal IR21a group were identified in all seven of the beetle species that we surveyed, whereas the clades IR8a, IR25a and IR76b were formed by single-copy orthologs from five species, excluding P. aenescens and P. maculicollis (transcriptomic data are available for both of these species) (Fig. 4). Furthermore, IRs from all seven species fell within the well-supported non-single-copy IR75 clade (Fig. 4). Remarkably, for A. viridicyanea, 18 of 33 IRs were clustered into a “Galerucinae + Alticinae” specific lineage that consisted of genes from O. communa, P. aenescens, P. maculicollis and A. viridicyanea (Fig. 4). These four species belong to the sister groups, Galerucinae and Alticinae, two closely related subfamilies of Chrysomelidae [60].
Finally, we examined the protein binding gene families. OBPs and CSPs are generally regarded as carriers of pheromones and odorants in insect chemoreception, and a multitude of additional functions have also been suggested such as carrying semiochemicals and visual pigments, promoting development and regeneration, and digesting insoluble nutrients [61]. OBPs are small, soluble proteins with six conserved cysteines [50]. Although the detailed mechanisms remain unclear [62], it is believed that OBPs deliver hydrophobic molecules to the receptors [50]. In A. viridicyanea, we annotated 48 putative OBP genes and one pseudogene (PseudoGene855). Among these, 34 genes belonged to the Minus C OBPs. We found four clades of classic OBPs, which include single-copy orthologs from each of the seven species in the analysis, e.g., Classic I, IV, VIII and IX. In clades VII and X, two copies from Dendroctonus ponderosae were clustered with single-copy orthologs from other six species. The clades of Classic II, III, V and VI were formed by single-copy orthologs from 5–6 species (Fig. S3). Plus-C OBPs were not found in A. viridicyanea, and are also absent in the Pyrrhalta species that belong to the “Galerucinae + Alticinae” taxonomic group.
CSPs are characterized by the presence of four cysteines that form two disulfide bridges [63]. We annotated 12 CSP genes in A. viridicyanea. The phylogenetic analysis revealed that only one clade (clade 1) was formed by single-copy orthologs from the eight species surveyed. Clades 2–7 were formed by single-copy orthologs from 5–7 beetle species. In these lineages, the absence of IRs from transcriptomic sources (e.g., P. aenescens, P. maculicollis and O. communa) was more common whereas the orthologs of A. viridicyanea also lacked members of clade 5 (Fig. S4).
Similar to previous work on GRs, transcriptomes often fail to describe the full set of chemosensory genes due to low expression, spatiotemporal variation in expression, or shallow sequencing depth. For instance, 106 chemosensory genes were detected in Anoplophora glabripennis using transcriptomic sequencing [56] whereas more than 500 chemosensory genes (65 pseudogenes included) were annotated from its genome [52].
Detoxification supergene families
Novel plant secondary compounds often present a challenge for herbivorous insects, and physiological adaptation to novel plant secondary metabolites is a key problem. The detoxification and metabolism of most xenobiotics occurs via a common set of detoxification-related enzymes, all of which belong to multigene families [64]. The cytochrome P450s (P450s), carboxyl/cholinesterases (CCEs), and glutathione S-transferases (GSTs) are widely regarded as the major insect gene/enzyme families involved in xenobiotic detoxification [65–67]. In addition, the UDP-glucuronosyltransferases (UGTs) and ATP binding cassette transporters (ABCs) can also play a role in detoxification [68–70]. This diversity of detoxification enzymes is critical for many herbivorous insects [18, 71] as their diets often contain a suite of plant chemicals that can be toxic, reduce palatability, or slow development time.
The host plant of A. viridicyanea is Geranium nepalens which has a number of chemical defenses such as tannins, flavonoids and organic acids [72]. As a strict specialist, then, A. viridicyanea likely has adaptations that allow them to detoxify these chemicals. Indeed, we annotated 225 detoxification enzymes spanning all three families (101 P450s, 97 CCEs and 27 GSTs). Expansion and contraction of these gene families are considered important in adaptive phenotypic diversification [73]. Furthermore, meta-analyses have established that the size of the P450, CCE and GST gene families are correlated with insect diet breadth [66, 67, 74]. Consistent with these ideas, in A. viridicyanea, the number of detoxifying genes is larger than that of the closely-related oligophagous Leptinotarsa decemlineata (197 genes) [21, 67], but smaller than that of the seed feeding T. castaneum (275 genes) [26, 75].
Insect cytochrome P450 proteins are important in both xenobiotic detoxification and synthesis and degradation of endogenous molecules such as ecdysteroids and juvenile hormone [76–78]. In insects, the cytochrome P450 family is divided into four major clades: the mitochondrial P450 clade, the CYP2 clade, the CYP3 clade, and the CYP4 clade [79]. We found 101 P450s in Altica viridicyanea spanning all four clades: five in the mitochondrial clade, seven in the CYP2 clade, 53 in CYP3 clade, and 36 in CYP4 clade (Fig. 5, Table S6). We found that a majority of these genes belonged to the CYP6 and CYP9 subfamilies of the CYP3 clade and the CYP4 subfamily of the CYP4 clade (Table S7). These P450 subfamilies are known to be involved in detoxification of plant allelochemicals as well as resistance to pesticides [21, 80–82].
In addition to the cytochrome P450s, the A. viridicyanea genome also contained 97 genes encoding putative CCEs (Fig. S5), which is slightly fewer than that of L. decemlineata (102), but more than that of the other eight beetle species that were included in the analysis (ranged from 44 to 82) [67]. The dietary/detoxification group included two clades: coleopteran xenobiotic metabolizing CCE (clade A) and ɑ-esterase type CCEs (clade B) [83]. In A. viridicyanea, there is a noteworthy expansion (62 genes) in clade A, whereas we did not identify any genes from Clade D (integument esterase), F (juvenile hormone esterase), or I (unknown function) (Fig. S5; Table S8).
Another group of detoxification enzymes that we examined are the GSTs. GSTs are involved in many cellular physiological activities, such as detoxification of endogenous and xenobiotic compounds, intracellular transport, biosynthesis of hormones and protection against oxidative stress [75, 84]. Insect GSTs are divided into two major groups, the cytosolic and the microsomal GST genes. The cytosolic group is further divided into six classes: Delta, Epsilon, Sigma, Omega, Theta, and Zeta [85]. The Delta and Epsilon classes are thought to be insect-specific [75, 84, 86], and members of the Epsilon subfamily are commonly involved in detoxification of xenobiotics [87]. We detected a total of 27 GST genes in A. viridicyanea (Fig. S6; Table S9). Both the total number and the number of detoxification-related Epsilon subfamily in A. viridicyanea were lower than that of most beetles [67]( Table S9).
UDP-glycosyltransferases (UGT) catalyze the conjugation of a range of diverse small lipophilic compounds with sugars to produce glycosides, playing an important role in the detoxification of xenobiotics and in the regulation of endobiotics in insects [68]. From 17 (Oryctes borbonicus) to 65 (Anoplophora glabripennis) UGTs were identified in the nine beetle species surveyed [3, 67]. Currently, the largest repertoire of UGTs in beetles was found in the polyphagous longhorn beetle Anoplophora glabripennis, with 65 putative UGT genes and 7 pseudogenes [36]. The expansion of UGTs in A. glabripennis is thought to be related to its ability to feed on a broad range of host plants [36]. In line with this, we annotated 32 UGTs in the A. viridicyanea genome. A number of UGT50s were identified in this species, which has been suggested as the most conserved UGT in insects [68], and we also observed a remarkable expansion in the UGT324 family (Fig. S7, Table 10).
Most ABC proteins engage in active transport of molecules across cell membranes. The ABC transporters are well-known components of various detoxification mechanisms across all phyla [88, 89]. In the present study, we identified 69 putative ABCs in A. viridicyanea, belonging to eight subfamilies (A to H). This is a similar number to two specialist species of chrysomelids, Chrysomela populi and Diabrotica virgifera virgifera, (65 in each based on transcriptomic data) (Table S11). The gene numbers of the conserved subfamilies D, E and F were consistent with other beetles analyzed (Table S11); however, the number of genes in subfamilies B and C in A. viridicyanea (46) are the highest among the five species with which we compared (Table S11; Fig. 6). These subfamilies are known to be involved in detoxification processes [64, 90].
Plant cell wall-degrading enzymes
Early views of insect digestion postulated that insects lack the endogenous enzymes required for plant cell wall (PCW) digestion, and that PCW digestion by insects depended on exogenous enzymes from symbiotic microorganisms [91]. Recent studies, however, have revealed that endogenous PCW degrading enzymes are present in many insects and are important in the digestion of cellulose, hemicelluloses, and pectin in PCW [40, 92]. In fact, these enzymes are likely a key innovation in the adaptive radiation of herbivorous beetles. Some insect PCW-degrading enzymes are also involved in immune-defense responses and detoxification [40, 92].
Beetle-encoded plant cell wall-degrading enzymes are carbohydrate esterases (CE), polysaccharide lyases (PL), and mainly glycoside hydrolases (GH) [40]. In A. viridicyanea, we identified 65 putative glycoside hydrolases, including 35 GH1 genes, 10 GH45 genes, two GH48 genes and 18 GH28 genes (Table S12). Genes of GH1 originated anciently in animals and are ubiquitous in beetles. The species of Phytophaga (i.e., Chrysomeloidea and Curculionoidea) examined thus far have a greater number of GH1 genes than A. viridicyanea, for example, 228 were found in Diabrotica undecimpunctata, 135 in Mastostethus salvini, and 136 in Rhynchitomacerinus kuscheli [21, 36, 40]. For another ancient and ubiquitous gene family, GH9, there are at least a dozen independent losses in beetles [40]. GH9 was not detected in A. viridicyanea, along with 4 of 7 other chrysomelid species (Callosobruchus maculatus, Donacia marginata, Diabrotica undecimpunctata and Leptinotarsa decemlineata) [21, 40].
The other plant cell wall-degrading gene families (CE8, PL4, GH32, GH5, GH10, GH43, GH44, GH45, GH48 and GH28) are suggested to be obtained from bacteria and fungi via horizontal gene transfer, and are mainly found in Buprestoidea and Phytophaga, with scattered genes in a few other taxa [40]. In A. viridicyanea, in addition to the ubiquitous GH1 genes, three families of PCW-degrading enzymes were identified, including cellulose degrading GH45 and GH48, and pectin degrading GH28. These observed gene numbers are similar to that of closely related species in the Chrysomelinae (Oreina cacaliae and Leptinotarsa decemlineata) and Galerucinae (Diabrotica undecimpunctata) [21, 40].