Improved sets of chemosensory genes in D. ponderosae and A. planipennis
A previous study of the antennal transcriptome of D. ponderosae (“Dpon”) identified a total of 111 chemosensory genes, including 49 ORs, 2 GRs, 15 IRs, 3 SNMPs, 31 OBPs, and 11 CSPs [66]. Our present study of the genome yielded a total of 254 chemosensory genes (Additional file 1), of which 153 gene models were not identified in the previous transcriptome. New genes were identified in all gene families, with the largest increase observed for the GRs, followed by the ORs and IRs (details in sections below). Several of the original partial transcript sequences were extended (often to full length), and errors due to previously unnoticed frameshifts or introns on several original transcript models were corrected (Additional file 2). Ten of the original gene models were discarded: one OR gene (previous DponOr45) was the result of a transcript chimera; one IR gene (previous DponIr56e.1) showed no homology to insect IRs; two IR gene fragments (previous DponIr21a.2 and 56e.2) were dropped because they were revealed to belong to the same genes as two other previously reported partial IR genes; one IR gene (previous DponIr93a.2), four OBP genes (previous DponObp17, 20, 24, 32), and one CSP gene (previous DponCsp5) were assembly isoforms or alleles of other genes (DponIr93a and DponCsp3). Only one gene model (DponOr21) that was complete in the transcriptome study was incomplete in the genome assemblies; hence the original model was retained.
In A. planipennis (“Apla”), 24 chemosensory genes (2 ORs, 2 GRs, 6 IRs, 1 SNMP, 9 OBPs, and 4 CSPs) were previously identified from an antennal transcriptome [67]. Here, we annotated a total of 137 chemosensory genes from its genome, of which 118 annotations are novel compared to previous transcriptome work. Several of the original models were revised (Additional file 2) or discarded: two OBP genes (previous AplaObp4 and AplaObp8) were discarded because they were assembly isoforms; two IR genes were identified as separate fragments of AplaIr25a (isotig01857-ApIR and G3QO8C008JMTAX_ ApIr); and both existing short AplaGR gene models were found to lack homology to insect GRs. Several of the original gene models were renamed, especially in D. ponderosae (most ORs, some IRs, and two SNMPs; see also [53]) to follow established nomenclature for genomic annotations of these gene families (see Methods section for details and Additional file 2 for correspondence with original names).
Odorant receptors
A total of 86 OR genes, including Orco and 7 putative pseudogenes, were annotated in the genome of D. ponderosae, of which 63 OR genes were completed to full-length. Except for DponOR53INT (195 amino acids), all putatively functional DponORs are above 350 amino acids in length, with the majority only missing a short N-terminal exon (named A1) that could not be confidently identified due to absence of transcriptomic support. As has been observed with the OR genes in other insect genomes, a large proportion of the genes in both species occur in tandem arrays on scaffolds (Additional file 1). Although alternative splicing is uncommon in ORs, two of the DponOR genes were regarded to each encode two alternative splice variants (named DponOr2a/b and DponOr36a/b) with mutually exclusive N-terminal A1-A2 exons assembled consecutively with seemingly shared C-terminal B-E exons. The A. planipennis genome contains 47 OR genes, including Orco and one pseudogene. In this species, 31 of the ORs were completed to full-length models, with partial OR genes encoding protein sequences between 174 and 393 amino acids. In both species, the number of introns in full-length OR genes varies between four and seven, whereas Orco is interrupted by ten introns in both species (Additional file 1; see also [53]). Both D. ponderosae and A. planipennis have fewer putatively functional OR genes and pseudogenes compared to other species considered polyphagous (Table 1).
The DponORs and AplaORs were recently included in phylogenetic analyses that covered ORs from ten coleopteran genomes, and which allowed for classification and revision of nine higher-order monophyletic OR subfamilies (designated as Groups 1, 2A, 2B, 3, 4, 5A, 5B, 6, and 7) across the Coleoptera [53]. Several of these groups had been recognized also in earlier studies [54,66]. In this study, the phylogenetic analysis was restricted to two additional species (T. castaneum “Tcas” and A. glabripennis “Agla”), and we can here afford to present and discuss the results for these particular species in more detail. Our phylogeny (Figure 1) shows that the distribution of ORs among the nine major coleopteran OR subfamilies is species-dependent. The majority of DponORs belong to Group 7, followed by Groups 5A, 1, 2A, and 2B. In contrast, most AplaORs are found within Group 2B, followed by Groups 3, 6, 5B, and 2A. Furthermore, D. ponderosae appears to have lost ORs in Groups 3, 4, 5B, and 6, whereas A. planipennis lacks ORs in Groups 1, 4, 5A, and 7. These different OR distribution patterns are also distinct from those in T. castaneum and A. glabripennis, which in turn also are different from each other. In D. ponderosae, the largest species-specific radiation contained 30 ORs (DponOR27-55 including the putatively alternatively spliced DponOR36a/b in Group 7), whereas the largest expansion in A. planipennis contained 18 ORs (AplaOR16-33 in Group 2B). Well-supported orthologous relationships were only found for AglaOR55/DponOR57-59, AglaOR38/DponOR10-11, and TcasOR73FIX/DponOR56.
Gustatory receptors
In D. ponderosae, we annotated 60 GR transcripts (including 57 full-length models and one pseudogene) that are encoded by 49 genes of which seven were regarded to exhibit alternative splicing, each producing either two or four splice variants (Additional file 1). Most splice variants are encoded by genes with two exons, and they share the C-terminal exon, but have a unique N-terminal exon. One of the alternatively spliced genes, DponGr38a-d, has three exons, of which the N-terminal exon is unique and the two C-terminal exons are shared. In A. planipennis, 30 GR genes were revealed (22 full-length models), with no evidence of alternative splicing. The putative receptors for carbon dioxide and sugars contain several introns, whereas the majority of the remaining putative bitter-taste GR genes contain only one or two introns in both species. However, several of these GRs, especially in D. ponderosae, contained one to four additional introns (Additional file 1). As with ORs, D. ponderosae and A. planipennis presented fewer putatively functional GR genes and pseudogenes compared to other species considered polyphagous (Table 1).
The Dpon and AplaGRs were phylogenetically analyzed together with the GRs from A. glabripennis and T. castaneum, showing that the three conserved GRs for carbon dioxide (GR1-3) are present in D. ponderosae, whereas no evidence of GR1 was recovered from the genome assembly of A. planipennis, nor from the available raw sequence reads (accession: SRR1174015–SRR1174018; Figure 2). In addition, both species have six GRs (GR4-9) that grouped within the clade of conserved sugar receptors. Whereas DponGR4, DponGR6, and DponGR9 appear orthologuous to AglaGR4, AglaGR8, and AglaGR6, respectively, no simple orthologuous relationships are evident for the other sugar receptors in D. ponderosae or for any of these GRs in A. planipennis. These two species also have one GR each (GR10) that was placed within the clade of conserved fructose receptors, which is dominated by a lineage expansion in T. castaneum. Most of the remaining GRs (putative bitter-taste GRs) of all four species in the analysis grouped in small to large species-specific expansions, which in many cases comprise large suites of alternatively spliced proteins, especially from T. castaneum and A. glabripennis. Among the putative bitter-taste GRs, only a single clade was represented by one orthologue from each of the four species. This clade was highly supported (Shimodaira-Hasegawa [SH] support value 1.0) and named the “GR215 clade” based on the GR representative from T. castaneum (Figure 2). The GR215 clade is part of a larger and well-supported subfamily that includes one additional DponGR (DponGR46), and large expansions of alternatively spliced proteins from T. castaneum and A. glabripennis. Finally, it is noteworthy that one of the largest well-supported GR lineages (indicated by the long black arc in Figure 2), comprising almost half of the GRs in our analysis, was devoid of GR representatives from A. planipennis.
Ionotropic receptors
In D. ponderosae, we identified a total of 57 IR genes (51 full-length models), including two pseudogenes. The number of IRs in A. planipennis was 31 (22 full-length models), including one pseudogene. Members of the conserved antennal IR8a, IR21a, IR25a, IR40a, IR41a, IR68a, IR76b, and IR93a were identified in both species (Figure 3). Two paralogues of IR41a and IR76b were annotated in D. ponderosae and A. planipennis, respectively. Furthermore, D. ponderosae has 11 IRs that fell within the IR75 clade, which to date is the largest number reported from a beetle genome. In contrast, A. planipennis has only four members in this clade. Both species also had members of the IR100a clade, with three receptors found in D. ponderosae, and one in A. planipennis. Each beetle species (including L. decemlineata “Ldec”) also has one IR that grouped with IR60a from D. melanogaster (“Dmel”) with high support (0.93), suggesting this IR is conserved in beetles. Hence, the IRs from this group identified in the present study were named DponIR60a and AplaIR60a, whereas the orthologues from T. castaneum, L. decemlineata, and A. glabripennis retained their original names (TcasIR108, LdecIR106, and AglaIR150). The remaining divergent IRs from D. ponderosae and A. planipennis generally grouped in species-specific lineage expansions of various sizes, with only a few IRs being individually placed. Whereas the antennal IR genes are known to contain several and often very large introns, the number of introns in the divergent IRs was low (range: 0-3; Additional file 1). Again, we observed fewer putatively functional IR genes and pseudogenes in the stenophagous species (Table 1).
Sensory neuron membrane proteins
We annotated four SNMP genes (all full-length models; Table 1) in each of D. ponderosae, A. planipennis and A. glabripennis. Both D. ponderosae and A. glabripennis have two members each of SNMP1 and SNMP2, whereas A. planipennis only has one member in each of these broadly conserved classes. The two remaining AplaSNMP genes encode proteins related to TcasSNMP3, and were thus named AplaSnmp3a and 3b. The beetle SNMP3 clade was positioned sister to the SNMP1/SNMP2 subfamilies (Figure 4).
Odorant binding proteins
Our genome annotations revealed 36 OBPs in D. ponderosae and 12 OBPs in A. planipennis (all full-length models), which is fewer than in other species considered polyphagous (Table 1). Two of the DponOBP genes (DponObp37 and DponObp38) were exclusive to the male assembly, suggesting that they are located on the neoY chromosome. OBPs are classified into different groups based on the number of conserved cysteine (C) residues and their phylogenetic relationships [37,68]. The “classic” OBPs share a characteristic pattern of six C residues. Members of the Minus-C class have lost two of these cysteines (generally C2 and C5), whereas the Plus-C OBPs typically have 12 conserved cysteines and a characteristic proline. Finally, one subfamily of the classic OBPs is further classified as “antennal binding protein II” (ABPII), members of which are generally upregulated in the antennae [57]. The inspection of the patterns of C residues and our phylogenetic analysis showed that the genomes of D. ponderosae and A. planipennis contain one Plus-C member each, similar to other beetles (Figure 5; Additional file 1). Furthermore, 15 DponOBPs and four AplaOBPs presented the 4C pattern that is characteristic of the Minus-C group. However, two of these proteins (DponOBP13 and DponOBP22) did not group within the Minus-C clade in our phylogeny, but were placed with the classic OBPs with intermediate support (SH = 0.88). Seven DponOBPs and three AplaOBPs were placed within the ABPII clade (Figure 5). In contrast to the polyphagous A. glabripennis and T. castaneum, no major species-specific lineage expansions were observed in the two stenophagous wood-borers apart from a small expansion of five Minus-C DponOBPs (DponOBP3, 9, 11, 28, 38).
Two of the DponOBP genes (DponObp6 and DponObp7) showed evidence of alternative splicing, supported by transcriptomic data [66,69]. In both cases, the alternative splicing involves two mutually exclusive N-terminal exons encoding the short signal peptide, which appears to be alternatively combined with a shared C-terminal exon (DponObp7; Minus C-group) or six shared exons (DponObp6; ABPII group) (Additional file 1). Finally, we identified an unusually large OBP in D. ponderosae (DponObp4), encoding 500 amino acids. Apart from a short N-terminal exon housing the signal peptide, this gene contains four similarly sized exons, each presenting the conserved Minus-C motif. This extraordinary Minus-C “tetramer” model was supported by previous transcriptomic data from this species [66,69] and two other curculionids: the Yunnan pine shoot beetle (Tomicus yunnanensis; accession: GFJU01117056.1) and the red palm weevil (Rhynchophorus ferrugineus; GDKA01001723.1), retrieved from the Transcriptome Shotgun Assembly (TSA) collection at NCBI. The first three Minus-C exons of DponObp4 are separated by short, approx. 60 bp introns, whereas the final exon is separated by a 1.26 kb intron. To investigate how such a large OBP may have originated, we individually aligned the four Minus-C exons of DponOBP4 together with a subset of Minus-C OBPs, i.e., those encompassed under the most recent node shared by the individual DponOBP4 exons. The resulting phylogeny grouped DponOBP4 exons 2 and 3 together with moderate support, while exon 4 was positioned in a sister clade. Exon 5 was widely separated from the other exons but without support (Additional file 3). Inconsistent with this phylogeny, however, exon 2 and 4 shared the highest amino acid identity (41.6%), and a relatively high identity (30.7 %) was shared between exon 3 and 5, suggesting that this protein may have originated from a dimer that underwent a duplication of its two major exons.
Chemosensory proteins
Total numbers of CSP genes were 11 (all full-length models) in D. ponderosae, 14 (13 full-length models) in A. planipennis, and 17 (16 full-length models) in A. glabripennis (Table 1). The majority of the beetle CSP genes are characterized by the presence of a single central intron in splice phase 1, however a few of the DponCSPs have one additional intron (phase 0) close to the N-terminus, with the first exon only coding for the first two amino acids of the protein. Most of the CSP genes within each species were assembled on the same genomic scaffold (Additional file 1). The phylogenetic analysis revealed the presence of several highly-supported clades with one or two CSPs from each of the four species, suggesting the existence of several simple orthologous relationships in this gene family (Figure 6). This includes a conserved clade of four CSPs (DponCSP12, AplaCSP8, AglaCSP3, and TcasCSP7E) with greatly elongated C-terminals and proteins ranging from 251 to 307 amino acids. Species-specific radiations of CSP lineages were rare, but a few smaller ones (comprising 3-6 CSPs) were evident in T. castaneum and A. planipennis. CSPs from the latter species were missing from a few well-supported clades that contained members from two or three of the other species.