Identification and preliminary characterization of chemosensory-related proteins in the gall fly, Procecidochares utilis by transcriptomic analysis

Background: Chemoreception is critical for insect behaviors such as foraging, host searching and oviposition, and finding mating partners. The process of chemoreception is mediated by a series of proteins, including odorant-binding proteins (OBPs), gustatory receptors (GRs), odorant receptors (ORs), ionotropic receptors (IRs), chemosensory proteins (CSPs) and sensory neuron membrane proteins (SNMPs). The tephritid stem gall ﬂy, Procecidochares utilis Stone, is a type of egg parasitic insect, which is an effective biological control reagent for the invasive weed Ageratina adenophora in many countries. However, the study of molecular components related to the olfactory system of P. utilis has not been investigated. Here, we report the developmental transcriptome (egg, first-third instar larvae, pupae, fame and male adult) of P. utilis using next-generation sequencing technology and identified the major chemosensory related genes. Results: In this study, a total of 133 chemosensory genes in P. utilis transcriptomes were identified by bioinformatics analysis, including 40 OBPs, 29 GRs, 24 ORs, 28 IRs, 6 CSPs, and 6 SNMPs in P. utilis . The sequences of these candidate chemosensory genes were confirmed by BLAST, and phylogenetic analysis was performed. The expression profiles of all candidate genes at different developmental stages were analyzed by differentially expressed genes (DEGs) analysis and then the expression profiles of the OBPs in the seven developmental stages were confirmed by real-time quantitative RT-PCR (qPCR). The results showed that the expression of candidate OBP genes across different developmental stages was consistent with the differentially expressed genes (DEGs) analysis using the fragments per kilobase per million fragments (FPKM) value. Conclusions: A large number of chemosensory genes were identified. This study will provide a significant contribution to the molecular mechanism of chemoreception and help advance the use of P. utilis as biological control agents.

identified the expression patterns of the chemosensory via fragments per kilobase per million reads (FPKM) and used quantitative real-time RT-PCR (qPCR) to confirm the expression patterns of OBP genes in each developmental stage. The identification of putative chemosensory-related genes will provide a basis for further study the molecular mechanism of olfactory recognition and also helps advance the use of P. utilis as biological control agents.

Functional annotation of the unigenes in P.utilis
Gene ontology (GO) analysis was used to categorize annotated genes into 57 functional groups in three main categories: 'biological process' (19,488), 'cellular component' (10,936), and 'molecular function' (7,259). Within the biological process class, the subcategories 'metabolic process', 'cellular process' and 'single-organism process' contained the majority of the unigenes. In 'cellular component' class, the subcategories 'cell' and 'cell part' contained the largest numbers of unigenes. 'Catalytic activity' and 'binding' were the most numerous subcategories in the "molecular function" category (Additional file 2: Figure S1B). For KOG functional classification, approximately 12,187 unigenes were annotated and divided into 25 molecular families (Additional file 2: Figure S1C). Among them, the largest category was the 'general function prediction only', followed by 'signal transduction mechanisms' and 'posttranslational modification, protein turnover, chaperones'. 'Nuclear structure' and 'cell motility' were the smallest groups with only 45 and 30 unigenes, respectively (Additional file 2: Figure S1C). KEGG analysis was used to classified annotated genes into different KEGG Pathway functional categories (Additional file 2: Figure S1D). The most representative pathways included 'global and overview maps', 'translation' and 'transport and catabolism' (Additional file 2: Figure   S1D).

Identification of candidate gustatory receptors
A total of 29 candidate GR transcripts were identified from P. utilis transcriptome. Ten GR sequences contained a full-length ORF with 4-8 transmembrane domains (TMDs) (Additional file 3: Table S2). The remaining sequences were incomplete, and seven of them possessed a deduced protein longer than 100 amino acids with 1-2 transmembrane domains (Additional file 3: Table S2). A total of 29 candidate GR transcripts were identified from the P. utilis transcriptome. Ten GR sequences contained a full-length ORF with 4-8 transmembrane domains (TMDs) (Additional file 5: Table S6). The remaining sequences were incomplete, and seven of them possessed a deduced protein longer than 100 amino acids with 1-2 transmembrane domains (Additional file 3: Table S2). The Maximum Likelihood phylogenetic tree was constructed to classify the functions of GRs in P. utilis transcriptome using the GR genes identified in other dipteran species (Fig. 2). Nine (Unigene0037548, Unigene0037352, Unigene0001281, Unigene0037351, Unigene0037353, Unigene0037354, Unigene0026343, Unigene0040419 and Unigene0033944) GRs of P. utilis were clustered with the members of the candidate sugar detection GR subfamily including DmelGR64a-f, DmelGR5a and DmelGR61a [32,64].

Identification of Candidate odorant receptors
A total of 24 candidate GRs were identified from P. utilis transcriptomes. The sequence identities of these candidate GRs with other Dipteran insects ranged from 27.19% to 96.61% in the NCBI database (Additional file 3: Table S2). Notably, one of the ORs identified in the P. utilis transcriptomes shared the highest identity (96.61%), similar to a co-receptor in R. zephyria (XP_017470946.1). Among these OR unigenes, six encode full-length proteins with 374 to 472 amino acid residues with 5-7 TMDs. All of the remaining unigenes contained 1-4 TMDs, except that two unigenes had no domains (Additional file 3: Table S2).
A maximum likelihood tree was subsequently constructed using our identified putative OR proteins and the sequences from four other Dipteran species; D. melanogaster, B. minax, C. stygia and B. dorsalis (Fig. 3). The phylogenetic tree showed that the highly conserved P. utilis co-receptor (Orco), were also placed on a single branch. The remaining unigenes were spread across various branches, where they generally formed small subgroups together with at least one Dipteran orthologous OR in the phylogenetic tree (Fig. 3).

Identification of candidate ionotropic receptors
A total of 28 candidate IRs were identified from P. utilis transcriptomes by bioinformatics analysis according to the comparison with known insect IRs. The sequences identities of these candidate IRs with other Diptera insects ranged from 53.41 to 96.6% in the NCBI database (Additional file 3: Table   S2). Among these IRs, eight encode full-length proteins with 406 to 778 amino acid residues, the remaining unigenes were incomplete, because of the lack of complete 5′ or 3′ terminus. Of these IRs, the transmembrane domains ranged from 0 to 5 (Additional file 3: Table S2). The maximum likelihood tree was subsequently conducted by IQTREE (version 1.6.10) with best-fitting substitution model. To guarantee the reliability of the phylogenetic tree, all of the IRs in our transcriptomes were aligned with IRs from other Dipterans; D. melanogaster, B. minax, C. stygia and C. capitata (Fig. 4). In the phylogenetic analyses, all P. utilis candidate IRs were clustered with other known Dipterans IRs into separate sub-clades and most of them were clustered with presumed "antennal" orthologues ( Fig. 4).
For example, four (nigene0031176, Unigene0044165, Unigene0031177, Unigene0031178) candidate IRs were located in the clade of the DmelIR76b group, three (Unigene0033408, Unigene0033409, Unigene0033410) candidate IRs were located in the clade of the DmelIR64a group. Unigene0022450 and Unigene0022451 were placed on DmelIR8a group; Unigene0049020 and Unigene0027207 were placed on DmelIR25a group. Unigene0045883 and Unigene0003817 were located in the clade of the DmelIR21a and DmelIR75a group, respectively. In addition, three unigenes were located in the clade of the non-NMDA iGluRs. The remaining unigenes were spread across various branches with Dipteran orthologous IRs.

Identification of candidate Chemosensory proteins
In all, six putative unigenes encoding CSPs were identified in P. utilis (Additional file 3: Table S2). All of them contain four highly conserved cysteine residues and fit the motif "C1-X6-8-C2-X16-21-C3-X2-C4", which are characteristic of typical insect CSP (Additional file 5: Figure S3). Of these CSPs, four unigenes encode full-length proteins with predicted signal peptide sequences (Additional file 3: Table   S2). dorsalis and M. domestica. In addition, Unigene0012976 and Unigene0044055 were placed on a single branch, and Unigene0027955 is also classified in a single branch (Fig. 5).

Identification of candidate sensory neuron membrane proteins
Six unigenes encoding SNMPs were identified in P. utilis transcriptome by bioinformatics analysis and all of them were more conserved across other SNMPs variants, with 80.64 to 91.52% amino acid identity. Among them, two unigenes encode full-length proteins with 2 transmembrane domains (TMDs) (Additional file 3: Table S2). A phylogenetic tree was constructed using the sequences from five Dipteran species (Fig. 6). The result demonstrated that Unigene0035241, Unigene0035242 and Unigene0007016 were similar to BminSNMP1b, BminSNMP1a, and BdorSNMP1-1, the remaining 3 unigenes (Unigene0035275, Unigene0012260 and Unigene0035274) were clustered with other insect SNMP2 orthologous (Fig. 6).

Differentially expressed gene (DEG) analysis
Gene expression levels of all candidate chemosensory genes in eggs, 1st-3rd instar larvae, pupae, male and female adults were estimated based on their FRKM values. The overall expression levels of putative GRs, ORs, IRs, SNMPs and CSPs were relatively low compared with that of OBPs (Additional file 6: Table S3). Of the 40 OBPs identified, 17 were mainly expressed in male adults; five were more highly expressed in female adults; 12 were more highly expressed in pupae and three were highly expressed in 3rd instar larvae. One OBP (Unigene0051738) was mainly expressed in 2nd instar larvae compared to eggs, 1st and 3rd instar larvae, pupae, male and female adults. The remaining two OBPs, the former one was more highly expressed in 1st instar larvae, the latter one was more highly expressed in egg (Fig. 7a). The RPKM values for putative P. utilis GRs were relatively low and ranged from 0 to 4.49. Five, five, five, four, four, three and two GRs were relatively high expressed in female adults, 3rd instar larvae, eggs, 1st -2nd instar larvae, male adults and pupae, respectively (Fig. 7b).
Most candidate ORs were highly expressed in male adults. Five candidate ORs were relatively high expressed in pupae, three ORs were relatively high expressed in 2nd instar larvae, two ORs were relatively high expressed in eggs and one IRs was relatively high expressed in female adults. The remaining candidate ORs exhibited diverse expression patterns (Fig. 7c). For IRs, most candidate IRs were mainly expressed in male adult and three were highly expressed in pupae, whereas three candidate IRs had the highest expression level in eggs. The remaining candidate IRs exhibited diverse expression profiles (Fig. 7d). P. utilis CSPs were mainly expressed in pupae, only one (Unigene0012976) candidate CSP was highly expressed in 2nd instar larvae and Unigene0017876 was highly expressed in female and male adults (Fig. 7e). For SNMPs, three were highly expressed in pupae and the remaining candidate IRs showed diverse expression patterns (Fig. 7f)

Specific expression patterns of candidate OBP genes in different developmental stages
To confirm the expression levels of candidate OBP genes, fluorescence quantitative real-time PCR was performed using the seven different developmental stages including eggs, 1st-3rd instar larvae, pupae, male and female adults. Transcript levels of 35 were successfully detected in the seven developmental stages. Ten OBPs were expressed at significantly higher levels in both male and female adults, and the differences between males and females were not significant (Fig. 8). In addition, six OBPs also were more highly expressed in both male and female adults compared to other stages, but were expressed at extremely high significant levels in male adults, as were  Table S3).

Discussion
The stem gall fly, P. utilis is an oligophagous insect, has obvious selectivity for host plant and strong adaptability to the toxic weed A. adenophora. Unlike other insects, the female adult of P. utilis oviposits only on or near the apical bud of A. adenophora and its larvae only fed on A. adenophora [52,54]. In the majority of insect species, chemical cues drive several aspects of their behavior, such as host plant locations, mate finding, and oviposition site selection. Chemosensory proteins were involved in this process [23]. The differences in oviposition and feeding behaviors of the gall fly may be due to the differences in olfactory sensation specificity. Prior to this study, most research on P.
utilis was focused on biological and ecological parameters. To better understand P. utilis chemosensory proteins, we first identified candidate chemosensory genes in transcriptomes of P.
utilis eggs, 1st-3rd instar larvae, pupae, male and female adults.  [18]. The differences in the number of OBPs may be due to the evolution of different insects and adaptation to a variety of environments [23]. The phylogenetic analysis showed that Unigene0044361 clustered together with DmelOBPs-lush, which was first found to bind with D. melanogaster and other insect pheromones, such as cis-vaccenyl acetate (cVA), short-chain alcohols and phthalates [8,69] [71]. In Chilo suppressalis, six OBPs (CsupPBP4 and CsupOBP1, CsupOBP3, CsupOBP8, CsupOBP11, and CsupOBP24) were specifically expressed in the adult stage [72]. In Chrysomya megacephala, two OBP genes were more highly expressed in adults than in larvae [73].
These studies suggested that these OBP genes may be involved in the search for mating partners and oviposition sites [72,74]. In addition, in our study, there were significant differences in expression levels between male and female adults; six OBPs (Unigene0000012, Unigene0004435, Unigene0004872, Unigene0012354, Unigene0020678, Unigene0053021) were significantly expressed in male adults, suggesting that these OBPs may be related to the perception of sex pheromones, like in moths, Galeruca daurica and Bradysia odoriphaga [59,4]. In contrast, four OBPs (Unigene0022728, Unigene0031193, Unigene0037789 and Unigene0052365) were significantly expressed in female adults, and similar results were found in G. daurica and B. dorsalis, indicating that these OBPs play important roles in oviposition site selection and mating behavior [4,59,74]. There were five (Unigene0001375, Unigene0008856, Unigene0017986 and Unigene0034863) special OBPs that were highly expressed in pupae, which were also consistent with the results obtained in other species. For example, OBP44a, AsteObp1, GdauOBP15, 17, and 25 were found to be abundantly expressed in the pupal stage of B. dorsalis, Anopheles stephensi and G. daurica, respectively [4,59,74]. These results suggest that these OBPs may be related to the beginning of the development of chemosensory tissue during pupation [74,75]. Furthermore, one OBP (Unigene0051738) in both 1st and 2nd instar larvae was significantly expressed compared to other developmental stages., and similar results were found in D. melanogaster (Obp56a) and Sclerodermus sp. (OBP3), which suggests that these OBPs may have played a basic and conserved role in feeding or identifying general volatiles [71,76]. Interestingly, one OBP (Unigene0007733) was abundantly expressed exclusively in eggs, which agrees with another study of G. daurica that reported that GdauOBP28 levels peaked in eggs, which suggests its putative roles in egg development [59]. One (Unigene0046437) in both pupae and male adults, and one (Unigene0007971) in eggs, male and female adults were significantly expressed compared to other developmental stages. Research has shown that four OBPs (OBP19c, OBP44a, OBP99a, and OBP99d) from B. dorsalis were expressed from the prepupa to the adult stage, suggesting that they may be related to the distinction of mating partners and egg-laying substrates [74,76]. In general, our results show that the expression profile of OBPs are extremely complex at different developmental stages, which may be due to their different roles in P. utilis behaviors.
We identified 29 P. utilis GRs, more than those found in the antennae of B. minax, C. oryzae, B.
melanogaster, and M. domestica [18]. In the phylogenetic analysis, eight P. utilis GRs (Unigene0037548, Unigene0037352, Unigene0001281, Unigene0037351, Unigene0037353, Unigene0037354, Unigene0026343 and Unigene0033944) were clustered with the D. melanogaster sugar receptors DmelGR64a-f, DmelGR5a and DmelGR61a, suggesting that they likely play roles in tasting sugar [32,64]. Two P. utilis GRs formed a clade with CO 2 receptors, suggesting that they may be involved in CO 2 perception [37]. In addition, some GRs from P. utilis were clustered with fructosedetecting Grs and bitters-detecting GRs from Drosophila, indicating that they may perform similar functions [34]. Some studies have showed that the GRs from Helicoverpa armigera may be related to the host plant defense substances and the environments [13,77]. Similar functional analysis of the candidate P. utilis GRs will be required to confirm their physiological function.
We identified 24 P. utilis ORs, less than those found in the antennae of B. minax, B. dorsalis, E.
balteatus, E. corolla and D. melanogaster [7,16,17,23]. Compared to the antennae, fewer ORs were obtained in P. utilis, and the FPKM value of these genes in eggs, 1st-3rd instar larvae, pupae, male and female adults were relatively low. This may indicate that different samples including antennae will be required to obtain more functional ORs from P. utilis [17]. Our phylogenetic analysis showed that more members of the P. utilis ORs were part of the OR7a superfamily. Previous studies have showed that OR7a is involved in species aggregation and oviposition site-selection in Drosophila [78].
In addition, Unigene0051341 and Unigene0021745 were clustered with BdorOR13a and BdorOR82a, respectively. Furthermore, one OR (Unigene0052656) was clustered with the Orco, which is necessary for membrane targeting of canonical ORs [26]. Recent studies have found that BdorOR13a coexpressed with BdorOrco responded strongly to 1-octen-3-ol, while BdorOR82 co-expressed with BdorOrco responded markedly to geranyl acetate, suggesting that these ORs were related to the perception of plant volatiles that impact on the host-finding behavior of B. dorsalis [16].
Besides GRs and ORs, a third class of chemosensory receptors in insects is the ionotropic receptor (IRs), which play a vital role in the synaptic ligand gated ion channels participating in chemosensation and have multiple functions, such as sensation of tastes, odours and temperature [17,26]. The numbers of IRs identified in P. utilis were similar to that of B. minax, E. balteatus and B. dorsalis [7,23,16] but fewer than those reported in D. melanogaster, C. capitata and M. domestica [18]. It may indicate that the number of IRs varies from species to species and is related to diet or natural habitats [23]. Our phylogenetic analysis showed that most of P. utilis IRs were clustered with "antennal" orthologues from other Dipterans. According to the function of IRs gene in Drosophila, antenna IRs play key roles in odour and thermosensation [43]. For example, IR76b has been implicated in the sensation of polyamine, low-salt and amino acids [32,43]. IR8a acts as a coreceptor,along with IR64a that responds to acidic molecules [45]. IR8a and IR75a are also tuned to acidic molecules [79]. IR25a and IR21a are necessary to induce responses to cool conditions [80]. IR76b, together with IR25a and IR8a, is considered to have co-receptor function [43].The IRs orthologs in P. utilis might perform similar functions.
In present study, we identified six SNMPs which are similari to homologues from B. minax, D.
melanogaster and B. dorsalis [68]. Our phylogenetic analysis showed that Unigene0007016, Unigene0035241 and Unigene0035242 were clustered with DmelSNMP1, BminSNMP1a and 1b, suggesting that these proteins may be associated with pheromone reception [7]. We also identified six P. utilis CSP Unigenes. In the phylogenetic tree analysis, three P. utilis CSPs along with CSPs from B. dorsalis and B. minax were grouped in the same clade, suggesting that some CSPs may have similar functions between three species. Further investigations are needed to reveal the functions of P. utilis CSPs.

Conclusions
A total of 133 chemosensory genes were identified from the P. utilis transcriptomes of different developmental stages. As the first step towards understanding gene functions, chemosensory genes were classified according to their conservation, transmembrane domain prediction and phylogenetic analysis. Then, the expression levels of these genes at different development stages were observed in the transcriptomic data and verified OBP gene transcription patterns. This is the first study to obtain and identify candidate genes associated with chemoreception in P. utilis. Our results not only enrich the gene inventory of P. utilis and provide a foundation for further functional studies of the chemosensory system of P. utilis at the molecular level, but also offer creative approaches for the better application of P. utilis as a biological control agent.

Host plants and insects.
The host plants, A. adenophora were cultivated in the greenhouse of Yunnan Agricultural University, which were used as the host to rear the laboratory colony of P. utilis as described by Gao