CYP, CCE and GST repertoires were compared across four heteropterans (N. viridula, H. halys, R. prolixus and C. lectularius) and one Auchenorryncha (N. lugens). The latter was included in the analysis as a hemipteran non-heteroptera outgroup. Considering the sum of members of all the gene superfamilies under analysis, 151 detox genes in N. viridula, 116 in C. lectularius, 194 in R. prolixus, 238 in H. halys and 96 in N. lugens were identified (Table 1). CYP superfamily was the largest in terms of number of genes in all the species, followed by CCE, being GST the least represented (Table 1). This distribution is not observed in other insect species; Drosophila melanogaster, for example, possesses more GST than CCE genes . When considering each superfamily separately, the H. halys CYPs (134), CCEs (74) and GSTs (30) were the most abundant. By comparison, the N. lugens repertoire was reduced in every family (48 in CYPs, 36 in CCEs and 12 in GSTs) (Table 1). The CCE and CYP superfamilies of R. prolixus have 119 and 61 members, respectively, followed by N. viridula with 82 and 64, respectively (Table 1). The GST superfamily resulted slightly larger (15) in N. viridula than in C. lectularius and R. prolixus, counting 14 members in each species (Table 1).
When the sum of CYPs, CCEs and GSTs was compared to the complete gene sets in the genomes, significant expansions in C. lectularius, H. halys and R. prolixus were detected compared to N. lugens (Supplementary information 1). Within Heteroptera, both H. halys and R. prolixus presented significant expansions with respect to C. lectularius (Supplementary information 1). This was supported by different numbers of duplications and losses (Figure 1), with H. halys and R. prolixus being the species with a higher number of duplication events detected in the gene families studied.
CYPs are involved in the degradation of xenobiotics, from both the diet and the environment, being an important factor in the response of insects to chemical stress . They also participate in key metabolic processes, such as the degradation of pheromones or the biosynthesis of signaling molecules. CYP genes are divided into 4 large clans (Mitochondrial, CYP2, CYP3 and CYP4 clans), which in turn are subdivided into families and subfamilies .
CYP complement in the N. lugens genome was significantly smaller when compared with the CYP complement of other heteropteran genomes (Supplementary information 1). Also, the CYP superfamily was significantly expanded in both R. prolixus and H. halys when compared to C. lectularius (Supplementary information 1). Phylogenetic analysis allowed the classification of CYP superfamily members into clans and families (Figure 2), which presented different evolutionary dynamics (Table 2).
The number of mitochondrial clan genes was similar among species: 8 genes in N. lugens, R. prolixus and H. Halys, 7 in C. lectularius, and 6 in N. viridula (Table 1). These numbers reflect stability in the evolution of the clan, with few events of gene duplications and losses, although the number of losses was slightly higher (Table 2). Many genes from this clan have essential roles in the life cycle of insects, such as the Halloween genes that are involved in the synthesis of ecdysteroids . Halloween genes belong to families 315 (or sad), 302 (or disembodied; dib), and 314 or (shadow; shd). All of them were represented with 1 gene for each family in the hemipteran databases, with the exception of 2 CYP302 genes in H. halys (Figure 2A). The family CYP394 from the mitochondrial clan had 1 representative in both R. prolixus and C. lectularius, whereas CYP3221 had one representative in each one of the pentatomids. N. lugens had no representatives of these families, but it had 1 gene classified as CYP419. Interestingly, CYP404 was represented by one gene in N. lugens, C. lectularius and R. prolixus, but was absent in the pentatomids. The largest mitochondrial CYP family in the databases analyzed here was CYP301, with 3 genes detected in R. prolixus, H. halys and N. lugens, and 2 in C. lectularius and N. viridula (Figure 2A). Coincident to previous observations for triatomines, none of the sequences identified in the hemipterans analyzed herein were clustered with the CYP12 family, related to insecticide resistance [10, 15].
Like the mitochondrial clan, CYP2 contains several Halloween genes (CYP306 and CYP307 families; named phantom and spook respectively) (Figure 2B). Eight CYP2 genes were detected in N. lugens, 7 in R. prolixus, 6 in C. lectularius and H. halys, and 4 in N. viridula (Table1, Figure 2B). The evolutionary dynamic of this clan was similar to that of the Mitochondrial clan: few events, with gains slightly more prevalent than losses (Table 2). A single member of the families CYP305 and CYP307 were detected in each species database, with the exception of 2 CYP307 genes in N. lugens. For CYP15, one member was detected for each database. Two genes were classified in the CYP306 family in R. prolixus whereas one member was found in each of the other species under analysis. CYP304, previously reported in D. melanogaster and other insects, was represented by 1 gene in N. lugens, but not in the other species analyzed, suggesting a loss of this family in Heteroptera. Finally, only one sequence belonging to each of the CYP18 and CYP303 families were detected in the 4 genomes analyzed, but not in the N. viridula transcriptome (Figure 2B). In a previous work, CYP303 and CYP18 were not detected in the transcriptomes of Triatoma dimidiata and Triatoma infestans , which could suggest that these families are low or conditionally expressed in heteropterans, hindering their identification even in complete transcriptomic databases.
CYP3 and CYP4 clans are always bigger than the Mitochondrial and CYP2 clans in the insect genomes . These clans have proliferated as a result of gene duplication events, which allowed their diversification and neo-functionalization . Similar results were observed in the hemipteran species analyzed herein. The CYP3 clan has a very dynamic evolution and a significant expansion trend with 2.5 folds more gene duplications than losses (Table 2). Within the CYP3 clan, there are multiple xenobiotic metabolizing families involved in phytochemical detoxification and insecticide resistance, mainly represented by CYP6 and CYP9 families .
The number of sequences encoding CYP3 enzymes was 55 in R. prolixus, 31 in C. lectularius, 78 in H. halys and 41 in N. viridula (Table 1; Figure 2C); in N. lugens the CYP3 complement was significantly reduced (8 genes) with respect to their counterpart in heteropterans (Table 1; Figure 2C). Accordingly, many duplications were observed in all phylogenetic branches leading to Heteroptera species. In particular, R. prolixus had 30 duplications compared to only 11 in C. lectularius, which explains most of the difference between these two hematophagous species. A large number of duplications was observed in the ancestor of N. viridula and H. halys, and again in the lineage of H. halys (Figure 1, Table 2). The discrepancy between the two pentatomids could be related to differences in the broad of their diet, although the much lower number of genes and duplications, and more gene losses detected in N. viridula could be due to the low expression of some of these genes, impairing their detection in the assembled transcriptome. Genomic information of this species will be important to refine present results.
CYP6 is the most abundant CYP3 family in the herbivorous species analyzed (35.9% of the CYP3 are CYP6 in H. halys, 39% in N. viridula and 62.5% in N. lugens), and the second larger in the hematophagous (22.6% sequences in C. lectularius and 14.5% in R. prolixus). Within this family, subfamily LV seems to be exclusive for pentatomids (18 in H. halys and 11 in N. viridula). The second larger family in pentatomids, and the more abundant in both R. prolixus and C. lectularius, is CYP395 (23.1 % of the sequences in H. halys, 12.2% in N. viridula, 25.8% in C. lectularius and 25.4% in R. prolixus). However, none of the N. lugens CYP sequences was classified within this family. Subfamilies CYP395-S and CYP395-R are only represented in pentatomids, whereas subfamilies CYP395-C to CYP395-F contain only R. prolixus sequences. Families CYP3225 to CYP3231 are exclusive for the pentatomids. CYP3225 family is larger in H. halys (11 sequences) when compared to N. viridula (3 sequences). On the contrary, subfamily CYP3226 contains 2 genes from H. halys and 5 from N. viridula. CYP3227 contains 6 H. halys and 4 N. viridula sequences, whereas CYP3228 to CYP3231 are represented by one gene in both H. halys and N. viridula.
Families CYP3084 to CYP3089 and CYP3091 are absent in pentatomids. The latter is represented by 6 sequences in R. prolixus and 1 in C. lectularius and N. lugens. Families exclusive for R. prolixus are CYP3084, CYP3085, CYP3086, CYP3088 and CYP3096. Family CYP3090, which was previously described only in triatomines , seems to be ubiquitous in Heteroptera with 1 sequence in H. halys, N. viridula, R. prolixus; and C. lectularius. Finally, CYP3092 family, described for triatomines for the first time in previous works [4, 16], was also detected in pentatomids (4 sequences in R. prolixus, 7 in H. halys, 4 in N. viridula), even though subfamily CYP3092A is exclusive for R. prolixus, whereas subfamilies CYP3092D and CYP3092E were only detected in the pentatomids.
We observed that CYP9 family (belonging to CYP3 Clan) was absent in all the hemipteran databases analyzed here. In a previous work, we reported the absence of CYP9 family in triatomine species . Furthermore, a recent work analyzing the genome of the predaceous hemipteran Orius laevigatus (Anthocoridae) also reported the lack of CYP9 family . Globally, results point to the lack of CYP9 class as a common observation in hemipteran genomes. Given that CYP9 is a large family in insect genomes belonging to different Orders, and that it was associated with insecticide resistance and xenobiotic detoxification , the lack of this family in Hemiptera could be a relevant finding for evolutive and applied entomology. On the contrary, CYP395 family is one of the largest groups in heteropteran databases. Given the absence of the CYP9 family, CYP395 could have a role in detoxification in heteropterans.
CYP4 is the second most numerous CYP clan in the genomes of insects from different orders . Many genes belonging to this clan have been involved in detoxification . The analysis revealed 49, 42, 31, 12 and 24 CYP4 genes, in R. prolixus, H. halys, N. viridula, C. lectularius and N. lugens, respectively. CYP4 subfamily was significantly larger respect to the total CYP complement in N. lugens compared to H. halys and C. lectularius (Supplementary information 1). This was due to the small number of CYP3 genes detected in the planthopper. A significant reduction was also detected for CYP4 in C. lectularius compared to both N. viridula and R. prolixus (Table 1, Figure 2D, Supplementary information 1). This result reflects an expansion of the clan in R. prolixus due to 37 gene duplication events (Table 2). Our methodology cannot rule out that the expansion occurred in the ancestor of the hematophagous species with a posterior massive gene loss in C. lectularius. However, previous data comparing CYP4 complement in different triatomine species, including R. prolixus, supports the first hypothesis . A large number of duplications (23) was also detected in the ancestor of the pentatomids (Table 2). Again, the losses detected in N. viridula should be confirmed with genomic information.
CYP4 genes were classified in CYP4, CYP3222, CYP3223, CYP3224 and CYP3093 families, being 3222 to 3224 exclusive for pentatomids (Figure 2D). Besides, CYP3093 family is exclusive to R. prolixus. CYP3093 is expanded in R. prolixus (71.4% of the CYP4 clan), due to a gene bloom (Figure 2D). It was proposed that expansions of CYP genes occur in response to environmental stimuli, leading to the potential to develop insecticide resistance . Functional information on CYP40393 does not exist for R. prolixus to date, with the exception of bioinformatic molecular docking models. For several CYP43093 members, these models proposed a favorable interaction between the pyrethroid deltamethrin and the active site, suggesting a possible role in insecticide metabolism . In T. infestans CYP40393 are highly expressed in tegument, which is the first barrier to toxics; members of this family are overexpressed in resistant T. infestants populations . Altogether, the evidence allows us to hypothesize that CYP43093 bloom confers to R. prolixus the potential to acquire resistance to chemical insecticides. The bloom of CYP3093 observed in R. prolixus is not shared by other triatomine species , nor by other heteropterans (present results), suggesting that it may be a recent event in the evolution of this species.
A hundred percent of the CYP4 sequences of C. lectularius (12) belong to the CYP4 family, which is also represented in R. prolixus (14 genes, 28.5% of the clan), H. halys (30 genes, 71.4%), N. viridula (20 genes, 64.5%) and N. lugens (23 genes, 95.8%). It has been suggested a role of this family in insecticide tolerance and resistance in triatomine species, given their higher expression in T. infestans resistant to pyrethroids [4, 20]. Moreover, RNAi mediated gene-silencing of CYP4 family members led to an increased susceptibility to deltamethrin in both T. infestans  and R. prolixus .
Database searches and manual gene curation (Supplementary information 4) revealed 36 genes encoding CCEs in N. lugens, 74 in H. halys, 46 in C. lectularius and 54 in N. viridula, whereas R. prolixus genome encodes 61 genes (Table 1). It is a dynamic detoxification gene family with 150 duplications and 54 gene losses detected (Table 2, Figure 1). These numbers suggest an expansion in the superfamily compared with D. melanogaster (Supplementary information 1), which was more pronounced in some branches of the phylogeny. Interestingly, very large expansions occurred in the ancestors of pentatomids (51 duplications) and individually in R. prolixus (41) and in C. lectularius (25). Overall, this dynamic was mostly branch specific, suggesting that it could be the result of adaptation to different environmental niches. The number of gene losses is certainly overestimated due to lack of genomic data in N. viridula.
The CCE superfamily includes proteins with different functions which were classified into three functional classes [23, 24]: neuro/developmental (ND) class which, with the exception of acetylcholinesterases (Ach), lacks the catalytic activity; dietary/detoxification (DD) class, and hormone and pheromone processing (HPP) esterases. In a previous work, we showed that triatomine species lack DD class, and that most of the CCEs in these species were classified as HPP . More recently, Bailey et al  also reported the lack of DD in the predaceous hemipteran O. laevigatus.
We have classified CCE sequences of N. lugens, N. viridula and C. lectularius based on their phylogenetic relationships to CCEs of H. halys and R. prolixus previously reported [4, 13] and the analysis of characteristic conserved residues in their sequences (Supplementary information 2; Figure 3). Interestingly, DD class was only represented in N. lugens (4 genes). The lack of DD class CCEs in blood feeding and predaceous Hemiptera has been proposed as a consequence of their diet; given that these species are not exposed to plant secondary metabolites they would not need this class of CCEs [4, 17]. However, our results revealed that DD CCEs are also absent in pentatomids, suggesting its lose during the evolution of Heteroptera.
The sequences considered as HPP were defined here taking into account phylogenetic relationships and the presence of β-esterases in the groups. In agreement to previous reports on Heteroptera [4, 17] we observed an important diversification in the HPP. Forty HPP class CCEs were detected in R. prolixus, 30 in C. lectularius, 56 in H. halys, 46 in N. viridula and 16 in N. lugens. These numbers are considerably bigger than the reported for species of other Orders. The HPP clades that contain heteropteran sequences harbor large expansions, especially in pentatomids. Three CCE sequences of H. halys, 2 of N. viridula, 2 of N. lugens and 21 of C. lectularius could not be classified as β-esterases by sequence identity analysis, but they are part of the hormone and pheromone processing class according to the phylogeny (Figure 3).
The ND class encodes neuroligin, gliotactin, glutactin, and neurotactin proteins, which are not catalytic. It also encodes Ach, which is involved in neurotransmission, and is the target site of organophosphate and carbamate insecticides . ND class presented smaller species-specific expansions in the species analyzed as compared to the HPP class. In the ND class, the gene complement was significantly expanded in R. prolixus in respect to N. viridula and H. halys (Supplementary information 1). D. melanogaster and other dipteran genomes possess 1 Ach encoding-gene. A single gene encoding Ach was also found in N. lugens genome, while R. prolixus, H. halys, and N. viridula have 2 each. Remarkably, the C. lectularius genome encodes 5 Ach paralogue genes (Figure 3). All the species analyzed possess 1 gene classified as “putative neuroreceptor” (Gliotactin or clade K) within the neurodevelopmental class. N. lugens, H. halys, R. prolixus and C. lectularius have 1 gene in the neurotactin group and 1 in the “uncharacterized” or I group, whereas N. viridula transcriptome do not have any representatives in these groups. H. halys, C. lectularius and N. viridula have 1 gene in the glutactin group while R. prolixus has 2. The most numerous neurodevelopmental class is the one formed by neuroligins: 2 in N. viridula, 5 in C. lectularius, 7 in H. halys, 10 in N. lugens and a significant expansion with 13 representatives in R. prolixus (Supplementary information 1).
Altogether, our results and previous reports suggest a particular configuration of CCEs complement in Heteroptera, with a lack of DD and expansions in HPP class. In the absence of DD class, catalytic β-esterases could have a role in detoxification in Heteroptera. In this sense, the expansions observed in the HPP class may functionally counteract the absence of DD in the ability of these species to cope with toxic xenobiotics.
GST enzymes play a fundamental role in the detoxification of endogenous and xenobiotic compounds, but they also participate in hormone biosynthesis, intracellular transport and protection against oxidative stress . These enzymes can metabolize compounds by reductive dehydrochlorination or by conjugation reactions with reduced glutathione, generating soluble metabolites that are easier to eliminate. GSTs are classified in microsomal, mitochondrial and cytosolic, depending on their location in the cell; mitochondrial GSTs have not been found in insects. While 7 types of cytosolic GSTs have been recognized in mammals, insects have 4 of these families known as Omega, Sigma, Theta and Zeta. Delta and Epsilon classes, associated with insecticide resistance in Diptera, were only reported in insects to date .
Fourteen GSTs were found in both R. prolixus and C. lectularius genomes, 15 in N. viridula (15) and 30 in H. halys (Table 1). Twelve GST encoding genes were identified in the N. lugens genome, in agreement with a previous report . Although this family was less dynamic in terms of gene birth and death as compared to the other detox families, it also presented an expansion trend (Figure 1). The largest number of duplications were observed in the ancestor of the pentatomids and in H. halys, suggesting an adaptive role of this family in phytophagous heteropterans. The absence of genomic data for N. viridula may be hiding duplications in this species and leading to overestimation of the number of gene losses. The expansion of the GST family observed in the H. halys genome is significant when compared with both N. lugens and R. prolixus (Supplementary information 1).
Microsomal GSTs have not been reported to play a role in the detoxification of xenobiotics; even though they differ structurally from cytosolic GSTs, they catalyze similar reactions . We found 3 sequences in H. halys and C. lectularius, 2 in N. lugens, while R. prolixus and N. viridula databases encode for only 1 microsomal GSTs.
The hemipteran species analyzed herein present a low number of Delta (2 in H. halys, N. lugens and C. lectularius; 1 in R. prolixus, 0 in N. viridula) and Epsilon GSTs (2 in N. lugens and 0 in the heteropterans; Table 1) (Figure 4, Supplementary Figure 2). With the only exception of N. lugens and Trialeurodes vaporariorum , the absence of Epsilon GSTs is a common finding in hemipteran genomes. Conversely, other GST groups are larger in these species compared to other Orders . Particularly, Sigma class was the largest group, especially in pentatomids (Table 1). Fifty percent of the total GSTs of R. prolixus (7 out of 14), 60% in N. viridula (9 out of 15), 42.8% in C. lectularius (6 out of 14), 63.3% in H. halys (19 out of 30), and 25% in N. lugens (3 out of 12) belong to Sigma class (Table 1). These results suggest that Sigma class GSTs could be relevant for detoxification in Hemiptera, compensating the absence or reduction in other GST groups.
The Omega class has a cysteine residue in their active site, which allows the catalysis of thiol transferase and reduction reactions that are not catalyzed by the other GST classes . Among the hemipterans studied herein, H. halys possess 3 genes in this class, N. viridula has 2, whereas N. lugens, R. prolixus and C. lectularius have 1 representative each (Figure 4, Table 1).
The Theta class has been proposed to be a contributor to the detoxification of xenobiotics, due to the protective action against oxidants and dehalogenating activity . R. prolixus has 3 genes in this class, whereas C. lectularius, H. halys and N. viridula have 2 and the N. lugens genome encodes 1. Finally, 1 gene belonging to the Zeta class was detected in each of the hemipteran databases analyzed here.
Genomic clusters and evolution of detoxifying gene superfamilies
Paralogue genes that are originated by relatively recent duplications are usually organized in clusters in the genomes. Furthermore, duplicated genes that are in adjacent genomic regions could be regulated in a coordinated way. For this reason, it is possible that evolutive forces may be acting to maintain genes in a cluster organization even a long time after the duplication events that generated the cluster [28, 29].
A positive linear correlation was observed between the total number of detoxifying genes (CYP, CCE and GST families) and the detoxifying genes organized in genomic clusters (R=0.96) (Figure 5A). This tendency is maintained when each superfamily is considered separately (R=0.99 for GSTs; R=0.99 for CYPs; R=0.81 for CCEs). The detoxifying genes forming clusters represented 25% in N. lugens, 55.2 % in C. lectularius, 62.4% in R. prolixus and in 57.14% H. halys (Figure 5B). These results are in accordance with the large estimated number of gene duplications (Figure 1).
Considering the total of CYP members identified in the four species under analysis with available genomic sequence 58.1% form clusters; these percentages are 55.7 for CCEs and 26.8 for GSTs (Figure 5C). All the families within the CYP superfamily showed a tendency to form clusters, with the exception of the mitochondrial clan. CYP2 contains 48% of the genes in clusters, CYP3 67% and CYP4 66% (Figure 5D). The large percentage of genes in genomic clusters is in accordance with the large number of species specific, probably recent, duplications observed in the CYP3 and CYP4 gene families.
Most CYP3 and CYP4 genes were clustered in the heteropterans (Figure 6A); whereas most of the genes belonging to CYP2 were clustered in the hematophagous species analyzed. HPP class of CCE has 68% of genes located in clusters. This tendency is strong in Heteroptera, being very evident in C. lectularius (95% of clustered genes).
In the case of the GSTs, the highest proportion of clustered genes (close to 50%) is given by the Sigma and Omega classes (Figures 5D and 6C).
Our phylogenetic analysis of gene gains and losses revealed some interesting patterns. The large number of both gene duplications and losses suggests that the detox gene families have a very dynamic evolution, with high turnover rates resulting in many species and lineage-specific genes. This evolutionary pattern resembles that of chemosensory gene families, which similarly to the detox gene families, have a large adaptive potential . Interestingly, it has been recently suggested that some chemosensory genes are involved in detoxification in insects, probably by sequestering toxic molecules .
Among the detoxification superfamilies, the CYP superfamily was the most dynamic, with a total of 299 events, the majority of which were duplications, indicating that this family is expanding. The CYP3 and CYP4 clans are the most dynamic within the superfamily (Figure 1, Table 2). The second more dynamic superfamily was CCEs, with 204 events. The GST superfamily had less events in comparison, which is in accordance with having less genes as compared to the other superfamilies. Overall, all families had more duplications than losses, which suggests a general expansion trend.
Comparisons across species showed an expansion of detox genes in the lineage leading to the phytophagous heteropterans (Figure 1). This is especially true for superfamilies CCE and CYP. The GST superfamily had fewer events overall, but appears to be more dynamic in the phytophagous species as compared to the hematophagous species. Besides, a large expansion was observed in CYP in H. halys, which also had many duplications in the GST family. These results are in accordance with phytophagy being a newly acquired diet in the group from the predator common ancestor of all the heteropterans . A phytophagous diet represents an important adaptive challenge, as a result of the arms race between plants trying to avoid herbivory and insects trying to overcome the toxicity of chemical compounds produced by plants. The broad complement of detoxificative enzymes resulting from both ancestral and species-specific duplication in H. halys are certainly instrumental for its feeding on different plant species, facilitating its success as an invasive species. N. viridula had a very different pattern, most likely as a consequence of having only transcriptomic data available for this analysis. Due to the incompleteness of transcriptomes, gene numbers and duplications are probably underestimated, while gene losses are likely overestimated in this species. However, H. halys is a polyphagous pentatomid, when compared to N. viridula, suggesting that the observation of a broader complement of detoxificant enzymes could be not a consequence of the lack of genomic information in the latter, but a reflect of different feeding habits between the species.
In contrast with the phytophagous species, significant expansions were not observed in the ancestors of the hematophagous, which in turn had many species-specific expansions. The GST superfamily, however, seems to be shrinking among the blood-feeding heteropterans. Hematophagy evolved independently in C. lectularius and R. prolixus as their most recent common ancestor was likely a predator. Hence, it makes sense that these two species had independent evolution of their detox gene complements in response to the diet change, with many species-specific duplications and losses. R. prolixus, however, had many more duplications as compared to C. lectularius, which accounts for most of the large difference in the total number of detox genes between these two species. In C. lectularius, besides the GSTs, the CYPs also seem to be in a reduction trend.