Protein-coding genes experience a myriad of selection pressures. Deleterious mutations may be expected, whereas those enhancing fitness can become fixed and traced through evolutionary signals of positive selection (Sémon and Wolfe, 2007; Conant and Wolfe, 2008; Studer and Robinson-Rechavi, 2009). The characterization of the chorionic gene, defective chorion-1 (dec-1), in Anastrepha and its comparison to other Tephritidae and Diptera allowed us to investigate which forces have shaped its evolution at different evolutionary scales and identify conserved gene regions, as well as some possibly subjected to positive selection events.
Structural analysis of the dec-1 gene revealed 6 exons and three isoforms in Anastrepha fraterculus and A. obliqua, a pattern similar to what has been shown for Drosophilidae, such as D. melanogaster, D. yakuba, and D. virilis. Transcriptomic and sequencing data from laboratory samples of A. fraterculus and A. obliqua provided a more detailed characterization of the dec-1 gene, improving our understanding of gene boundaries and expressed isoforms in this genus. Previous studies have shown the structural conservation of dec-1, so many transgenic studies have indicated its functional interchangeability across distant Drosophilidae species with regard to fertility and wild-type eggshell morphology (Badciong et al., 2001). Here, we show that despite amino acid sequence divergence, especially among more distantly related species, such as Drosophilidae and Tephritidae, there is evidence of preserved protein functionality throughout evolution across Cyclorrhapha (Badciong et al., 2001). This is supported by evidence of purifying selection acting across its length at different evolutionary levels. Despite the overall evidence of selective constraint, especially in some portions of the gene associated with some proproteins, occasional episodes of positive selection suggest the impact of dynamic evolutionary processes, shedding new light on the adaptive mechanisms of dec-1.
The overall pattern detected by global tests of selection on dec-1 across different evolutionary hierarchies indicates that purifying selection seems to be the driving force along its five protein-coding exons throughout most of its history, although there is evidence of positive selection in some portions of the gene and at different points in time. This was shown in PAML, as well as on the global BUSTED test, which indicates that overall synonymous substitution rates are higher than nonsynonymous rates (dN/dS < 1) and that most sites are evolving under purifying selection, although some sites have evidence of evolving under neutral conditions. Considering the functional role of dec-1, its isoforms, and protein derivatives in oogenesis (Noguerón et al., 2000), particularly in the formation of the chorion and vitelline membrane (Bauer and Waring, 1987), it is reasonable to expect a prevalence of purifying selection across its length and history. This becomes even more apparent when we consider that mutations in exon 3, which may affect all three isoforms, can lead to sterility or egg anomalies (Waring et al., 1990; Kim et al., 2002; Spangenberg and Waring, 2007), underscoring the biological significance of the products of this gene. Nonetheless, the results from MEME, as well as the contrast between the M7 and M8 models in PAML, suggest that some sites are evolving under positive selection.
An investigation of different rates of evolution across the gene using the Bayes Empirical Bayes model (BEB) in PAML (Yang, 1997), along with various tests conducted using HyPhy (Kosakovsky Pond et al., 2019), revealed that even though the majority of codons are evolving under purifying or neutral evolution, there are codons that experienced positive selection or episodic diversifying selection. A better understanding of the potential impact of changes across dec-1 should consider the different proproteins and derivatives produced by the three isoforms normally found at different developmental times, which may have different functions. For instance, along with the former, fc106 is much more abundant than are fc125 and fc177 and is found in oogenesis stages 9 and 10. On the other hand, fc177 expression increases in stages 11 and 12 (Hawley and Waring, 1988; Mauzy-Melitz, and Waring, 2003). The fc106 product is cleaved into the s25 and s80 proproteins during late oogenesis stage 10, but s80 is cleaved later into s60 and s20, so much so that s25, s60, and s85 are deposited in the egg layers, whereas s20 does not seem to be a structural component of the eggshell, although it is detected in vesicles within the oocyte (Spangenberg and Waring, 2007). This might be one of the reasons why s20 seems to be more conserved than the other dec-1 proproteins.
Analyses such as BEB (Fig. 3) suggest that evidence of positive selection between more distantly related taxa is restricted to the beginning of proprotein s25, corresponding to exon 2, and the middle portion of exon 3, which putatively encodes the proprotein s60 but is also part of the s95 proprotein. Significant positively selected sites were detected only for the putative s20 proprotein in the more closely related Anastrepha dataset. Similar results, which in general point to the same sites under selection, were detected by the FUBAR and MEME tests. A similar pattern was revealed by TreeSAAP, which identified the same codons, 523 and 656, in the “Anastrepha dataset” and codon 1314 in the "Cyclorrhapha dataset" group, among other codons that showed evidence of positive selection.
Physicochemical changes in codons 523 and 656 could affect all three dec-1 isoforms since they all share sequences from exon 3, but each derivative proprotein would be differentially affected since only the s20 derivative (produced from FC106), s85, and the initial regions of the s95 (FC125) derivatives would retain these domains. In contrast, the 1314 V codon found in the “Cyclorrhapha dataset”, located at the end of exon 5 (equivalent to exon 4 in D. melanogaster), impacts the C-terminal region of DEC-1, a part of the s80 derivative, s60 and s85 proproteins. Changes in the s20 derivative could disrupt the organization of early protein interactions in the egg assembly process (Badciong et al., 2001) and induce female sterility, similar to alterations in the s95 derivative (Mauzy-Melitz and Waring, 2003). Furthermore, changes in the function of the s85 derivative may destabilize the vitelline membrane (Waring et al., 1990) and the tripartite endochorion, as proper aggregation of chorion proteins to form the pillars, floor and reticulated roof does not occur (Mauzy-Melitz and Waring, 2003). Overall, the identified mutations occur in regions that might compromise egg formation and viability (Hawley and Waring, 1988).
Different rates of synonymous and nonsynonymous changes potentially indicate regions subject to different selection pressures, but it is not known whether these changes impact fitness. Another way to look at these changes is to evaluate whether inferred amino acid substitutions produce potentially radical physicochemical changes, which are more likely to have a fitness impact, since they are more likely to be selected against (Smith, 2003), particularly in large populations (Weber and Whelan, 2019). In general, regions that are under purifying selection are less likely to undergo radical amino acid changes since they are more likely to alter protein structure and, potentially, function, whereas positive selection might favor radical amino acid changes. Therefore, in conserved regions, radical changes are more likely to be found on the tips of trees, which would indicate that they are evolutionarily recent, whereas nonradical changes could be found in internal branches just as likely as on the tips (Pupko et al., 2003).
Overall, the dec-1 results indicate that radical changes are more likely to be found on tree tips than on the interior, and this finding is consistent across the whole gene. The one distinction we observed is related to the part of the gene that codes for the s20 proprotein, since there are several internal amino acid changes segregating in the “Anastrepha dataset”, but not for s60, whereas that pattern is not observed when we compare more distantly related taxa. To explain these differences, we should consider the impact of selection on the underlying genetic variation. Positive selection may lead to the sweep of other variation in the population, but the speed with which that happens depends on whether it is a hard or a soft sweep, and it is not necessarily trivial to distinguish them (Schneider et al., 2021). On the other hand, purifying or background selection does not remove neutral variation, or even deleterious mutations at the same rate as with positive selection, showing larger variance (Cvijović et al., 2018). In this case, we would expect that regions that are subject to positive selection might not have any segregating radical changes when we compare closely related specimens but rather greater rates of evolution when comparing more distantly related specimens. In regions under purifying selection, however, we would still see segregating variation between closely related species, but that would not be reflected in higher rates of evolution; rather, we would not see that reflected when comparing more distantly related taxa because that would not be favored by selection.
Another important aspect to consider when using rates of synonymous and nonsynonymous changes to investigate selection in recently diverged lineages is that dN/dS is a measure of selective pressure that contrasts rates of fixed substitution between two homologous sequences (Kimura, 1977), and as such, it was designed for analyzing more divergent lineages; otherwise, it might suffer from the stochasticity of the mutation/selection process. The comparison of these changes between more closely related taxa, generally by Ka/Ks, may be influenced by other factors that are not directly associated with selection intensity, such as population expansion or contraction, and other evolutionary forces that may even affect neighboring regions (Kryazhimskiy and Plotkin, 2008). Therefore, it is relevant to contrast patterns of evolution considering adequate evolutionary models that could differentiate evidence of selection from its relaxation. In this framework, the RELAX analysis indicated that clade 2 in the “Anastrepha dataset” has an intensification of selection (K = 2.03), and no clades in the "Cyclorrhapha dataset" showed evidence of intensification or relaxation of selection. Intensified selection pressures can push ω values away from neutrality (ω = 1), while relaxed selection allows for the accumulation of neutral or slightly deleterious genetic variations, promoting genetic diversity within populations. This relaxation of selection potentially influences differentiation by providing raw material for adaptation to new environments and conditions (Templeton, 2008).
Global analysis of selection detected potential sites experiencing positive and/or episodic diversifying selection, but because the average ω ratio is rarely above 1, detecting positive selection in specific lineages considering a phylogenetic framework may be more effective, as it targets those lineages specifically (Álvarez-Carretero et al., 2023). Branch-site tests in PAML identified specific branches with evidence of positive selection, such as the lineage separating A. obliqua and A. fraterculus in the “Anastrepha dataset” and all three tested branches in the Cyclorrhapha dataset. In all cases, we have evidence of a few lineages experiencing episodic positive selection in a background of purifying selection.
The pattern of purifying selection with a few events of positive or diversifying selection identified in the dec-1 gene is also observed in genes directly and indirectly related to reproduction and sex determination, such as transformer (tra) and fruitless (fru) in Drosophila (McAllister and McVean, 2000; Kulathinal et al., 2003; Parker et al., 2014) and even in Anastrepha (Sobrinho and de Brito, 2010; Sobrinho and de Brito, 2012). More importantly, this trend has also been observed for several choriogenic genes investigated in Anastrepha, such as Cp15, Cp16, Cp19, and Cp38, whereas vitelline membrane genes, in general, have been experiencing only purifying selection, with the possible exception of a recently duplicated gene (Vm26Ab and Vm26Aa’) in the genus Anastrepha (Gonçalves et al., 2013).
Similar results of vitelline membrane genes under purifying selection and chorionic genes under positive selection were also described for Drosophila, which has been attributed to the fact that vitelline membranes are internal to the chorion and therefore more subject to purifying selection, whereas chorionic proteins may have important roles with regard to desiccation and oxygen transport that could be subject to positive selection depending on environmental cues (Jagadeeshan and Singh, 2007). Notably, portions of dec-1 associated with s25, which are expressed earlier in egg development, seem to be more conserved than portions of the gene related to the s60 proprotein, which is produced later and seems to be deposited in more outer chorionic layers. However, most of the positive sites detected in the branches that define Tephritidae, as well as the genus Anastrepha, are associated with the s25 proprotein, which might be relevant when we consider the great variation across species of Anastrepha in terms of chorion morphology (Selivon and Perondini, 1998; Selivon et al., 2004; Figueiredo et al., 2013).
Despite the general pattern of purifying selection, the evidence of positive selection on dec-1 in Tephritidae, particularly among different Anastrepha species, is relevant because of the great diversity of the inner chorion patterns among Anastrepha species, which are much more diverse than those found in other Tephritidae (Murillo and Jirón, 1994; Selivon and Perondini, 1998; Dutra et al., 2011; Figueiredo et al., 2017). Although rapidly evolving genes in general have been associated with sexually selected traits or sexual conflict across different taxa (Swanson and Vacquier, 2002; Clark et al., 2006; Panhuis and Swanson, 2006; Turner and Hoekstra, 2006), adaptive responses to environmental cues can also play an important role (Jagadeeshan and Singh, 2007).
It is important to emphasize that phylogenetic inferences derived from dec-1 are not meant to help elucidate relationships among the taxa studied here, which are generally well established, at least at higher levels. Rather, they are used to establish a framework that enables the investigation of molecular evolutionary patterns. Nonetheless, we should mention that the relationships depicted among the more diverse taxa in general reflect what has been indicated in the literature, with a closer relationship between Anastrepha and Rhagoletis, although the phylogenetic tree here produced placed Ceratitis basal to Tephritidae rather than on a clade with Bactrocera and Zeugodacus (Virgilio et al., 2015), a more accepted relationship among Toxotrypanini (Mengual et al., 2017).
The overall phylogenetic pattern observed, with short internal branches even between more distant groups, provides a crucial backdrop for understanding the broader evolutionary context of the dec-1 gene and its potential functional implications, especially for more closely related lineages, such as the inferences among Anastrepha specimens. Our results indicated that the samples of A. fraterculus we investigated seemed to be separated into three different lineages. It has been suggested that A. fraterculus is a species complex with several cryptic species, with at least three in Brazil that can be identified via morphometric methods (Hernández-Ortiz et al., 2012; Hernández-Ortiz et al., 2015). Although we did not investigate different morphotypes in this study that would enable us to compare these taxa to the ones described elsewhere (Hernández-Ortiz et al., 2015; Prezotto et al., 2019), the topology we found is remarkably similar to what has been described using transcriptomic and genomic data (Congrains et al., 2021; Congrains et al., 2023), which identified three distinct branches for A. fraterculus while supporting a single A. obliqua branch. These diverse A. fraterculus lineages may be subject to distinct selective pressures (Smith-Caldas et al., 2001; Manni et al., 2015), despite their recent divergence. This detailed phylogenetic context sets the stage for understanding the evolutionary dynamics of the dec-1 gene within specific Anastrepha lineages, although evidence of intraspecific gene flow among these lineages (Díaz et al., 2018; Congrains et al., 2023) may limit our ability to investigate that adequately.
Shifting focus to broader implications, a quest for mutations fixed by positive selection events can help identify novel functions and differentiate species (Sémon and Wolfe, 2007; Conant and Wolfe, 2008). In the context of a structural gene related to egg and chorion development, these changes may not be readily discernible or confer adaptive advantages, especially due to the complex interactions between genotype, phenotype and environmental influences. Therefore, comprehending how reproductive genes are influenced by selective forces through selection analyses enables the identification of this genotypic and potentially phenotypic variation in the genome. Furthermore, these genes can serve as markers in the context of species differentiation amid phylogenetic uncertainties. Remarkably, dec-1 produces adequate phylogenetic signals at different evolutionary levels across Tephritidae, but it seems to distinguish among different lineages of A. fraterculus for which there are only a handful of informative markers, despite the plethora of genes investigated (Congrains et al., 2021; Congrains et al., 2023). Understanding its structure, isoforms, and evolutionary patterns across broader sampling also sets the stage for its establishment as a viable adequate marker to study variation across Anastrepha. It also provides insight into reproductive biology, which could foster the development of genetically modified females through transgenes, similar to what has been done in Drosophila, which could be important for population management, particularly in pest insects such as those of the Tephritidae family.