The circadian system generates and maintains endogenous rhythms synchronised with the daily fluctuation of light and is observed in a wide range of life forms [2, 3]. We identified and characterized the molecular evolution of genes from the core circadian system in western Iberian chubs, freshwater fish species of genus Squalius that inhabit different river basins under two climatic types (Atlantic and Mediterranean), with differences in light and temperature.
Identification of paralogs and orthologs of all circadian gene families in Squalius
We were able to identify sixteen genes (Table S1) belonging to the four known core-circadian gene families (CRY, PER, CLOCK and BMAL), which are orthologous to genes described in D. rerio and other fish species [6–10].
Phylogenetic relationships within each gene family revealed that the history of paralogs genes is conserved in Squalius species, and supports the correct identification of circadian orthologs and paralogs [6]. Moreover, in agreement with results from Danio rerio and other fish species [21, 22, 40, 41] (summary on Table S1), we detected possible diversification in functions of the circadian genes that may have arisen to optimize important biological processes, synchronised with the circadian oscillation. We found that in Squalius the set of predicted protein-protein interactions in CRY, PER and BMAL (Table S3) involve circadian related proteins, but also proteins with other biological functions, namely BHLHe41 (basic helix-loop-helix family, member e41) and HSF2 (heat-shock factor 2), involved in temperature response [23, 42–44], or with NFIL3-5 (nuclear factor, interleukin 3-regulates, member 5), a protein involved in immune response [45].
Evidence of positive selection is related with protein predicted function
We found evidence of positive selection mostly on cry (cry1aa, cry2 and cry3) and per (per1a, per1b, per2 and per3) genes (Tables 1 and S4-S6). These genes encode the negative elements of the circadian system acting as repressors of transcription [46] (Fig. S1). These genes are sensitive to environmental stimuli (e.g. light or temperature) and refine the regulation of the circadian system (e.g. [13, 22, 23]). Moreover, within these genes, we found 44 potential adaptive changes many of which located within the functional domains of the protein, namely in CRY3, PER1A, PER1B, PER2 and PER3 (Fig. 2, Fig. S4, Table S5). In CRY3 many changes were located inside the DNA Photolyase domain, one of the light-sensitive domains of cryptochromes. In PER proteins, some adaptive substitutions were located inside PAS domains (Fig. 2, Fig. S4) that serve for dimerization with CRY proteins [38, 39]. Several changes in PER2 and in PER3 are non-conservative that alter the charge of the amino acid, which can consequently alter the strength of protein-protein interaction. Moreover, these changes can impact protein structure as certain amino acids have a propensity for specific structural arrangements [47]. Although there are more cry and per genes (10) from the negative loop than clock and bmal genes (6) from the positive loop, the proportion of sites under positive selection in the negative loop genes is larger than expected if sites were randomly distributed (Table S11). A possible explanation is that genes of the positive loop are under stronger selective constraints due to purifying selection, however this is not fully supported by our results that indicate more sites under negative selection on cry and per genes (Table S9). Another possibility is that due to a higher number of paralogous genes in the negative loop these could have evolved to respond to different environmental stimuli, which is supported by our results on predicted protein interactions (Table S3). The fact that we find positive selection in the negative elements is in line with previous studies showing that cry and per genes are important to integrate stimuli other than light in fish (e.g. mutations in PER2 protein were important for the adaptation of blind cavefish to its environment [48]). Taken together, these results suggest that circadian system genes were involved in Iberian Squalius species adaptation, which occurred mostly by changes in the negative elements of the circadian system.
More genes under positive selection in populations under Mediterranean climate
Based on dN/dS we found evidence of positive selection in all species using different tests (Table 1, Tables S4-S6). The site level tests indicate that signatures of positive selection are present mostly in southern populations (S. torgalensis, S. aradensis and S. pyrenaicus from Almargem, Fig. 2 and Tables S5) that are under the influence of the Mediterranean climate type. These results indicate that circadian genes can be involved in adaptation of these species to their specific environments. The circadian system is indirectly related to regulation of many physiological and metabolic aspects that affect the response of organisms to environmental stimuli, and hence these signatures of positive selection can be related to adaptation due to several environmental factors. Despite light stimuli, temperature is important for the proper maintenance of the circadian system in fish [21–23], and temperature is known to impose strong selective pressures in ectothermic species [49, 50]. Thus, present-day and past differences in temperature between river drainage systems inhabited by Squalius species can explain our results.
There are several lines of evidence supporting the hypothesis that temperature is a key selective pressure. First, populations with more genes and sites with signatures of positive selection inhabit a region influenced by a Mediterranean climate with higher water temperatures (Fig. 2, Table S5). Second, evidence from protein analysis shows differences in predicted protein thermostability between species in different climatic types (Table S10). A higher protein thermostability can be achieved either by (i) increasing the aliphatic index AI [34, 35], (ii) increasing the strength of ionic interactions [51], or a combination of these mechanisms. We found that in 55 putatively adaptive changes of the core circadian genes, 13 of them (in CRY3, PER1A, PER1B, PER2, PER3, BMAL2, and CLOCKB) have a potential impact on protein aliphatic index (AI), 15 of them (in CRY1AA, CRY3, PER1A, PER2, PER3, BMAL2, CLOCKA, CLOCKB) on isoelectric point (pI), and 8 of them in both AI or PI (in CRY3, PER1A, PER2, BMAL2, CLOCK2 and CLOCKA), and therefore can have a direct effect on protein thermostability (Table S5). For these proteins, the analysis of predicted AI and pI showed differences between the Squalius species inhabiting the Atlantic and Mediterranean climate types (Tables S10), suggesting that differences in protein thermostability can result from adaptation to different water temperatures. Moreover, approximately one third of the sites inferred to be under negative selection were on codons encoding for aliphatic amino acids (Table S9), which are associated with protein thermostability.
Last, signatures of positive selection were found mostly on cry and per genes, circadian genes that have been shown to regulate temperature integration within the circadian system in fish (Table S1) [21, 22, 24, 25]. For instance, we found signatures of positive selection and/or differences in adaptive substitution rate in lineages from different environments (Table 1) in cry1ba, cry2, per1b and per3, which are four genes known for integrating temperature within the circadian system in D. rerio (Table S1) (see also [21]). For instance, for PER3 protein we detected positive selection at site K429Y in the PAS domain, with an amino acid change in S. carolitertii and S. pyrenaicus from Tagus predicted to affect the isoelectric point (Fig. 2, Table S5), reflected in differences in pI between Squalius from northern and southern river basins (Table S10). Another clear example is per1b, for which S. torgalensis has the PER1B with the highest AI and pI (Table S10) and we found signatures of positive selection in S. torgalensis at a site associated with amino changes that increase AI (Fig. 2, Table S5), suggesting that selection led to increased thermostability. This protein has been shown to be important in D. rerio for the integration of temperature and light cues within the circadian system [21]. We also found signatures of positive selection in cry1aa and per1a, both found to change their gene expression in S. torgalensis and S. carolitertii when exposed to increased water temperature under controlled laboratory conditions [24, 25]. In PER1A, we found a mutation under positive selection in S. carolitertii at an important region of the protein (PAS domain, Fig. 2, Fig. S4, Table S5) that leads to an increase in the aliphatic index, and therefore increased protein thermostability. This is compatible with Jesus et al. 2017 [25], that observed a downregulation of the expression of this gene at higher temperatures in S. carolitertii. We speculate that such increase in protein thermostability could have been selected to properly function at higher temperatures with low expression levels, compensating the costs of over-expression of other proteins.
Evidence of adaptive convergence
In four genes (cry1ba, per1a, per3 and bmal2) the best topology for the inferred gene tree is congruent with the species tree with two main clusters (Fig1, Table 1, [28, 29]). This could indicate that these genes evolved neutrally during speciation. However, we found signatures of positive selection in three of them (per1a, per3 and bmal2) and differences in dN/dS ratio between the two clades in the four genes (Fig. 3a-b, Table S7), which might indicate that the evolution of these genes has been at least partially driven by natural selection, rather than exclusively by neutral divergence. However, the phylogenies we inferred using both the nucleotide and protein sequences indicate that for 9 out of 16 genes (cry1aa, cry1bb, cry3, per1b, per2, clockb, clock2 and bmal1a) the best topology clusters together, with high support, S. aradensis and S. pyrenaicus from Almargem (Fig. S2 and S3). According to the species tree these two species belong to two highly divergent lineages (Fig. 1) [28], and hence such a high proportion of genes with this clustering is unlikely due to neutral incomplete lineage sorting. Instead, this suggests a scenario of evolutionary convergence of populations inhabiting similar environments. In fact, Almargem and Arade are two basins from the south of Portugal influenced by Mediterranean climate, facing very similar environmental conditions (e.g., average water temperature and photoperiod). This pattern of sequence convergence in these genes and proteins may thus be a consequence of convergent adaptation.
The convergence in these genes and proteins matches at least partially the criteria for detecting evolutionary convergence proposed by Dávalos et al. (2012), namely: (1) evidence from sequences of functional parts of genes; (2) clear link between gene function and ecological conditions; and (3) evidence that selection is acting on target genes at different rates from other lineages. As we only obtained data from cDNA, we expect all the sequences to be from exons, and therefore, constituting functional parts of genes, confirming the criterium 1. As previously mentioned, most genes presenting this signal of convergence belong to CRY and PER family, and both cry and per genes have a demonstrated importance in the response to environmental temperature [21], therefore confirming the criterium 2. Based on dN/dS tests we could confirm the criterium 3 for cry1bb, cry3, per2 and clock2, since we found increased dN/dS in the lineages with evidence of convergence (Fig 3c, Table 1, Table S8). Furthermore, for per2 and clockb we found sites under positive selection with amino acid changes shared by S. aradensis and S. pyrenaicus from Almargem (Fig 3c, Table S8). For cry1aa, per1b and clockb, protein analysis supports a strong similarity between the physicochemical parameters estimated for S. aradensis and S. pyrenaicus from Almargem, pointing to a functional convergence at protein level (Table S10), even though for those genes signatures of positive selection were mostly on S. torgalensis. For bmal1a we did not find signatures of positive selection or increased dN/dS in convergent lineages, but the functional characterization of BMAL1A protein revealed similar predicted protein physicochemical patterns in S. aradensis and S. pyrenaicus from Almargem, pointing to functional convergence.
Scenarios of convergence have been described for other species at the morphological level [53–55], and at the molecular level [56–61]. Moreover, light was shown to be an important determinant in two studies that detected molecular convergence in fish, such as in: (1) the evolution of albinism linked to the Oca2 gene in two independent populations of the cavefish Astyanax mexicanus [56]; and (2) in functional evolution of Rhodopsin proteins in several fish species [62]. Here, we detected adaptive convergent evolution in freshwater fish in genes and proteins related to integration of visual and thermal stimuli within the circadian system, which are part of gene families with duplications. This raises the possibility that gene duplications can be important in convergent evolution, which could be further studied and tested in the future. Sequences from genomic DNA could be particularly informative and increase the power to investigate patterns of adaptive convergence, for instance by comparing patterns of genetic divergence at exons and introns.