OR gene numbers of marine mammals are significantly lower than those of terrestrial mammals
We identified a total of 12,711 intact OR genes from the full genome data of eutherian mammals (including 11 marine mammals and 11 terrestrial mammals) and the outgroup opossum genome based on protein sequence similarity and homologous relationships. Fig. 1A and Additional file 2: Table S1 show detailed information on these results. We found that the OR gene numbers in marine mammals were significantly lower than those in closely related terrestrial mammals (Fig. 1B, Mann-Whitney U test, p value = 2.84×10-6), which is consistent with previous reports. Furthermore, the number of OR genes in cetacean species (14~61) was very different from the number of OR genes in terrestrial mammals (484~1,680). In Pinnipedia and Sirenia, the number of OR genes was relatively high, with 217~369 OR genes in Pinnipedia and 438 OR genes in manatee, although the number was still significantly lower than the OR gene number of terrestrial mammals.
The number of OR genes varied greatly among different species (Fig. 1A). Among the 23 species we analyzed, African elephants had the largest number of intact OR genes (1,841) and pseudogenes (2,462), and the number of OR genes and pseudogenes was more than twice higher than that of close relatives. This result is basically consistent with previous studies [14]. Similarly, the proportion of OR pseudogenes also varied greatly among different species (Fig. 1A, Additional file 2: Table S1). Killer whales had the highest proportion of OR pseudogenes (75%), and Yangtze River dolphins had the lowest proportion of OR pseudogenes (15%). As shown in Fig. 1C, significant correlations were not observed between the proportion of OR pseudogenes in each genome and the number of intact OR genes (Pearson correlation coefficient r = 0.039; p value = 0.86). Therefore, the proportion of OR pseudogenes cannot be used to predict the number of intact OR genes for a particular genome. In contrast, the absolute number of OR pseudogenes was positively correlated with the number of intact genes (Fig. 1D, r = 0.874; p = 4.99×10-8).
The OGG numbers of marine mammals are significantly lower than those of terrestrial mammals
In this study, we obtained a total of 1,111 OGGs, of which 281 OGGs contained only one OR sequence; thus, in subsequent analyses, we only used the 830 OGGs containing at least two sequences. Based on the principle of similarity to intact OR gene sequences, we also classified all truncated genes and pseudogenes into the 830 OGGs (see Materials and methods for details). According to the definition of orthologs, in an OGG, all genes are derived from the most recent common ancestor (MRCA). Therefore, we speculate that there are approximately 830 intact OR genes from the MRCAs of the studied marine mammals and their closely related terrestrial mammals. These genes varied among different species due to gene gains and losses.
As shown in Fig. 2A and 2B, most OGGs contained a small number of OR genes and pseudogenes. Among the 830 OGGs, the average and median numbers of intact genes per OGG were 15.0 and 11, respectively, and for pseudogenes, the mean and median numbers were 14.9 and 7, respectively. We also calculated the average sequence similarity among different genes in the same OGG, and the majority showed 80%~90% similarity (Fig. 2C). The similarity was relatively low in large OGGs and relatively high in small OGGs. For all OGGs, the number of intact OR genes was positively correlated with the number of pseudogenes (Fig. 2D, r = 0.886, p value < 2.2×10-16). That is, OGGs with more intact genes possessed more pseudogenes. In the same OGG, although the OR gene sequences were relatively conserved, the OR gene number was relatively highly variable among species. To investigate the differences in the OR gene numbers of different species, we compared the relationship between the standard deviation and total number of OR genes in each OGG and found that they were significantly positively correlated (Fig. 2E, p value < 2.2×10-16). In other words, smaller-sized OGGs were correlated with smaller differences among species, which indicates that large-scale OGGs generally tend to be subject to an extreme form of birth-and-death evolution [20, 23]. This pattern is more common in gene family evolution, and this phenomenon is mainly caused by tandem gene duplication [24].
Simionato et al. [25] reported that gene family size does not generally reflect the evolutionary diversity of gene families, such as the tyrosine kinase family and the basic helix-loop-helix family [25-27]. Therefore, we tried to explore the difference in the OR gene numbers between marine and terrestrial mammals resulting from gene-specific duplications or increased numbers of gene gains and losses. As shown in Fig. 2F, we compared the OGG numbers among 23 species. The number of OGGs also varied greatly among different species and ranged from 13 to 541. Significant differences were observed in the number of OGGs between marine and terrestrial mammals (Mann-Whitney U test, p value = 5.53×10-5). Then, we selected the 20 largest OGGs and found that there were large numbers of species-specific duplications in these OGGs. For instance, more than 30 members were included in OGG2-2, OGG2-5, and OGG2-10 in elephant and OGG2-17 in cape golden mole.
OR genes experienced gains and losses under weaker evolutionary constraints
The sizes of some OGGs are very large, indicating that some ancestor OR genes experienced large numbers of duplications in certain mammals (as shown in Fig. 3A, B). OGG2-1 contained the largest number of intact OR genes (128), particularly in opossum, cape golden mole and elephant (>20 intact OR genes), and OGG2-2 was the second largest OGG and contained 119 intact OR genes, with the most OR genes in elephant (43). The phylogenetic analysis indicated that these large OGGs originated from a large number of independent gene gains and losses among different species (Fig. 3C, D). Comparing the distribution of marine and terrestrial mammals in different OGGs, we found that the loss of the ancestral OR gene occurred in different marine lineages. For example, in OGG2-1, cetaceans lost two of their four ancestral genes, and only one of the remaining two ancestral genes was retained by different cetacean species. One gene was also lost in the ancestral state in Pinnipedia (Fig. 3C). For OGG2-2 in the cetacean lineage, only the minke whale retained an intact OR gene, while all genes were lost in the other species; moreover, two of these OR genes were lost in the ancestors of Pinnipedia (Fig. 3D). Additionally, OGG2-5 contained the largest number of pseudogenes (171).
We calculated the species-specific gain and loss rates for each OGG in each species and considered the phylogenetic relationships among species, which represent the extent of branch-specific gene gains or losses in the 23 mammals. The results indicate that specific gene gains were frequent in elephant and opossum, and genes were often lost in marine lineages, especially in cetaceans [13].
Then, we used the maximum likelihood method in PAML 4.9 to estimate the nonsynonymous/synonymous replacement rate (ω value) of each OGG. This value reflects the extent of purifying selection. In a comparison of the Class I and Class II genes, the former was found to be significantly smaller than the latter (Fig. 4A, p value < 6.3×10-12), indicating that the Class II genes are more dynamic than the Class I genes during evolution. As shown in Fig. 4B, no significant difference was found in a comparison between the estimated ω values of OGGs containing marine mammal genes and marine mammal-free OGGs. The estimated ω values of OGGs containing the three marine lineages were indistinguishable from the other OGGs or all OGGs (Additional file 1: Figure S1). This finding may be due to the small number of marine OR genes, which were easily overwhelmed by the background branching noise of the OGGs. The estimated ω value was also positively related to the number of intact OR genes in the OGGs (r = 0.129, p value = 1.24×10-3) (Fig. 4C). The estimated ω value of each OGG was positively correlated with the number of gene gains in the OGG (r = 0.221, p value = 2.51×10-8) (Fig. 4D). Moreover, the estimated ω value of each OGG was also positively correlated with the number of gene losses in the OGG (r = 0.253, p value = 1.56×10-10) (Fig. 4E). These analyses suggested that OGGs having undergone more gene gains or losses are often under weaker evolutionary constraints.
OR genes in marine mammals are not evolutionarily conserved
Among the 830 OGGs, we did not find any OGG containing OR genes from all 23 mammals, indicating that the OR genes showed evolutionary diversity between marine and terrestrial mammals, and this phenomenon may be related to differences in their environments. Moreover, we also failed to find OGGs containing the genes of all species from the three marine lineages. We found two OGGs (lost in one or more species) containing a single copy of each species, i.e., OGG1-22 and OGG1-23. As shown in Fig. 5A, OGG1-22 was lost in the Weddell seal and pig but presented as a single copy in other species, and no pseudogenes were found in this OGG. However, the phylogenetic analysis of this OGG did not exhibit a topology similar to that of the species tree, indicating that this gene was not very evolutionarily conserved. As shown in Fig. 5B, OGG1-23 did not contain minke whale and sperm whale genes and presented as a single copy in the other species, with two opossum pseudogenes. The phylogenetic analysis revealed that the members of this OGG exhibited a topology similar to that of the species tree (Fig. 5C), indicating that genes in this OGG were truly orthologous among species, and no gene gain and loss events occurred during evolution. In other OGGs, different degrees of gene gains and losses occurred. No OR orthologous genes, including the above two OGGs, were found in all marine mammals, indicating that the methods of OR degradation or loss in different lineages are not the same.
Marine mammals show a lower rate of gene gains but a higher rate of gene losses than terrestrial mammals
During the evolution of marine mammals and their closely related terrestrial mammals, we estimated the OGG gain and loss rates of 830 OR genes on each branch. Consistent with previous studies, large numbers of gains and losses occurred in different branches [14, 16] (Fig. 6). Thus, although two species may have similar numbers of OGGs or genes, they may have very different OR repertoires. For example, both bowhead whale and minke whale had approximately 50 OGGs, whereas only 19 OGGs were common (< 40%). The cape golden mole and cape elephant shrew presented 327 shared OGGs, which accounted for approximately 76% of all OGGs. Additionally, each of the 23 mammals clearly lost hundreds of intact OR genes present in MRCAs, although the number of genes lost in the three marine lineages was greater than that lost in the terrestrial lineages (Fig. 6, Mann-Whitney U test, p value = 5.56×10-5). There were 99 intact OR genes in the cetacean MRCA, approximately 88% of which were lost; there were 397 intact OR genes in the Carnivora MRCA, approximately 52% of which were lost; and there were 282 intact OR genes in the MRCA of Pinnipedia, approximately 66% of which were lost.
We also estimated the gain (β) and loss (δ) rates of the OR genes in each species. β and δ were defined as the number of gene gains or losses per million years (MYs), respectively, and it was assumed that these two indices were constant on each branch. We calculated the β and δ of each species by the calculation method of Niimura et al. [14] (Fig. 7). The results show that β was largest in African elephants, which is consistent with the results of Niimura et al. [14], while almost no gene gains occurred in the cetacean branches (β was between 0 and 0.0002). The β values in marine mammals were significantly lower than those in terrestrial mammals (Mann-Whitney U test, p value = 8.86×10-5), and the δ values in marine mammals were significantly higher than those in terrestrial mammals (Mann-Whitney U test, p value = 5.46×10-5). During the evolution of marine mammals and their related species, the average β and δ were 0.0016 and 0.0088 (gene per MYs), respectively (Fig. 7). The former was consistent with the average gene family size change rate (turnover, including gain and loss) previously reported in mammalian genes, i.e., 0.0016 per gene per MYs [28]; however, the latter was much larger. To compare the differences between marine and terrestrial mammals, we compared the average β and δ in both types of mammals. The mean β and δ in terrestrial mammals were 0.0032 and 0.0030 (per million per gene per year, Fig. 7), respectively, which were both higher than 0.0016, while the average β and δ in marine mammals were 0.0004 and 0.0169 (gene per MYs), respectively, where the gain rate was significantly lower than the average, and the loss rate was much higher than the average (approximately 10 times).