VNO Sequencing, Assembly & V1R Recovery
Table 1. VNO transcriptome assembly statistics, V1R transcript recovery, and genome annotations. The mouse reference genome is shown for comparison (M. m. domesticus, top). Recovery estimates combining short and long-read datasets for M. spretus are indicated in bold.
aGRCm38.p6, bGenBank accession#: QGOO00000000, cSPRET_EIJ_v1.1, dCAROLI_EIJ_v1.1, ePAHARI_EIJ_v1.1, fORF: open reading frame
We characterize V1R repertoires for five Mus species of varying evolutionary distance from the house mouse (1.5–7 mya, Figure 1) by sequencing their VNO transcriptomes using short-read platforms. The final transcriptome assemblies for each species are of good quality (Table 1).. We detect approximately twice the number of V1Rs than are currently annotated in the genomes of M. spretus, M. caroli, and M. pahari and provide the first M. macedonicus V1R dataset (Table 1).. The number of V1Rs identified in M. spicilegus is in good agreement with existing genome annotations (Table 1).. For one species (M. spretus),, the short-read sequencing was performed at greater depth, and an additional round of long-read sequencing was done. This allows us to examine the effectiveness of short versus long-read sequencing for assembling large and highly duplicated gene families such as V1Rs. The total number of assembled transcripts is greater for the M. spretus short-read dataset, as expected from greater sequencing depth (Table 1).
On average, 126 V1R transcripts are recovered from each species’ short-read assembly (Table 1).. A subset are transcript variants or gene duplicates, with homology to the same gene in the mouse reference genome (GRCm38.p6). Although the majority of V1Rs are single-exon genes, a substantial number of V1Rs contain introns and express transcript variants in the house mouse , we similarly detect transcript variants among the non-commensal species sequenced (Table 1 & Figure 2).. For a conservative estimate of V1R genes, only unique transcript annotations are included (Table 1).. When putative gene duplicates are added, the number of V1R genes increases markedly (Table 1).. Compared to the house mouse, the 5 Mus species sequenced have smaller V1R repertoires, consistent with V1R gene expansion in the house mouse (Table 1).. However, the addition of long-read sequencing for M. spretus increases the number of V1Rs genes detected, resulting in a repertoire size similar to the house mouse (Table 1).. Therefore, whereas the M. spretus V1R repertoire is likely close to complete, long-read sequencing may detect additional V1Rs in M. spicilegus, M. macedonicus, M. caroli and M. pahari. Importantly, our analysis of V1R evolution in Mus is based on (1) a well-annotated mouse reference genome, (2) a comprehensive M. spretus V1R dataset, and (3) >100 V1Rs for all 6 Mus species. Therefore, small gaps in detection across the entire V1R family should not bias the broad patterns of V1R evolution reported here. Furthermore, discrepancies in species’ repertoire size appear to be largely accounted for by a house mouse specific V1R gene expansion, discussed in further detail below.
V1R Evolution Across Mus Species
To explore V1R evolution, we characterize which receptors share a common ancestor (i.e. are orthologous) by examining relationships within a V1R gene tree containing all six Mus species (5 sequenced species and the house mouse reference (Additional File 1).. A subset of receptors for each non-commensal mouse species do not exhibit a clear orthologous relationship to any V1R annotated in the mouse reference genome and are classified as non-orthologous genes (Figure 2).. Similarly, a set of receptors annotated in the mouse reference genome was not detected in any other species (Figure 2)..
We classify V1Rs into three broad categories based on their orthologous relationships: (1) V1Rs present only in the mouse reference genome, (2)non-orthologous V1Rs, and (3) V1Rs with orthology across multiple species. V1Rs with orthology across multiple species are further categorized based on the number of species represented in each orthologous receptor group (orthogroup). Orthogroups with 2–3 species were classified as “low orthology,” and orthogroups with 4–6 species as “high orthology”Figure 2A).. The majority of transcripts have some evidence for orthology (88.5%, Figure 2A).. Furthermore, most transcripts are highly orthologous (74.8%, Figure 2A),, indicating that missing V1Rs are unlikely to bias broad patterns identified here. Although many receptors are shared across species, approximately 25% of all V1R transcripts, and 59% of all unique V1R annotations, are either low orthology, non-orthologous, or present only in the mouse reference genome (Figure 2A).. This indicates that the dramatic VR gene turnover observed among more divergent mammalian species, such as across tetrapods or between rodent species [19, 21, 23], is replicated within the genus Mus, albeit on a more limited scale.
We next examine the presence of gene duplicates and transcriptional variation across species (Additional File 2).. A similar proportion of V1R gene duplicates are identified across all 5 species (10–16%, Figure 2B).. The proportion of V1R transcript variants detected is also similar across species, with the clear exception of M. spretus (Figure 2B).. As expected, the addition of long-read (M. spretus) sequencing data recovers many more transcript variants than short-read sequencing datasets (Figure 2B).. Interestingly, the same number of V1R genes expressing distinct coding transcript variants are detected in M. spretus as in the house mouse (43 V1R genes, Additional File 3: Figure S1).. However, the identity of V1Rs exhibiting alternative spliceforms, and the clades they belong to, vary between the two species (Additional File 3: Figure S1).. In contrast, the proportion of gene duplicates detected is similar between M. spretus and the other species. This indicates that, for gene families such as V1Rs, short-read datasets are sufficient for identifying gene duplicates.
Table 2. V1R gene losses in the house mouse. Species with expression indicated with different colors: M. spicilegus, M. macedonicus, M. spretus, M. caroli and, M. pahari.
Our characterization of V1R repertoires across Mus species allows for a reliable estimate of V1R gene loss in the house mouse. We detect evidence for 10 such V1R gene losses, distributed across six clades (Table 2 & Figure 3A:indicated in red text). All V1R genes lost in the house mouse are present in at least 3 of the 5 non-commensal Mus species examined, including close relatives (Table 2).. Most gene losses have corresponding pseudogenes in the house mouse reference genome (Table 2).. It appears gene losses are relatively uncommon compared to the abundant gene gains, at least within the house mouse lineage.
Novel V1R Clade: Clade “N”
In addition to the house mouse gene losses observed in clades E, C, H, I and G, we identify a novel V1R clade (Table 2, Figure 3A).. This novel clade “N” has been lost in the house mouse and consists of two receptor orthogroups. Both clade N receptors (Vmn1r248 and Vmn1r249) are expressed in at least 3 non-commensal Mus species (Additional File 3: Figure S2) and have corresponding pseudogenes in both the house mouse (M. m. domesticus) and the rat (Rattus norvegicus)..
Variable Patterns of Evolution Among V1R Clades
Patterns of V1R gene orthology and duplication vary across clades. Four of the 11 V1R clades are highly orthologous (E, F, J/K and L: >75% of receptors are high-orthology), with clade G trailing just behind with a few more non-orthologous receptors (Figure 3A, B).. All 5 of these clades have 5 or fewer gene duplicates detected, however, the proportion of duplicates by clade size is variable (Figure 3C, D).. Clades E, F and G have very low proportions of gene duplicates, while clade J/K has among the highest (Figure 3C)..
[Insert Figure 3 here.]
Clades C, D and H have abundant low-orthology and non-orthologous receptors (Figure 3B),, indicating greater evolutionary lability. While most orthologous relationships are straightforward, some orthogroups contain multiple house mouse receptors, and are annotated with combination-IDs (e.g. Vmn1r25/30).. These receptor groups are the result of one or more duplication events within the Mus lineage, and are unequally distributed across clades, with 76% located in clades C and D (Figure 3A).. In addition, all reference-only V1Rs are located in these same two clades (Figure 3B).. Not surprisingly, clades C, D and H have the highest number of detected gene duplicates (19 or more) and have similarly high proportions of duplicates by clade size (Figure 3C, D).. Thus, all three clades have evidence for substantial gene expansions, particularly clade D within the house mouse lineage.
[Insert Figure 4 here.]
We examine V1R clade sizes across all 6 species. With the striking exception of clade D, the house mouse clade sizes are very similar to the 5 other species, (Figure 4).. This general pattern provides further evidence that receptor recovery is high and species’ repertoires are near complete. Interestingly, the M. spretus repertoire is largest for several clades (A/B, C, E, H and I; Figure 4),, indicative of M. spretus-specific gene expansions.
The size ranges of two clades (A/B and D) are skewed by the house mouse and M. spretus datasets. Both species have much larger clade D repertoires than the other 4 species, exposing this clade as a potential hotspot for recent gene duplications (Figure 4).. While the current data suggest these expansions are unique to M. spretus and M. domesticus, additional long-read sequencing might reveal comparable patterns in the other species. In contrast, there are several clades which exhibit low variation in repertoire size across all species’ datasets (E, F, G, J/K, L, N). Furthermore, clades C and H display variation in repertoire size across all 6 species, providing evidence for species-specific V1R gains and losses in multiple Mus lineages (Figure 4)..
Guided by the evolutionary patterns observed across clades, we identify and categorize receptors as interesting candidates for further functional work based on striking patterns of conservation or divergence (Additional File 3: Table S1).. We hope this list will help guide future efforts to deorphanize V1Rs.
Fast-Evolving Clades & Lineage-Specific Expansions
Clade H appears to be a mouse-specific V1R expansion, as it’s absent in the rat genome . The clade is characterized by patterns of low orthology, abundant gene duplicates, and variable repertoire size across species (Figures 3 & 4).. A sub-region of clade H containing Vmn1r217, 219 and 220 receptors exemplifies the pattern of low orthology, while the receptor ortholog group Vmn1r206/209 is representative of the abundant gene duplicates (Figure 5A).. A striking exception to the patterns of dynamism observed in clade H is the extremely conserved receptor group Vmn1r197 (Figure 5A).. The general pattern of low orthology and the rapid species-specific gene gains and losses, suggests that clade H receptors may play an important role in species-specific chemosensation.
Clade C is the largest V1R clade in all non-commensal species examined in this study, exhibiting variable repertoire sizes suggestive of lineage-specific evolution (Figure 4).. This inference is supported by the large numbers of combination-ID ortholog groups, gene duplicates, non-orthologous receptors, and house mouse-specific gene gains (Figure 3).. The phylogenetic structure of clade C comprises three sub-clades. Interestingly, the non-orthologous receptors are largely clustered in one sub-clade, suggesting different rates of receptor evolution exist within clade C (Figure 5B).. Two clade C receptors, Vmn1r9 and Vmn1r10, have been implicated in pup odor detection in house mice . However, these receptors also respond to female odors, and thus may detect chemosensory components of the nest environment . These two receptors are part of a single receptor orthogroup (Vmn1r9/10) that is both orthologous and highly duplicated (Figure 5B).. The sister group Vmn1r7/8 exhibits a similar pattern of high orthology and abundant duplication (Figure 5B).. Given the potential role of Vmn1r9/10 receptors in pup odor detection, and the lineage-specific evolutionary patterns observed in Vmn1r7/8 and Vmn1r9/10, these receptor groups are interesting candidates for future functional tests of their role in conspecific chemosignaling.
[Insert Figure 5 here.]
Clade D exhibits a large skew in repertoire size within the house mouse (Figure 4),, and has the most dramatic patterns of non-orthology of all V1R clades (Figure 3B).. Nearly all reference-only V1Rs (50/53: 94%) are located in clade D, providing further support for a large recent gene expansion in the house mouse (Figure 3B).. These receptors are similar in sequence and cluster together on chromosome 7, consistent with recent tandem gene duplication. While we did not find evidence for a comparably large expansion in the non-commensal species, we recover approximately twice as many clade D receptors in M. spretus relative to the other four species (Figure 4).. It is possible that similar expansions exist in the other species that are not detected here. Additionally, clade D has a high proportion of non-orthologous receptors and gene duplicates. Given the evolutionary labile nature of clade D, there are a few rare conserved receptors that stand out: V1rd19, Vmn1r179 and Vmn1r172/173/174 (Figure 5C).. While no functional data currently exists for these receptors, their distinct evolutionary pattern within clade D suggests they are under purifying selection, and are worthy of further investigation.
Conserved Clades & Female-Specific Odor Detection: Clades E & F
A subset of V1R clades are highly conserved, and thus good targets for uncovering receptors with conserved olfactory functions. Clades E and F are characterized by high orthology (Figure 3B),, long internal branch lengths and short terminal branch lengths, suggestive of old gene duplications maintained within the Mus lineage (Additional File 3: Figure S3).. In contrast, very few recent gene duplications are detected (Figure 3C, D).. A subset of 5 clade E receptors are important for the detection of female-specific urine odors in house mice . Two clade E sub-regions containing all 5 receptor groups are shown in Figure 5D, E; those with the strongest support for female odor detection are highlighted in red (Vmn1r69 and Vmn1r185)35]. Vmn1r68 and Vmn1r69 are sister to each other in the gene tree and are highly orthologous, however, Vmn1r69 has no orthologs detected among the more basal species (M. caroli and M. pahari; Figure 5D).. It is plausible that Vmn1r69 is the result of a gene duplication event preceding the divergence of the four more derived species (Figure 5D),, providing enhanced specificity or sensitivity toward female-specific urine odors. The second clade E sub-region contains receptors: Vmn1r184, Vmn1r185, and Vmn1r71. Vmn1r184 and Vmn1r185 are sister receptor groups, in which Vmn1r185 is highly orthologous and Vmn1r184 appears to be the result of a recent duplication event (Figure 5E).. Interestingly, Vmn1r184 is detected in only the house mouse and M. spicilegus (Figure 5E).. Furthermore, M. spicilegus has evidence for a species-specific Vmn1r184 duplicate, and has an absence of Vmn1r185 expressionFigure 5E).. The distinct expression pattern of Vmn1r184 in M. spicilegus is noteworthy given this species’ unique social structure, which includes cooperative behaviors and social monogamy . In comparison, Vmn1r71 is highly orthologous (Figure 5E),, but displays remarkable transcriptional variability, most of which is located at either the C-terminus or N-terminus regions of the protein (Additional File 3: Figure S3).. Broadly, clades E and F display clade-wide patterns of conservation, with interesting lineage-specific receptor evolution within clade E surrounding sex-specific chemosignaling.
Clade J/K Evolution & Detecting Estrus Cues
Clade J/K is both the most orthologous clade and boasts one of the highest proportions of gene duplicates (Figure 3).. Thus this clade encompasses a unique mixture of conservation and divergence, in which there is very little gene loss but gene gains are abundant (Figures 3 & 5F).. Clade J/K is also the only clade for which a large number of the receptors have known ligands. In the house mouse, two of the four J/K receptors (Vmn1r85 and Vmn1r89) have been shown to detect estrus cues (i.e. sulfated estrogens) in female urine [35, 43]. The Vmn1r89 receptor group has evidence for short and long transcript types across Mus species (Additional File 3: Figure S4).. Many species have only one form detected, however, the house mouse and M. spretus express both forms as transcript variants of the same gene. While M. pahari appears to have distinct genes generating these two forms (Additional File 3: Figure S4).. The widespread detection of both transcript types, suggests they may facilitate the detection of distinct ligand (i.e. sulfated estrogen) features. This is particularly compelling given that in the house mouse, Vmn1r89-expressing VSNsdetect multiple sulfated estrogen molecules and are more broadly tuned than Vmn1r85-expressing VSNs . In comparison,the Vmn1r85 receptor group is highly conserved among the 3 Mus species most closely related to the house mouse (Figure 5F),, with the majority of substitutions concentrated in M. caroli and M. pahari (Additional File 3: Figure S5).. For both Vmn1r85 and Vmn1r89, the highest proportion of amino acid site changes detected across species occurs in extracellular regions (Additional File 3: Figure S6). The trend towards a higher rate of extracellular substitutions is consistent with a prior analysis of molecular evolution in 22 V1Rs, which demonstrated that most sites with evidence for positive selection are located in extracellular motifs .