Analysis of mys-1 elements via intra-ORF1 and intra-ORF2 PCR amplification
To ascertain the evolution of the understudied mys elements in cricetid rodents, we utilized the P. leucopus mys-1 LTR-retrotransposon as the focal point as it represents the only full-length mys element with nearly intact open reading frames. Intra-ORF1 PCR products using primer sets corresponding to nucleotides of P. leucopus mys-1 620-1100 (data not shown) and 671-1100 were consistent as amplicons among Peromyscus species (Fig. 2a) were of the expected size, albeit more intense in members of the P. maniculatus/P. leucopus clade, which includes P. gossypinus. This was confirmed upon substituting a higher quality P. gossypinus DNA sample for PCR (Fig. 2b). Additionally, amplicons were only faintly observed, but limited to, members of the Neotominae subfamily of cricetid rodents (Fig. 2b). Smaller sized amplicons were observed among Peromyscus, although considering that only full-length fragments were obtained via shotgun cloning, these size variants are potentially artifact bands, conformational variants, or primer-amplicon heteroduplexes. Additionally, we cloned and sequenced intra-ORF1 PCR products from Peromyscus and non-Peromyscus species (data not shown) and determined highly conserved sequences for primers to further enhance the verification for the presence of mys using intra-ORF1 PCR. Using the additional primers, an amplicon was generated for N. albigula (Fig. 2c) which is consistent with the restricted presence of mys-1 in Neotominae rodents. Upon generating an intra-ORF (1F-239R) PCR DNA library for N. albigula we obtained sequences, that when queried to the NCBI collection of databases using BLASTn, most closely matched P. leucopus mys-1 with 98% coverage (C) and 82% identity (I) consistent with the presence of mys-1 among all analyzed rodents classified in the Neotominae subfamily including both the Reithrodonotomyini and Neotomini tribes.
ORF2 intra-mys PCR among species of Peromyscus (Fig. 2d) generated intense bands of expected size for species of the P. leucopus/P. maniculatus clade, along with a main band of higher molecular weight in the sister species of P. truei and P. difficilis, and a main amplicon of higher molecular weight in P. eremicus and P. californicus of the subgenus Haplomylomys. This initial pattern is consistent concerted evolution of mys elements.
Identification of a novel mys-related (mysRS) family of retrotransposons.
Upon sequencing intra-ORF1 PCR-generated clones in Peromyscus maniculatus and Neotomodon alstoni (GenBank accession numbers ON778008-ON778016), we found all the P. maniculatus clones and two of four N. alstoni clones aligned well to mys-1, whereas the other two N. alstoni clones appeared similar but distinct from mys-1 (Fig. 3), seemingly representing a novel discovery of a family of mys-related sequences (mrs), that we refer to as mysRS. elements. The origin and evolutionary history of the mysRS element in contrast to mys-1 was therefore further investigated. Utilizing a primer set in which sequences were distinct from mys-1 but highly conserved between the two N. alstoni clones, PCR products of the expected size were observed among representatives of the Reithrodontomyini tribe (Fig. 4), seemingly absent from N. albigula. This suggests mysRS may be of more recent origin than mys. It was observed through a non-quantitative analysis, that the relative intensity of the PCR products was more equivalent among Reithrodontomyini in contrast to mys-1 suggesting less variation in mysRS sequences among the genera.
Individual intra-mysRS clones from N. alstoni, P. leucopus, R. fulvescens, and O. leucogaster (GenBank accession numbers ON677500-ON766521) were sequenced and a molecular phylogeny was generated (Fig. 5), with inclusion of the corresponding region of the P. leucopus mys-1 element (accession no. X02855.1). From these data, it is highly conceivable that mysRS and mys represent distinct cricetid retrotransposon families. Some clustering is observed among N. alstoni intra-mysRS clones, which might have been the result of PCR bias due to using N. alstoni-based primers. Alternatively, there may be some level of concerted evolution, with amplified products representing both younger and older elements.
Estimation of copy numbers of mys and mysRS families in cricetid rodents
Estimated copy numbers were determined for mys and mysRS families of retrotransposons using quantitative dot blots, which were performed to provide a general picture of the level of genomic amplification of these elements to possibly assess activity over time. In turn, this would provide further information regarding the evolutionary time of origins of the elements, as mys was previously determined to date back to a shared ancestor with the Sigmodontinae rodent subfamily (Wichman et al. 1985). Probes used were gel-isolated inserts from clones that were reamplified to avoid cross hybridization of standards that would result from plasmid sequences. ORF1 analysis (Fig. 6a) using a mys probe from P. gossypinus, as well as ORF2 analysis (Fig. 6b) using a mys probe from P. polionotus appeared to show over 20,000 copies of mys-1 among Peromyscus species, with either fewer, or more divergent, elements in non-Peromyscus genera of the Neotominae subfamily. The restriction of mys-1 to members of the Neotominae subfamily of Cricetidae is consistent with the PCR results (Fig. 2). Additionally, there was no apparent cross-hybridization to mysRS elements further depicting distinct LTR-retroelement families. A control dot blot was constructed (Fig. S1) using guinea pig L1 (ORF2 region) to further verify that roughly similar amounts of DNA were loaded on the filter, as the guinea pig is most distantly related to all the other analyzed rodents (Fig. 1a). As anticipated, the guinea pig presented the most intense band based on concerted evolution of L1 sequences. Some variability was observed regarding signal intensities with the L1 probe albeit generally in a similar range among analyzed rodents including among Peromyscus, and therefore the results should be considered rough estimates. The non-Peromyscus Neotominae rodents displayed dots of intensities corresponding to the lower range found among the Peromyscus species, and there was clear hybridization of the L1 to S. hispidus, along with similarity of signal for O. palustris to those of the non-Peromyscus Neotominae. Along with the signal observed for the gerbil in the control blot, this additionally supports the lack of hybridization of the mys and mysRS probes to non-Neotominae rodents. The cross hybridization of guinea pig L1 ORF2 to high amounts of the mys ORF2 standards demonstrated that there is some level of similarity of the L1 ORF2 to mys ORF2, which could result in an overestimation of the number of mys elements in our study.
The mysRS quantitative dot blot (Fig. 6c) displays hybridization to N. albigula, which in contrast to the intra-mysRS PCR results, indicates mysRS is not limited to the Reithrodontomyini tribe but as with mys is found in all analyzed Neotominae genera. It appears that mysRS copy numbers are roughly 5,000-10,000 with less of a difference to the non-Peromyscus Neotominae than observed for mys.
DNA sequence analyses to identify additional full-length mys elements and further assess its evolutionary history
The discovery of mys-1 also included the previous finding of ten additional elements characterized as mys, based on restriction maps of clones isolated via screening a P. leucopus genomic library with a mys-1 probe (Wichman et al. 1985). However, beyond low stringency Southern blots indicating that mys-1 subfamilies have been active throughout the evolution of Peromyscus species (Lee et al. 1996), there is a lack of additional support at the DNA sequence level. We therefore queried the full chromosomal assembly of P. maniculatus, a related species (Fig. 1), available in Ensembl, using BLASTn, which generated 3,199 hits at E = 1e-50. Among these, we identified five sequences (Fig. S2) corresponding to nucleotides 20 or 21 up through 2059 of P. leucopus mys-1, with additional similarity further downstream. We analyzed the sequences for LTRs and FDRs, and four were full-length elements, with possibly a fifth, but sequences downstream of the element were unavailable for verification. Of note, each contains a small insertion in relation to the P. leucopus mys-1 within ORF2 explaining the loss of corresponding sequences at roughly nucleotide 2059. A ClustalW alignment similarity matrix showed a range of 76%-86% sequence identity to mys-1, and these sequences were further verified as mys via a screen of Repbase (Jurka, 2000), with the indel being the result of an insertion of a segment related to the ERV2-N1_Mio-I elements. The P. maniculatus sequences displayed 80-90% sequence identity to each other. The Pm mys 23-3312091 element (Fig. 7), labeled based on chromosome number and approximate location, contained the longest ORF of 162 amino acids, and when translated to protein to screen protein databases in NCBI via BLASTp, the only match was to a sequence referred to as ORF1 Pl (accession 1110236A). A screen using BLASTp queried with the ORF2 region yielded a match to ORF2 Pl (1110236B). ORF1 Pl and ORF2 Pl sequences correspond to the P. leucopus mys-1 translated ORFs, and therefore mys proteins do not correspond to that of any known retrotransposon. We did not identify any full-length mys-1 elements in P. eremicus and P. aztecus, but this could be a result of using low coverage genomic libraries. We did, however, identify numerous hits to the LTR sequences, as well as to either ORF1 or ORF2 but not both (see mORF1 and mORF2 sections). We translated and performed a BLASTp screen of a small ORF associated with the reverse complement of two P. maniculatus mys-1 loci which each yielded a very weak hit to a Cricetulus griseus putative pol gene (61% C, 44%I E = 0.017; 68%C 47%I E = 3e-05). Therefore, there is an apparent paucity of full-length mys elements in Peromyscus genomes.
To address the presence of full-length mys-1 in other genera of cricetid rodents and to gain insights into its origin and evolution, we screened the golden hamster and prairie vole assemblies in Ensembl, as well as performed an overall screen of the NCBI database collection with the full-length Pm mys 23-3312091 (Pm mys 23) element. We obtained at E = 1e-50, 3134 hits among the P. maniculatus assembly, albeit just five were greater than 2,000 bp, and most roughly corresponded either to the first half or the second half of the element. At 1e-10, the 5,000 hit maximum was obtained although the matches were similar. At 1e-50, no matches were obtained for the prairie vole as well as for the hamster. At 1e-10, 876 matches were obtained for the hamster, albeit the largest segment was 285 bp, and although the 5,000 hit maximum was obtained for the prairie vole, the maximum length was only 279 bp, with very few close to that length. We therefore narrowed down the screen to ORF1- and ORF2-based sequences of Pm mys 23. For ORF1 at 1e-50 there were 779 hits of full-length or near full-length ORF1 sequence matches in the P. maniculatus screen, with no matches to prairie vole or golden hamster. When reducing the expected value to 1e-10, the number of matches to the deer mouse increased to 2509 with again most being full or near full length ORF1 sequences, whereas at 1e-50 there were no hits to both prairie vole and golden hamster, and at 1e-10 338 partial hits to the prairie vole and 88 partial hits to the golden hamster. Querying the databases with Pm mys 23 ORF2 at 1e-50, 1796 full or near full-length ORF2 hits aligned to the deer mouse with none to hamster or prairie vole, and when reduced to 1e-10, 2539 hits aligned to deer mouse with again no matches to the prairie vole or hamster. For a more broad-based analysis we used Pm mys 23 to screen the NCBI databases using BLASTn. The top hits, besides to the
P. leucopus mys-1 sequence were to Onychomys torridus. There were 4,341 matches to the combined 14 chromosomal assemblies available for O. torridus, with the top matches from each being full or near full length at E=0. These results are consistent with our PCR and dot blot data supporting the limitation of the mys LTR-retroelement to the Neotominae subfamily of cricetid rodents. The best match (referred to as Ot (Pl) mys 11; with 11 indicating the O. torridus chromosome and Pl indicating that the P. leucopus mys-1 was used for screening) displayed 85% similarity aligning to P. leucopus mys-1 for nucleotides 85-2048, though further analysis showed high similarity to mys-1 downstream. Other matches displayed 80% or less sequence identity and therefore not classified until further distinguishing mys and mysRS elements (next section). The Ot (Pl) mys 11 element was added to the alignment of the P. leucopus mys-1 element and the five P. maniculatus full-length or near full-length elements, with LTRs, ORF1, ORF2 and FDRs depicted in Fig S2.
Identifying full-length mysRS elements
To identify full-length mysRS elements, we utilized the intra-ORF1 N. alstoni sequences to screen the available P. maniculatus genetic database in Ensembl and were able to identify and isolate full-length elements based on inclusion of 5’ and 3’ LTRs and FDRs, as represented by Pm 12-54005644 (Fig. 8). As with mys, the P. maniculatus sequences were designated based on chromosome and approximate location. We utilized a P. maniculatus full- length mysRS element (Pm mysRS 3-2991789) to rescreen the Peromyscus database in contrast to the mys screen, there were numerous matches obtained that contained the region encompassing both the ORF1 and ORF2 sequences (approximate nucleotides 612-2259), including the first 38 of 100 hits, suggesting the mysRS family is notably more common than mys. However, at E=1e-50, only six were apparently full length or near full length among the 2775 total hits. Using mysRS full-length, ORF1, and ORF2 sequences we queried the hamster and prairie vole databases and obtained zero hits (1e-50). We also screened the P. eremicus partial genomic library by aligning to a Peromyscus mysRS sequence (Pm 12-54005644) as a reference sequence in MacVector and isolated full-length elements from this species as well. We additionally screened the collection of databases in NCBI to obtain mysRS elements from additional genera and identified full-length mysRS elements in Onychomys torridus. The O. torridus elements isolated using either the P. maniculatus mys or mysRS elements all contained the ERV2-N1_Mio-I insert within ORF2 (highlighted in Fig. 7) and therefore this indel cannot be used as a criterium to distinguish mys and mysRS, although the O. torridus elements isolated by screening with the P. leucopus mys-1 did not contain the insert and may simply represent a separate Onychomys mysRS subfamily.
Distinguishing mys-1 vs mysRS LTR-Retroelements
To establish criteria to distinguish between mys and mysRS elements, we generated ClustalW alignments of full-length or near full-length isolated sequences (Fig. S3), followed by Muscle alignments in Mega11 from which molecular phylogenies using maximum likelihood were generated, that also were broken down into the components of the of the elements that included ORF1, ORF2 and the 5’LTRs (Fig. 9). The molecular phylogeny of full-length (or nearly full-length) elements displayed several notable groupings (Fig. 9a). At the broadest level, the Peromyscus sequences formed a distinct clade from the Onychomys sequences. O. torridus sequences queried with the P.
leucopus mys-1 sequence formed a distinct clade from O. torridus sequences queried with either mys or mysRS sequences from P. maniculatus. At this point, the P. leucopus (Pl)-derive O. torridus (Ot) sequences could represent mys elements or a subfamily of mysRS elements as there is a lack of a conspicuous mechanism for distinguishing these LTR-retroelement families. However, there is a clear demarcation in Peromyscus distinguishing mys and mysRS elements. Additionally, the P. eremicus mysRS sequences form a distinct clade indicating concerted evolution of sequences and in turn, recent activity. Of note, both Pm 6-20830407 and Pm 6-25307443 appear highly related, and their sequences continued to align beyond the 3’ end of the full-length element (data not shown but available in Ensembl).
To further ascertain any clear distinction between mys and mysRS elements, particularly as it relates to O. torridus, sequences were trimmed to align the various components of these LTR-retroelements. ORF1 sequences were aligned via ClustalW and a notable distinction was shared between the Ot (Pl) mys 11 sequence and the Peromyscus mys sequences, outlined in Fig. S4. A molecular phylogeny from an ORF1 Muscle alignment was generated (Fig. 9b) with findings that include a monophyletic group consisting of the Ot (Pl) mys 11 sequence (that we characterized as mys), with the P. maniculatus and P. leucopus mys elements; the single O. torridus mys element formed an outgroup to the Peromyscus sequences; the O. torridus sequences obtained via the P. leucopus mys-1 sequence formed a clade (with one exception); and although with weak bootstrap values, Peromyscus and O. torridus mysRS sequences formed separate clades; and P. eremicus sequences formed a monophyletic group. These findings suggest that ORF1 sequences potentially distinguish mys and mysRS elements, that there is concerted evolution of elements, and, in turn, mysRS elements are more common than mys elements in Neotominae rodents.
The relationships of the ORF2 regions of the full-length elements (highlighted in Fig. S5) were additionally analyzed. A molecular phylogeny of ORF2 sequences (Fig. 9c) most notably places all the Onychomys sequences into a distinct clade from the Peromyscus sequences (100% bootstrap value) consistent with concerted evolution irrespective of the retrotransposon family. Among Peromyscus, the ORF2 sequence alignment yielded a paraphyletic group of P. maniculatus mys, with the P. leucopous mys-1 element forming a clade with mys, albeit with a low bootstrap value. Notably, this is not the result of the ERV2-n1_Mio-I insertion, as both P. maniculatus mys and mysRS elements contain the inserts, but not the P. leucopus mys-1. However, the insertion may be responsible for the distinct O. torridus (Pl) clade as the sequences obtained via the P. leucopus mys-1 query lack the insertion in ORF2 (Fig. S5) and is apparently a feature distinguishing distinct mysRS subfamilies in O. torridus. Two O. torridus (Ot) mysRS elements (Ot mysRS15A and Ot mysRS19) derived via a P. maniculatus mysRS screen formed a clade with the Ot (Pl) subfamily, albeit as outliers, with a 100% bootstrap value. In general, ORF2 may, in a limited and less conspicuous fashion, distinguish mys and mysRS elements, but only in Peromyscus and not Onychomys, and the result of the ERV2-n1_Mio-I indel seemingly contributes to delineating mysRS subfamilies in Onychomys.
The 5’LTRs of the elements were additionally analyzed to establish if these sequences can be recognized as criteria for assessing age and activity of elements, as well as an additional method for distinguishing mys and mysRS elements. We aligned and generated a molecular phylogeny of the individual presumptive mys and mysRS elements to assess their similarities (Fig. 9d) and found the 5’LTRs of Peromyscus mys elements formed a clade with the O. torridus elements obtained via a query of P. leucopus mys-1, which then separated into distinct monophyletic groups. There were no clear patterns for the 5’LTRs of other Peromyscus and Onychomys mysRS elements. With the objective of distinguishing mys and mysRS elements, it appears that the LTR sequence can be a supporting factor but limited to within a genus as the sequence of the presumptive O. torridus mys element (Ot (Pl) mys 11) did not align with Peromyscus mys elements. Therefore, the overall picture of characterizing mys and mysRS elements appears to primarily be associated with ORF1 sequences, with the LTRs being an additional, but not exclusive, distinguishing feature.
Identification and analysis of a novel mys “ORF1-only” or mORF1 LTR-retroelement
With distinction of mys and mysRS elements primarily by ORF1 sequences and to some extent the LTR sequences we performed additional analyses to address the inconsistency between mys copy number estimates and the lack of full-length elements identified in Peromyscus and Onychomys genomic DNA libraries. While screening for mys elements in both the P. eremicus and P. aztecus DNA libraries (accession PRJNA847558) we identified ORF1 sequences not adjacent to ORF2 or “ORF1-only” elements, that commence at the same position. There were no full-length elements identified as the sequences vary at different points downstream of ORF1, or individual sequences were not long enough to make conclusions. In contrast, by querying the assemblies of greater coverage with ORF1 sequences of the P. leucopus mys element, we isolated sequences matching to O. torridus and P. maniculatus, then continued rescreening with more downstream sequences and were able to subsequently identify similar sequences that eventually reached the 3’ LTRs and the flanking direct repeats with full-length elements of approximately 8,000 nucleotides, yet lacking mys- or mysRS-like ORF2 sequences. We are tentatively classifying this element (represented in Fig. 10) as mORF1 (LTR-retroelement containing the mys ORF1 sequence), being nomenclature of LTR retroelements are not well established (Gifford et al. 2018). A ClustalW alignment of the mORF1 LTR-retroelements identified in O. torridus and P. maniculatus (Fig. S6), seemingly demonstrates a high-level of sequence conservation spanning the entire sequence. Based on pairwise analyses, the O. torridus mORF1 elements exhibited between 94.2-99% sequence identity to each other, with Peromyscus mORF1 elements displaying between 69.8-85.3% sequence identity to each other. Additionally, a molecular phylogeny (Fig. 11) supports the elements in O. torridus being more related to each other than to Peromyscus elements, consistent with concerted evolution. The ORF1-region nucleotides, 551-1140 of P. leucopus mys-1, appear to be the only region mys-1 has in common with these elements and therefore we added this stretch of sequences to the mORF1 alignment (Fig. S6). When screening the NCBI genetic database collection queried with the full-length O. torridus mORF1 23 element (the number corresponds to the chromosome harboring the element), 7,486 BLAST hits for the 14 chromosome assemblies were indicated. The displayed top match for each assembly corresponded to a full-length element. Although unclear of the estimated proportion of matches that are full-length 8,000 bp sequences, a BLAST of the ORF1 region (e.g. nucleotides 620-1166 of Ot mORF1 23) yielded 6,288 hits with the graphic summary spanning the entire region. These numbers, in contrast to mys-1, are more consistent with the dot blot copy number estimates for mys-1 ORF1, and likely the result of a greater number of mORF1 elements within Neotominae genomes. Additionally, when translating the sequences in different phases within the ORF1 region, BLASTp screens of protein databases yielded matches strictly to ORF P. leucopus (1110236A), which corresponds to P. leucopus mys-1 ORF1.
We performed ORF analyses of the remaining portion of mORF1 elements, which notably yielded long stretches of amino acids in the reverse strand corresponding to sequences roughly between 1500-6000 nucleotides downstream of the ORF1 region. One example was an ORF of 1022 amino acids located in similar regions among three O. torridus elements which include Ot mORF1 16 (Fig. 12), Ot mORF1 20, and Ot mORF1 20A. The 1022 AA ORF in Ot mORF1 16 corresponds to nucleotides 5668-2600 of the element. The translated amino acid sequences for these three elements displayed greater than 99% sequence identity to each other suggesting these represent “young” elements. A BLASTp screen of the NCBI protein databases yielded a primary hit to a reverse transcriptase domain-containing protein of Pseudomonas bacteria (WP_217682179.1) with up to 83% sequence identity, followed by low similarity matches to HERV-K (up to 48%I). Just downstream and adjacent to this ORF was a second ORF in the reverse complement strand of 452 corresponding to nucleotides 7036-5667 in Ot mORF1 16 (Fig. 12), and up to 534 amino acids in Ot mORF1 20. A BLASTp screen of the NCBI protein databases of this region of mORF1 among the O. torridus and P. maniculatus translated sequences displayed low identities (30-40%) to portions of C. griseus gag-pol-pro related sequences. ORFs were of smaller lengths within the P. maniculatus mORF1 elements. A BLASTp screen of the 167 amino acid sequence associated with an ORF near the 3’ end of OT mORF1 16 (Fig. 12) generated no significant protein matches. Overall, it appears that the region downstream of the mys-1 ORF1 sequence corresponds to endogenous retroviral sequences, most closely related to a Pseudomonas ERV. We additionally screened Repbase using the O. torridus 23 and P. maniculatus 7 mORF1 elements and ascertained that large sections of these elements corresponded to the reverse complement of the ERV2-2_Mio-I element, inclusive of the ORF1 region. Since ERV2-2_Mio-I is an ERV in the prairie vole and there is some protein similarity to C. griseus (hamster) ERVK genes, then it is possible this element may have arisen in an ancestor of cricetid rodents and differentiated over time. Alternatively, a unique integration occurred from a Pseudomonas retrovirus into the genome of a Neotominae ancestor that recombined to incorporate the mys ORF1 sequence. The large reading frames of O. torridus mORF1 elements with high similarity and the finding of several full-length elements, along with decipherable direct repeats (Fig. S6), suggests these elements may still be actively contributing to shaping genomes among genera in the Neotominae subfamily of cricetid rodents.
Identification and analysis of a novel mys “ORF2-only” retroelement or mORF2
In addition to identifying elements lacking ORF2 sequences, our mys-1 screening led to the discovery of full-length elements containing ORF2 but lacking ORF1 sequences in P. maniculatus, P. eremicus and P. aztecus, supported by displaying 5’ and 3’ LTRs and FDRs, and are roughly 1860 bp in length (Fig. 13). Translation of the ORF2 region used to query NCBI protein databases via BLASTp yielded hits to ORF2 Pl, which corresponds to the mys-1 ORF2. We refer to these elements as. mORF2, based on the rationale for mORF1. A molecular phylogeny of identified Peromyscus mORF2 sequences (Fig. 14) displayed concerted evolution and therefore suggestive of recent activity. We found the 5’ and 3’ LTRs, along with the sequences downstream of the 3’ end of ORF2, to display greater
similarity to mys elements than to mysRS elements. A ClustalW alignment incorporating additional presumptive Peromyscus mORF2 sequences is represented in Fig. S7. When screening the P. maniculatus library in Ensembl with Pm 6-7752989 as the query sequence, at 1e-10, 32 of the top 100 hits showed nearly complete coverage with 88% sequence identity or higher. In contrast, an Ensembl BLASTn query to voles and hamster displayed matches of only 137 nucleotides or fewer, primarily in the ORF2 region, with the vole displaying 280 hits at 1e-10 with none identified via screening the golden hamster. The NCBI database collection was also screened with the Pa 11410 mORF2 element (as it demonstrated the longest ORF) and numerous significant hits to O. torridus were obtained, but notably aligning at nucleotide position 440, roughly where ORF2 begins. Therefore, the sequences of O. torridus upstream of this site do not represent a common 5’ LTR to the Peromyscus mORF2 and likely contain ORF1, and therefore mORF2 sequences are plausibly lacking in Onychomys and therefore originated in an ancestor of Peromyscus. Additionally, the presence of this element, large numbers of segments of this element, and similarity to the mysRS ORF2 would potentially explain the copy number results obtained for the mys dot blot (Fig. 2c)
Contrasting mys, mysRS, mORF1, and mORF2
Overall, the detailed search for mys-1 elements led to the discovery of three additional related LTR- retrotransposon families in Peromyscus, two of which harbor only one of the ORFs related to mys-1. We aligned individual elements from each of the four LTR-retroelement families and generated a molecular phylogeny (Fig. 15) to ascertain their evolutionary relationships. Depicted from the molecular phylogeny is (1) Peromyscus mys elements form a distinct clade; (2) Peromyscus mysRS elements form a clade with Peromyscus mys forming a paraphyletic group; (3) Peromyscus mysRS forms a distinct clade from O. torridus mysRS (88% bootstrap value); (4) the O. torridus mysRS elements appear to form two distinct subfamilies based on whether the P. leucopus mys-1 versus either P. maniculatus mys-1 or mysRS elements were used as query sequences; and (5) the mORF1 family is the most distantly related, which is consistent with containing a large amount of sequence not found in the other retroelement families. We have classified Ot (Pl) 11 as a mys element, although the full-length elements phylogenetically fall within the O. torridus mysRS element clade that were obtained via a P. leucopus mys-1 screen. Additionally, each retrotransposon family displays concerted evolution consistent with recent activity. To further address this point, we screened orthologous loci for the presence or absence of some of the individual elements.
Screening orthologous loci for recent retrotransposon activity
To assess relatively recent activity of the various LTR-retroelement families, we analyzed orthologous loci in Peromyscus for several individual elements (Table 1). In some cases, library screens queried with sequences flanking individual elements did not yield matches to the corresponding locus from the available assembly. We analyzed two P. eremicus mysRS element-containing orthologous loci in the related P. californicus; both species in the subgenus Haplomylomys (Fig. 1), as well as in the more distantly related P. maniculatus. These individual elements were limited to P. eremicus (Table 1). This provides evidence of mysRS integrations that are limited to P. eremicus. We also analyzed five P. maniculatus mysRS elements of which three were present and two were absent in the closely related P. leucopus species. We were able to identify the orthologous locus in P. polionotus, the sister species of P. maniculatus, for one of these two absence loci in P. leucopus and found the presence of the element, indicating integration post-dating the divergence of the P. leucopus/P. maniculatus clades.
To assess recent mys integrations we analyzed four orthologous loci of the P. maniculatus mys elements in other Peromyscus species (Table 1). We found two to be present in P. leucopus and one absent, indicating activity of the latter occurred after the divergence leading up to these species. The fourth mys ortholog could not be identified in the P. leucopus assembly, although we determined presence in P. polionotus, the sister species of P. maniculatus.
To assess recent mORF2 LTR-retroelement activity we analyzed four individual mORF2-containing loci from P. maniculatus and P. eremicus. We found one P. maniculatus mORF2 element present in the P. leucopus ortholog,
Table 1. Analysis for the presence or absence of retroelements within orthologous Peromyscus loci.
Element
|
Source
|
Library
|
Present or Absent
|
1) mysRS
|
|
|
|
Pe mysRS 56082383
|
P. eremicus
|
P. maniculatus
|
absent
|
Pe mysRS 56082383
|
P. eremicus
|
P. californicus
|
absent
|
Pe mysRS 4961104
|
P. eremicus
|
P. maniculatus
|
absent
|
Pe mysRS 4961104
|
P. eremicus
|
P. californicus
|
absent
|
Pm mysRS 12-54005644
|
P. maniculatus
|
P. leucopus
|
absent
|
Pm mysRS 12-54005644
|
P. maniculatus
|
P. polionotus
|
present
|
Pm mysRS 11-9715318
|
P. maniculatus
|
P. leucopus
|
present
|
Pm mysRS 3-2991789
|
P. maniculatus
|
P. leucopus
|
present
|
Pm mysRS 10-660859
|
P. maniculatus
|
P. californicus
|
absent
|
Pm mysRS 10-660859
|
P. maniculatus
|
P. leucopus
|
absent
|
Pm mysRS 20-36151548
|
P. maniculatus
|
P. leucopus
|
present
|
2) mys
|
|
|
|
Pm mys 23-33312091
|
P. maniculatus
|
P. leucopus
|
absent
|
Pm mys 1-70574111
|
P. maniculatus
|
P. leucopus
|
present
|
Pm mys 1-99318323
|
P. maniculatus
|
P. leucopus
|
present
|
Pm mys 5-92303600
|
P. maniculatus
|
P. polionotus*
|
present
|
3) mORF2
|
|
|
|
Pm 21-28563327
|
P. maniculatus
|
P. polionotus*
|
present
|
Pm 18-786447
|
P. maniculatus
|
P. leucopus
|
present
|
Pe 32375871
|
P. eremicus
|
P. maniculatus
|
absent
|
Pe 32375871
|
P. eremicus
|
P. californicus
|
absent
|
Pe 58262094
|
P. eremicus
|
P. maniculatus
|
absent
|
mORF1
|
|
|
|
Pm 7-10377608
|
P. maniculatus
|
P. leucopus
|
present
|
Pm 6-20227267
|
P. maniculatus
|
P. leucopus
|
present**
|
Pm 6-20227267
|
P. maniculatus
|
P. californicus
|
absent
|
*Sequence from P. leucopus assembly not attainable. **Contains mORF1 plus additional insert.
and the other in P. polionotus (the flanking sequences in P. leucopus could not be identified). For the two P. eremicus mORF2 loci we analyzed, we found both to be absent in orthologous P. maniculatus loci, but also absent in the orthologous site in the more closely related P. californicus. Therefore, there has been activity of mysRS and mORF2 elements specific to the P. eremicus lineage.
We additionally assessed orthologous mORF1-containing loci, albeit more difficult to analyze based on the large size. Using flanking sequences of the near full-length Pm 7-10377608 element as well as the full-length Pm 6-20227267 element-containing loci, we identified these mORF1 elements within the orthologous loci of P. leucopus loci. However, of interest was the finding that the flanking sequences for Pm 6-20227267 were separated by 12,015 nucleotides in P. leucopus, the variation resulting from an insertion of about 3840 bp. We queried the NCBI genetic database with the 3840 bp insert sequence with the top match displaying 95% coverage and 97.25% identity to a P. maniculatus solute carrier family 2 member 12 (Slc2a12) transcript variant (accession no. XM_042262210.1). Further matches of the insertion corresponded to the same gene in additional cricetid rodents followed by lesser matches to other rodent families. No matches beyond a few nucleotides were identified via a Repbase search. Additionally, the Pm 6-20227267 element was lacking in P. californicus orthologous locus supporting mORF1 activity in the genus Peromyscus.