Analysis of Genomic Variations between P. maniculatus and P. polionotus
The ability to generate fertile and viable hybrid offspring from P. maniculatus and P. polionotus provides an opportunity to identify the genetic loci associated with phenotypic traits that differ between the two species. Capitalizing on this rich genetic resource requires identifying the allelic differences between the two species. We sought to identify allelic differences between P. maniculatus and P. polionotus, using publicly available genomic resources. The first assembly of the P. maniculatus genome was used as the reference genome, and the P. polionotus sequencing reads deposited in the SRA database were mapped against the P. maniculatus reference genome. For polymorphism determination we required 10 independent P. polionotus sequencing reads to be mapped to a specific P. maniculatus genomic location. With these criteria we determined that 81.9% of the P. maniculatus genome was covered by 10 P. polionotus sequencing reads, resulting in 38,166,334 polymorphisms between P. maniculatus and P. polionotus. Among these variations are 34,084,607 single nucleotide polymorphisms (SNPs) and 4,081,727 insertions or deletions (INDELs) between BW and PO (Fig. 1). Lowering the read mapping requirement to five sequencing reads increased the number of identified polymorphisms to 40,815,360. However, we chose to keep the more stringent criteria for subsequent analysis, recognizing that we are likely underestimating the number of polymorphisms between the species. Using the more stringent 38,166,334 polymorphisms results in a variant rate of one variation every 68 base pairs. Approximately 33% of these variants occur in introns and 18% occur in intergenic regions. Less than 1% occur in exons.
We characterized the SNPs that occur between P. maniculatus and P. polionotus for changes in predicted protein coding sequences and found that there are 11,588 genes that contain a coding sequence SNP. Of these variants, 70.2% are silent, 29.6% are missense, and 0.2% are nonsense variants (Fig. 1). We identified a set of 10,405 genes that contain a nonsynonymous change, which might result in a functional difference in the protein between the two species. Using the annotated Mus genome as a reference, we conducted a gene ontology (GO) term analysis on the list of 10,405 genes with missense or nonsense variants to determine if any biological processes or molecular functions are over or underrepresented within this list of genes. There are 2060 biological processes that are over-represented, and 63 that are under-represented (Table 1 and Online Resource Tables 1 and 2). The processes with the most statistically significant enrichment are typically broad in GO terminology, including cellular process, cellular metabolic process, and metabolic process. P. maniculatus and P. polionotus are known to have distinct physiological differences, including an almost four-fold increase in blood cholesterol levels in P. polionotus compared to P. maniculatus and a two-fold increase in blood triglyceride levels in P. polionotus over P. maniculatus (19). We searched the list of overrepresented GO terms for terms with a possible relationship with cholesterol and triglycerides and found that the GO terms cholesterol metabolic process, cholesterol homeostasis, cholesterol transport, triglyceride metabolic process, triglyceride biosynthetic process, and triglyceride catabolic process are enriched in this list of genes with potential functional protein changes (p = 3.90 × 10− 6, 3.05 × 10− 5, 3.86 × 10− 4, 9.86 × 10− 6, and 1.81 × 10− 5, respectively) (Online Resource Table 1). Each of these GO terms contains from 20 to 77 different genes with nonsynonymous changes, suggesting that there is substantial genetic complexity that may underlie the metabolic differences between the two species. In contrast, there are 63 GO terms that are under-represented in the list, including sensory perception of smell, sensory perception of chemical stimulus, sensory perception, G protein-coupled receptor signaling pathway, and nervous system process, suggesting that these biological processes are more conserved between the two species.
Table 1
Top 20 over and underrepresented GO terms in the list of Peromyscus genes containing a nonsynonymous SNP between P. maniculatus and P. polionotus
Overrepresented GO terms | GO term ID | p-value | Number of Mus genes with GO term | Expected number of genes | Number of Peromyscus Genes with non-synonymous SNP |
cellular process | GO:0009987 | 2.03E-298 | 15748 | 6516 | 7781 |
cellular metabolic process | GO:0044237 | 8.06E-257 | 9331 | 3861 | 5119 |
metabolic process | GO:0008152 | 2.96E-243 | 10435 | 4318 | 5557 |
primary metabolic process | GO:0044238 | 5.83E-235 | 9405 | 3891 | 5096 |
organic substance metabolic process | GO:0071704 | 8.98E-225 | 9966 | 4124 | 5310 |
nitrogen compound metabolic process | GO:0006807 | 2.89E-214 | 8892 | 3679 | 4819 |
cellular component organization or biogenesis | GO:0071840 | 5.56E-214 | 6225 | 2576 | 3618 |
cellular component organization | GO:0016043 | 9.53E-201 | 6053 | 2505 | 3505 |
localization | GO:0051179 | 3.18E-193 | 5918 | 2449 | 3423 |
cellular macromolecule metabolic process | GO:0044260 | 2.50E-174 | 7080 | 2929 | 3905 |
macromolecule metabolic process | GO:0043170 | 3.99E-170 | 8461 | 3501 | 4506 |
organonitrogen compound metabolic process | GO:1901564 | 9.62E-137 | 6215 | 2572 | 3402 |
developmental process | GO:0032502 | 4.23E-136 | 6193 | 2562 | 3390 |
anatomical structure development | GO:0048856 | 1.40E-130 | 5810 | 2404 | 3198 |
establishment of localization | GO:0051234 | 1.11E-127 | 4518 | 1869 | 2588 |
biological regulation | GO:0065007 | 1.61E-120 | 12092 | 5003 | 5874 |
multicellular organism development | GO:0007275 | 1.36E-119 | 5304 | 2195 | 2931 |
transport | GO:0006810 | 4.07E-118 | 4378 | 1811 | 2494 |
organelle organization | GO:0006996 | 2.00E-117 | 3549 | 1468 | 2095 |
macromolecule modification | GO:0043412 | 7.62E-114 | 3623 | 1499 | 2121 |
Underrepresented GO terms | GO term ID | p-value | Number of Mus genes with GO term | Expected number of genes | Number of BW Genes with non-synonymous SNP |
sensory perception of smell | GO:0007608 | 3.37E-238 | 1138 | 471 | 17 |
sensory perception of chemical stimulus | GO:0007606 | 2.58E-228 | 1236 | 511 | 39 |
sensory perception | GO:0007600 | 1.17E-116 | 1784 | 738 | 303 |
G protein-coupled receptor signaling pathway | GO:0007186 | 2.04E-83 | 1921 | 795 | 410 |
nervous system process | GO:0050877 | 8.09E-55 | 2286 | 946 | 606 |
system process | GO:0003008 | 2.91E-25 | 2923 | 1209 | 955 |
phagocytosis, recognition | GO:0006910 | 3.11E-23 | 151 | 62 | 9 |
response to pheromone | GO:0019236 | 1.07E-21 | 105 | 43 | 2 |
complement activation, classical pathway | GO:0006958 | 1.28E-19 | 165 | 68 | 16 |
humoral immune response mediated by circulating immunoglobulin | GO:0002455 | 3.89E-17 | 180 | 74 | 23 |
complement activation | GO:0006956 | 1.02E-15 | 187 | 77 | 27 |
humoral immune response | GO:0006959 | 6.06E-15 | 355 | 147 | 78 |
phagocytosis, engulfment | GO:0006911 | 4.56E-12 | 189 | 78 | 34 |
protein activation cascade | GO:0072376 | 1.11E-11 | 200 | 83 | 38 |
B cell receptor signaling pathway | GO:0050853 | 2.16E-11 | 181 | 75 | 33 |
xenobiotic metabolic process | GO:0006805 | 2.19E-11 | 112 | 46 | 14 |
plasma membrane invagination | GO:0099024 | 6.36E-11 | 198 | 82 | 39 |
membrane invagination | GO:0010324 | 1.60E-10 | 205 | 85 | 42 |
response to leukemia inhibitory factor | GO:1990823 | 1.06E-09 | 311 | 129 | 78 |
cellular response to leukemia inhibitory factor | GO:1990830 | 1.06E-09 | 311 | 129 | 78 |
Table 2
Polymorphism characterization in Peromyscus genes associated with autism in humans
Gene | SFARI Score | Upstream | 5' UTR | Silent | Missense | In-frame deletion | Splice region | Intronic | 3' UTR | Downstream |
Arid1b | 1S | 205 | | 30 | 6 | | 4 | 5881 | 13 | |
Adnp | 1S | 75 | | 4 | 3 | | | 175 | 24 | |
Ash1l | 1S | 205 | | 16 | 12 | | 3 | 1089 | 9 | 53 |
Chd8 | 1S | 155 | 2 | 26 | | | 11 | 663 | 11 | 42 |
Dyrk1a | 1S | 84 | | 4 | | | 1 | 888 | 19 | 82 |
Kmt2a | 1S | 100 | 2 | 39 | 15 | 1 | 5 | 896 | 19 | |
Tbr1 | 1 | 89 | | 2 | | | 1 | 83 | 1 | |
The P. maniculatus laboratory stock, BW, is known to have a significant incidence of repetitive, or stereotactic, behavior, including repetitive jumping (13, 20, 21). The P. polionotus laboratory stock, PO, does not display stereotactic behaviors. In humans, repetitive movements are associated with autism and obsessive-compulsive disorder (15). BW animals are also less social than PO animals, another hallmark of autism in humans (13, 22). We examined the list of overrepresented GO terms for processes that may be related with autism associated behaviors and found that locomotory behavior and social behavior are both enriched GO terms (p = 1.94 × 10− 5 and 6.79 × 10− 4, respectively) (Online Resource Table 1). In addition, we selected a list of candidate genes that all have a high confidence of being associated with autism in humans from the Simons Foundation Autism Research Initiative (SFARI) database (Table 2). We then identified sequence variations that occur between P. maniculatus and P. polionotus in this list of autism candidate genes. Each gene analyzed has multiple sequence variations between the two species that could result in a functional change to the protein, including missense changes, nonsense changes, in-frame deletions, and nucleotide variations in splicing regions. In addition, there are numerous differences in untranslated regions, introns, and upstream and downstream sequences that could result in differences in transcript or protein levels, if they occur in regulatory regions.
We examined potential functional differences in one autism candidate gene, ASH1L, a chromatin modifying protein that is associated with transcriptional activation (23). An alignment of select mammalian ASH1L protein sequences identifies highly conserved amino acids and determines if Peromyscus amino acid substitutions occur in these conserved residues (Fig. 2). None of the nonsynonymous changes between P. maniculatus and P. polionotus ASH1L are in the ASH1L conserved protein domains, SET, BROMO, PHD, and BAH, and generally occur in regions with less conservation between mammalian ASH1L proteins. However, at positions 61, 484, 770, 1632, and 2814 there are amino acid substitutions in one Peromyscus species where the amino acid is conserved in the other mammals. The S484A, S770P, and T1632P substitutions are intriguing as they remove potential phosphorylation sites in one of the Peromyscus species. The potential functional impact of these amino acid substitutions on ASH1L function will require further characterization.
We also examined possible transcriptional regulatory changes between P. maniculatus and P. polionotus for ASH1L by generating a VISTA plot to identify conserved non-coding sequences (CNS) in the ASH1L locus (24). A conserved non-coding sequence occurs in intron 3 of ASH1L in 100 vertebrate species (UCSC Genome Brower: Human GRCh38/hg38 chromosome 1: 155,459,751 − 155,478,012) and is also found in BW and PO (Fig. 3) (25). Within this CNS there are three SNPs between P. maniculatus and P. polionotus. Two of the three Peromyscus SNPs are in positions that are not conserved between a group of seven mammalian species. However, one SNP occurs in a region of 16 nucleotides that are completely conserved within the selection of mammalian species (Fig. 3c). We used PROMO to identify potential transcription factor binding sites within this region and found that in six mammalian species, including P. maniculatus, the conserved sequences contain a potential NKX2-1 binding site (26, 27). However, in P. polionotus the SNP removes the NKX2-1 binding site and generates a potential EBF1 binding site.
Restriction Enzyme Recognition Sites in P. maniculatus
A QTL analysis or GWAS using Peromyscus is likely to utilize RADsEq. RADseq is a flexible approach to genomic analysis, as the choice of restriction enzyme used to digest the genomic DNA can be varied to customize the number of sequenced sites, known as RAD markers, across the genome (28). The number of RAD markers generated is twice the number of restriction enzyme recognition sites. An enzyme that cuts more frequently will generate more RAD markers and, therefore, provide more allelic information than an enzyme that cuts less frequently. We used the P. maniculatus reference genome to determine the number of cuts sites and the average fragment size for the enzymes listed in Table 3. This data provides a range of restriction enzymes with recognition sites from approximately 1000 bp apart (DraI) to approximately 1 million bp apart (AscI), enabling an informed choice for restriction enzyme selection in P. maniculatus RADseq projects. RADseq generates about 400 bp of sequence information flanking a restriction enzyme recognition site. Because a sequence variant between P. maniculatus and P. polionotus occurs approximately every 68 base pairs, it is likely that RADseq analysis on F1 hybrids will generate informative allelic information at most RAD markers.
Table 3
Number of restriction enzyme recognition sites and average fragment size for P. maniculatus and similarity index (SI) for Peromyscus, Mus, and Drosophila
Restriction Enzyme | CpG | Number of Recognition sites in Peromyscus | Average fragment size in Peromyscus | Peromyscus SI | Mus SI | Drosophila SI |
ApaI | Yes | 384,617 | 6,431 | -0.74 | -0.82 | -1.66 |
AscI | Yes | 2,460 | 1,005,504 | -4.03 | -3.92 | -1.50 |
AvrII | No | 482,638 | 5,125 | -0.41 | -0.14 | -2.26 |
BamHI | No | 410,386 | 6,027 | -0.65 | -0.23 | -0.61 |
BspQI | No | 457,484 | 5,406 | 1.51 | 1.59 | 1.22 |
BssHII | Yes | 115,555 | 21,405 | -2.47 | -2.88 | -1.31 |
DraI | No | 2,508,985 | 986 | 1.97 | 1.97 | 2.19 |
EagI | Yes | 47,415 | 52,167 | -3.76 | -3.99 | -1.20 |
EcoRI | No | 701,369 | 3,527 | 0.13 | 0.25 | 0.24 |
FseI | Yes | 13,191 | 187,517 | -1.61 | -1.97 | -1.26 |
HindIII | No | 809,283 | 3,056 | 0.33 | 0.40 | 0.24 |
NaeI | Yes | 105,993 | 23,336 | -2.60 | -2.86 | -0.27 |
NarI | Yes | 89,075 | 27,769 | -2.85 | -3.03 | -0.64 |
NheI | No | 377,464 | 6,553 | -0.77 | -0.86 | -1.05 |
NotI | Yes | 5,842 | 423,406 | -2.78 | -2.66 | -0.51 |
PacI | No | 146,207 | 16,918 | 1.86 | 1.83 | 3.00 |
PmeI | No | 34,572 | 71,547 | -4.22 | -4.20 | -3.62 |
RsrII | Yes | 10,111 | 244,638 | -5.99 | -5.63 | -2.73 |
SacI | No | 608,101 | 4,067 | -0.08 | 0.05 | -0.31 |
SacII | Yes | 47,658 | 51,901 | -3.75 | -3.97 | -1.61 |
SaII | Yes | 33,458 | 73,930 | -4.26 | -4.21 | -0.96 |
Sbfl | No | 78,368 | 31,563 | 0.97 | 0.61 | -0.04 |
SgrAI | Yes | 13,574 | 182,226 | -1.56 | -1.52 | 1.48 |
SmaI | Yes | 200,696 | 12,324 | -1.68 | -1.84 | -1.78 |
SpeI | No | 370,097 | 6,683 | -0.80 | -0.84 | -1.19 |
SphI | No | 654,548 | 3,779 | 0.03 | -0.09 | -0.25 |
SspI | No | 1,559,294 | 1,586 | 1.28 | 1.33 | 1.99 |
SwaI | No | 166,068 | 14,894 | 2.05 | 1.99 | 3.25 |
XbaI | No | 748,330 | 3,305 | 0.22 | 0.36 | -0.77 |
XhoI | Yes | 109,564 | 22,576 | -2.55 | -2.31 | -0.73 |
A Similarity Index (SI) indicates whether the observed number of recognition sites for a specific restriction enzyme differs from the expected number (29). SI values near zero indicate that the observed number of recognition sites does not differ greatly from the expected number. Negative values indicate fewer observed recognition sites than expected, while positive values indicate a more frequent occurrence than expected. The frequency of restriction enzyme recognition sites within a genome broadly corresponds with phylogeny (29). Using the reference genomes for Mus and Drosophila, we determined the number of restriction enzyme recognition sites for the list of selected restriction enzymes used in the P. maniculatus analysis and calculated a SI for each enzyme in all three species (Fig. 4). Using a single-factor ANOVA with post hoc Tukey HSD, we then compared the mean SI in each species and found that there is no statistically significant difference in the frequency of restriction enzyme recognition sites (F (2,84) = 3.1, p = 0.09). The selected restriction enzymes can be divided into those that contain a CpG within their recognition site and those that do not. Within our group of restriction enzymes, the presence of a CpG in the recognition site is correlated with a lower SI (p = 2.62 × 10− 14). Because CpG dinucleotides appear less frequently in the genome than expected (30), restriction enzymes that contain a CpG in their recognition site are expected to have negative SI values. We compared SI between species based on the presence or absence of a CpG in the restriction enzyme recognition site, using a single-factor ANOVA, and found that SI for non-CpG restriction enzymes was not statistically significantly different between species (F (2, 42) = 3.2, p = 0.87), while it is statistically significantly different for CpG recognizing restriction enzymes (F (2, 39) = 3.2, p = 1.1 × 10− 4). Post hoc Tukey HSD pairwise comparisons between the three species for CpG containing restriction enzymes determined that Peromyscus and Mus have a statistically significant different mean SI from Drosophila (Peromyscus and Drosophila p = 0.0053, Mus and Drosophila p = 0.019) but are not statistically significantly different from each other (Peromyscus and Mus p = 0.90). Because CpG dinucleotides occur in CpG islands, which are typically associated with transcriptional regulation, the difference is likely reflective of differences in transcriptional regulation between more closely and distantly related species.
Linkage Analysis of Dominant spot
Dominant spot is a spontaneous mutation that arose within a wild population of P. maniculatus near Morrison, Illinois (1, 31). The Dominant spot trait (S) has been maintained on the BW laboratory stock of P. maniculatus at the PGSC (Fig. 5a). We sought to perform linkage analysis to identify genetic loci associated with Dominant spot by crossing BW S/+ with the PO laboratory stock of P. polionotus (+/+). F1 hybrids of P. maniculatus and P. polionotus exhibit developmental dysgenesis (1, 32). When female P. maniculatus are crossed with male P. polionotus, the hybrid offspring are smaller than either parent, but are viable and fertile. In contrast, female P. polionotus crossed with male P. maniculatus result in overgrown fetuses with developmental defects and are not viable. Therefore, male +/+ PO were crossed with S/+ BW to generate F1 hybrids. S/+ offspring, identifiable by the white spot on the forehead (Fig. 5b), were then backcrossed to PO to generate an N2 generation (Fig. 5c).
Disrupted pigmentation patterns in laboratory mice, Mus musculus, are readily identifiable, and characterization of the causative mutations for these spotting defects has identified key members of a neural crest gene regulatory network necessary for normal neural crest development (33). We pursued a candidate gene approach as a first step towards linking Dominant spot with a specific genomic region. Edn3, Ednrb, Kit, Kitl, Mitf, Pax3, Ret, Snail, and Sox10 are all known to cause spotting phenotypes in M. musculus; therefore, we sought to identify allelic differences in P. maniculatus and P. polionotus for each gene to determine if any of these candidate genes are linked with Dominant spot. For our list of candidate genes, we identified a sequence variant that removes a restriction enzyme recognition site in one Peromyscus species. We will call these sites restriction fragment length polymorphisms (RFLPs) because of their similarity to the technique used for genomic variation analysis (Table 4). We then designed polymerase chain reaction (PCR) primers flanking the site to generate an RFLP site specific amplicon. BW and PO genomic DNA, along with genomic DNA from S/+ N2 animals was PCR amplified and then digested with the appropriate restriction enzyme (Fig. 6). The S mutation arose in P. maniculatus and has been maintained on the BW stock; therefore, the S mutation occurs in a BW allele. If a candidate gene is linked with the Dominant spot trait, then all S/+ N2 animals will have both a BW allele and a PO allele for that candidate gene. If an S/+ N2 animal has only PO alleles for a candidate gene, then that candidate gene is not linked with the Dominant spot trait. From our list of candidate genes, eight of the candidate genes are not linked with Dominant spot as there are multiple S/+ N2 individuals with only the PO allele. However, Sox10 is linked with Dominant spot; 29 S/+ N2 individuals were genotyped at the Sox10 RFLP site, and all 29 are BW/PO (χ2 (1, N = 29) = 28.1, p = 1.2 × 10− 7). By employing the same RFLP analysis, we have identified a 1.7 Mb region between Tex33 and Pdgfb on chromosome 20 that is linked with Dominant spot (data not shown). Among the 53 genes contained in the linkage interval, only Sox10 has a defined role in neural crest development. Therefore, we favor the possibility that the S mutation disrupts Sox10 function.
Table 4
Polymorphisms that generate RFLPs between BW and PO in candidate genes that cause spotting in Mus
Candidate Gene | Contig | PCR Amplicon Location | Polymorphism Location | BW sequence | PO Sequence | Forward Primer | Reverse Primer |
Edn3 | NW_006501107 | 1645861. .. 1646386 | 1646160 | AGGCCT (StuI) | AGTCCT | CTCGAGAACCTTGGGATTCA | AACAGGGTCTCCTGCAGTGT |
Ednrb* | NW_006501134 | 1664283. .. 1664495 | 1664389 | CCGG (MspI) | CCAG | ATGACGCCACCCACTAAGAC | GATGATGCCTAGCACGAACA |
Kit | NW_006501162 | 6362706. .. 6363108 | 6362907. .. 6362919 | CCGTGGTACCTCTGCTCGGGA (KpnI) | CCGT//GGGA | CCCGTCCTAGCTTTGGAAC | AGCATCAGGGCAACCTTAAA |
Kitl | NW_006501158 | 227546. .. 228022 | 227668 | GGATCC (BamHI) | GAATCC | CCCAATTAGCTGCTCTTCAAAC | GGAGCCTTTGTGTCTTATCAGTA |
Mitf | NW_006501059 | 77663262. .. 7663281 | 7663508 | GTATGC | GCATGC (SphI) | GGATGAGACTCAGGGTGAGG | GCTCCATCACTCGGCATTAT |
Pax3 | NW_006501055 | 3517274.. . 3517631 | 3517515 | TTTAAA (DraI) | TTTCAA | CCTTGCCTACTACGCTCTGA | TAATTCTGCATCCTTCCGGC |
Ret | NW_006501668 | 491420. .. 491911 | 491673 | AGGCCT (StuI) | AGACCT | GTTTCACCCTAGGAAGTTGTGG | GCCTCAGAAGCAGCCCTC |
Snai2 | NW_006502260 | 133647. .. 133992 | 133912 | TTTAAA (DraI) | TTTAAG | CCAAAGTTGAAGGCTGTTGC | AGTCCATTGCTTTCACACCT |
Sox10 | NW_006501150 | 898573. .. 898893 | 898801 | CCAC | CCGC (AciI) | GGCAGACTGAGGGAGGTGTA | GGAGATCAGCCACGAGGTAA |
* Polymorphism and primers identified by (55) and verified in our analysis |
In generating the N2 generation, we noticed that the spot size was smaller on the F1 and N2 animals compared to the originating BW background. The average spot size for 25 S/+ BW animals is 77.6 ± 36.6 mm2. Six S/+ F1 animals have very small spots in comparison, 3.75 ± 1.56 mm2, suggesting that PO alleles have a dominant effect on the S/+ spot size phenotype. In the PO N2 generation the spot size for 45 affected animals averaged 15.4 ± 12.9 mm2, which is significantly smaller than the spot size of S/+ BW animals (Welch’s t (27) = 8.23, p = 7.14 × 10 − 9), suggesting that genetic background has a significant impact on the S/+ phenotype. A histogram for spot size for S/+ on BW and PO N2 illustrates the quantitative nature of the phenotype and the shift in spot size in the PO N2 animals (Fig. 7). We used the backcross data to estimate the number of loci that affect spot size. The six F1 animals with small spots produced 40 offspring, of which 12 (30%) resembled the F1 parent. In this backcross experiment there are only two possible genotypes for each gene. If two unlinked loci determine the spot size phenotype, then there are four possible genotypes and 25% of the offspring are expected to resemble the F1 parent. The observed number of N2 offspring with the F1 phenotype fits a model of two loci interacting with the Dominant spot mutation (X2 (1, N = 40) = 0.25, p = 0.47), where there is no significant difference from the null hypothesis. Models for one (X2 (1, N = 40) = 6.4, p = 0.011) or three interacting loci (X2 (1, N = 40) = 11.2, p = 0.008) are statistically significantly different.
Further analysis indicates that there is a significant loss of affected animals in the N2 backcross. A total of 154 offspring were produced in the N2 generation. Of these animals, only 57 pups had forehead spots, representing the S/+ genotype, while 74 were expected (X2 (1, N = 154) = 10.4, p = 0.0013). Analysis of the PGSC breeding records for BW S/+ X BW +/+ indicate that the S/+ genotype is produced at the expected frequency (310 total offspring of which 165 have spots, (χ2 (1, N = 310) = 1.29, p = 0.26)). These results suggest that there is a significant loss of the S/+ phenotype in the PO N2 offspring, resulting from either a lethality of S/+ in PO N2 animals or because some PO N2 S/+ animals have a phenotypic rescue and do not have a forehead spot. We are currently sequencing the linked region to identify the causative mutation and conducting a QTL analysis to identify loci associated with the spot size phenotype.