PolG mutation spectra pipeline
Raw Illumina sequencing reads were aligned to the mouse mt genome. After mutations were called, they were separated into germline and somatic mutations (Fig. 1A). Mutations were also analyzed using two metrics: mutation count which evaluated whether a mutation was present at each reference basepair, and mutation frequency, which measures the prevalence of each mutation (Fig. 1B).
PolG mtDNA mutations are more common in liver tissue.
PolG mice accumulated many more mtDNA mutations compared to wild type mice (Fig. 2A, 2B). At an mtDNA coverage of about 10,000, only four point mutations were found in the 3 liver samples from wild type mice (fixed polymorphisms compared to the reference genome were not included), while about 300 mutations were found on average per PolG mouse liver sample. Importantly, the wild type mice used here were never introduced in the PolG line, unlike previous studies that included homozygous negative PolG mice as a control, which are sometimes referred to as “wild-type” (12, 18, 40). When all mutations were summed, mutation counts and frequencies were 106% and 24% higher respectively in liver tissue compared to brain tissue (Fig. 2A; t(13) = 6.73, p < 0.001; Fig. 2B; t(13) = 6.06, p < 0.001). Therefore, we only report on liver mutations in the main text, while those from brain samples are presented in Supplementary Figures and showed the same trends (Supplementary Fig. 1–4, 5, 7, 8) [Additional File 1]. In liver tissue, somatic mutation counts were about 1.5 times higher compared to germline mutations (Fig. 2A; t = 6.212, p = < 0.001) in PolG mice, but mutation frequencies were about 50% lower for somatic mutations (Fig. 2B; t = -5.217, p < 0.001).
Mutations are abundant throughout the mtDNA except in the D-loop.
MtDNA mutations were detected over the entire mtDNA genome in PolG mice (Fig. 3A). MtDNA mutation frequency and mutation count tend to trend together, but there are regions where these measures appear to diverge (e.g., ATP6 and CYTB; Fig. 3A).
There were no significant differences in mutation counts among the tRNAs (Fig. 3B, 3C; t=-1.855, p = 0.067) or rRNAs (Fig. 3B, 3C; t=-0.460, p = 0.646) when compared to the protein coding regions (CDS), but the D-loop had 79% fewer mutations when compared with the CDS (Fig. 3B, 3C; t = -11.805, p < 0.001). A similar pattern among the mtDNA genome regions was found in mutation frequency: The D-loop was lowest (Fig. 3D; t = -14.226, p < 0.001) with an 82% lower average mtDNA mutation frequency when compared with the CDS, but unlike mutation count, the tRNA region had a 35% higher mutation frequency than the CDS (Fig. 3D, 3E; t = 4.421, p = < 0.001). There was a significant negative interaction between mutation type (germline vs. somatic) and location because germline, but not somatic tRNA mutations tended to rise to high frequencies (Fig. 3E; t = -2.007, interaction p = 0.048). Additionally, rRNA mutation frequency was 31% lower than the CDS region (Fig. 3E; t = -2.679, p = 0.009). Most animals had no detectable D-loop germline mutations (Fig. 3C, 3E). Overall, these results suggest that the D-loop is depleted of mtDNA mutations in the PolG mouse, and the other regions have about the same number of mtDNA mutations, but frequency significantly varies among them.
Missense mtDNA mutations are abundant, but rarely inherited.
For the combined CDS, we evaluated how mutations affected the resulting amino acids. Nonsense mutations were rarest in both count and frequency (Fig. 4A; t = -13.999, p < 0.001; Fig. 4D; t = -14.922, p < 0.001). On average, missense mutations had twice the mutation count compared to silent mutations (Fig. 4A, t = 2.568, p = 0.013), but silent and missense mutations showed similar mutation frequencies (Fig. 4D; t = -1.427, p = 0.158). For both mutation counts and mutation frequencies, there was a positive interaction between somatic/germline and silent/missense, such that somatic mutations were more likely to be missense than germline mutations (Fig. 4B, t = 2.889, interaction p = 0.005; Fig. 4E; t = 3.536, interaction p < 0.001).
There was no effect of codon position on mutation count (Fig. 4C; t = 1.479, p = 0.180, but there was a significant effect on mutation frequency (Fig. 4F; t = 17.160, p < 0.001), such that mutations in codon position 3 had 90% higher frequency compared with positions 1 and 2 (p < 0.001 for both). Overall, these results suggest that somatic mutations are more likely to be missense when compared to germline mutations, and though all three codon positions are equally likely to mutate, mutations in the third codon position of CDS regions rise to higher frequencies.
C to T (G to A) transition mutations dominate the PolG mutation spectra, contributing to an increase in hydrophobic amino acids.
There was a significant effect of base pair substitution type for both mutation count (Fig. 5A; F = 152, p < 0.001) and frequency (Fig. 5. B; F = 162, p = < 0.001). C to T (G to A) base pair transitions were the most abundant type of single base pair point mutation, showing 3 times higher mutation count and 2 times higher mutation frequency compared with T to C (A to G) mutations, the second most frequent base pair change (Fig. 5A, 4B; p < 0.001 for all pairwise comparisons with C to T (G to A)). All other types of point mutations were also detected, although C to G (G to C) and T to G (A to C) mutations were exceedingly rare in our data (only 8 and 14 total mutations detected across all liver samples, respectively) and were not considered in analyses of amino acid changes.
Considering only missense mutations, those involving a change between hydrophilic and hydrophobic amino acids were the most common when examining amino acid properties (Supplementary Fig. S6, S7) [Additional File 1]. In mixed linear models for mutation count and frequency that only include hydrophobic and hydrophilic changes in C to T mutations, there was no significant difference when the initial state of the amino acid was considered (Fig. 5C; t = -2.018, p = 0.0523; Fig. 5D; t = -0.242, p = 0.810) (i.e., hydrophilic and hydrophobic reference amino acids were equally likely to mutate), but in both mutation count and frequency, the mutated amino acid was more likely to be hydrophobic (Fig. 5C; t = 7.023, p = < 0.001; Fig. 5D; t = 6.145, p = < 0.001). Taken together, PolG mutations are primarily C to T (G to A) transitions which tend to increase the hydrophobicity of protein products.
Indels are less common than point mutations and tend to be small deletions in PolG mice.
Compared to point mutations, indels were less abundant; they were still spaced throughout the mtDNA, but were less abundant in the D-loop (Fig. 6A). Wild type mice also have a very low indel count and frequency compared to the PolG mice, as an average of 4 indels were called per wild type mouse compared to about 70 per PolG mouse (Fig. 6B, 6C). Unlike point mutations, there was no significant difference between germline and somatic mutation count (Fig. 6B; t = 1.149, p = 0.261). Similar to point mutations, somatic indels were found at approximately half of the frequency of germline indels (Fig. 6C; t = -3.951, p < 0.001). PolG indels are primarily small, frameshift deletions, with 850 out of the almost 1000 indels being deletions (Fig. 6D). At random, we would expect close to 33% of CDS indels to be a multiple of 3 and not cause frameshifts, yet only 7% of the CDS indels were not frameshift mutations. Overall, indels in PolG mice are more prevalent than wild type mice, but there are fewer indels in PolG mice compared to point mutations, and there is an underrepresentation of non-frameshift indels.