Somatic mutations in single human cardiomyocytes demonstrate accelerated age-related DNA damage and cell fusion


 The accumulation of somatic DNA mutations is a hallmark of aging in many dividing cells and contributes to carcinogenesis. Here we survey the landscape of somatic single-nucleotide variants (sSNVs) in heart muscle cells (cardiomyocytes) which normally do not proliferate but often become polyploid with age. Using single-cell whole-genome sequencing we analyzed sSNVs from 48 single cardiomyocytes from 10 healthy individuals (ages 0.4 - 82 yrs.). Cardiomyocyte sSNVs increased strikingly with age, at rates faster than reported in many dividing cells, or in non-dividing neurons. Analysis of nucleotide substitution patterns revealed age-related “clock-like” mutational signatures resembling those previously described. However, cardiomyocytes showed distinct mutational signatures that are rare or absent in other cells, implicating failed nucleotide excision repair of oxidative damage and defective mismatch repair (MMR) during aging. A lineage tree of cardiomyocytes, constructed using clonal sSNVs, revealed that some tetraploid (10%) and most cardiomyocytes with higher ploidy (>60%) derive from distinct clonal origins, implicating cell fusion as a mechanism contributing to many polyploid cardiomyocytes. Since age-accumulated sSNVs create dozens of damaging exonic mutations, cell fusion to form multiploid cardiomyocytes may represent an evolutionary mechanism of cellular genetic compensation that minimizes complete knockout of essential genes during aging. The rates and patterns of accumulation of cardiac mutations provide a paradigm to understand the influence of genomic aging on age-related heart disease.


Introductory Paragraph:
The accumulation of somatic DNA mutations is a hallmark of aging in many dividing cells and contributes to carcinogenesis 1-7 , but non-cycling cells have been thought to be protected from mutation. Here we survey the landscape of somatic single-nucleotide variants (sSNVs) in heart muscle cells (cardiomyocytes) which normally do not proliferate [8][9][10][11] but often become polyploid with age [12][13][14] . Using single-cell whole-genome sequencing 1,15,16 we analyzed sSNVs from 48 single cardiomyocytes from 10 healthy individuals (ages 0.4 -82 yrs.). Cardiomyocyte sSNVs increased strikingly with age, at rates faster than reported in many dividing cells, or in nondividing neurons 1 5,17,18 . Analysis of nucleotide substitution patterns revealed age-related "clock-like" mutational signatures resembling those previously described 15 19,20 . However, cardiomyocytes showed distinct mutational signatures that are rare or absent in other cells, implicating failed nucleotide excision repair of oxidative damage and defective mismatch repair (MMR) during aging. A lineage tree of cardiomyocytes, constructed using clonal sSNVs, revealed that some tetraploid (10%) and most cardiomyocytes with higher ploidy (>60%) derive from distinct clonal origins, implicating cell fusion as a mechanism contributing to many polyploid cardiomyocytes. Since age-accumulated sSNVs create dozens of damaging exonic mutations, cell fusion to form multiploid cardiomyocytes may represent an evolutionary mechanism of cellular genetic compensation that minimizes complete knockout of essential genes during aging. The rates and patterns of accumulation of cardiac mutations provide a paradigm to understand the influence of genomic aging on age-related heart disease. of this strategy is quite challenging in humans. This prompted us to evaluate the mechanism of polyploidization and explore the landscape of sSNVs in aging human heart cardiomyocytes.

Cardiomyocyte polyploidization starts early in life and gradually increases with age
To understand the temporal dynamics of polyploidization in human cardiomyocytes, we systematically evaluated the ploidy of cardiomyocyte nuclei from more than 50 human heart tissue samples from individuals ranging from 44 days to 81 years of age. Previous studies have shown that human cardiomyocyte nuclei are mostly diploid at birth and start to become polyploid mainly in the second decade of life 8 . We purified cardiac nuclei from fresh-frozen, unfixed, human myocardium by density sedimentation 44 . Diploid cardiomyocyte nuclei were identified with cell-specific markers, cardiac troponin T. Polyploid cardiomyocyte nuclei were identified with the nuclear stain, 4′, 6-diamidino-2-phenylindole (DAPI) as well as cell-specific markers ( Figure 1A, upper). Our systemic analysis revealed that ~5% of cardiomyocyte nuclei were tetraploid in the 44 day-old heart sample. By 20 years of age, ~35% of cardiac nuclei are tetraploid, and ~5% of the cardiomyocytes were hexaploid or even higher in ploidy (hexaploid+).
The proportion of tetraploid and hexaploid+ cardiomyocytes significantly increased with age (p = 1.5×10 -15 and 3.4×10 -3 , respectively, linear regression, Figure 1B). Our data indicate that polyploidization does not start late in life, but polyploidization in cardiomyocytes is evident in neonatal heart and increases with age.

Cellular fusion is a significant driver of cardiomyocyte polyploidization
Numerous non-inherited somatic mutations record the unique history of each somatic cell originating from a zygote. The clonal structure of these somatic mutations enables us to reconstruct the developmental lineage of somatic cells 1 . In order to study the mechanism of polyploidization in human cardiomyocyte, we isolated a total of 482 single cardiomyocyte nuclei (182 diploid, 173 tetraploid, and 127 hexaploid+) from the postmortem left ventricle heart tissue of a 17-year-old donor using fluorescence-activated nuclear sorting (FANS). We then amplified the DNA from each nucleus by multiple displacement amplification (MDA) 16 Figure 1F, G) contained sSNVs from multiple clades. We observed three-or four-claded cells only in the hexaploid+ cardiomyocyte population ( Figure 1F, G). Since multiclade cells can only be explained by cell fusion rather than endoreplication, our results suggest that at least ~10% of tetraploid cardiomyocytes are generated from the fusion mechanism, and this proportion increases up to ~63% in cardiomyocytes with higher ploidy, after adjusting for the sensitivity of detecting sSNVs from different clades (see Methods, Supplementary Table S2).
We note that even this estimate likely under-states the frequency of cell fusion, due to its limitation in detecting fusing cells that originate from relatively closely related clones.

Nonclonal somatic mutations increase with age in cardiomyocytes
We evaluated the genome-burden of sSNVs using single-cell WGS of 40 tetraploid and 8 diploid single cardiomyocyte nuclei from postmortem hearts of young (<4 yrs.), middle-age (30-63 yrs.), and aged individuals (>75 yrs.) ( Figure 1A, bottom). Cardiomyocyte nuclei DNA was amplified using MDA 16 followed by quality control steps (see Methods) and high-coverage (>50X) WGS 15 for selected, well-amplified cells (Supplementary Table S3). We identified single-cell sSNVs and estimated their genome-wide burden for each cardiomyocyte (Supplementary Table S4) using a modified version of the LiRA 46 algorithm that takes cellspecific dropout rates and sequencing depth distributions into consideration to account for the tetraploidy effect on detection sensitivity and power calculation (see Methods). sSNVs originating from likely MDA amplification artifacts 47 , which show a distinctive nucleotide substitution pattern, were also excluded.
Tetraploid cardiomyocytes from hearts of aged individuals showed an average of ~11,500 sSNVs per tetraploid genome, compared to ~2,600 sSNVs per tetraploid genome from young individuals, with an increasing rate of ~118 ± 22 [s.e.] sSNVs per year (p = 2.0×10 -4 , linear regression; Figure 2A). The age-dependent increase of sSNVs remained significant even after adjusting for the unevenness of MDA amplification (Supplementary Figure S2). The SNVs were distributed broadly across the genome ( Figure 2B). Note that at each age group there are significant intra-individual and inter-individual variations, particularly in the aged group, where a few outlier nuclei showed very high numbers (>15,000 per cell) of sSNVs. We further confirmed the age-dependent trend in diploid cardiomyocytes, in which the aged heart showed an average of ~3,900 sSNVs per diploid cardiomyocyte genome compared to ~600 sSNVs in the young heart (p = 0.013, two-tailed T-test; Figure 2C). Our analysis revealed increasing somatic mutational burden in human heart muscle cells with age, regardless of cardiomyocyte nuclear ploidy.
Further we compared the sSNV accumulation rate in cardiomyocytes with that in neurons 1 , another non-dividing human cell type, and found that aged cardiomyocytes accumulated sSNVs much faster than neurons (~118 ± 22 [s.e.] sSNVs per year in tetraploid cardiomyocyte vs ~23 ± 7 [s.e.] sSNVs per year in neurons), after correcting for the unevenness of whole-genome amplification (p < 6.2×10 -10 for tetraploid cardiomyocytes vs neurons and p < 2.9×10 -3 for diploid cardiomyocytes vs neurons, linear regression; Figure 2D). Tetraploid cardiomyocytes showed a ~2.5-fold higher rate of sSNV increase per-diploid genome than neurons, after accounting for the doubled genomic size in tetraploid cardiomyocytes as compared with neurons. Our results suggest that cardiomyocytes and neurons have differing rates of mutational accumulation during aging.

Signature analysis identifies distinct mutational processes in aging cardiomyocytes
Patterns and types of DNA substitution mutations can be deconvoluted into "signatures" that provide important biological information about their potential causes 48 . Our mutational analysis from 40 tetraploid cardiac nuclei detected 8,197 base substitutions (Supplementary   Table S5). First, we compared the base substitution pattern of age-accumulated sSNVs by subtracting the sSNV profiles of young cardiomyocytes from aged cardiomyocytes, and observed that C>T and T>C mutations accumulated predominantly during the aging process ( Figure 3A).
To decipher mutational processes in heart aging, we performed non-negative matrix factorization 19 using the sSNV profiles of cardiomyocytes as described previously 1, 15 . Cardiac sSNVs were best fit into four distinct molecular signatures referred to as Signature A, B, C, and D in this study (Supplementary Figure S3). Signature B consisted mainly of C>T mutations, depleted at CpG dinucleotides, and matched closely with mutational signatures that were previously ascribed to artifacts created by MDA amplification 47 , and so this signature was removed and not considered further (Supplementary Figure S4). The remaining signatures ( Figure 3B) were similar to known signatures in cancer, neurons, and normal replicating cells suggesting that they may reflect important biological sources of mutations and distinct from any known artifact signatures.
Signature A mainly comprised C>T and T>C mutations and its contribution in cardiomyocytes increased with age. Signature A resembles age-related, "clock-like" signatures previously observed in many cancers 19,49 , normal cycling cells 20 , and single neurons 15 , proposed to reflect faulty repair of deamination of methylated cytosines to thymine that frequently occurs in the CpG context 50 , and aging-associated ( Figure 3A). Our data show a similar clock-like signature in post mitotic cardiomyocytes that is independent of cell division. Hierarchical clustering of Signature A showed that Signature A closely resembled COSMIC signature SBS5, known as an aging signature.
Signature C was distinct from Signature A due to the prominent enrichment of C>A mutations ( Figure 3B). C>A mutations often reflect increased 8-oxoguanine levels, created by oxidative DNA damage 51,52 , one of the most common threats to genome stability 53 . The heart is one of the most highly metabolic organs, with large oxidative demand resulting in reactive oxygen species (ROS) production 54 . During myocardial remodeling (aging or myocardial dysfunction), ROS production might be deregulated, producing 8-oxoguanine (8hydroxyguanine, 8-oxo-Gua, or OH8Gua), one of the most common DNA lesions 55 , and causing mismatched pairing with adenine resulting in G to T and C to A substitutions in the genome.
Oxygen radicals also react with 5-methylcytosine to result in C>T transitions. Signature C mutations could arise due to the defective nucleotide excision repair of oxidative lesions.
A third signature, Signature D was also enriched for C>T and T>C mutations but distinct from Signature A in its tri-nucleotide context ( Figure 3B Comparative signature analysis between cardiomyocytes and neurons-another nondividing cell-shows shared and distinct mutational signatures ( Figure 3D, E). sSNVs from cardiomyocytes and neurons share an indistinguishable clock-like aging Signature A with a similar rate of age-related increase in the distinct cell types, and also a similar Signature C enriched in C>A substitutions, that also modestly increases with age (Supplementary Figure S7).
On the other hand, the dramatic increase in the contribution of Signature D with age was observed only in cardiomyocytes, not in neurons ( Figure 3D). sSNVs that accumulated in aging cardiomyocytes showed broader substitution types with enrichment in untranscribed strands, such as T>C and T>G mutations than those in neurons, whereas age-accumulated sSNVs in neurons showed the enrichment of C>T and T>C mutations in transcribed strands ( Figure 3E). This putative MMR signature, which is relatively specific to aging cardiomyocytes, represents a distinct mutational process in the heart.

Potential source and mechanisms of mutation formation in the aging heart
To understand how mutations can be formed and accumulate during aging in the absence of cell cycling, it is important to recognize that both MMR and NER involve steps of exonuclease removal of a segment of DNA, followed by replication of the remaining strand to reconstitute double-stranded DNA. Hence, even nondividing cells are still undergoing DNA replication of small segments of the genome, so that errors of replication, or two nearby damaged based on opposite strands 61,62 , could result in fixed, double stranded mutations. Our analysis of mutational signatures ( Figure 3B, C) suggested a model where oxidative stress leads to an increased burden of damaged bases, which might overwhelm the MMR machinery of aged cardiomyocytes. Using GTEx expression data 63 , we observed a significant decrease in gene expression for the core components of the MMR complex (MLH1, MLH3, MSH2, MSH3, MSH6, PMS1 and PMS2) with aging in heart samples (p = 0.01, linear regression; Figure 5A), and overall lower expression levels of these genes in heart versus brain samples (p = 2.5×10 -8 , twotailed paired Wilcoxon test; Figure 4A).
We directly assessed potential oxidative damage in the heart using an ELISA (see Methods). We found that the level of 8-hydroxy-2'-deoxyguanosine in aged hearts was more than twice as high as in younger hearts (p = 0.008, two-tailed Wilcoxon test; Figure 4B). These data suggest that aging results in increased generation, decreased repair, or both, of oxidative DNA lesions. It is known that failure of DNA MMR is associated with a strikingly elevated rate of base substitution mutagenesis, as a consequence, tumors with MMR deficiency are amongst those with the highest number of somatic mutations 64 . The MSH2-MSH6 complex is primarily responsible for repairing single base-base mismatches. Our findings of increased sSNV count in aged heart suggest that MMR is not efficient at correcting mismatched nucleotides in aged cardiomyocytes, contributing to the large increase in sSNVs. Taken together, these findings indicate that increased oxidative stress leads to elevated 8-hydroxy-2-deoxyguanosine, which may overwhelm DNA repair systems resulting in increased DNA mutational burden.

Functional impact of sSNVs in the aging process
To further investigate whether the occurrence of sSNVs is associated with defective gene transcription and function, we stratified and compared cardiac and neuronal sSNVs by using the expression profiles of corresponding tissues in GTEx 63 . Our results suggest that Signature A sSNVs are statistically enriched in highly expressed genes at a similar level in both aged cardiomyocytes and aged neurons (p < 0.01, linear regression; Figure 4C, upper), suggesting Signature A as a common transcription-associated signature 5,49 . In contrast, Signatures C and D showed higher contributions in aged cardiomyocytes than in aged neurons, without a strong association with gene transcription level (p < 0.1, linear regression; Figure 4C, middle and lower), indicating that Signatures C and D might result from mechanisms different from Signature A and specific to cardiomyocytes. The age-dependent accumulation of sSNVs in genes highly expressed in cardiomyocytes suggests that the very genes essential for the function of cardiomyocytes are those most likely to be damaged during one's lifetime. Gene Ontology (GO) analysis found that genes involved in mismatch repair, mitochondria organization, and PI3Kinase pathways showed sSNV enrichment (FDR-adjusted p < 0.05, permutation test; Figure   4D). We also identified 25 nonsynonymous and 1 stop-gain sSNVs in aged cardiomyocytes (Supplementary Table S6).
Although many heterozygous mutations would likely compromise cardiomyocyte function, it is expected that deleterious gene "knockout" (KO) mutations in genes would be especially damaging if all the alleles are affected, and that there may be a threshold for the accumulation of such KOs above which cardiomyocyte function would deteriorate. Whereas either endoreplication or fusion of genetically unrelated cells may potentially represent adaptive mechanisms to guard against these KO mutations, we compared these two by estimating the accumulation of gene KOs in cardiomyocytes using a probability model where at least two coincident deleterious sSNV events in a diploid cell, or four deleterious sSNVs within one gene in tetraploid cells, cause loss of function. Tetraploid cardiomyocytes generated through endoreplication had an average probability of 1% of getting one or more genes completely knocked out by age 60, with this probability increasing to 6% by age 80, implying that a substantial fraction of cardiomyocytes would carry damaging mutations ( Figure 4E). In contrast, cardiomyocytes generated by fusion showed a significantly lower probability of gene KO (p = 0.002, two-tailed paired Wilcoxon test; Figure 4E), with less than 0.5% of cells with KO genes at age of 80 ( Figure 4E). These data strongly support that fusion is more effective mechanism to avert the loss of gene function caused by age-related mutations ( Figure 4F). We note also that our WGS-based method does not identify the source of the fusing cells, thus we cannot distinguish fusion of cells from distantly or closely-related cardiomyocytes.
Our results showed that each human cardiomyocyte has a profoundly distinctive genome, harboring as many as 4000 to 30000 somatic SNVs. These estimates are likely to improve in accuracy with better amplification methods as well as a larger sample size. We also note that our samples are biased towards cells that amplified well and equally, i.e., cells more likely to have intact genomes and so may underestimate mutation rates. Cardiomyocyte SNVs display signatures of mutagenic processes, including increased oxidative stress, defective mismatch repair and a preponderance of C>T deamination especially at the CpG context ( Figure 4F). Our analysis of human cardiomyocytes lays a foundation for better understanding the genomic landscape and mechanisms driving mutation accumulation in aging cardiomyocytes that may help develop new treatments to reduce age-related cardiomyocyte dysfunction.

Acknowledgment:
We thank R. Sean Hill, Jennifer N. Partlow for assistance, Boston Children's Hospital Flow

Competing interest declaration:
The authors declare that they have no competing interests.

Materials and Methods:
Human tissues DNA sample preparation and isolation of cardiomyocyte nuclei: Briefly Amplicons were generated using a high-fidelity polymerase and then were purified using a magnetic bead capture kit (Ampure; Agencourt) and quantified using a fluorometric kit (QuantIT PicoGreen; Invitrogen). The purified amplicons were then pooled in equimolar concentrations and were sequenced on Illumina HiSeq X10 with an average depth of >7000X (Extended Data Fig. 1).

Read alignment and post-processing:
Reads generated from single-cell WGS and targeted sequencing were aligned against the and then post-processed with local realignment around indels and base quality score recalibration using Genome Analysis Toolkit (GATK) (ver 3.5) 5 . For single-cell targeted sequencing data: 1) read pairs with overlapping tails were merged into single consensus reads using USEARCH (ver 11.0.667) 6 , 2) error-prone reads with gap alignment or >4 mismatches were removed, and 3) the 5bp regions at both ends of the consensus reads were masked due to its vulnerability to systematic amplification and sequencing errors.

Somatic SNV genotyping from single-cell targeted sequencing:
The single-cell genotype likelihood was calculated for each of the 253 sSNV sites using

Lineage reconstruction using clonal somatic SNVs:
To reconstruct the lineage tree of 482 cardiomyocytes subjected to single-cell targeted sequencing, we calculated the pairwise genetic similarity (S) as the cosine similarity of binarized sSNV profiles between two cardiomyocytes, where called sSNVs were considered as "1" and absent sSNVs were considered as "0". Unsupervised clustering was then performed using the UPGMA (unweighted pair group method with arithmetic mean) method with the distance defined as 1-S. The clustered cardiomyocytes were grouped into 9 distinct nested clades (clades A-I in Figure 1C), where each clade was defined by one or more clade-specific sSNVs (i.e., sSNVs present in only one clade but absent in other clades). For 93.2% (340/365) of the claded cardiomyocytes, they only harbored sSNVs from a single clade (Supplementary Table 1). In comparison, 25 cardiomyocytes (1 diploid, 8 tetraploid, and 16 hexaploid+) contained sSNVs belonging to more than one clades, suggesting their multi-clade origins that could be explained by fusion events between cells from different clades.

Sensitivity correction for ploidy proportion:
In our sSNV-based lineage reconstruction, the detection sensitivity of cardiomyocytes derived from potential fusion events was affected by 1) whether the clade-informative sSNVs were successfully MDA-amplified and sequenced in targeted sequencing; 2) whether the origin cells were from two or more genetically distinct clades or the same clade because the latter case was not distinguishable from cardiomyocytes without fusion.
To address this, we developed a mathematical model to estimate the detection sensitivity in profiled diploid, tetraploid, and hexaploid+ cardiomyocyte populations, separately. For simplicity, we only considered the fusion events involving two cells because they  Table 2).

Measuring the evenness of genome amplification in single cells:
We measured the evenness of genome amplification in single-cells using two metrics: median absolute pairwise difference (MAPD) and coefficient of variation (CoV). MAPD is the median value of absolute differences between the copy number ratios of neighboring bins with variable length where bins were divided to have the same number of uniquely mapping reads. CoV is the measure of variance of bin-wise copy number ratios, calculated by taking the ratio of their standard deviation to the mean. Both higher MAPD and CoV scores represent greater unevenness of single cell genome amplification.
Binning, GC normalization, segmentation, and copy estimation were performed following the previous single-cell copy number analysis protocol 9 , to obtain the copy number ratio per bin and calculate MAPD and CoV scores.

Estimation of amplification dropout rates:
Germline heterozygous SNVs were identified from bulk WGS data using GATK 5 , and only those reported by the 1000 Genomes Project 10 were considered subsequently as high-confidence variants. For each single-cell, a germline heterozygous SNV was considered as "locus-dropout" if the total coverage in single-cell WGS is less than 5X and considered as "allele-dropout" if the number of reads supporting either a reference or a mutant allele is less than two, and the genome-wide locus-and allele-dropout rates were then calculated as the proportions of dropout sites among all germline heterozygous SNVs.

Somatic SNV calling from single-cell WGS data:
We performed phasing-based linked read analysis using the LiRA method (ver 2018Feb) 11 to identify sSNVs in single-cells using ~30X WGS data of non-heart tissue from the same individual as bulk germline controls. Initial somatic and germline SNVs were called using GATK's HaplotypeCaller and germline SNVs were further phased by Shapeit 2 (ver 904) 12 . LiRA distinguishes true somatic mutations from base-calling or amplification errors by leveraging the linkage information between the somatic candidate and adjacent phased germline mutations. Genome-wide sSNV burden was further calculated by correcting for the detection power of phasable regions estimated from germline SNVs. We considered sSNVs in autosomes only to avoid potential detection bias in sex chromosomes between different genders.

Genome-wide sSNV burden correction for tetraploid cells:
Each tetraploid cardiomyocyte contains two sets of diploid genomes (i.e., four haplotypes). Theoretically, somatic mutations present in one out of the four haplotypes can be called by LiRA only when the reads violating complete linkage were lost due to allelic or locus dropout (Extended Data Fig. 8). In comparison, germline mutations can be called in the same way as in the diploid cells. Therefore, the LiRA-estimated power from germline SNVs should be corrected by cell-specific allelic or locus dropout rates before applying to the count of identified sSNVs in tetraploid cells.
By definition, the dropout status of a site in a tetraploid cell (S4n) was determined by the status of its two diploid origins as below, Cell-specific allelic and locus dropout rates were estimated for each tetraploid cell, and then the dropout rate for its two diploid origins was calculated from the above matrix under the assumption of equal rates between the two diploid origins. The count and burden of LiRA-called sSNVs were further adjusted by the locus and allelic dropout rate of the corresponding diploid origins and the sensitivity loss due to the decrease of perhaplotype sequencing depths from diploid to tetraploid cells.

Mutational signature analysis:
Mutational signatures were de novo decomposed by the non-negative matrix factorization (NMF)-based mutational signature framework 13 using Mutational Patterns (ver 1.8.0) 14 , using the 96 trinucleotide contexts of sSNVs detected from tetraploid cardiomyocytes in this study as well as non-disease neurons that were previously studied 15 . We estimated signature stability and reconstruction error and found that four signatures best fit the observed sSNV profiles (Extended Data Fig. 3). We then compared our de novo signatures (Signatures N1, N2, N3, and N4) with previously reported signatures in neurons (Signatures A, B, and C) 15 and signatures potentially resulted from single-cell artifacts (SBS scE and scF) 16 . As shown in Extended Data Fig. 4, Signature N1, N2, and N3 resemble Signature B (as well as SBS scF), A, and C, respectively, whereas Signature N4 did not show high similarity to any of these signatures, suggesting a potential cardiacspecific signature (renamed as Signature D).
Considering the evidence that SBS scF (highly similar to Signature N1/B) represents potential single-cell artifacts 16 , we removed the contributions of Signature N1/B in the estimation of genome-wide sSNV burden per cell. Specifically, we decomposed the sSNV profile of each single cell using Signatures N2/A, N3/C, N4/D as well as SBS scE and scF (Signature N1/B was not included because it was nearly identical to SBS scF) using MutationalPatterns 14 . Then we calculated the number of sSNVs derived from SBS scF by multiplying the LiRA-estimated sSNV burden and the cell-specific weight of SBS scF and then subtracted the sSNV count from SBS scF in subsequent burden analyses.

Mutation spectrum and strand bias analysis:
The LiRA-identified sSNVs were grouped into three categories according to the age of their cell donor: young (<4 yrs.), middle-age (30-63 yrs.), and aged (>75 yrs.), and then the mutation spectrum and strand bias were calculated for each age category. The transcriptional strands of genic sSNVs were assigned based on the UCSC TxDb annotations by MutationalPatterns 14 , where mutated bases ("C" or "T") on the same strand as the gene direction were categorized as "untranscribed" and on the opposite strand as "transcribed". To characterize sSNV accumulation during aging, we further estimated the mutation spectrum and strand bias for the net increase of sSNVs between young and aged categories. Specifically, we first measured the absolute sSNV count for each mutation type by multiplying its proportion and the average genome-wide sSNV burden for each age category, and then subtracted the sSNV count for each mutation type between young and aged categories. Statistical significance of strand bias was determined by the Poisson test.

Gene expression analysis:
The expression matrix for left ventricle of heart and frontal cortex (BA9) of brain was downloaded from GTEx 17 . The per-gene expression value was normalized for each individual after controlling for age and gender using DESeq2 (ver 1.24.0) 18 . To study the age-dependent changes in MMR activity in the heart and brain, we extracted the expression levels of MLH1, MLH3, MSH2, MSH3, MSH6, PMS1 and PMS2, which encode the core components of the MMR complex. Individuals with both heart and brain expression profiles were binned according to their ages. Individuals with age <40 years (n=6) were excluded due to the small sample size. For the remaining 178 individuals, we calculated the average expression levels of the seven MMR genes in heart and brain samples, separately, and tested their association with age using the linear regression model.
To investigate the relationship between somatic mutation and gene expression, we assigned genes into four quartiles based on their average expression values in heart or brain samples across all GTEx individuals. Cardiac and neuronal sSNV densities were calculated for each quartile of genes, after normalizing by gene length and per-cell sSNV detection power. The standard deviation of sSNV density was estimated using a permutation test, in which the quartile classification of genes was randomly shuffled and the permuted sSNV densities were calculated for 1000 rounds. We further performed a NMF-based mutational signature decomposition for sSNVs located in each quartile of genes, to estimate the relative contributions of Signature A, Signature C, Signature D, SBS scE, and SBS scF for each quartile. The sSNV density for each signature was calculated by multiplying the overall sSNV density by the signature contribution. We also performed the above analysis by using the expression profiles from aged individuals (>75 yrs.) only and observed robust results.

Functional enrichment analysis:
Gene Ontology (GO) functional enrichment analysis was performed using GOseq (ver 1.34.1) 19 . We assigned a binary value "0" or "1" to each RefSeq gene according to whether any sSNV was present in the gene in any single cell and built the sSNV-gene table for cardiac and neuronal sSNVs separately. GOseq uses the Wallenius approximation method to test the enrichment of sSNVs for each GO term, after applying a probability weighting function to control for potential bias from gene length. Genes without any GO annotation were ignored when calculating the total gene count. GO terms with less than 5 member genes with sSNVs were excluded to avoid ascertainment bias.
GO terms with more than 1000 member genes were also exclueded.
To identify GO terms that were specifically enriched in cardiomyocytes but not in neurons, we performed a permutation test among all GO terms with p < 0.01 for either cardiac or neuronal sSNVs. For each permutated GO term, we compared the observed rank difference in GOseq's p between cardiac and neuronal sSNVs against the expected null distribution, which was estimated by 1000 rounds of random shuffling of the sSNVgene tables. The FDR method was applied for correcting multiple hypothesis testing.

Measurement of oxidative stress:
The level of 8-hydroxy-2′deoxyguanosine and 8-hydroxyguanosine was measured in 250 ng total nucleic acids extracted from left ventricular cardiomyocytes using a competitive enzyme-linked immunosorbent assay kit (#589320, Cayman Chemical) according to the manufacturer's instructions. The samples were incubated for 1 hour with a monoclonal antibody against 8-hydroxy-2′deoxyguanosine in a micro titer plate pre-coated with 8hydroxy-2′deoxyguanosine and 8-hydroxyguanosine. The final color was developed by the addition of 3,3,5,5-tetramethylbenzidine, and absorbance was measured at 450 nm.
The samples were diluted 1:50 with EIA buffer before assaying.
Accumulation of exonic, deleterious "gene knockout" (KO) mutations might be detrimental for proper cell function. These mutations can be "biallelic" in the case of diploid cells, or "quadallelic" in the case of tetraploid cells. The number of sSNVs identified in this study were used to estimate the accumulation of gene KOs in single cells, using an extension of the method described in Lodato et al 20 .
To account for genes that are highly dosage sensitive and thus can be haploinsufficient, we included a factor to capture the probability of a mutation landing on an allele of a gene with a high pLI score.
The pLI metric measures the probability of loss-of-function intolerance 21 , and genes with pLI > 0.90 are considered highly dosage sensitive. These high pLI score genes comprise 17% of all genes. Consequently, the calculation was computed as follows: = # × total # deleterious variants total # variants × where p is the ploidy factor which captures the probability of obtaining a mutation on the remaining alleles of the gene (e.g., p=0.5 for diploid genomes, and p=0.125 for tetraploid genomes), and n is the expected number of deleterious mutations (i.e. frameshift, splice altering, stop gain). The average was taken across all cells per individual and 95% confidence interval on those point estimates were calculated. Regressions were performed using a log-normal model to capture the non-linear trend of the probability of obtaining cells with KO genes with age. All calculations were performed using custom R scripts.

Data availability:
All the single-cell WGS data for cardiomyocytes will be deposited in the NCBI SRA.

Code availability:
Code used in this study is available upon reasonable request to the corresponding authors. In diploid cells, LiRA identified the complete linkage between each sSNV candidate and its adjacent germline heterozygous mutation, which distinguish true sSNVs from technical artifacts. In tetraploid cardiomyocytes, sSNVs present in one out of the four haplotypes were able to be called by LiRA when the reads violating complete linkage were lost due to allelic or locus dropout.  Figure 1D-F but not listed here. Table S2. Ploidy proportion with sensitivity correction. The proportion of single-clade and multiclade cardiomyocytes before and after sensitivity correction. Odds ratio and P-value (one-tailed proportion test) were calculated for tetraploid and hexaploid+ cells when comparing against diploid cells as negative control.