Experimental design and sequencing data qualities.
To systematically investigate the genetic mutations that accumulate in pigs that are genome-edited for xenotransplantation, we used the Wuzhishan GTKO minipig as a representative model; these model pigs are PERV-C free to prevent PERV transmission to the human recipient9,10. To create Wuzhishan GTKO minipigs for xenotransplantation, we designed a sgRNA to target exon 3 of the GGTA1 gene (Fig. 1a), which is essential for coding galactosyl-alpha 1,3-galactose34. We electroporated CRISPR/Cas9 reagents with the designed sgRNA from Streptococcus pyogenes into fibroblasts isolated from the ear tissue of Wuzhishan minipigs. Next, positive genetic modification of GGTA1 and off-target effects were evaluated by using polymerase chain reaction (PCR) and Sanger sequencing (Supplementary Table 1). Then, we used validated clones with high knockout efficiency (nearly 100%, Supplementary Table 2) to successfully generate GTKO pigs by SCNT and mated GTKO pigs to produce F1 offspring (Fig. 1b, d). GTKO pigs were born without incident and were healthy. Furthermore, we compared the health of GTKO pigs with that of wild-type pigs based on body weight (Supplementary Fig. 1a), blood index (Supplementary Table 3), and litter performance (Supplementary Fig. 1b) at approximately six months of age, and no difference (P > 0.05) was shown between GTKO pigs and wild-type pigs.
To investigate the genetic mutations that accumulated in GTKO pigs compared with original wild-type pigs, we studied five GTKO pigs (named 657 (♀), 659 (♀), 666 (♂), 669 (♂), 681 (♀)) with two different types of genetic modifications of GGTA1 (-2/+1, -1/+1) and three F1 offspring (named 2216 (♂), 2217 (♂), 2218 (♀), sire 669, dam 657) (Fig. 1b, Supplementary Table 4). These GTKO pigs were born without incident and observably healthy. We validated the genotypes of the cell lines and GTKO pigs using Sanger sequencing at the DNA level and fluorescence-activated cell sorting (FACS) at the protein level (Supplementary Figs. 2, 3). We performed whole genome sequencing (WGS) of the ear samples of GTKO pigs and the original wild-type pigs (named 153 (♂), 214 (♀)) with a mean depth of 110× across the entire sus scrofa11.1 reference genome (Supplementary Table 5) on the DNBSEQ-T7 platform, which showed comparable levels of sequencing quality, uniformity of coverage, percentage GC coverage, and variant accuracy with Illumina sequencing platforms35. To study how the genetic mutations in GTKO pigs arose, we also performed WGS of the wild-type cell lines (named WT-153-C, WT-214-C) and the GTKO cell lines (named KO-153-C(-2/+1), KO-214-C(-2/+1), KO-214-C(-1/+1)) with an average depth of 110× across the entire sus scrofa11.1 reference genome (Supplementary Table 5) on the DNBSEQ-T7 platform. The WGS data exhibited a high read quality with Q30 > 90%, duplicate percentage < 6%, and properly paired reads > 98% (Supplementary Table 5). A dendrogram of identity by state (IBS) distance verified the parentage of the fifteen sequenced genomes among the DNA sequences involved in this study (Fig. 1c). Genome sequences from the same derived pig were closely related and clustered together. The identity by descent (IBD) values also confirmed the genetic relationship between the F1 offspring (2216, 2217, 2218) and their parents (669, 657) (Supplementary Fig. 4).
GTKO pigs contain thousands of de novo genetic mutations.
We identified de novo genetic mutations that showed the gain of a new allele in each GTKO pig relative to their corresponding original wild-type pig genome, and the alleles were inheritable between generations. To characterize the genetic mutations that accumulated in F0 generation GTKO pigs, we employed the corresponding wild-type pigs as references. Combining three mutation callers (Mutect2 of GATK36, Strelka237 and VarScan238) and a serial reasonable filtering procedure (“Methods” section), approximately 1205 de novo mutations (DNM), including SNVs and Indels per F0 generation GTKO pig genome, were identified (Table 1, Supplementary Data 1), which were all fixed in heterozygous conditions. Mutations fixed in all F0 generation GTKO pigs were located randomly in the genome. The percentages of SNV types were distributed randomly in each F0 GTKO pig (Supplementary Fig. 5). GTKO pigs produced by the same route shared a comparable portion of mutations. For example, pig 666 (814 de novo genetic mutations) and pig 669 (1070 de novo genetic mutations) shared 474 de novo mutations, 58.2% and 44.3%, respectively. Pig 657 (1498 de novo genetic mutations) and pig 659 (1490 de novo genetic mutations) shared 1472 short mutations, 98.3% and 98.8%, respectively. The majority of total de novo genetic mutations (5983 of 6022) were annotated to be unlikely to change protein by SnpEff39 software combined with the corresponding Ensembl gene annotation file. Only approximately eight mutations per F0 generation GTKO pig genome were predicted to potentially alter protein function. Of all twenty-five mutated genes that may alter protein function, two mutated genes (EGFR and THBS1) were associated with bladder cancer based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis40 (Table 1). Four genes (DCC, EGFR, LRIG3 and RUNX1) that mutated in GTKO pig 681 were previously found to be mutated in some human cancers41 (Table 1). For example, the DCC gene is a well-characterized human colorectal cancer suppressor gene found mutated in GTKO pig 681, and RUNX1 mutations are closely related to tumorigenesis in human leukemia42 and breast cancer43. However, we did not find any abnormal cancer-associated phenotype in GTKO pig 681. In addition to cancer-related genes, GTKO pigs (657 and 659) contained mutations in a gene (RALGAPA1) with known roles in neurodevelopmental disorders with hypotania44.
Since plasmid integration was reported in germline genome-edited cattle45, we included the sgRNA-Cas9 plasmid backbone sequence in our comparison of the WGS reads to the sus scrofa11.1 reference genome and found an unexpected integration of the gRNA-Cas9 plasmid into the genome (chr1:212536798) of GTKO pig 681 (Supplementary Fig. 6). We further confirmed the insertion by agarose gel analysis, indicating the integration of the origin (KO-214-C(-1/+1)) (Supplementary Fig. 7). Although no abnormal phenotype was observed in GTKO pig 681 after birth, implying that gRNA-Cas9 plasmid integration seemingly did not affect its health, we still believe that the unexpected integration of the gRNA-Cas9 plasmid occurred accidently and should be avoided carefully. We also evaluated whether there were structural variants (SVs) (≥ 50 bp) based on the combination of two SV calling methods (Delly46 and Manta47) and a serial reasonable filter (refer to “Methods” section); the results showed four high-confidence heterozygous tandem duplications observed identically in GTKO pig 666 and GTKO pig 669 (Supplementary Table 6), which were predicted to cause structural changes in the corresponding genes (MAN1C1, HMG20A, ENSSSCG00000042071 (a lncRNA gene), TUBD1 and VMP1) by SnpEff software. GTKO pig 666 and GTKO pig 669 behaved normally, with no phenotypes corresponding to the affected genes, which may be due to the existence of another normal allele. As MSI (microsatellite instability) is an important indicator of genomic instability48, we calculated the MSIsensor’s score49, which corresponds to the percentage of mutated microsatellite loci to detect MSI status and compare with wild-type pigs. The results showed that the genomes of GTKO pigs were all in MSS (microsatellite stability; MSIsensor’s score < 3.5) (Supplementary Table 7), indicating that the genomes of knockout animals were stable. At the same time, we performed read depth analysis across chromosomes and did not find any chromosomal aberration in the genomes of GTKO pigs (Supplementary Fig. 8).
To investigate de novo mutations in the offspring produced by F0 GTKO pigs, we studied a family (sire 669, dam 657, offspring 2216,2217 and 2218). The siblings produced as the F1 generation are the first reported to be examined by WGS to our knowledge. We adopted the method described in Vivek et al. 26 to evaluate the statistical power of detecting DNMs. Given a mean sequencing depth of 110× and a minimum de novo allele frequency of 10% (see “Methods” section), the power to detect one DNM occurring in the single-cell or the two-cell stage of zygote was at least 99.98%. We performed genome-wide variant calling using four different tools, including GATK36, Platypus50, Freebayes51, and Samtools52, with serial filtering and then identified DNMs by TrioDeNovo53 software referring to the previous DNM calling pipeline25. The combination of multiple different calling tools is necessary to identify high-confidence variants because the performance of different tools varies when conducting genome-wide variant calling. The overlapping variants from the four different calling tools were identified as high-confidence variants (Supplementary Figs. 9, 10). Notably, the four different tools exhibited consistency (> 68%) for variant calling, which was similar to the study of Cas9-edited rhesus monkeys25. Finally, 1 high-confidence de novo indel and 17, 19 and 14 high-confidence de novo SNVs were identified in the siblings (2216, 2217 and 2218), respectively (Table 2). Only 1 mutation in LOC100523897 in the F1 generation GTKO pig 2217 genome was predicted to alter protein function (Table 2, Supplementary Table 8). All DNMs in the F1 generation GTKO pigs were located randomly in the reference genome and did not overlap with each offspring (Supplementary Table 8). Taken together, these results show that the number and genome-wide distributions of DNMs among the three siblings were comparable. These DNMs in the offspring can be explained by the known spontaneous mutation rate of wild-type pigs (3.6 × 10− 9 per site per generation, 95% confidence interval (CI), CI = 2.8 × 10− 9- 4.4 × 10− 9) with 7–11 expected DNMs per generation54. We also evaluated whether there were de novo structural variants (SVs) (≥ 50 bp). Based on two structural variant calling methods (Delly46 and Manta47) and multiple filters (refer to “Methods” section), we did not find any high-confidence de novo SVs in the offspring (Supplementary Table 10). We did not find any chromosomal aberration in F1 pigs by read depth analysis (Supplementary Fig. 8). These results reflected the fact that the genome of GTKO pigs is stably inherited during intergenerational transfer.
To study the influence of these thousands of genetic de novo mutations in GTKO pigs on molecular function, especially gene expression, we performed RNA sequencing (RNA-seq) of blood tissues of wild-type pigs and GTKO pigs. Three F0 GTKO pigs (669, 657, 659) and three F1 offspring (2216, 2217, 2218) were included. Unfortunately, the corresponding wild-type pigs (pig 153 and pig 214) died and could not be sampled for RNA sequencing. Therefore, 4 wild-type pigs (named 872, 876, 937 and 950) were selected; the grandfather of these pigs was wild-type pig 153. All pigs were fed in the same environment. In total, we obtained an average of 60 million raw reads for each sample. The average percentage of Q30 RNA pair reads was higher than 90% (Supplementary Table 10). First, we found that the GGTA1 gene maintained normal expression levels in GTKO pigs compared with wild-type pigs, although GGTA1 transcripts had a missense mutation. We also detected the expression of PERV subtypes, including PERV-A, PERV-B and PERV-C, in the blood samples of these pigs. We found different levels of expression of PERV-A and PERV-B subtypes in the blood samples, but not PERV-C (Supplementary Fig. 11), confirming that these pigs were PERV-C free; this mitigates a special risk induced by the transmission of PERVs via xenotransplantation55. Then, we found that the relative expression levels of most genes were very close, and the average Pearson correlation of expression levels among the samples was > 0.998 (Supplementary Fig. 12, Supplementary Fig. 13a). This means that GTKO pigs, including F1 offspring, had the same gene expression pattern and maintained normal physiological function when compared with wild-type pigs blood tissues. With a log2-fold-change > 1 and adjusted p value < 0.01, we observed 12 upregulated genes and 7 downregulated genes in F0 GTKO pigs compared with wild-type pigs (Supplementary Fig. 13b) and 152 upregulated genes and 63 downregulated genes in F1 GTKO pigs compared with wild-type pigs (Supplementary Fig. 13c). Gene Ontology (GO) term enrichment analysis and KEGG pathway enrichment analyses showed that significantly differentially expressed genes (DEGs) of F0 GTKO pigs were not enriched in specific pathways. The DEGs of F1 GTKO pigs were enriched in pathways such as metabolic pathways, which may exert a tiny influence on pig health and xenotransplantation (Supplementary Fig. 13d). Combining transcriptomic data with genomic data, we found that mutated genes in GTKO pigs did not overlap with these DEGs, and the distance between the locations of de novo genetic mutations and the locations of the DEGs was large, indicating that the accumulated genetic mutations have no association with differential gene expression. Taken together, these results show that accumulated mutations may have a limited influence on the healthy phenotypes of GTKO pigs.
De novo genetic mutations in GTKO pigs arise through multiple mechanisms.
GTKO pigs were actually developed from a single cell. Mutations may first exist in fibroblast cell lines or GTKO cell lines at low frequency. We analyzed the genome sequences of corresponding fibroblast cell lines and GTKO cell lines to investigate how these thousands of de novo genetic mutations accumulated during the GTKO pig generation process (Fig. 1b). Given the mean WGS depth (110×) (Supplementary Table 5), de novo mutations at a frequency > 1% preexisted in fibroblast cell lines or GTKO cell lines. Of all approximately 1205 observed de novo genetic mutations fixed in GTKO pig genomes on average, we found approximately 67 mutations that existed and were fixed in fibroblast cell lines and 748 mutations that existed and were fixed in GTKO cell lines on average (Table 3). Surprisingly, on average, 358 de novo genetic mutations per GTKO pig genome could not be observed either in corresponding fibroblast cell lines or in corresponding GTKO cell lines. Considering the reports about genomic instability during reprogramming by nuclear transfer32,56,57, we believed that these unassigned de novo genetic mutations occurred during or shortly after nuclear/transfer and then fixed in the GTKO pigs. In addition, of four heterozygous tandem duplications in GTKO pig 666 and GTKO pig 669, three tandem duplications existed first in GTKO cell lines, and one tandem duplication occurred during or shortly after nuclear/transfer (Supplementary Table 6). The genomes of the cell lines were all in the state of MSS (Supplementary Table 7), and no chromosomal level changes were observed in fibroblast cell lines or GTKO cell lines (Supplementary Fig. 8).
Mutations that first existed in the different fibroblast cell lines and were finally fixed in the GTKO pig genome were not shared between different original cell lines. For example, mutations preexisting first in the WT-153-C fibroblast cell line and in the WT-214-C fibroblast cell line were not shared. However, mutations existing first in the same fibroblast cell line and finally fixed in the GTKO pig genome were shared. For example, 60 de novo mutations that existed first in the WT-153-C fibroblast cell line and were fixed in GTKO pig 666 and GTKO pig 669 were exactly the same. Mutations that existed first in fibroblast cell lines all occurred at a low frequency (Fig. 2a), indicating that these mutations were normal somatic mutations that occurred during in vitro fibroblast cell culture.
Mutations that existed in the same GTKO cell line and were fixed in the GTKO pig genome were largely shared. For example, mutations that existed in the KO-153-C(-2/+1) cell line and were fixed in GTKO pig 666 and GTKO pig 669 were 81.3% and 71.5% shared. The number of mutations that existed first in GTKO cell lines was significantly greater than the number of mutations that existed first in fibroblast cell lines (normal-culture mutational load) (P < 0.05 by Mann‒Whitney test). A proportion of mutations the existed in GTKO cell lines were at low frequency (Fig. 2b), indicating that these mutations were introduced in the later stages of cell line culturing. Other mutations in GTKO cell lines existed at a nearly 50% frequency, indicating that these mutations should be present in nearly all cells and were introduced in progenitor cells. Therefore, some mutations were not be due to in vitro GTKO cell culture; this result raises the possibility that a significant number of mutations occur during or shortly after genome editing and then become fixed during colony picking and expansion.
To investigate whether de novo mutations that existed in the GTKO cell lines were off-target mutations, we measured the targeting specificity of CRISPR/Cas9 technology. We first validated on-target mutations of GGTA1 in GTKO pigs (657, 659, 666, 669, 681) and found that they were all successfully modified by the designed CRISPR/Cas9 system. We also found that these on-target mutations indeed existed in corresponding GTKO cell lines (KO-153-C(-2/+1), KO-214-C(-2/+1), KO-214-C(-1/+1)), indicating that on-target mutations were passed to GTKO pigs produced by SCNT (Supplementary Fig. 14a). The inheritance of on-target mutations from parental GTKO pigs (sire 669, dam 657) to the offspring (2216, 2217, 2218) was also in line with the Mendelian inheritance law (Supplementary Fig. 14a), indicating that on-target mutations were also stably passed among generations. At the RNA level, we found consistent evidence (Supplementary Fig. 14b). In addition, the observed on-target variant allele fractions at the cell level and at the individual GTKO pig level were all nearly 0.5, indicating that the on-target mutations were not mosaic. Any other genome sequence change, including large structural variants at the intended on-target sites, was not observed in all genomes. To check whether mutations that existed in GTKO cell lines were due to CRISPR/Cas9, we performed off-target analysis. Cleavage by the widely studied Streptococcus pyogenes Cas9 ortholog (SpCas9) is mediated by a 20-nt segment of a guide RNA (gRNA) complementary to a target DNA sequence that precedes a protospacer adjacent motif (PAM), where Cas9 is recruited58. With the employment of Cas-OFFinder59 software allowing up to 3 mismatches with 1 bp of inserted or deleted sequence and up to 4 mismatches with no inserted or deleted sequence between the on-target and off-target site for the GGTA1 sgRNA, we predicted 11634 potential off-target locations across the reference genome, which were randomly distributed across the genome (Supplementary Data 2, Supplementary Fig. 15). However, we did not find any mutation (SNVs, Indels and SVs) that accumulated in GTKO cell lines or GTKO pigs in any predicted off-target site (Table 4). Hence, no off-target effects were detected for the designed sgRNA for GTKO pigs at the genome-wide level. That is, mutations preexisting first in the same GTKO cell line were not caused by the side effects of gene editing.
We extended this analysis by asking whether the CRISPR/Cas9 plasmid was mutagenic and responsible for the hundreds of de novo mutations that existed in GTKO cell lines. We further designed experiments to explore the effect factors, including the dose and presence of Cas9 and sgRNA, on genome stability. Referring to the production of the GTKO pigs, we established the treated cell lines “empty plasmid” (153-control, 214-control), “Cas9 only plasmid” (153-Cas9, 214-Cas9), “Cas9 only plasmid (high dose)” (214-Cas9High), “Cas9 only plasmid (low dose)” (214-Cas9Low), and “CRISPR/Cas9 plasmid knockout unsuccessfully” (153-(sgRNACas9)KO-, 214-(sgRNACas9)KO-) and performed WGS with a mean depth of 110× on the DNBSEQ-T7 platform (Fig. 3, Supplementary Table 5). Groups of “CRISPR/Cas9 plasmid knockout successfully” (KO-153-C(-2/+1), KO-214-C(-2/+1), KO-214-C(-1/+1)) were also included as positive controls in this experimental design. The WGS data also exhibited a high read quality with Q30 > 90%, duplicate percentage < 6%, and properly paired reads > 98% (Supplementary Table 5). We first confirmed that the GGTA1 gene was not edited by the Cas9 protein in all control groups, especially the “CRISPR/Cas9 plasmid knockout unsuccessfully” groups. We identified sites that showed the gain of a new allele in each treated cell line relative to their corresponding matched progenitor fibroblast genome. Finally, we identified hundreds of high-confidence de novo mutations in the “empty plasmid” group, in the “Cas9 only plasmid” group, in the “Cas9 only plasmid (high dose)” group, in the “Cas9 only plasmid (low dose)” group and in the “CRISPR/Cas9 plasmid knockout unsuccessfully” group (Supplementary Data 3, Table 5), which were all randomly distributed in different chromosomes. Further analysis showed that there was no overlap of these mutations in all groups. Only several mutations per treated cell line genome were predicted to alter protein function (Table 5). No high-confidence SV or chromosomal aberration was observed in any control group (Supplementary Table 6). The de novo mutation number, distribution and predicted effects of mutations in all groups were comparable, indicating that the mutations were not associated with different components of the CRISPR/Cas9 plasmid.
A large proportion of mutations were heterozygous at an approximately 50% frequency, indicating that these mutations should be present in nearly all cells in the treated cell lines and were introduced in the progenitor cell (Fig. 4a). Some de novo mutations preexisted first at low frequency (< 0.2), indicating that these mutations were introduced in the later stages of treated cell line development (Fig. 4a). These results also raise the possibility that some mutations occurred during or shortly after genome editing and then became fixed during colony picking and expansion; meanwhile, others occurred during subsequent culturing in all treated cell lines.
To infer the cellular mutational pattern processes of somatic alterations in the genome, we performed mutation signature analysis of de novo mutations that occurred in all types of treated cell lines. Mutation signature analysis revealed the existence of an SBS18-like mutational pattern and an SBS40-like mutational pattern (Fig. 4b). The SBS18-like signature is similar to the SBS18 mutational signature reported in the Catalog of Somatic Mutations in Cancer (COSMIC)41 database, and their cosine similarity is 0.88. The SBS40-like mutational pattern is similar to the SBS40 mutational signature reported in the COSMIC database, and their cosine similarity is 0.82. The number of mutations attributed to SBS40 is correlated with patient age for some types of human cancer, which may be related to spontaneous somatic mutations. The proposed etiology of the SBS18 signature in COSMIC is damage by reactive oxygen species41. Previous studies have reported that the electrotransfection of cells can lead to cell cycle arrest and apoptosis and may cause damage to genomic DNA in cells, presumably due to reactive oxygen species (ROS) generated during pulse application60; the majority of mutations induced by ROS appear to involve modification of guanine, causing C > A/G > T transversions61. Considering our result that the most common SNV type was also C > A/G > T in all groups (Supplementary Fig. 5) and the attribution of the SBS18 signature, we inferred that the mutations could be induced by the physical stimulation of plasmid introduction rather than by the side effects of gene editing. Furthermore, we believe that optimizing the delivery method of the Cas9 protein could help prevent the occurrence of hundreds of mutations in the production of GTKO pigs for xenotransplantation.
We also checked whether these mutated genes could provide a common functional advantage through GO and KEGG pathway enrichment analyses. We found that some mutated genes were enriched in synapse-related GO terms and pathways such as “GO:0050803 ~ regulation of synapse structure or activity” (Fig. 4c). Combined with the reports that ROS accumulation can be harmful to synaptic plasticity62, we hypothesize that the electrotransfection of cells should be responsible for the occurrence of those mutations in all treated cell lines.