Identification of imprinted genes by high-throughput sequencing
Reciprocal crosses of Yorkshire and Min were constructed to characterize the imprinted genes related to skeletal muscle development (Fig.1a). DNA-seq for F0 parents and RNA-seq for F1 progenies were performed to genotype the SNV and detect allelic bias of transcripts. There were 275,027,760–339,794,920 clean reads from DNA-seq and 174,931,928–208,251,498 clean reads from RNA-seq. The Q30 was about 90–94% for both DNA-seq and RNA-seq. The GC content was 40–44% for DNA-seq and 51–53% for RNA-seq (Additional file 1: Table S1).
DNA-seq and RNA-seq clean reads were aligned to the Sscrofa11.1 genome, and the aligned reads were used to identify the SNV with SAMtools. The observed SNVs—heterozygous in the F1 piglets, but homozygous in the F0 parents—were phased to determine the allele-specific expression (Fig.1b). A list of genes as the phased SNVs that exhibited a complete paternal or maternal expression bias in a certain transcript of an F1 individual were considered to be informative for further analysis. If all of the SNVs located in a transcript fragment were from the same parent-of-origin allele, the gene containing this fragment was selected. Then, the overlaps of the above genes among the four piglets in each cross were preserved. As shown in Fig.2a, a total of 6038 genes with paternal SNVs and 3944 genes with maternal SNVs were obtained in Y × M and 4665 genes with paternal SNVs and 6730 genes with maternal SNVs were obtained in M × Y.
All of the SNVs in an imprinted gene should originate from the single parent simultaneously. Therefore, the genes that processed both the paternal and maternal SNVs from the previous step were removed. Only the SNVs that existed in the exon were kept; those located in the X/Y chromosome, untranslated region (UTR), intron, and splicing sites between exon and intron were ignored. About 3,137 genes with paternal SNVs and 1,043 genes with maternal SNVs were obtained in Y × M, while 1,427 genes with paternal SNVs and 3,492 genes with maternal SNVs were obtained in M × Y (Fig.2b).
Finally, the intersection of genes with paternal SNVs (or maternal SNVs) between Y × M and M × Y were reserved to eliminate monoallelic expression caused by the breed type. After this filter step, we obtained 211 paternally imprinted genes and 417 maternally imprinted genes (Fig.2c and Additional file 2: Table S2).
Regulation and characteristics of imprinted genes
Mutation of the DNMT gene family causes genome-wide deficiencies in DNA methylation and alteration in imprinted gene expression, suggesting that DNA methylation plays an essential role in the regulation of imprinted gene expression [18]. We analyzed the CpG islands in the promoter regions of candidate imprinted genes to predict the imprinting control region (ICR) of imprinted genes. A total of 162 imprinted genes (50 paternally and 112 maternally imprinted genes) possessed CpG islands in the promoters (Fig.3 and Additional file 3: Table S3).
Two or more imprinted genes are often found in clusters on a chromosome over a region spanning nearly 1 MB or more [19]. Some of the candidate imprinted genes in this study existed in clusters (Fig.4). A total of 78 clusters—including 216 imprinted genes—were identified in this study. However, none of the imprinted genes matched known clusters (Additional file 4: Table S4).
ZFP57—recruited by the methylated TGCCGC motif—is a regulator of genomic imprinting, which suggests that the motif has an important role in the regulation of imprinted gene expression by DNA methylation [20]. Imprint-related motifs were predicted within the upstream sequences of the candidate imprinted genes in this study. About 38 imprinted-gene-specific motifs were obtained (Fig.5).
Functional analysis of imprinted genes for skeletal muscle development
To further understand the function of imprinted genes, a GO enrichment analysis of paternally or maternally imprinted genes was performed (Fig.6). The significant enrichment biological processes were homophilic cell adhesion via plasma membrane adhesion molecules, ion transport and regulation of postsynaptic cytosolic calcium ion concentration for paternally imprinted genes and immune response in mucosal-associated lymphoid tissue, cell–cell recognition and DNA methylation or demethylation for maternally imprinted genes (Additional file 5: Table S5 and Additional file 6: Table S6). However, only a maternally imprinted gene of EPHB1 was found to be involved in the biological process of negative regulation of skeletal muscle cell proliferation, which was related to skeletal muscle development. Myoblasts undergo proliferation, differentiation and fusion processes to form multinucleated myofibers. Therefore, the imprinted genes related to the biological processes of cell proliferation, differentiation and fusion were selected to illustrate their potential regulatory role in skeletal muscle development. As shown in Fig.7, 41 maternally imprinted genes and 25 paternally imprinted genes participated in the above three biological processes, some of which (e.g., E2F1, FBXO40, and WNT5A) are well known to play a role in skeletal muscle development.