Copy Number Variations Contribute to Intramuscular Fat Content Differences By Affecting The Expression of PELP1 Alternative Splices in Pig

Background Intramuscular fat (IMF) is a key meat quality trait. Research on the genetic mechanisms of IMF decomposition is valuable for both pork quality improvement and the treatment of obesity and type 2 diabetes. Copy number variations (CNVs) are a type of variant that may inuence meat quality. In this study, a total of 1185 CNV regions (CNVRs) including 393 duplicated CNVRs, 432 deleted CNVRs, and 361 CNVRs with both duplicated and deleted status were identied in a pig F2 resource population using next-generation sequencing data. A genome-wide association study (GWAS) was then performed between CNVs and IMF, and a total of 19 CNVRs were found to be signicantly associated with IMF. QTL colocation analysis indicated that 3 of the 19 CNVRs overlapped with known QTLs. RNA-seq and qPCR validation results indicated that CNV150, which is located on the 3’UTR end of the proline, glutamate, and leucine rich protein 1 (PELP1) gene, may affect the expression of PELP1 alternative splices. Sequence alignment and Alphafold2 structure prediction results indicated that the two alternative splices of PELP1 have a 23 AA sequence variation and a helix-fold structure variation. This region is located in the region of interaction between PELP1 and other proteins which have been reported to be signicantly associated with fat deposition or insulin resistance. We infer that the CNVR may inuence IMF content by regulating the alternative splicing of the PELP1 gene and ultimately affecting the structure of the PELP1 protein.


Background
Intramuscular fat (IMF) refers to the total amount of fat located between muscle bers and as lipid droplets in the muscle cells. In humans, extra IMF deposition has been reported to be associated with type 2 diabetes and insulin resistance [1].
In animals, IMF content directly in uences the avor and juiciness of meat and indirectly in uences the tenderness and color of meat [2]. Because pork IMF contains many long-chain polyunsaturated fatty acids, it can also directly affect human health [3].
IMF content is a quantitative trait with low to moderate heritability (0.2-0.4) [4,5] and is in uenced by multiple genes or quantitative trait loci (QTLs). As stated in the PigQTLdb (http://www.animalgenome. org/cgi-bin/QTLdb/SS/index) a total of 842 IMF-related QTLs have been reported prior to 23 August 2021 [6]. Although thousands of single-nucleotide polymorphisms (SNPs) have been reported to be associated with IMF in genome-wide association studies (GWAS), most of them only explain a small part of the total genetic variance [4,7,8].
Copy number variations (CNVs) are the mutation type with the widest coverage of the genome and are one of the variation types most likely to explain "missing inheritance" beyond the SNP effect. In pigs, there have been several studies revealing CNVs or CNV regions (CNVRs) associated with economic traits. In the research of Rubin et al., CNVs of KIT were reported to be associated with coat color [9]. Fowler et al. (2013) found a CNV region related to backfat thickness on chromosome 7 using a pig 60K Beadchip [10]. In the research of Revilla 2020), CNVs were reported to be associated with fatty acid composition and growth traits, ear size, number of piglets born alive, and prevalence of porcine endogenous retroviruses [11][12][13][14]. In our previous research, CNVRs have reported associated with IMF content [15]. The previous research used 60K Beadchip data with only 48 CNVs detected.
In this study, we used genome-wide CNV markers identi ed and genotyped via next-generation sequencing to perform genome-wide association analysis on IMF. Moreover, function analyses of these CNVs were also performed to analyze the potential effects and mechanisms of the CNVs on IMF. We aimed to identify the candidate genes and causal mutation sites of IMF and provide a research basis for pig meat quality breeding.

Results
Phenotypic distribution of pig IMF content For the 592 F2-generation individuals of the Large white pigs × Min pig resource population, the average slaughter weight (240±7d) was 109.13kg and the average carcass weight (left side) was 40.38kg. The average IMF content of these pigs was 2.87%, the maximum was 12.70%, the minimum was 0.73%, the standard deviation was 1.85%, and the coe cient of variation was 0.65.

Porcine genome copy number variation segmentation
After calling and genotyping, all the CNVRs were screened with the silhouette_score >0.6 in this study. A total of 1185 CNV regions (CNVRs) were identi ed, including 393 duplicated CNVRs, 432 deleted CNVRs, and 361 CNVRs with both duplicated and deleted status ( Figure 1). The length of CNVRs ranged from 2000 bp to 4,196,500 bp with an average length of 30,818 bp. The numbers of copies was relatively evenly distributed on autosomes and basically corresponded to the lengths of the chromosomes.

CNV association analysis
All 592 pigs were used in a genome-wide association study (GWAS) between CNVs and IMF content. The GWAS results are shown in Table 1, and the Manhattan plot of GWAS analysis is shown in Figure 2. Only 1 CNVR signi cantly associated with IMF could be identi ed using the general linear model (GLM) method, and 19 CNVRs signi cantly associated with IMF were identi ed using the xed and random model circulating probability uni cation (FarmCPU) method. The signi cant CNVR detected by the overlapped two methods was located on chromosome 12 :52194501-52220000. The other CNVRs were located on chromosome 1 (2 CNVRs  Quality assessment of CNVs using Quantitative real-time PCR (qPCR) In this study, DNA qPCR validation was performed to con rm the existence of the identi ed CNVRs. As shown in Figure   3A-3D, all of the selected four CNVRs were validated by DNA qPCR .

Annotation and QTLs co-location of CNVRs
In order to study the function of the CNVRs, we rst annotated the CNV coverage area. As shown in Table 1, there were nine known genes (including four olfactory receptor genes) overlapped with six CNVRs, and four novel genes overlapped with four CNVRs. Ten of the 19 CNVRs were located in the intergenic region, without covering any genes. Considering the CNVRs and 842 known IMF-associated QTLs together, 3 of the 19 signi cant CNVs could be overlapped with at least one of the known QTLs for IMF (Supplementary Table S1).
Signi cantly related CNV internal RNA-seq Six individuals with different copy numbers were selected for RNA sequencing to predict the potential effect mechanisms of the CNVRs. The relationships between DNA dosage and RNA expression were then analyzed. The results of the transcript expression levels for the 13 overlapped genes showed that related transcript expression differences could be detected only in the Proline, Glutamate, and Leucine Rich Protein 1 (PELP1) genes. The expression of one of the PELP1 alternative splices (ASs), ENSSSCT00000019597, was signi cantly different between different individual CNVRs ( Table  2). Functional analysis of CNV150 As our research population was constructed using Large white pigs and Min pig F0 individuals, which have different IMF contents (the average IMF content of the Large white is less than 2%, and the average IMF content of Min pigs is more than 4%), we also detected the copy numbers of F0 individuals to con rm the relationship between IMF and CNVR150.
From Figure 4, we can see that in Large white pigs, the copy number of CNV150 was multiple, and in Min pigs, the copy number of CNV150 was around 1.
Beyond the RNA-seq data, we also detected the expression of the two PELP1 alternative splices (ENSSSCT00000075280 and ENSSSCT00000019507) using the qPCR method. As shown in Figure 3E and Figure 3F, the expression of ENSSSCT00000019507 was signi cantly different between normal-copy-number individuals and duplicated-copynumber individuals. ENSSSCT00000075280 and ENSSSCT00000019507 were coding modulators of the nongenomic activity of the estrogen receptor proteins A0A5G2R420 and F1RFT3.
In order to mine the mechanism of CNV150 in depth, we analyzed the interaction network of PELP1. From the proteinprotein interaction (PPI) networks ( Figure 5), a total of 23 proteins were experimentally determined to interact with PELP1 (Supplementary Table S2). Among the 23 proteins, the androgen receptor (AR), estrogen receptor (ESR1), glucocorticoid receptor (GR, NR3C1), nuclear receptor subfamily 4 group A member 1(NR4A1), retinoblastoma-associated protein (RB1), 60S ribosomal protein L11 (RPL11), proto-oncogene tyrosine-protein kinase Src (SRC), and signal transducer and activator of transcription 3 (STAT3) have been reported to be related to fat deposition, metabolism, or insulin resistance.
In order to study whether the protein structure variation affected the interaction between PELP1 and its interactive proteins, we aligned the AA of A0A5G2R420 and F1RFT3 and found there was a 23 AA (83-105) difference between these two proteins ( Figure 6A). Alpha fold 2 was used to predict the potential structure of these two sequences. In F1RFT3, a helix (about 30 AA) is unfolded as compared to A0A5G2R420 ( Figure 6B and Figure 6C). This helix is located between two LXXLL motifs of PELP1.

Discussion
The Min is a native Chinese pig breed with an average IMF of > 4.0%, and the Large white is a commercial pig breed with an average IMF of <=2.00%. The Large white × Min F2 separated population is an ideal population in which to investigate the candidate genes or QTLs for IMF. In this study, we rst used NGS data of the F2 resource population for CNV calling and genotyping, and then performed CNV-based GWAS for candidate CNV identi cation. CNV calling using NGS data mainly uses four approaches: paired-end mapping, split-reads, sequence assembly, and read depth (RD) [16]. In this study, we chose to use CNVcaller software [17], which uses the RD method. CNVcaller mitigates the in uence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly for CNV calling and genotyping and was suitable for our population. In our research, a total of 1185 CNVRs were identi ed, and this number was smaller than some other pig CNVRs detection research. In the GWAS analysis, a total of 19 genomic signi cant CNVRs were identi ed as being related to IMF. Among the known genes in which signi cant CNVRs were located, PELP1 is an ESR coregulator protein [20] and has been reported to be associated with sperm morphology abnormalities in pigs [21]. In the human and great ape PELP1 gene, duplicated CNV also exists [22,23]. Homozygous spermatogenesis associated 22 (SPATA22) is a sex-related gene associated with infertility and related traits [24]. SH3 domain-binding protein 4 (SH3BP4) is a negative regulator of amino acid-Rag GTPase-mTORC1 signaling and is related to diabetic retinopathy [25,26]. FCH And Mu Domain Containing Endocytic Adaptor 2 (FCHO2) protein can participate in the early step of clathrin-mediated endocytosis and has lipid-binding activity [27]. Other known genes, namely LOC100524322, LOC100524156, and R-SSC-381753, are olfactory receptors (ORs), and ORs have been reported to be related to IMF or insulin resistance in previous research [28 -31].
In order to study the function of the CNVRs located in the intergenic region or in novel genes, we performed QTL colocation analysis, and ultimately found that 3 of the 19 CNVRs were located in the regions of reported IMF-associated QTLs. We infer that these QTLs may affect IMF through structure variation.
As CNVRs usually work through regulation effects or dose effects, we analyzed the RNA expression pro les of some individuals with different CNVR dosages. Interestingly, we found that one of the PELP1 ASs, named ENSSSCT00000019597, was signi cantly differently expressed in CNV150-variant individuals. We then validated the differential expression using qPCR and the results were positive. Hence, we inferred that this CNV150 may affect PELP1 alternative splicing.
In order to con rm the function of CNV150, we analyzed the read depth of CNV 150 in F0-generation individuals and found that the copy number of CNV150 was normal in Min pigs and duplicate in Large white pigs. As shown in Table 1, this CNVR has a negative effect for IMF, and Min pigs and Large white pigs are high-and low-IMF pigs, respectively.
These results were consistent. We then further studied PELP1 and its ASs.
First, we studied whether PELP1 directly or indirectly affects IMF. In the PPI networks, about half of the proteins had been reported as related to IMF or insulin resistance. Among these genes or proteins, AR and ESR1 can regulate leptin transcript accumulation and protein secretion in adipocytes [32]. NR3C1 transcription factor has been identi ed as a potential regulator co-localizing within QTLs for fatness and growth traits [33]. NR4A1 can affect insulin resistance and downregulated intramuscular lipid content [34]. RB1 has a direct interaction relationship related to adipogenesis growth [35]. RPL11 has been revealed to play a role in fat storage [36]. SRC and STAT3 can respond to adipogenesis through the TNF-α pathway [37]. We inferred that PELP1 may in uence IMF by interacting with other proteins.
In previous research, the interacting regions of PELP1 and ESR, AR, GR, RB, and STAT3 were amino acids 1-400, or LXXLL motifs, amino acids 1-600, or amino acids 1-330 [38]. Hence, we then studied whether the ASs affected the 3D structure of the PELP1 protein and affected the interaction between PELP1 and its interactive proteins. The structures of the two proteins coded by PELP 1 ASs were predicted using Alphafold2. Alphafold2 has been used to predict the structures of many di cult protein targets at or near experimental resolution [22]. Our results may have high reliability.
The results indicated that, in the variation location of amino acids 83-105, a helix was unfolded in F1RFT3 (coded by ENSSSCT00000019597). In A0A5G2R420 (coded by ENSSSCT00000075280), the structure located in the variation region had very high con dence (predicted LDDT (pLDDT)>90). Moreover, this helix was between two LXXLL motifs (located on 69-74 AA and 111-116 AA). We inferred that the structure changes potentially caused by CNV150 may affect the interaction of PELP1 and its interactive proteins. However, the function and molecular regulatory mechanisms between CNV150 and IMF content require further experimental research, such as gene knockdown/editing, coimmunoprecipitation, and so on.

Conclusions
In this study, a CNV-based GWAS was performed between CNVs and IMF, and a total of 19 CNVRs were found to be signi cantly associated with IMF. Some CNVRs, such as the three CNVRs overlapped with known QTLs, may be useful candidate markers for IMF selection. CNV150, which was located on the 3'UTR of PELP1, may in uence IMF content by regulating the alternative splicing of the PELP1 gene and the structure of PELP1 protein. These ndings suggest a novel mechanistic approach for meat quality improvement in animals and potential treatment of insulin resistance in human beings.

Animals and phenotype determination
The pigs used in the experiment were all from the Large white pig × Min F2 resource population, raised in the Changping pig farm of the Institute of Animal Science, Chinese Academy of Agricultural Sciences. All 592 F2 individuals were raised to market age (240±7 days) and slaughtered for commercial purposes. Longissimus dorsi muscles (LDM) were collect for IMF content measurement and DNA sequencing. IMF content was measured using an ether extraction method (Soxtec Avanti 2055 Fat Extraction System; Foss Tecator, Hillerød, Denmark).
Tissue DNA extraction and sequencing DNA of LDM tissue was extracted using the phenol imitation method. The NANODROP 1000 was used to detect the concentration and quality of DNA (Thermo Scienti c, USA). An Illumina Hi-seq2500 was used for paired-end sequencing, and the sequencing depth was 5-7×.
CNV detection, genotyping, and genomic association analysis CNVcaller software was used to detect and genotype all individual CNVs with a sliding window of 500 bp [17]. In the step of CNV determination, we used a strict standard of silhouette_score >0.6. When genotyping, each CNV region was divided into no more than three genotypes, and each sample was labeled 1, 2, and 3 to represent the genotype cluster to which the individual belonged.
CNV-based GWAS was performed using the GLM, mixed linear model (MLM), and FarmCPU model approaches with rMVP software [39]. In the GLM method, we used the xed effects of sex and batches. In the method of MLM, we used PCA as a xed effect, and in the FarmCPU method, we also used PCA as a xed effect. The selected threshold for genomic level signi cance was -log 10 (0.05/N), where N was the total number of CNVRs after quality control.

RNA data acquisition
Six RNA-seq datasets, as in our previous research, were used for RNA expression analysis [40]. These data have been submitted to the Genome Sequence Archive with the accession number CRA001645.
Validation of the CNVs and PELP1 RNA sequencing results using qPCR DNA of ve individuals in each CNVR genotype group were used for qPCR ampli cation for CNVR validation. The primers were designed using the Primer 6 software. The glucagon gene (GCG) was used as the single-copy control. RNA extract from ve individuals' LDM in each CNVR genotype group was used for expression validation. The glyceraldehyde-3phosphate dehydrogenase (GAPDH) gene was used as the housekeeping gene. Copy number and relative expression were calculated using the 2 −ΔΔCT method [41], where ΔCT is the differential value of target region cycle threshold (CT) and the control region CT. Moreover, 2 −ΔΔCT stood for the comparison of the ΔCT value of samples with CNV to those without CNV. The PCR cycle was as follows: 3 min at 50℃, 10 min at 95℃, 40 cycles of 15 sec at 95℃, and 1 min at 60℃. The primer sequences are shown in Supplementary Table S3. One-way ANOVA was used to determine the statistical differences between any two groups, followed by Tukey's test for multiple comparisons. P < 0.05 was considered to indicate a signi cant difference.

Protein alignment and structure prediction
Consensus sequences and alignment of A0A5G2R420 and F1RFT3 were built using ClustalX in MEGA X [44]; the structures of these two sequences were predicted using alphafold2 [22] and visualized using iCn3D [45].

List Of Abbreviations
ΔΔCt: delta delta cycle threshold Availability of data and materials Some of the sequencing data used in the current study have been submitted to the Genome Sequence Archive, with the accession number CRA001645. The other data presented in this study are available from the corresponding author upon reasonable request.

Competing interests
The authors declare that they have no competing interests.  The distribution of pig CNVRs. The outer circle presents the lengths of the chromosomes; the inner circle presents the CNVR distributions. The length of the histogram bars indicates the ratio of different types of CNVRs to the total number of CNVRs.