A multi-tissue and -breed catalogue of chromatin conformations and their implications in gene regulation in pigs

doi:10.21203/rs.3.rs-4239308/v1

Download PDF

Research Article

A multi-tissue and -breed catalogue of chromatin conformations and their implications in gene regulation in pigs

https://doi.org/10.21203/rs.3.rs-4239308/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Topologically associating domains (TADs) are functional units that organize chromosomes into 3D structures of interacting chromatin, and play a crucial role in regulating gene expression by constraining enhancer-promoter contacts. Evidence suggests that deletion of TAD boundaries can lead to aberrant expression of neighboring genes. In our study, we analyzed high-throughput chromatin conformation capture (Hi-C) datasets from publicly available sources, integrating 71 datasets across five tissues in six pig breeds.

Results

Our comprehensive analysis revealed 65,843 TADs in pigs, and we found that TAD boundaries are enriched for expression Quantitative Trait Loci (eQTL), splicing Quantitative Trait Loci (sQTL), Loss-of-Function variants (LoFs), and other regulatory variants. Genes within conserved TADs are associated with fundamental biological functions, while those in dynamic TADs may have tissue-specific roles. Specifically, we observed differential expression of the NCOA2 gene within dynamic TADs. This gene is highly expressed in adipose tissue, where it plays a crucial role in regulating lipid metabolism and maintaining energy homeostasis. Additionally, differential expression of the BMPER gene within dynamic TADs is associated with its role in modulating the activities of bone morphogenetic proteins (BMPs)—critical growth factors involved in bone and cartilage development.

Conclusion

Our investigations have shed light on the pivotal roles of TADs in governing gene expression and even influencing traits. Our study has unveiled a holistic interplay between chromatin interactions and gene regulation across various tissues and pig breeds. Furthermore, we anticipate that incorporating markers, such as structural variants (SVs), and phenotypes will enhance our understanding of their intricate interactions.

pig

TADs

gene expression

biological functions

In eukaryotic cell nuclei, genomic DNA is organized into three-dimensional (3D) structures at various scales. In 2009, the development of high-throughput chromatin conformation capture (Hi-C) technology enabled the genome-wide identification of interacting fragments by combining second-generation sequencing with chromosome conformation capture (3C) and molecular labeling techniques[1]. Hi-C has provided insights into chromatin interactions, revealing spatial hierarchical structures such as chromosome territories, compartments, and topologically associated domains (TADs)[2], as well as long-range interactions that play crucial roles in transcriptional regulation[3]. TADs are discrete units of folded chromatin ranging in size from 200 kb to 1 Mb and serve as independent regulatory domains characterized by extensive self-interactions[4]. Studies in model organisms such as Drosophila have demonstrated a strong correlation between TADs and functional epigenetic domains defined by chromatin marks[4]. The structural features and mechanisms underlying TAD formation and the regulation of local gene expression are being elucidated[2, 4–9]. For example, CCCTC-binding factors (CTCFs) and cohesion proteins, which are abundant at TAD boundaries, have been identified as essential for TAD localization and structural stability[10]. TAD boundaries are more evolutionarily conserved than the rest of the TADs and are enriched with CTCF and housekeeping genes, imposing genetic constraints on TADs[11, 12]. Moreover, TAD boundaries exhibit enrichment in various epigenetic makers such as histone modifications, DNA methylation sites, transcription start sites (TSSs), and transfer RNA (tRNA), which are closely associated with epigenetic regulation of transcriptional activity[8]. Disrupting the boundaries thus disrupts the interactions between protein-coding genes and their enhancers, leading to a decrease in gene expression levels[13]. These studies have also highlighted the crucial roles of TAD boundaries in gene expression regulation of human and animals.

The domestic pig (Sus scrofa) serves as a significant source of meat worldwide and is widely used as an animal model for human diseases and xeno-transplantation[13]. Previous projects like ENCODE, Roadmap Epigenomics, and Functional Annotation of Animal Genomes (FAANG) have identified cis-regulatory elements involved in gene expression regulation in human, cattle and pig[14–19]. Studies have revealed that changes in 3D chromatin structure during growth and development, as well as long-range interactions within specific breeds or tissues, play regulatory roles in gene expression[20]. In a comparison of Bama pigs and wild boar, chromatin architecture contributes to determining the regulatory mechanism of phenotypic differences between Bama pigs and wild boar[21]. Furthermore, the chromatin architecture had contributed to the analyses of pig phenotypes[9, 22]. Research provides valuable comparative epigenetic data, such as Hi-C, relevant to using pigs as models in human biomedical research[23]. However, a comprehensive characterization of chromatin architecture associated with genes regulation and even traits in pigs is lacking.

Therefore, we compiled a comprehensive dataset comprising 71 publicly available Hi-C data from 5 tissues in 6 pig breeds (Table 1). Leveraging this extensive dataset, we constructed a mini-atlas consisting of 65,843 TADs to provide a detailed map of TADs in pigs. Our study encompasses multiple objectives. Firstly, we examine the interplay between TADs and functional regulatory elements derived from the Pig Genotype-Tissue-Expression (PigGTEx) and FAANG, including expression quantitative trait loci (eQTL), splicing quantitative trait loci (sQTL), loss-of-function variants (LoFs) datasets and other regulatory elements. Secondly, we conduct inter-tissue and inter-species comparisons of TADs across different tissues and breeds. Furthermore, in order to deepen our understanding of the regulatory impact of TADs, we identify candidate genes that are likely influenced by changes of TADs and are associated with tissue- or breed-specific functions. These findings contribute to unraveling the complex regulatory landscape of the pig genome and offer potential avenues for future research in pig genetics.

Table 1

The numbers of breeds and tissues for 71 public Hi-C samples
breed	number	adipocytes	liver	ear	muscle	embryo
Large White	51	2	8	2	9	30
Bamaxiang	13	6	7
Luchuan	3			3
Rongchang	2			2
Meishan	1				1
Tibetan	1			1
Total	71	8	15	8	10	30

Data collection

The high-throughput sequencing datasets contained 5 tissues and 6 breeds and were all collected from public datasets (Table S1). The public datasets were retrieved from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), and the corresponding SRA accession numbers can be found in Table S1. The genome annotation data, eQTL, sQTL, CTCF signals, top 1% predicted effect score of 13 tissues, and gene expression matrices were obtained from PigGTEx and FAANG[9, 22]. The chromatin states of the pig genome can be accessed through the UCSC Genome Browser (http://genome.ucsc.edu/s/zhypan/susScr11_15_state_14_tissues_new)[9]. Additionally, the high-throughput sequencing data generated for four tissues in this study are available in the Gene Expression Omnibus (GEO) under the accession number GSE158430.

Hi-C data processing and TAD calling

The Hi-C reads were processed using the Juicer software (version 1.5)[24]. Briefly, the high-quality Hi-C reads were mapped to the pig reference genome (susScrofa11.1, access date:2021-08-28) using BWA (v0.7.15) [25, 26] with default parameters. Unaligned read pairs and PCR duplicates were filtered out using Juicer, and alignments with low quality (MAPQ < 30) were also discarded.

Subsequently, intra-chromosomal contact matrices at 50 kb resolution were generated independently for each sample using valid read pairs. These matrices were then quantile normalized using the Knight-Ruiz algorithm[27]. Additionally, inter-chromosomal contact matrices at 1 Mb resolution were generated using the KR algorithm and normalized using log-counts per million (CPM), which represents the average abundance across all libraries.

Criteria for filtering samples

To ensure the quality and consistency of the Hi-C datasets used in this study, we implemented several filtering steps. Firstly, we excluded Hi-C samples with sequencing depths below the average sequencing depth of 48.3\(\times\) as higher sequencing depth is known to yield more reliable results for high-resolution Hi-C maps and 3D chromatin structure prediction. The interquartile range (IQR) method was employed to define the upper and lower limits of outliers. Datasets with sequencing depths outside the range of 30 (lower limit) and 208.8 (upper limit) were excluded, based on the criteria of data points above the third quartile (Q3) + 1.5 × IQR or below Q1 − 1.5 × IQR (Fig. S2).

To maintain consistency in comparisons, we removed any unique Hi-C datasets with read lengths different from the majority of the datasets (read length of 150), specifically those with a read length of 100. Trim Galore (v0.6.7) [28] was used to trim sequences with a Phred quality score below 20, setting the threshold at 20. Sequences shorter than 15 nucleotides were discarded, and the first 3 nucleotides at the 5' end of Read 2 were trimmed (--q 20 --paired --max_n 15 --clip_R2 3).

Furthermore, we calculated the percentage of Hi-C contact reads out of all reads. After applying the aforementioned filtering steps, the remaining Hi-C datasets exhibited a balanced number and coverage of TADs.

Down-sampling analysis

To ensure equal representation and comparability within each comparison group, down-sampling was performed on each sample. Initially, All Valid Reads of each sample were using for down-sampling, and then we utilized HiC-Pro (v2.11.4) [29] to convert the matrix files back to the Juicer format for subsequent analysis. Down-sampling was conducted on the Hi-C interaction matrix following the method described by Carty[30].

In summary, the counts of each element in the matrix, representing pairs of genomic loci, were converted into a list of paired-end reads, where the size of the list matched the counts. Through a random subsampling procedure without replacement, reads were selected from this list and reassigned to create a new downsized dataset. This down-sampling process ensured that each sample within the comparison group had an equal number of valid read pairs, facilitating fair and unbiased comparisons.

Functional enrichment analysis of genes

To conduct gene function enrichment analysis, we utilized the biomaRt (v2.52.0)[31] R package. This package facilitated the conversion of swine gene Ensemble IDs to swine Entrez IDs. The obtained Entrez IDs were then inputted into clusterProfiler (v4.2.2)[32], enabling us to retrieve GO[33] and KEGG[34] enrichment annotations.

In order to gain deeper insights into the functional roles of genes within the conserved TADs, we conducted GO and KEGG enrichment analyses. These analyses aimed to identify significant enriched GO terms and KEGG pathways, respectively, providing valuable information about the functional characteristics of genes within the identified TADs.

Identification of conserve and differential TADs.

Our study focused on six comparative groups: muscle tissue and adipose tissue, adipose tissue and ear tissue, liver tissue and adipose tissue, Large White pig and Rongchang pig, muscle tissue and ear tissue, and Bama Xiang pig and Large White pig (Table 4). Differential interactions regions for each group were detected using CHESS (v 0.3.7)[35]. Differential TADs were defined that TADs overlapped with the regions of differential interactions.

Table 3 Definitions and abbreviations of 15 chromatin states

Chromatin states	Abbr.	Group
Strongly active promoters/transcripts TssA comparison	TssA	promoter
Flanking active TSS without ATAC	TssAHet	promoter
Transcribed at gene	TxFlnk	TSS-proximal
Weak transcribed at gene	TxFlnkWk	TSS-proximal
Transcribed region without ATAC	TxFlnHet	TSS-proximal
Strong active enhancer	EnhA	enhancers
Medium enhancer with ATAC	EnhAMe	enhancers
Weak active enhancer	EnhAWk	enhancers
Active enhancer no ATAC (hetero)	EnhAHet	enhancers
Poised enhancer	EnhPois	enhancers
ATAC island	ATAC_Is	NA
Bivalent/poised TSS	TssBiv	promoter
Repressed polycomb	Repr	repressed
Weak repressed polycomb	ReprWk	repressed
Quiescent	Qui	quiescent

Table 4

the details of six comparison groups
	Group1	Group2
comparison	Large White muscle	Large White adipose
	Large White adipose	Large White ear
	Bamaxiang liver	Bamaxiang Adipose
	Large White muscle	Large White ear
	Large White ear	Rongchang ear
	Bamaxiang liver	Large White liver

When analyzing differential TADs, we took into account that the TAD regions identified by the Arrowhead algorithm[36] contained overlapping regions between TADs. To accurately calculate the frequency distribution of TADs throughout the genome, we assigned a frequency of 2 to the overlapping regions, while assigning a frequency of 1 to the non-overlapping regions. This approach ensured an accurate representation of TAD occurrences in the genome. TADs with a frequency of 24 (equal to the number of samples) were defined as conserved TADs, while all other TADs within each comparison group, excluding the differential TADs, were categorized as other TADs. We then calculated the proportions of both the differential TADs and conserved TADs between the groups.

Enrichment analysis of TADs and functional annotation data

To evaluate the enrichment fold and odds ratios of functional regulatory elements within TADs and their flanking regions, we conducted a rigorous analysis. Initially, we identified the number of variants located within TAD regions across different pig tissues and breeds using bedtools (v2.30.0)[37]. Subsequently, we employed Fisher's exact test to evaluate the enrichment probability of various functional elements, including eQTL, sQTL, 15 chromatin states, top 1% regulatory variants predicted by a deep learning method in 13 tissues, and LoFs within TAD regions. Here, we will provide a detailed description of the enrichment analysis for eQTLs as an example.

To calculate the enrichment, we implemented the following procedure:

We divided each TAD into 60 equally sized bins. Additionally, we extended each TAD upstream and downstream by the length of the TAD itself to define the TAD flanking region. We further divided the TAD flanking region into 120 equally sized bins. Consequently, both the TAD and its flanking region were uniformly segmented into a total of 180 bins.

Next, we computed the enrichment as the ratio of the number of eVariants (the genes and genes with alternative splicing events had at least one significant variant detected in e/sQTL study) within a specific bin to the number of SNPs within that bin. This value was divided by the ratio of the number of all eVariants in the entire genome to the number of all SNPs in the genome. The calculation was performed as follows: (number of eVariants in bin / number of SNPs in bin) / (number of all eVariants in the genome / number of all SNPs in the genome).

We employed a significance threshold of FDR < 0.01 to identify statistically significant enrichment between eQTL and TADs. This allowed us to determine the presence of meaningful relationships at a high confidence level.

By applying these robust methods, we were able to investigate the enrichment between eQTL and TADs at a fine-grained level within the genome. Similar procedures were followed for assessing the enrichment of other functional regulatory elements. This comprehensive analysis enabled us to uncover the intricate relationships and potential regulatory mechanisms between TADs and these elements.

Identification of differentially expressed mRNAs.

In this study, RNA-seq gene expression matrices were obtained from the PigGTEx[22]. To identify differentially expressed genes (DEGs) in each comparison group, we utilized the EdgeR package (v3.36.0)[38], which is a widely used statistical method based on a negative binomial distribution model. For each comparison group, we applied filtering criteria set as FDR ≤ 0.01 and |log₂FC| ≥ 1 to identify differential expression genes, where FC represents the fold change.

A mini-atlas TADs map of 24 Hi-C samples

We gathered a dataset comprising 71 Hi-C samples (Table 1), which yielded approximately 40 billion Hi-C sequence reads (Table S1). After alignment, we filtered out unmapped reads, self-loops, and non-valid data connections, resulting in an average mapping rate of 98.13% across all samples (see Methods, Fig. S1 and Table S2).

To mitigate potential biases arising from variations in sequencing depth across different projects, we conducted thorough data screening and filtering processes. Out of the initial 71 datasets, we retained 24 datasets for further analysis (see Methods, Table S3). All samples included in the study met the criterion of having effective interacting reads comprising more than 50% of the total reads [29]. The sequencing depths of the samples ranged from 48.3 to 204.01 (Table S3, Fig. 1a and Table 1), showing a strong correlation (P < 0.05, R² = 0.80) between sequencing depth and Hi-C contact count (Fig. 1a, Table 2). For each sample, we assessed various parameters including the number of sequenced reads (90 M to 2000 M), contact counts (41 M to 1183 M), trimmed reads (0.1 M to 45 M), and TAD coverage (65–78%) (Fig. 1b-c, Table S2-3). By utilizing contact maps with a resolution of 50 kb, we observed that TADs spanned approximately 65–78% of the whole genome. The number of TADs detected ranged from 878 to 1077, with a median size of 1.7 Mb (Fig. 1c, Table 2). This comprehensive analysis provides insights into the coverage and characteristics of TADs within the genome.

Table 2

The mean values of Hi-C and TAD data grouped by breed and tissue
	Number		Hi-C contacts	Depth¹	TAD number	Coverage²	Average length
Breed	Bamaxiang	12	524069552	96.04	981	0.73	1772357
	LargeWhite	9	411919633	68.33	1012	0.70	1644939
	Rongchang	2	404897552	80.87	1028	0.73	1738474
	MeiShan	1	462142804	105.15	1017	0.68	1638540
Tissue	adipose	8	359570239	61.05	1026	0.71	1641087
	muscle	5	470123849	84.57	985	0.65	1744264
	liver	6	706592286	128.19	929	0.74	1879201
	ear	4	398717640	51.73	1047	0.71	1661586
	embryo	1	329877088	53.68	1065	0.71	1471360

Given the well-established importance of CTCFs and cohesions in TAD formation and chromatin loops[39–41], we investigated the CTCF signals in muscle tissue obtained from FANNG[9]. Specifically, we examined the enrichment of CTCF signals within the identified TADs in adipose, liver, and muscle tissues separately. Our analysis revealed significant enrichment of CTCF peaks at the boundaries of TADs in each tissue (Fig. 1d). This consistent CTCF enrichment across all three tissues highlights the high conservation of TADs. Furthermore, we assessed the consistency of TAD numbers, average TAD length, and the number of Hi-C contacts across different tissues and breeds. Strikingly, all three factors exhibited similar values, indicating a comparable TAD landscape and unbiased basis for TAD comparisons among various tissues and breeds (Fig. 1e-g). These findings provide reassurance regarding the reliability of our TAD analyses and enable robust comparisons of TADs across diverse biological contexts.

Impact of TADs on regulation of transcriptional activity

To comprehensively investigate the impact of TADs on regulation of transcriptional activity, we examined the enrichment of functional regulatory elements from the pig-GTEx project within TADs. These elements included 15 chromatin states, eQTL, sQTL, LoFs, and top 1% predicted SNPs effect of 13 tissues using deep learning by ATAC-seq data (Fig. 2a).

Initially, we analyzed a LoFs dataset consisting of 27,148 mutations, including splice acceptor, splice donor, start lost, stop gain, and stop loss. We assessed the enrichment of LoFs within the TAD body and flanking regions. Interestingly, we found that LoFs were more frequently observed at the TAD boundaries, while the TAD body showed lower enrichment (Fig. 2b). Furthermore, we examined SNPs with the highest predicted effect scores (top 1%) of 13 tissues using deep learning with ATAC-seq data and observed similar distribution and enrichment patterns to LoFs (Fig. 2e), suggesting that TAD boundaries have a greater impact on the gene regulations.

We further examined the enrichment and frequency of different chromatin states within TADs and their flanking (2× TAD length) regions. 15 chromatin states examined in this study were defined by the integration of five epigenetic marks across 14 different tissues (Table 3). These states primarily include promoters (TssA, TssAHet, and TssBiv, covering 1.16% of the entire genome), TSS-proximal transcription regions (TxFlnk, TxFlnkWk, and TxFlnkHet, covering 0.92% of the genome), enhancers (EnhA, EnhAMe, EnhAWk, EnhAHet, and EnhPois, covering 6.5% of the genome), repressed regions (Repr and ReprWk, covering 13.25% of the genome), and quiescent regions (Qui, covering 73.39% of the genome) (Table 3). Enrichment of enhancers, promoter and TSS-proximal regions in TADs boundary (upstream and downstream regions of 50kb for each TAD) are significantly (P = 0.0172, 0.00012, 0.0005) higher than within TAD body, whereas enrichment fold of repressed regions was low (P = 0.0006) in TADs body and boundary (Fig. 2c). However, quiescent regions exhibit notably higher enrichment but not significantly (P = 0.55) within TAD body regions compared to TAD boundary (Fig. 2c). This observation may be attributed to the promoting role of CTCF in TAD formation, leading to a reduced proportion of quiescent regions near TAD boundaries. Furthermore, we assessed the enrichment of eQTL and sQTL of 34 tissues within TADs, which are known to influence gene expression and splicing. Interestingly, we observed the highest frequency and of eQTL and sQTL at the TAD boundary, the enrichment fold of eQTL and sQTL in TAD boundary were significantly (P < 0.05) higher than in TAD body (Fig. 2d). These findings highlight the critical role of TAD boundaries in the pig genome, as indicated by the enrichment of functional regulatory elements.

Table 3

Definitions and abbreviations of 15 chromatin states
Chromatin states	Abbr.	Group
Strongly active promoters/transcripts TssA comparison	TssA	promoter
Flanking active TSS without ATAC	TssAHet	promoter
Transcribed at gene	TxFlnk	TSS-proximal
Weak transcribed at gene	TxFlnkWk	TSS-proximal
Transcribed region without ATAC	TxFlnHet	TSS-proximal
Strong active enhancer	EnhA	enhancers
Medium enhancer with ATAC	EnhAMe	enhancers
Weak active enhancer	EnhAWk	enhancers
Active enhancer no ATAC (hetero)	EnhAHet	enhancers
Poised enhancer	EnhPois	enhancers
ATAC island	ATAC_Is	NA
Bivalent/poised TSS	TssBiv	promoter
Repressed polycomb	Repr	repressed
Weak repressed polycomb	ReprWk	repressed
Quiescent	Qui	quiescent

Overall, our results demonstrate the importance of TAD boundaries in shaping the chromatin landscape and regulating gene expression and splicing in the pig genome.

Identification of conserved and differential TADs across breeds and tissues

To gain a comprehensive understanding of TAD conservation and variability in the pig genomes, we divided 24 diverse samples into six comparison groups based on tissue and breed. The details of each group were shown in Table 4. Using graphical representations of the chromosomes, we visually depicted the differential TADs for each group and conserved TAD regions of all groups, providing an overview of our analysis results (Fig. 3a). Our findings revealed that the majority of TADs were conserved (Fig. 3b, Table S4). Surprisingly, we observed greater differences of Adipose and Liver tissue between Bamaxiang pigs compared to the other groups (Fig. 3b, Table S5), contrary to our initial expectations. We attributed this variation to differences in sampling and sequencing depth among the collected samples, which may lead to bias. To address the issue of varying sequencing depths, we performed down-sampling to ensure comparability among the samples and reanalyzed the differential and conserved TADs. As depicted in Fig. 3c, the differences observed between breeds were found to be smaller compared to the differences observed between tissues.

Furthermore, we investigated the genes functions within the TADs identified as differential and conserved. We quantified the number of genes within the differential TADs for each comparison group and within the conserved TADs present across all groups (Fig. 3c, Table S6). Then we performed GO (Ashburner et al., 2000) and KEGG (Kanehisa and Goto, 2000) enrichment analyses on the 5,916 genes within the conserved TADs (Fig. 3c). These genes exhibited enrichment in pathways crucial for maintaining fundamental organismal functions (Fig. 3d-e, Table S7-8). However, no significant enrichment in GO terms and KEGG pathways was found for the genes within the differential TADs. This analysis confirmed the reliability of the TAD structures and provided evidence supporting the essential role of conserved TADs in maintaining genome stability and organismal viability.

Identification of differentially expressed genes in differential TADs

To gain further insights into the regulatory mechanisms underlying gene expression in pigs resulting from alterations in 3D chromatin structure across different breeds and tissues, we focused on identifying differentially expressed genes within TADs that exhibited tissue-specific differences[42]. In the differential TADs of Bamaxiang between liver and adipose tissue, we discovered a total of 10 differentially expressed genes (FDR < 0.01 and |log₂FC| ≥ 1) (Fig. 4a, Table S9), including nuclear receptor coactivator 2 (NCOA2), a member of the p160 co-activator family. Our analysis also revealed that NCOA2 was located within a differentially TAD in adipose and muscle tissues of LargeWhite pigs, and RNA-seq data analysis confirmed its specific expression in adipose tissue (Fig. 4b).

Furthermore, we identified differentially expressed genes within the differentially TADs between muscle and adipose tissues in LargeWhite pigs, and total 10 differential genes ranked by log₂FC were presented in Fig. 4c (Table S10). We then focused on the differential TAD where BMPER is located (Fig. 4d) and observed a more compact folding of the genome at the TAD boundary where BMPER resides, along with tissue-specific TSS that led to upregulated gene expression. By examining TADs across tissues, we identified differentially expressed genes within these tissue-specific differential TADs, providing further support for the pivotal role of such TADs in regulating tissue-specific gene expression and related functions.

The comprehensive analysis presented in this study delves deep into the characterization of TADs in the pig genome, shedding light on their crucial roles in regulating gene expression across different tissues and breeds. The findings herein provide valuable insights into the structural and functional aspects of TADs in the context of the pig genome, offering a foundation for future studies on the genetic regulation of complex traits in pigs.

Our investigation has highlighted the preservation of TADs through the heightened presence of CTCF and the differential analysis of TADs. Kentepozidou et al. (2020) identified evolving clusters of CTCF binding sites as a characteristic of TAD boundary architecture[43]. CTCF binding sites exhibit enrichment at TAD boundaries, and this conservation across diverse cell types suggests a pivotal role for CTCF in both the establishment and maintenance of these boundaries[44]. The asymmetry of CTCF binding sites is noteworthy, where the convergent orientation of a CTCF site pair contributes to the formation of chromatin loops in vivo[44]. CTCF, highly concentrated at TAD boundaries, forms physical loops with intervening DNA, establishing an insulated environment crucial for the proper expression of lineage-specifying genes[45]. Additionally, CTCF binding actively contributes to the configuration of a higher-order genome structure by delineating the boundaries of extensive TADs[43]. Disruptions in TAD boundaries have been observed to significantly impact gene expression, underscoring the necessity of exploring genome-level alterations in TADs and TAD boundaries to comprehend the intricate interplay between 3D genome structure and phenotype[13]. Notably, our findings indicate an enrichment of functional DNA variants at TAD boundaries, including eVariants in e/sQTLs, LoFs, and the top 1% high-impact SNPs. We posit that this enrichment may be attributed to the ability of functional DNA variants to modulate chromatin, such as enhancer and promoter regions, as well as repressed segments, all of which exhibit heightened presence at TAD boundaries. The enrichment of the top 1% high-impact SNPs, predicted to induce substantial changes in open chromatin states, provides support for our hypothesis. Nanni et al. (2020) similarly noted the enrichment of epigenetic marks associated with gene expression at TAD boundaries[44]. Additionally, Lazar et al. (2018) found significant enrichment of genetic and epigenetic signatures at TAD boundaries, including higher CpG density compared to the rest of the genome, increased presence of CTCF binding, H3K4me3, and the existence of SINE elements[46]. These collective insights highlight the fundamental importance of TAD boundaries in orchestrating gene expression dynamics within the genome.

The identification of conserved and differential TADs across breeds and tissues provides a unique perspective on genome organization in pigs. The majority of TADs were found to be conserved, with variations primarily attributed to tissue differences rather than breed. This result emphasizes the need to account for tissue-specific effects in future studies. The functional enrichment analysis of genes within conserved TADs highlights their role in fundamental organismal functions, consistent with previous studies[42], reinforcing the significance of TAD conservation in maintaining genome stability and viability. Finally, the identification of differentially expressed genes within tissue or breed-specific differential TADs provides a link between 3D chromatin structure and gene expression regulation. Among the notable findings, we identified NCOA2, a member of the steroid receptor coactivator family, as a differential gene residing within the TAD unique to Bamaxiang adipose when compared Bamaxiang liver. Previous studies have shown that NCOA2 is highly expressed in adipose tissue and contains tissue-specific CpG islands with chromatin-accessible enhancers[47, 48]. Functionally, as a coactivator of PPARγ, NCOA2 plays a pivotal role in regulating lipid metabolism and energy homeostasis[49]. Furthermore, it emerges as a key player in modulating intramuscular fatty acid composition in pigs[50, 51]. In a different context, we observed BMPER as a differential gene situated within TAD unique to muscle when compared to adipose in Large White pigs. BMPER, or BMP endothelial cell precursor-derived regulator, is known for its role in dampening the activities of bone morphogenetic proteins (BMPs), growth factors pivotal in the development of bone and cartilage[52]. Intriguingly, variants of this gene in cattle have been associated with larger body sizes and extended rump lengths[53]. Moreover, BMPER has been demonstrated to enhance intramuscular fat content in pigs[54], with quantitative analysis unveiling a positive correlation between BMPER expression and intramuscular fat levels[55]. BMPER also serves as a marker for adipose progenitors and adipocytes, exerting a positive influence on adipogenesis[56]. These findings collectively illuminate the intricate interplay of genes within TADs across various tissues, offering valuable insights into their roles in shaping physiological traits of agricultural significance. The discovery of tissue-specific differential genes underscores the importance of TADs in shaping tissue-specific gene expression patterns.

However, we acknowledge some limitations in our study. Firstly, the exploration of the 3D genome in pigs is still in its early stages, and the availability of tissues and breeds for analysis is limited, let alone single-cell 3D genomic data. Secondly, the lack of phenotype data (such as GWAS data) hinders our ability to directly investigate the impact of TAD alterations on phenotypic traits. Lastly, the absence of genomic structural variation data prevents us from employing deep learning methods to explore genomic variations underlying TAD changes. These limitations should be considered when interpreting our results and highlight areas for future research and data integration to further elucidate the functional implications of TAD dynamics in pigs.

In conclusion, this study provides a comprehensive exploration of TADs in the pig genome, offering insights into their conservation, impact on gene regulation, and relevance to tissue-specific differences. The findings not only advance our understanding of genome organization in pigs but also lay the foundation for future investigations into the genetic basis of complex traits and diseases in pigs.

In summary, our comprehensive analysis of TADs in the pig genome has provided valuable insights into their role in shaping the chromatin landscape and regulating gene expression and splicing. We found that TAD boundaries are enriched with functional regulatory elements, such as CTCF signals, LoFs, top-effect SNPs, chromatin states, eQTL, and sQTL, underscoring their critical importance in gene regulation. Additionally, our study revealed both conserved and differential TADs across various pig breeds and tissues, with a focus on tissue-specific gene expression changes within these TADs. These findings highlight the dynamic nature of TADs in pigs and their significance in governing the transcriptional activity and functional diversity of the genome. Overall, our research contributes to a deeper understanding of 3D chromatin organization and its impact on gene regulation in the pig genome, with implications for both basic biology and livestock breeding.

Ethics approval and consent to participate

No applicable

Consent for publication

No applicable

Availability of data and material

The data collected in our study was sourced from a public dataset and is presented in Table S1. We did not generate any new data. The source code of our papers is available at https://github.com/yinhongwei4079/Hi-C-analysis-codes .

Competing interests

The authors have declared they have no competing interests.

Funding

This work was funded by the National Key Research and Developmental Program of China (2021YFF1000600), the National Natural Science Foundation of China (31972539, 31501933), the Science and Technology Program of Shenzhen (CJGJZD20210408092402006), the National Key R&D Program of China (2021YFD301201).

Author Contributions Statement

The study was conceived and directed by FLZ and BLJ. ZQY wrote the manuscript. YHW and ZQY analyzed and interpreted the data. YL and YWY had provided the advices for analysis. YGQ, YHW, FLZ and BLJ had revised the paper. All the authors read and approved the final manuscript. All authors consent for publication.

Acknowledgements

We are grateful to all members of the Key Laboratory of Livestock and Poultry Multi-omics of MARA, PigGTEx and FAANG for data and advice of data statistics.

Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al: Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 2009, 326(5950):289-293.
Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, Piolot T, van Berkum NL, Meisig J, Sedat J et al: Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 2012, 485(7398):381-385.
Bolzer A, Kreth G, Solovei I, Koehler D, Saracoglu K, Fauth C, Müller S, Eils R, Cremer C, Speicher MR et al: Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS biology 2005, 3(5):e157.
Ulianov SV, Khrameeva EE, Gavrilov AA, Flyamer IM, Kos P, Mikhaleva EA, Penin AA, Logacheva MD, Imakaev MV, Chertovich A et al: Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains. Genome Research 2016, 26(1):70-84.
Hou C, Li L, Qin ZS, Corces VG: Gene Density, Transcription and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains. Molecular cell 2012, 48(3):471-484.
Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G: Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 2012, 148(3):458-472.
Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D, Kayserili H, Opitz JM, Laxova R et al: Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions. Cell 2015, 161(5):1012-1025.
Rowley MJ, Nichols MH, Lyu X, Ando-Kuri M, Rivera ISM, Hermetz K, Wang P, Ruan Y, Corces VG: Evolutionarily Conserved Principles Predict 3D Chromatin Organization. Molecular Cell 2017, 67(5):837-852.e837.
Pan Z, Yao Y, Yin H, Cai Z, Wang Y, Bai L, Kern C, Halstead M, Chanthavixay G, Trakooljul N et al: Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat Commun 2021, 12(1):5848.
Zuin J, Dixon JR, van der Reijden MIJA, Ye Z, Kolovos P, Brouwer RWW, van de Corput MPC, van de Werken HJG, Knoch TA, van IJcken WFJ et al: Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proceedings of the National Academy of Sciences 2014, 111(3):996-1001.
Spitz F, Furlong EEM: Transcription factors: from enhancer binding to developmental control. Nature Reviews Genetics 2012, 13(9):613-626.
McArthur E, Capra JA: Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. The American Journal of Human Genetics 2021, 108(2):269-283.
Akdemir KC, Le VT, Chandran S, Li Y, Verhaak RG, Beroukhim R, Campbell PJ, Chin L, Dixon JR, Futreal PA et al: Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nature Genetics 2020, 52(3):294-305.
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, Casas E, Cheng HH, Clarke L, Couldrey C et al: Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biology 2015, 16(1):57.
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ et al: Integrative analysis of 111 reference human epigenomes. Nature 2015, 518(7539):317-330.
Burns EN, Bordbari MH, Mienaltowski MJ, Affolter VK, Barro MV, Gianino F, Gianino G, Giulotto E, Kalbfleisch TS, Katzman SA et al: Generation of an equine biobank to be used for Functional Annotation of Animal Genomes Project. Anim Genet 2018, 49(6):564-570.
Fang L, Liu S, Liu M, Kang X, Lin S, Li B, Connor EE, Baldwin RL, Tenesa A, Ma L et al: Functional annotation of the cattle genome through systematic discovery and characterization of chromatin states and butyrate-induced variations. BMC Biology 2019, 17(1):68.
Kingsley NB, Kern C, Creppe C, Hales EN, Zhou H, Kalbfleisch TS, MacLeod JN, Petersen JL, Finno CJ, Bellone RR: Functionally Annotating Regulatory Elements in the Equine Genome Using Histone Mark ChIP-Seq. Genes 2019, 11(1):3.
Halstead MM, Kern C, Saelao P, Wang Y, Chanthavixay G, Medrano JF, Van Eenennaam AL, Korf I, Tuggle CK, Ernst CW et al: A comparative analysis of chromatin accessibility in cattle, pig, and mouse tissues. BMC Genomics 2020, 21(1):698.
Chepelev I, Wei G, Wangsa D, Tang Q, Zhao K: Characterization of genome-wide enhancer-promoter interactions reveals co-expression of interacting genes and modes of higher order chromatin organization. Cell Research 2012, 22(3):490-503.
Zhang J, Liu P, He M, Wang Y, Kui H, Jin L, Li D, Li M: Reorganization of 3D genome architecture across wild boar and Bama pig adipose tissues. Journal of Animal Science and Biotechnology 2022, 13(1):32.
Teng J, Gao Y, Yin H, Bai Z, Liu S, Zeng H, Bai L, Cai Z, Zhao B, Li X et al: A compendium of genetic regulatory effects across pig tissues. Nature Genetics 2024.
Zhao Y, Hou Y, Xu Y, Luan Y, Zhou H, Qi X, Hu M, Wang D, Wang Z, Fu Y et al: A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome. Nature Communications 2021, 12(1):2217.
Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Aiden EL: Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Systems 2016, 3(1):99-101.
Peters D, Qiu K, Liang P: Faster Short DNA Sequence Alignment with Parallel BWA. AIP Conference Proceedings 2011, 1368(1):131-134.
Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, Chow W, Eory L, Finlayson HA, Flicek P et al: An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience 2020, 9(6).
Nagano T, Lubling Y, Várnai C, Dudley C, Leung W, Baran Y, Mendelson Cohen N, Wingett S, Fraser P, Tanay A: Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 2017, 547(7661):61-67.
Krueger F, James F, Ewels P, Afyounian E, Weinstein M, Schuster-Boeckler B, Hulselmans G, sclamons: FelixKrueger/TrimGalore: v0.6.10 - add default decompression path. In.: Zenodo; 2023.
Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C-J, Vert J-P, Heard E, Dekker J, Barillot E: HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology 2015, 16(1):259.
Carty M, Zamparo L, Sahin M, González A, Pelossof R, Elemento O, Leslie CS: An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data. Nature Communications 2017, 8(1):15454.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 2005, 21(16):3439-3440.
Yu G, Wang L-G, Han Y, He Q-Y: clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS: A Journal of Integrative Biology 2012, 16(5):284-287.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 2000, 25(1):25-29.
Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 2000, 28(1):27-30.
Galan S, Machnik N, Kruse K, Díaz N, Marti-Renom MA, Vaquerizas JM: CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nature Genetics 2020, 52(11):1247-1255.
Dixon JR, Gorkin DU, Ren B: Chromatin Domains: the Unit of Chromosome Organization. Molecular cell 2016, 62(5):668-680.
Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26(6):841-842.
Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26(1):139-140.
Li J, Xiang Y, Zhang L, Qi X, Zheng Z, Zhou P, Tang Z, Jin Y, Zhao Q, Fu Y et al: Enhancer-promoter interaction maps provide insights into skeletal muscle-related traits in pig genome. BMC Biol 2022, 20(1):136.
Nakai K, Vandenbon A: Chapter 2 - Higher-order chromatin structure and gene regulation. In: Epigenetics in Organ Specific Disorders. Edited by Boosani CS, Goswami R, vol. 34: Academic Press; 2023: 11-32.
Porter RS, Iwase S: Modulation of chromatin architecture influences the neuronal nucleus through activity-regulated gene expression. Biochemical Society Transactions 2023, 51(2):703-713.
Zhao R, Talenti A, Fang L, Liu S, Liu G, Chue Hong NP, Tenesa A, Hassan M, Prendergast JGD: The conservation of human functional variants and their effects across livestock species. Communications Biology 2022, 5(1):1-13.
Kentepozidou E, Aitken SJ, Feig C, Stefflova K, Ibarra-Soria X, Odom DT, Roller M, Flicek P: Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains. Genome Biology 2020, 21(1):5.
Nanni L, Ceri S, Logie C: Spatial patterns of CTCF sites define the anatomy of TADs and their boundaries. Genome Biology 2020, 21(1):197.
Islam Z, Saravanan B, Walavalkar K, Thakur J, Farooq U, Singh AK, Sabarinathan R, Pandit A, Henikoff S, Notani D: Active enhancers strengthen insulation by RNA-mediated CTCF binding at TAD boundaries. In.: bioRxiv; 2021.
Lazar NH, Nevonen KA, O'Connell B, McCann C, O'Neill RJ, Green RE, Meyer TJ, Okhovat M, Carbone L: Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome research 2018, 28(7):983-997.
Camargo GMFd, Costa RB, Albuquerque LGd, Regitano LCA, Baldi F, Tonhati H: Polymorphisms in TOX and NCOA2 genes and their associations with reproductive traits in cattle. Reproduction, Fertility and Development 2015, 27(3):523-528.
WenXing S, ShuHua G, HaiYin H, WeiWei C, ShiGang Y, Jie C: Influence of silencing NCOA2 gene by siRNA on differentiation of porcine intramuscular preadipocytes. Journal of Nanjing Agricultural University 2016, 39(4):619-623.
Yamamuro T, Nakamura S, Yanagawa K, Tokumura A, Kawabata T, Fukuhara A, Teranishi H, Hamasaki M, Shimomura I, Yoshimori T: Loss of RUBCN/rubicon in adipocytes mediates the upregulation of autophagy to promote the fasting response. Autophagy 2022, 18(11):2686-2696.
Ramayo-Caldas Y, Ballester M, Fortes MR, Esteve-Codina A, Castelló A, Noguera JL, Fernández AI, Pérez-Enciso M, Reverter A, Folch JM: From SNP co-association to RNA co-expression: novel insights into gene networks for intramuscular fatty acid composition in porcine. BMC Genomics 2014, 15:232.
Valdés-Hernández J, Ramayo-Caldas Y, Passols M, Sebastià C, Criado-Mesas L, Crespo-Piazuelo D, Esteve-Codina A, Castelló A, Sánchez A, Folch JM: Global analysis of the association between pig muscle fatty acid composition and gene expression using RNA-Seq. Sci Rep 2023, 13(1):535.
Duan X, An B, Du L, Chang T, Liang M, Yang B-G, Xu L, Zhang L, Li J, E G et al: Genome-Wide Association Analysis of Growth Curve Parameters in Chinese Simmental Beef Cattle. Animals 2021, 11(1):192.
Zhao C, Gui L, Li Y, Plath M, Zan L: Associations between allelic polymorphism of the BMP Binding Endothelial Regulator and phenotypic variation of cattle. Molecular and Cellular Probes 2015, 29(6):358-364.
Banigan EJ, Tang W, van den Berg AA, Stocsits RR, Wutz G, Brandão HB, Busslinger GA, Peters J-M, Mirny LA: Transcription shapes 3D chromatin organization by interacting with loop extrusion. Proceedings of the National Academy of Sciences of the United States of America 2023, 120(11):e2210480120.
Liu Z, Sun W, Zhao Y, Xu C, Fu Y, Li Y, Chen J: The effect of variants in the promoter of BMPER on the intramuscular fat deposition in longissimus dorsi muscle of pigs. Gene 2014, 542(2):168-172.
Garritson JD, Zhang J, Achenbach A, Ferhat M, Eich E, Stubben CJ, Martinez PL, Ibele AR, Hilgendorf KI, Boudina S: BMPER is a marker of adipose progenitors and adipocytes and a positive modulator of adipogenesis. Communications Biology 2023, 6(1):638.

Supplementary Figures S1 and S2 are not available with this version

No competing interests reported.

SupplementfileTableS111.xlsx

Download PDF

Reviewers agreed at journal
08 May, 2024
Reviewers agreed at journal
08 May, 2024
Reviewers invited by journal
15 Apr, 2024
Editor invited by journal
15 Apr, 2024
Submission checks completed at journal
12 Apr, 2024
Editor assigned by journal
12 Apr, 2024
First submitted to journal
08 Apr, 2024

You are reading this latest preprint version

A multi-tissue and -breed catalogue of chromatin conformations and their implications in gene regulation in pigs

Status:

Version 1

Abstract

Background

Results

Conclusion

Figures

Background

Methods

Data collection

Hi-C data processing and TAD calling

Criteria for filtering samples

Down-sampling analysis

Functional enrichment analysis of genes

Enrichment analysis of TADs and functional annotation data

Results

A mini-atlas TADs map of 24 Hi-C samples

Impact of TADs on regulation of transcriptional activity

Identification of conserved and differential TADs across breeds and tissues

Identification of differentially expressed genes in differential TADs

Discussion

Conclusions

Declarations

References

Supplementary Figures S1 and S2

Additional Declarations

Supplementary Files

Status:

Version 1