Global DNA methylation and gene expression in the breast, lungs, and biceps brachii muscle at different ages
We generated the methylomes and transcriptomes of lung, breast, and biceps brachii muscle tissues at four different month ages (MA), representing four life stages with 3 replicates (MA= 6, 30, 54, and 90 months; young, per-mature, mature and post-mature), from a total of 12 female Riwoqe yaks. Among these yaks, only the individuals at 90 months were in lactation period with ~3.16 kg/day milk yield. After performing sequence quality control and filtering, we obtained a single-base resolution methylome covering 85.6% (27,471,373/32,092,725) of CpG sites across the genome with an average depth of 22.5×. We first calculated pairwise Pearson’s correlations of CpG sites with at least 10× coverage depth across all samples, and the samples were well clustered by tissue (Figure 1a). CpG methylation levels of biological replicates highly correlated with each other (median Pearson’s r = 0.74), and the correlation of CpG methylation level between ages (median Pearson’s r = 0.72) was relatively weaker and showed the lowest coefficients among tissues (median Pearson’s r = 0.66) (Figure 1c). The transcriptome sequencing data of all samples were aligned to our newly assembled yak genome reference (unpublished), and subsequently obtained the transcripts. In total, we obtained a total of 2,0504 transcripts, which were annotated to the Gene Ontology (GO) [6], InterPro [7], Kyoto Encyclopedia of Genes and Genomes (KEGG) [8], Swiss-Prot [9], and TrEMBL [10] databases (Table S1). We also calculated pairwise Pearson’s correlations of all transcripts and obtained similar results as with DNA methylation (Figure 1b). Biological replicates showed the highest correlation coefficients, while different tissues showed the lowest correlation coefficients (Figure 1d).
Differential DNA methylation among the age groups was involved in protein processing in endoplasmic reticulumWe looked for differentially methylated regions (DMRs) across age groups within the breast, lungs, and gluteal muscle (Table S2-4). Within the lung and gluteal-muscle tissues, age groups did not differ in age-related DMRs (A-DMRs), but post-mature breast tissue had considerably more differential methylation regions than the three younger age groups (Figure 2a). We investigated the correlations between DMRs and their associated differentially expressed genes, and the ratios of negatively to positively correlated gene pairs were 1.02 for promoters with DMRs and 0.95 for genebody with DMRs. This might be explained that not every methylation was to be correlated with the expression of its associated gene due to the complexity in gene regulation [11].
At ~90 months, ~120 days after the third birth, yaks were in the lactation period (milk yield, ~3.16kg/day)(Li & Jiang, 2019), so it is possible that the observed methylation may at least partially control yak lactation. As methylation on promoter decrease the gene expression [11], we then selected 375 hypomethylated promoter (highly expressed) genes along with 207 hypermethylated promoter (lowly expressed) genes from post-mature yak breast tissue. The hypomethylated (highly expressed) genes were enriched in only “protein processing in endoplasmic reticulum (ER)” (nine genes, 2.964-fold enrichment, p = 0.0049). Specifically, the genes were involved in vesicle trafficking (SEC23B), oligosaccharide linking (MOGS, RPN2), folding and assembly (HSPA5), transportation (LMAN2, SEL1L), and ubiquitination and degradation (UBE2J1, UBE2J2, DERL2) [12, 13]. These genes were significantly uprelated at 90 months in breast tissues, but not in lung and biceps brachii muscle tissues (Figure 2e). Thus, methylation might be involved in regulation of milk production by influencing protein processing in the endoplasmic reticulum during the vigorous period of lactation.
We examined A-DMRs that overlapped across age groups. Young and post-mature tissues rarely shared A-DMRs when comparing the lung and muscle tissues (young, muscle: 586 A-DMRs, breast: 2,249, lung: 496; post-mature, muscle: 470, breast: 12,050, lung: 772) (Figure 2b, c). Pre-mature and mature stages also rarely shared A-DMRs across muscle and lung tissues (Figure 2d), suggesting that methylation patterns were already established at the young stage and that no extensively divergent epigenetic difference occurred through the different ages under natural high-altitude conditions.
Consensus network analysis for tissues and age groups
We first performed a multi-way ANOVA test for each gene among all samples (n=36), to test the null hypothesis that the gene expression level did not differ among age groups and tissues. At the threshold for significance (p<0.05), 417 age-related and 8,560 tissue-related genes were selected for further weighted gene correlation network analysis (WGCNA), which takes advantage of the correlations among genes and groups genes into modules using network topology [14]. Subsequently, we conducted WGCNA for tissue- and age-related gene expression separately to identify a “consensus network”–a common pattern of genes that are correlated in all conditions. We performed consensus network, module statistic, and eigengene network analyses to identify modules, assess relationships between modules and traits, and study the relationships between co-expression modules [15]. The consensus networks identified for tissues and age groups had clearly delineated modules (Figure 3a, 3b), and the modules identified were significantly correlated with tissues and age groups (Figure 3c, 3d).
Age network analysis indicates that ZGPAT might regulate milk production
Within the age-related gene network, the largest module (“turquoise”, n=356) had opposite directions of correlation for breast tissue (r=-0.57, p=3e-04) and age (r= 0.37, p=0.03) and showed a positive correlation with biceps brachii muscle (Table S5). The breast tissue had a stronger signal than age and possibly overwhelmed the signal from age. Genes in this module were enriched in the GO categories of “protein polyubiquitination,” “RNA polymerase II core promoter proximal region sequence-specific DNA binding,” “ATP binding,” “transcription, DNA-templated,” and “negative regulation of transcription from RNA polymerase II promoter” (p-values of 0.000344, 0.000822, 0.000991, 0.00139, and 0.00244, respectively). The “blue” (n=48) and “grey” (n=13) modules showed only a negative correlation with age (r= -0.58, p= 2e-04; r= -0.76, p= 6e-08, respectively) and exhibited no enrichment of GO categories for genes. After applying the threshold of the absolute value of gene significance for age (|GS| >0.5) and module membership measures (|MM| >0.6) in each module, we defined 20 and 7 “hub” genes in the “turquoise” and “blue” modules (Table 1). The gene expression of the “hub” genes was well clustered by modules, which was consistent with the opposite directions of correlation with age (“turquoise” r=0.37, “blue” r=-0.58). The upregulated expression level of the “hubs” in breast tissue at 90 months of age indicated that the “turquoise” module had a stronger correlation with breast tissue than with age (Figure 4a).
Table 1. List “hub” genes in the consensus network for age
Yak ID
|
Gene symbol
|
Module color
|
Gene significance
|
p-value
|
Module membership
|
p-value
|
BmuPB009868
|
AP3D1
|
turquoise
|
0.513965075
|
0.001344116
|
0.868945999
|
6.37E-12
|
BmuPB004083
|
TMEM30A
|
turquoise
|
0.505722627
|
0.001652657
|
0.848054843
|
6.65E-11
|
BmuPB000726
|
DDRGK1
|
turquoise
|
0.53596122
|
0.000754414
|
0.842070133
|
1.22E-10
|
BmuPB019011
|
OTUD3
|
turquoise
|
0.543916749
|
0.000606215
|
0.828680521
|
4.37E-10
|
BmuPB000091
|
CNPPD1
|
turquoise
|
0.51195228
|
0.001414359
|
0.814537409
|
1.50E-09
|
BmuPB006815
|
TOR1AIP1
|
turquoise
|
0.544705745
|
0.000593032
|
0.809789221
|
2.21E-09
|
BmuPB007137
|
MGAT4A
|
turquoise
|
0.545969144
|
0.000572452
|
0.798170686
|
5.51E-09
|
BmuPB007385
|
HECA
|
turquoise
|
0.603273223
|
9.84E-05
|
0.770629907
|
3.85E-08
|
BmuPB019443
|
SYAP1
|
turquoise
|
0.500377628
|
0.001884511
|
0.733120646
|
3.68E-07
|
BmuPB009825
|
PHAX
|
turquoise
|
0.561938984
|
0.00036184
|
0.70497442
|
1.59E-06
|
BmuPB008032
|
UBE2G2
|
turquoise
|
0.52385538
|
0.001041698
|
0.70208134
|
1.83E-06
|
BmuPB012517
|
SYF2
|
turquoise
|
0.568328068
|
0.000299194
|
0.699367268
|
2.08E-06
|
BmuPB001610
|
FURIN
|
turquoise
|
0.567906296
|
0.000303008
|
0.697046314
|
2.32E-06
|
BmuPB000679
|
ZGPAT
|
turquoise
|
0.504624296
|
0.001698141
|
0.683958391
|
4.25E-06
|
BmuPB007049
|
SKP2
|
turquoise
|
-0.527608908
|
0.000943732
|
-0.669790419
|
7.91E-06
|
BmuPB000224
|
C2orf6
|
turquoise
|
0.531191165
|
0.000857925
|
0.652871802
|
1.59E-05
|
BmuPB007420
|
TAB2
|
turquoise
|
0.505905132
|
0.001645204
|
0.652844304
|
1.59E-05
|
BmuPB018569
|
ORAOV1
|
turquoise
|
0.51035386
|
0.001472421
|
0.637043276
|
2.95E-05
|
BmuPB010550
|
CCPG1
|
turquoise
|
0.644779605
|
2.19E-05
|
0.612874899
|
7.08E-05
|
BmuPB012521
|
TMEM57
|
turquoise
|
0.57317319
|
0.000258349
|
0.60967979
|
7.91E-05
|
BmuPB013324
|
MCM3
|
blue
|
-0.583585649
|
0.000186982
|
0.841505398
|
1.29E-10
|
BmuPB010064
|
SPC24
|
blue
|
-0.509962181
|
0.001486965
|
0.744497748
|
1.93E-07
|
BmuPB003102
|
C17orf49
|
blue
|
-0.526803776
|
0.000964031
|
0.741805787
|
2.26E-07
|
BmuPB015902
|
SERPINH1
|
blue
|
-0.607769763
|
8.44E-05
|
0.721017606
|
7.04E-07
|
BmuPB016582
|
SRPX2
|
blue
|
-0.51838468
|
0.001200558
|
0.698151166
|
2.20E-06
|
BmuPB012996
|
UCK2
|
blue
|
-0.588274454
|
0.000161067
|
0.677420455
|
5.68E-06
|
BmuPB015372
|
UMPS
|
blue
|
-0.503577413
|
0.001742514
|
0.648111658
|
1.92E-05
|
We then used the AnimalTFDB 3.0 database [16] to examine transcription factors in these 27 “hubs” and found that ZGPAT encodes a transcription regulator protein and was significantly upregulated in breast tissue at 90 months of age (Figure 4a). previous study reported that this protein specifically binds the 5'-GGAG[GA]A[GA]A-3' consensus sequence and represses transcription by recruiting the chromatin multi-complex NuRD to target promoters [17]. High expression of ZGPAT in breast tissue at 90 months of age was observed, and it potentially regulated the transcription of 280 genes (weight >0.15) in the network from the “turquoise” module. To identify the most important cellular activities controlled by this TF regulatory network, we analyzed over-represented GO biological process and molecular function terms and KEGG pathways. These potential target genes were enriched in the GO categories of “protein binding,” “ATP binding,” and “zinc ion binding,” among others, and the KEGG categories of “aminoacyl-tRNA biosynthesis,” “autophagy animal,” and “protein processing in endoplasmic reticulum” (Figure 4b). These enriched GO terms and KEGG pathways are likely to play roles in regulating protein synthesis, processing, and secretion in breast tissue. For example, 6 of 7 genes from “protein processing in endoplasmic reticulum” were also upregulated at 90 months of age in breast tissue (Figure 4c) and involved in multiple processes in the endoplasmic reticulum, including vesicle trafficking (SEC24C), folding and assembly (SELENOS), transportation (BCAP31), and ubiquitination and degradation (BAG1, UBE2G2, and MARCH6) [12, 13]. Only DNAJC10 was downregulated at 90 months of age in breast tissue, and this gene encodes an endoplasmic reticulum co-chaperone that is part of the endoplasmic reticulum-associated degradation complex involved in recognizing and degrading misfolded proteins [13].
Tissue network analysis indicates that high expression of HIF1A regulates genes involved in energy metabolism in the lung
Within the tissue-related gene module network, four modules showed a positive correlation and two modules showed a negative correlation with the lung, and all significant module-trait relationships were negative in the muscle but positive in the breast (Figure 3d). Moreover, 99.54% of the total 8,560 tissue-related genes were related to the top 4 modules (“turquoise,” n=3833, “blue,” n=2795, “brown,” n=1052, “yellow,” n=339) (Figure 5a, Table S6), and these modules were also highly correlated with other modules; for example, “brown,” “yellow,” and “black” showed a high eigengene adjacency with each other (Figure 5b).
We applied the more stringent threshold of the absolute value of gene significance for the age and module membership measures in the top four modules to identify “hub” genes in the “turquoise,” “blue,” “brown,” and “yellow” modules. With the threshold values of |GS|>0.7 and |MM| >0.8, 34 “hub” genes were identified in the “turquoise” module. Then, 24 “hubs” were filtered from the gene significance of module-lung relationships, and 10 “hubs” were filtered from the gene significance of module-breast relationships; these were further divided into 3 clusters by hierarchical clustering, and they showed high expression levels in the breast (cluster 1), lung (cluster 2), and biceps brachii muscle (cluster 3) tissues, respectively, with distinct clustering patterns by tissue (Figure 5c).
Table 2. List “hub” genes in the consensus network for tissue
Yak ID
|
Gene symbol
|
Module color
|
Tissue
|
Gene significance
|
p-value
|
Module membership
|
p-value
|
BmuPB014336
|
EEF1G
|
turquoise
|
lung
|
-0.73907039
|
2.64E-07
|
0.826748126
|
5.20E-10
|
BmuPB017352
|
PMS1
|
turquoise
|
lung
|
-0.71599797
|
9.13E-07
|
0.850552958
|
5.12E-11
|
BmuPB000878
|
MTPAP
|
turquoise
|
lung
|
-0.715583465
|
9.33E-07
|
0.912877613
|
8.73E-15
|
BmuPB018762
|
CXHXorf58
|
turquoise
|
lung
|
-0.709433358
|
1.27E-06
|
0.82262518
|
7.50E-10
|
BmuPB014450
|
RWDD4
|
turquoise
|
lung
|
-0.708526806
|
1.33E-06
|
0.923713618
|
9.94E-16
|
BmuPB011005
|
HUS1
|
turquoise
|
lung
|
-0.707137304
|
1.43E-06
|
0.875052199
|
2.97E-12
|
BmuPB005540
|
MTUS1
|
turquoise
|
lung
|
-0.704482832
|
1.62E-06
|
0.871626441
|
4.58E-12
|
BmuPB011453
|
EBF3
|
turquoise
|
lung
|
-0.703373458
|
1.71E-06
|
0.870717052
|
5.13E-12
|
BmuPB010608
|
LAMB3
|
turquoise
|
lung
|
0.70783756
|
1.38E-06
|
-0.85067459
|
5.05E-11
|
BmuPB004299
|
C3H3orf58
|
turquoise
|
lung
|
0.708872549
|
1.31E-06
|
-0.88522675
|
7.61E-13
|
BmuPB020871
|
G6PD
|
turquoise
|
lung
|
0.708930411
|
1.30E-06
|
-0.90727231
|
2.41E-14
|
BmuPB007438
|
ZCCHC6
|
turquoise
|
lung
|
0.709740967
|
1.25E-06
|
-0.84738754
|
7.13E-11
|
BmuPB007592
|
CTDSPL
|
turquoise
|
lung
|
0.711634321
|
1.14E-06
|
-0.8813581
|
1.30E-12
|
BmuPB004894
|
HIF1A
|
turquoise
|
lung
|
0.713237417
|
1.05E-06
|
-0.87302984
|
3.84E-12
|
BmuPB020508
|
FAM122B
|
turquoise
|
lung
|
0.720139618
|
7.37E-07
|
-0.82428698
|
6.48E-10
|
BmuPB018102
|
CCDC82
|
turquoise
|
lung
|
0.720592854
|
7.20E-07
|
-0.90665968
|
2.68E-14
|
BmuPB014539
|
CNTRL
|
turquoise
|
lung
|
0.722130497
|
6.64E-07
|
-0.87149974
|
4.65E-12
|
BmuPB003882
|
VAV3
|
turquoise
|
lung
|
0.725601285
|
5.53E-07
|
-0.89496378
|
1.82E-13
|
BmuPB020153
|
EPS8L1
|
turquoise
|
lung
|
0.727495147
|
4.99E-07
|
-0.82286844
|
7.34E-10
|
BmuPB010308
|
PGM2
|
turquoise
|
lung
|
0.746811608
|
1.69E-07
|
-0.83795918
|
1.83E-10
|
BmuPB004008
|
GDAP2
|
turquoise
|
lung
|
0.754806874
|
1.05E-07
|
-0.88661303
|
6.26E-13
|
BmuPB012568
|
WASF2
|
turquoise
|
lung
|
0.757428944
|
8.92E-08
|
-0.83874427
|
1.69E-10
|
BmuPB011966
|
ACTG1
|
turquoise
|
lung
|
0.772114072
|
3.49E-08
|
-0.84373776
|
1.03E-10
|
BmuPB014173
|
STAT6
|
turquoise
|
lung
|
0.80896084
|
2.37E-09
|
-0.84981355
|
5.53E-11
|
BmuPB015106
|
MAN2A1
|
turquoise
|
breast
|
0.705124693
|
1.57E-06
|
-0.84782348
|
6.81E-11
|
BmuPB008614
|
VPS26A
|
turquoise
|
breast
|
0.731553581
|
4.01E-07
|
-0.88669366
|
6.18E-13
|
BmuPB018496
|
FAM92A
|
turquoise
|
breast
|
0.718223061
|
8.14E-07
|
-0.85598454
|
2.85E-11
|
BmuPB005078
|
RASA1
|
turquoise
|
breast
|
0.705902582
|
1.51E-06
|
-0.85869193
|
2.11E-11
|
BmuPB011476
|
FAM53B
|
turquoise
|
breast
|
-0.728187435
|
4.81E-07
|
0.848680696
|
6.23E-11
|
BmuPB008348
|
EPDR1
|
turquoise
|
breast
|
-0.710771064
|
1.19E-06
|
0.832481139
|
3.08E-10
|
BmuPB003453
|
TROVE2
|
turquoise
|
breast
|
0.72654448
|
5.26E-07
|
-0.80669368
|
2.84E-09
|
BmuPB021200
|
VWA7
|
turquoise
|
breast
|
-0.722814498
|
6.41E-07
|
0.814033145
|
1.56E-09
|
BmuPB010528
|
RNF111
|
turquoise
|
breast
|
0.737234077
|
2.92E-07
|
-0.80146106
|
4.28E-09
|
BmuPB015811
|
FRMD3
|
turquoise
|
breast
|
-0.739223421
|
2.61E-07
|
0.809147869
|
2.33E-09
|
According to the AmalTFDB 3.0 database [16], EBF3, HIF1A, and STAT6 were annotated as transcription factors. EBF3 encodes a member of the early B-cell factor (EBF) family of DNA binding transcription factors. EBF proteins are involved in B-cell differentiation, bone development, and neurogenesis and may also function as tumor suppressors [18]. STAT6 is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators [19]. HIF1A, hypoxia-inducible factor-1, functions as a master regulator of the cellular and systemic homeostatic response to hypoxia by activating the transcription of many genes, including those involved in energy metabolism, angiogenesis, and apoptosis and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia [20]. HIF1A was found to be upregulated in the lung and to potentially regulate the transcription of 2008 genes (weight >0.15), which were enriched in multiple GO biological process and molecular function categories and KEGG pathways. Notably, most of the enriched GO terms and KEGG pathways were related to energy metabolism, such as “mitochondrial respiratory chain complex I assembly,” “NADH dehydrogenase (ubiquinone) activity,” “ATP binding,” “mitochondrial translation,” “tricarboxylic acid cycle,” and “GTP binding” in GO terms and “thermogenesis,” “carbon metabolism,” and “citrate cycle TCA cycle” in KEGG pathways (Figure 5d). Mitochondria function as the primary energy producers of the cell and serve as the center of biosynthesis, the oxidative stress response, and cellular signaling, placing them at the hub of a variety of immune pathways [21]. NADH dehydrogenase is a core subunit of the mitochondrial membrane respiratory chain and believed to belong to the minimal assembly required for catalysis [22]. Protein which binds ATP or GTP, carries three phosphate groups esterified to a sugar moiety and represents energy and phosphate sources for the cell [23, 24]. The tricarboxylic acid cycle, a series of metabolic reactions in aerobic cellular respiration, occurs in the mitochondria of animals and plants, and during this cycle, acetyl-CoA, formed from pyruvate produced during glycolysis, is completely oxidized to CO2 via the interconversion of various carboxylic acids. It results in the reduction of NAD and FAD to NADH and FADH2, whose reducing power is then used indirectly in the synthesis of ATP by oxidative phosphorylation [25]. The “thermogenesis” pathway is essential for warm blooded animals, ensuring normal cellular and physiological functioning under conditions of environmental challenge [26].