Global DNA methylation and gene expression in the breast, lungs, and biceps brachii muscle at different ages
We generated the methylomes and transcriptomes of lung, breast, and biceps brachii muscle tissues from 12 female Riwoqe yaks at four different stages (n=3/stage) of development: 6 months old (MO) (young), 30 MO (pre-mature), 54 MO (mature) and 90 MO (post mature). Among these, only the post-mature yaks (90 months) were lactation, with ~3.16 kg/day milk yield. After performing sequence quality control and filtering, we obtained a single-base resolution methylome covering 85.6% (27,471,373/32,092,725) of CpG sites across the genome with an average depth of 22.5×. We first calculated pairwise Pearson's correlations of CpG sites with at least 10× coverage depth across all samples, which were well clustered by tissue types (Figure 1a). The correlation of CpG methylation levels for biological replicates was strong (median Pearson's r = 0.74), the correlation of CpG methylation levels between ages (median Pearson's r = 0.72) was relatively weaker and the correlation between tissues was weakest (median Pearson's r = 0.66) (Figure 1c). We aligned the transcriptome sequencing data for all samples to our newly assembled yak genome reference, and we subsequently obtained the transcripts. In total, we obtained 2,0504 transcripts, that were then annotated to the Gene Ontology (GO) [6], InterPro [7], Kyoto Encyclopedia of Genes and Genomes (KEGG) [8], Swiss-Prot [9], and TrEMBL [10] databases (Table S1). We also calculated pairwise Pearson's correlations of all transcripts and obtained similar results to those of DNA methylation (Figure 1b). Biological replicates showed the highest correlation coefficients, while different tissues showed the lowest correlation coefficients (Figure 1d).
Differentially methylated regions among the age groups
We determined differentially methylated regions (DMRs) across age groups within the breast, lung, and biceps brachii muscle tissues (Table S2-4). Within the lung and biceps brachii muscle tissues, age groups did not differ in age-related DMRs (A-DMRs), but post-mature breast tissue had considerably more DMRs (155,957) than the three younger age groups (Figure 2a). We then investigated the correlations between DMRs and their corresponding differentially expressed genes. The ratio of negatively to positively correlated gene pairs was 1.02 for promoters with DMRs and 0.95 for gene bodies with DMRs. Not every methylation was correlated with the expression of its associated gene, due to the gene regulation complexity [11].
At ~90 months, ~120 days after giving birth for the third time, yaks are in the lactation period (milk yield of ~3.16kg/day)(Li & Jiang, 2019), so it is possible that the observed methylation partially controls yak lactation. Since promoter methylation decreases gene expression [11], we selected 375 hypomethylated promoter genes (highly expressed) along with 207 hypermethylated promoter genes (lowly expressed) from post-mature yak breast tissues. The hypomethylated (highly expressed) genes were only enriched in“protein processing in endoplasmic reticulum (ER)” (9 genes, 2.964-fold enrichment, p = 0.0049). Specifically, the genes are involved in vesicle trafficking (SEC23B), oligosaccharide linking (MOGS, RPN2), folding and assembly (HSPA5), transportation (LMAN2, SEL1L), and ubiquitination and degradation (UBE2J1, UBE2J2, DERL2) [12, 13]. These genes were significantly upregulated at 90 months in breast tissues, but not in lung and biceps brachii muscle tissues (Figure 2e). Based on this data, methylation might help regulate milk production by influencing protein processing in the endoplasmic reticulum during the lactation period.
We also examined A-DMRs that overlapped across age groups. Young and post-mature tissues rarely shared A-DMRs when comparing the lung and muscle tissues (for young tissues, muscle: 586 A-DMRs, breast: 2,249 A-DMRs, lung: 496 A-DMRs; for post-mature tissues, muscle: 470 A-DMRs, breast: 12,050 A-DMRs, lung: 772 A-DMRs) (Figure 2b, c). Pre-mature and mature stages also rarely shared A-DMRs across muscle and lung tissues (Figure 2d), suggesting that methylation patterns were already established at the young stage and that no extensively divergent epigenetic difference occurred across different age groups under natural high-altitude conditions.
Consensus network analysis for tissues and age groups
We first performed a multi-way ANOVA test for each gene across all samples (n=36) to test the null hypothesis that the gene expression level did not differ among age groups and tissues. At the threshold for significance (p<0.05), 417 age-related and 8,560 tissue-related genes were selected for further weighted gene correlation network analysis (WGCNA), which uses network topography to group genes into modules based on correlations [14]. Next, we conducted WGCNA for tissue- and age-related gene expression respectively to identify a “consensus network”–a common pattern of genes that are correlated in all conditions. We performed a consensus network, module statistic, and eigengene network analyses to identify modules, to assess relationships between modules and traits, and to study the relationships between co-expression modules [15]. The consensus networks identified for tissues and age groups had clearly delineated modules (Figure 3a, 3b), and the modules identified were significantly correlated with tissues and age groups (Figure 3c, 3d).
Age network analysis indicates that ZGPAT might regulate milk production
Within the age-related gene network, the largest module (“turquoise”, n=356) had negative correlation for breast tissue (r=-0.57, p=3e-04) and age (r= 0.37, p=0.03) and a positive correlation for biceps brachii muscle tissues (Table S5). The breast tissue had a stronger signal than that of age and may have overwhelmed the signal from age. Genes in this module were enriched in the GO categories of “protein polyubiquitination,” “RNA polymerase II core promoter proximal region sequence-specific DNA binding,” “ATP binding,” “transcription, DNA-templated,” and “negative regulation of transcription from RNA polymerase II promoter” (p-values of 0.000344, 0.000822, 0.000991, 0.00139, and 0.00244, respectively). The “blue” (n=48) and “grey” (n=13) modules showed only a negative correlation with age (r= -0.58, p= 2e-04; r= -0.76, p= 6e-08, respectively) and exhibited no enrichment of GO categories for genes. After applying the threshold of the absolute value of gene significance for age (|GS| >0.5) and module membership measures (|MM| >0.6) in each module, we defined 20 and 7 “hub” genes in the “turquoise” and “blue” modules (Table 1). The gene expression of the “hub” genes was well clustered by modules, which was consistent with the negative correlation with age (“turquoise” r=0.37, “blue” r=-0.58). The upregulated expression level of the “hubs” in breast tissue at 90 months old indicated that the “turquoise” module had a stronger correlation with breast tissue than with age (Figure 4a).
Table 1. List of “hub” genes in the consensus network for age
Yak ID
|
Gene symbol
|
Module color
|
Gene significance
|
p-value
|
Module membership
|
p-value
|
BmuPB009868
|
AP3D1
|
turquoise
|
0.513965075
|
0.001344116
|
0.868945999
|
6.37E-12
|
BmuPB004083
|
TMEM30A
|
turquoise
|
0.505722627
|
0.001652657
|
0.848054843
|
6.65E-11
|
BmuPB000726
|
DDRGK1
|
turquoise
|
0.53596122
|
0.000754414
|
0.842070133
|
1.22E-10
|
BmuPB019011
|
OTUD3
|
turquoise
|
0.543916749
|
0.000606215
|
0.828680521
|
4.37E-10
|
BmuPB000091
|
CNPPD1
|
turquoise
|
0.51195228
|
0.001414359
|
0.814537409
|
1.50E-09
|
BmuPB006815
|
TOR1AIP1
|
turquoise
|
0.544705745
|
0.000593032
|
0.809789221
|
2.21E-09
|
BmuPB007137
|
MGAT4A
|
turquoise
|
0.545969144
|
0.000572452
|
0.798170686
|
5.51E-09
|
BmuPB007385
|
HECA
|
turquoise
|
0.603273223
|
9.84E-05
|
0.770629907
|
3.85E-08
|
BmuPB019443
|
SYAP1
|
turquoise
|
0.500377628
|
0.001884511
|
0.733120646
|
3.68E-07
|
BmuPB009825
|
PHAX
|
turquoise
|
0.561938984
|
0.00036184
|
0.70497442
|
1.59E-06
|
BmuPB008032
|
UBE2G2
|
turquoise
|
0.52385538
|
0.001041698
|
0.70208134
|
1.83E-06
|
BmuPB012517
|
SYF2
|
turquoise
|
0.568328068
|
0.000299194
|
0.699367268
|
2.08E-06
|
BmuPB001610
|
FURIN
|
turquoise
|
0.567906296
|
0.000303008
|
0.697046314
|
2.32E-06
|
BmuPB000679
|
ZGPAT
|
turquoise
|
0.504624296
|
0.001698141
|
0.683958391
|
4.25E-06
|
BmuPB007049
|
SKP2
|
turquoise
|
-0.527608908
|
0.000943732
|
-0.669790419
|
7.91E-06
|
BmuPB000224
|
C2orf6
|
turquoise
|
0.531191165
|
0.000857925
|
0.652871802
|
1.59E-05
|
BmuPB007420
|
TAB2
|
turquoise
|
0.505905132
|
0.001645204
|
0.652844304
|
1.59E-05
|
BmuPB018569
|
ORAOV1
|
turquoise
|
0.51035386
|
0.001472421
|
0.637043276
|
2.95E-05
|
BmuPB010550
|
CCPG1
|
turquoise
|
0.644779605
|
2.19E-05
|
0.612874899
|
7.08E-05
|
BmuPB012521
|
TMEM57
|
turquoise
|
0.57317319
|
0.000258349
|
0.60967979
|
7.91E-05
|
BmuPB013324
|
MCM3
|
blue
|
-0.583585649
|
0.000186982
|
0.841505398
|
1.29E-10
|
BmuPB010064
|
SPC24
|
blue
|
-0.509962181
|
0.001486965
|
0.744497748
|
1.93E-07
|
BmuPB003102
|
C17orf49
|
blue
|
-0.526803776
|
0.000964031
|
0.741805787
|
2.26E-07
|
BmuPB015902
|
SERPINH1
|
blue
|
-0.607769763
|
8.44E-05
|
0.721017606
|
7.04E-07
|
BmuPB016582
|
SRPX2
|
blue
|
-0.51838468
|
0.001200558
|
0.698151166
|
2.20E-06
|
BmuPB012996
|
UCK2
|
blue
|
-0.588274454
|
0.000161067
|
0.677420455
|
5.68E-06
|
BmuPB015372
|
UMPS
|
blue
|
-0.503577413
|
0.001742514
|
0.648111658
|
1.92E-05
|
We used the AnimalTFDB 3.0 database [16] to examine transcription factors in these 27 “hubs” and found that ZGPAT encodes a transcription regulator protein and was significantly upregulated in breast tissue at 90 months of age (Figure 4a). Previous study reported that this protein specifically binds the 5'-GGAG[GA]A[GA]A-3' consensus sequence and represses transcription by recruiting the chromatin multi-complex NuRD to target promoters [17]. ZGPAT was highly expressed in breast tissue at 90 months, and it potentially regulated the transcription of 280 genes (weight >0.15) in the network from the “turquoise” module. In order to identify the most important cellular activities controlled by this TF regulatory network, we analyzed over-represented GO biological process and molecular function terms, as well as KEGG pathways. These potential target genes were enriched in the GO categories of “protein binding,” “ATP binding,” and “zinc ion binding,” among others, and the KEGG categories of “aminoacyl-tRNA biosynthesis,” “autophagy animal,” and “protein processing in endoplasmic reticulum” (Figure 4b). These enriched GO terms and KEGG pathways likely help regulate protein synthesis, processing, and secretion in breast tissue. For example, 6 of 7 genes from the “protein processing in endoplasmic reticulum” category were also upregulated at 90 months of age in breast tissue (Figure 4c) and involved in multiple processes in the endoplasmic reticulum, including vesicle trafficking (SEC24C), folding and assembly (SELENOS), transportation (BCAP31), and ubiquitination and degradation (BAG1, UBE2G2, and MARCH6) [12, 13]. Only DNAJC10 was downregulated at 90 months of age in breast tissue. This gene encodes an endoplasmic reticulum co-chaperone that is part of the endoplasmic reticulum-associated degradation complex involved in recognizing and degrading misfolded proteins [13].
Tissue network analysis indicates regulative role of HIF1A in lung
Within the tissue-related gene module network, four modules showed a positive correlation and two showed a negative correlation with the lung; all significant module-trait relationships were negative in the muscle but positive in the breast (Figure 3d). Moreover, 99.54% of the total 8,560 tissue-related genes were related to the top 4 modules (“turquoise,” n=3833; “blue,” n=2795; “brown,” n=1052; “yellow,” n=339) (Figure 5a, Table S6), and these modules were also highly correlated with other modules; for example, “brown,” “yellow,” and “black” showed a high eigengene adjacency with each other (Figure 5b).
We applied the more stringent threshold absolute value of gene significance for the age and module membership measures in the top four modules to identify “hub” genes in the “turquoise,” “blue,” “brown,” and “yellow” modules. With the threshold values of |GS|>0.7 and |MM| >0.8, 34 “hub” genes were identified in the “turquoise” module. Twenty-four hubs were then filtered from the gene significance of module-lung relationships, and 10 were filtered from the gene significance of module-breast relationships. These were further divided into 3 clusters by hierarchical clustering, which showed high expression levels in the breast (cluster 1), lung (cluster 2), and biceps brachii muscle (cluster 3) tissues, with distinct clustering patterns by tissue (Figure 5c).
Table 2. List of “hub” genes in the consensus network for tissue
Yak ID
|
Gene symbol
|
Module color
|
Tissue
|
Gene significance
|
p-value
|
Module membership
|
p-value
|
BmuPB014336
|
EEF1G
|
turquoise
|
lung
|
-0.73907039
|
2.64E-07
|
0.826748126
|
5.20E-10
|
BmuPB017352
|
PMS1
|
turquoise
|
lung
|
-0.71599797
|
9.13E-07
|
0.850552958
|
5.12E-11
|
BmuPB000878
|
MTPAP
|
turquoise
|
lung
|
-0.715583465
|
9.33E-07
|
0.912877613
|
8.73E-15
|
BmuPB018762
|
CXHXorf58
|
turquoise
|
lung
|
-0.709433358
|
1.27E-06
|
0.82262518
|
7.50E-10
|
BmuPB014450
|
RWDD4
|
turquoise
|
lung
|
-0.708526806
|
1.33E-06
|
0.923713618
|
9.94E-16
|
BmuPB011005
|
HUS1
|
turquoise
|
lung
|
-0.707137304
|
1.43E-06
|
0.875052199
|
2.97E-12
|
BmuPB005540
|
MTUS1
|
turquoise
|
lung
|
-0.704482832
|
1.62E-06
|
0.871626441
|
4.58E-12
|
BmuPB011453
|
EBF3
|
turquoise
|
lung
|
-0.703373458
|
1.71E-06
|
0.870717052
|
5.13E-12
|
BmuPB010608
|
LAMB3
|
turquoise
|
lung
|
0.70783756
|
1.38E-06
|
-0.85067459
|
5.05E-11
|
BmuPB004299
|
C3H3orf58
|
turquoise
|
lung
|
0.708872549
|
1.31E-06
|
-0.88522675
|
7.61E-13
|
BmuPB020871
|
G6PD
|
turquoise
|
lung
|
0.708930411
|
1.30E-06
|
-0.90727231
|
2.41E-14
|
BmuPB007438
|
ZCCHC6
|
turquoise
|
lung
|
0.709740967
|
1.25E-06
|
-0.84738754
|
7.13E-11
|
BmuPB007592
|
CTDSPL
|
turquoise
|
lung
|
0.711634321
|
1.14E-06
|
-0.8813581
|
1.30E-12
|
BmuPB004894
|
HIF1A
|
turquoise
|
lung
|
0.713237417
|
1.05E-06
|
-0.87302984
|
3.84E-12
|
BmuPB020508
|
FAM122B
|
turquoise
|
lung
|
0.720139618
|
7.37E-07
|
-0.82428698
|
6.48E-10
|
BmuPB018102
|
CCDC82
|
turquoise
|
lung
|
0.720592854
|
7.20E-07
|
-0.90665968
|
2.68E-14
|
BmuPB014539
|
CNTRL
|
turquoise
|
lung
|
0.722130497
|
6.64E-07
|
-0.87149974
|
4.65E-12
|
BmuPB003882
|
VAV3
|
turquoise
|
lung
|
0.725601285
|
5.53E-07
|
-0.89496378
|
1.82E-13
|
BmuPB020153
|
EPS8L1
|
turquoise
|
lung
|
0.727495147
|
4.99E-07
|
-0.82286844
|
7.34E-10
|
BmuPB010308
|
PGM2
|
turquoise
|
lung
|
0.746811608
|
1.69E-07
|
-0.83795918
|
1.83E-10
|
BmuPB004008
|
GDAP2
|
turquoise
|
lung
|
0.754806874
|
1.05E-07
|
-0.88661303
|
6.26E-13
|
BmuPB012568
|
WASF2
|
turquoise
|
lung
|
0.757428944
|
8.92E-08
|
-0.83874427
|
1.69E-10
|
BmuPB011966
|
ACTG1
|
turquoise
|
lung
|
0.772114072
|
3.49E-08
|
-0.84373776
|
1.03E-10
|
BmuPB014173
|
STAT6
|
turquoise
|
lung
|
0.80896084
|
2.37E-09
|
-0.84981355
|
5.53E-11
|
BmuPB015106
|
MAN2A1
|
turquoise
|
breast
|
0.705124693
|
1.57E-06
|
-0.84782348
|
6.81E-11
|
BmuPB008614
|
VPS26A
|
turquoise
|
breast
|
0.731553581
|
4.01E-07
|
-0.88669366
|
6.18E-13
|
BmuPB018496
|
FAM92A
|
turquoise
|
breast
|
0.718223061
|
8.14E-07
|
-0.85598454
|
2.85E-11
|
BmuPB005078
|
RASA1
|
turquoise
|
breast
|
0.705902582
|
1.51E-06
|
-0.85869193
|
2.11E-11
|
BmuPB011476
|
FAM53B
|
turquoise
|
breast
|
-0.728187435
|
4.81E-07
|
0.848680696
|
6.23E-11
|
BmuPB008348
|
EPDR1
|
turquoise
|
breast
|
-0.710771064
|
1.19E-06
|
0.832481139
|
3.08E-10
|
BmuPB003453
|
TROVE2
|
turquoise
|
breast
|
0.72654448
|
5.26E-07
|
-0.80669368
|
2.84E-09
|
BmuPB021200
|
VWA7
|
turquoise
|
breast
|
-0.722814498
|
6.41E-07
|
0.814033145
|
1.56E-09
|
BmuPB010528
|
RNF111
|
turquoise
|
breast
|
0.737234077
|
2.92E-07
|
-0.80146106
|
4.28E-09
|
BmuPB015811
|
FRMD3
|
turquoise
|
breast
|
-0.739223421
|
2.61E-07
|
0.809147869
|
2.33E-09
|
According to the AmalTFDB 3.0 database [16], EBF3, HIF1A, and STAT6 were annotated as transcription factors. EBF3 encodes a member of the early B-cell factor (EBF) family of DNA binding transcription factors. EBF proteins are involved in B-cell differentiation, bone development, and neurogenesis and they may also function as tumor suppressors [18]. STAT6 is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor-associated kinases. They form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators [19]. HIF1A, hypoxia-inducible factor-1, functions as a master regulator of the cellular and systemic homeostatic response to hypoxia by activating the transcription of genes, involved in energy metabolism, angiogenesis, and apoptosis, as well as other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia [20]. HIF1A was upregulated in the lung to potentially control the transcription of 2008 genes (weight >0.15), which were enriched in multiple GO biological process categories, molecular function categories and KEGG pathways. Most of the enriched GO terms and KEGG pathways were related to energy metabolism. Specifically, enriched GO terms included “mitochondrial respiratory chain complex I assembly,” “NADH dehydrogenase (ubiquinone) activity,” “ATP binding,” “mitochondrial translation,” “tricarboxylic acid cycle,” and “GTP binding” while enriched KEGG pathways included “thermogenesis,” “carbon metabolism,” and “citrate cycle TCA cycle” (Figure 5d). Mitochondria function as the primary energy producers of the cell and serves as the hub for a variety of immune pathways as the center of biosynthesis, oxidative stress response, and cellular signaling [21]. NADH dehydrogenase is a core subunit of the mitochondrial membrane respiratory chain and is believed to contribute to the minimal assembly required for catalysis [22]. The protein which binds ATP or GTP, carries three phosphate groups esterified to a sugar moiety and provides energy and phosphate sources for the cell [23, 24]. The tricarboxylic acid cycle, a series of metabolic reactions in aerobic cellular respiration, occurs in the mitochondria of animals and plants. During this cycle, acetyl-CoA, which is formed from pyruvate produced during glycolysis, is completely oxidized to CO2 via interconversion of various carboxylic acids. This results in the reduction of NAD and FAD to NADH and FADH2, respectively, and indirectly in the synthesis of ATP by oxidative phosphorylation [25]. The “thermogenesis” pathway is essential for warm-blooded animals, because it ensures normal cellular and physiological functions under challenging environmental conditions [26].