Global DNA methylation and gene expression in the breast, lungs, and biceps brachii muscle at different ages
We generated the methylomes and transcriptomes of lung, breast, and biceps brachii muscle tissues at four different month ages (MA), representing four life stages with 3 replicates (MA = 6, 30, 54, and 90 months; childhood, juvenile, youth, and adult), from a total of 12 female Riwoqe yaks. After performing sequence quality control and filtering, we obtained a single-base resolution methylome covering 85.6% (27,471,373/32,092,725) of CpG sites across the genome with an average depth of 22.5×. We first calculated pairwise Pearson’s correlations of CpG sites with at least 10 × coverage depth across all samples, and the samples were well clustered by tissue (Fig. 1a). Biological replicates highly correlated with each other (median Pearson’s r = 0.74), and the correlation between ages (median Pearson’s r = 0.72) was relatively weaker and showed the lowest coefficients among tissues (median Pearson’s r = 0.66) (Fig. 1c). The transcriptome sequencing data of all samples were aligned to our newly assembled yak genome reference (unpublished), and subsequently obtained the transcripts. In total, we obtained a total of 2,0504 transcripts, which were annotated to the Gene Ontology (GO) [6], InterPro [7], Kyoto Encyclopedia of Genes and Genomes (KEGG) [8], Swiss-Prot [9], and TrEMBL [10] databases (Table S1). We also calculated pairwise Pearson’s correlations of all transcripts and obtained similar results as with DNA methylation (Fig. 1b). Biological replicates showed the highest correlation coefficients, while different tissues showed the lowest correlation coefficients (Fig. 1d).
Differential DNA methylation among the age groups was involved in milk production
We looked for differentially methylated regions (DMRs) across age groups within the breast, lungs, and gluteal muscle (Table S2-4). Within the lung and gluteal-muscle tissues, age groups did not differ in age-related DMRs (A-DMRs), but adult breast tissue had considerably more differential methylation than the three younger age groups (Fig. 2a). At ~ 90 months, yaks enter the lactation period, so it is possible that the observed methylation may at least partially control yak lactation. We then selected 375 hypomethylated (highly expressed) genes along with 207 hypermethylated (lowly expressed) genes from adult yak breast tissue. The hypomethylated genes were enriched in only “protein processing in endoplasmic reticulum (ER)” (nine genes, 2.964-fold enrichment, p = 0.0049). Specifically, the genes were involved in vesicle trafficking (SEC23B), oligosaccharide linking (MOGS, RPN2), folding and assembly (HSPA5), transportation (LMAN2, SEL1L), and ubiquitination and degradation (UBE2J1, UBE2J2, DERL2) [11, 12]. Thus, methylation regulates milk production by influencing protein processing in the endoplasmic reticulum during the vigorous period of lactation.
We examined A-DMRs that overlapped across age groups. Childhood and adult tissues rarely shared A-DMRs when comparing the lung and muscle tissues (childhood, muscle: 586 A-DMRs, breast: 2,278, lung: 496; adult, muscle: 471, breast: 12,050, lung: 772) (Fig. 2b, c). Juvenile and youth stages also rarely shared A-DMRs across muscle and lung tissues (Fig. 2d), suggesting that methylation patterns were already established at the childhood stage and that no extensively divergent epigenetic difference occurred through the different ages under natural high-altitude conditions.
Consensus Network Analysis For Tissues And Age Groups
We first performed a multi-way ANOVA test for each gene among all samples (n = 36), to test the null hypothesis that the gene expression level did not differ among age groups and tissues. At the threshold for significance (p < 0.05), 417 age-related and 8,560 tissue-related genes were selected for further weighted gene correlation network analysis (WGCNA), which takes advantage of the correlations among genes and groups genes into modules using network topology [13]. Subsequently, we conducted WGCNA for tissue- and age-related gene expression separately to identify a “consensus network”–a common pattern of genes that are correlated in all conditions. We performed consensus network, module statistic, and eigengene network analyses to identify modules, assess relationships between modules and traits, and study the relationships between co-expression modules [14]. The consensus networks identified for tissues and age groups had clearly delineated modules (Fig. 1a, 1b), and the modules identified were significantly correlated with tissues and age groups (Fig. 1c, 1d).
Age network analysis indicates that ZGPAT might regulate milk production
Within the age-related gene network, the largest module (“turquoise”, n = 356) had opposite directions of correlation for breast tissue (r=-0.57, p = 3e-04) and age (r = 0.37, p = 0.03) and showed a positive correlation with biceps brachii muscle (Table S5). The breast tissue had a stronger signal than age and possibly overwhelmed the signal from age. Genes in this module were enriched in the GO categories of “protein polyubiquitination,” “RNA polymerase II core promoter proximal region sequence-specific DNA binding,” “ATP binding,” “transcription, DNA-templated,” and “negative regulation of transcription from RNA polymerase II promoter” (p-values of 0.000344, 0.000822, 0.000991, 0.00139, and 0.00244, respectively). The “blue” (n = 48) and “grey” (n = 13) modules showed only a negative correlation with age (r= -0.58, p = 2e-04; r= -0.76, p = 6e-08, respectively) and exhibited no enrichment of GO categories for genes. After applying the threshold of the absolute value of gene significance for age (|GS| >0.5) and module membership measures (|MM| >0.6) in each module, we defined 20 and 7 “hub” genes in the “turquoise” and “blue” modules (Table 1). The gene expression of the “hub” genes was well clustered by modules, which was consistent with the opposite directions of correlation with age (“turquoise” r = 0.37, “blue” r=-0.58). The upregulated expression level of the “hubs” in breast tissue at 90 months of age indicated that the “turquoise” module had a stronger correlation with breast tissue than with age (Fig. 4a).
Table 1
List “hub” genes in the consensus network for age
Yak ID | Gene symbol | Module color | Gene significance | p-value | Module membership | p-value |
BmuPB009868 | AP3D1 | turquoise | 0.513965075 | 0.001344116 | 0.868945999 | 6.37E-12 |
BmuPB004083 | TMEM30A | turquoise | 0.505722627 | 0.001652657 | 0.848054843 | 6.65E-11 |
BmuPB000726 | DDRGK1 | turquoise | 0.53596122 | 0.000754414 | 0.842070133 | 1.22E-10 |
BmuPB019011 | OTUD3 | turquoise | 0.543916749 | 0.000606215 | 0.828680521 | 4.37E-10 |
BmuPB000091 | CNPPD1 | turquoise | 0.51195228 | 0.001414359 | 0.814537409 | 1.50E-09 |
BmuPB006815 | TOR1AIP1 | turquoise | 0.544705745 | 0.000593032 | 0.809789221 | 2.21E-09 |
BmuPB007137 | MGAT4A | turquoise | 0.545969144 | 0.000572452 | 0.798170686 | 5.51E-09 |
BmuPB007385 | HECA | turquoise | 0.603273223 | 9.84E-05 | 0.770629907 | 3.85E-08 |
BmuPB019443 | SYAP1 | turquoise | 0.500377628 | 0.001884511 | 0.733120646 | 3.68E-07 |
BmuPB009825 | PHAX | turquoise | 0.561938984 | 0.00036184 | 0.70497442 | 1.59E-06 |
BmuPB008032 | UBE2G2 | turquoise | 0.52385538 | 0.001041698 | 0.70208134 | 1.83E-06 |
BmuPB012517 | SYF2 | turquoise | 0.568328068 | 0.000299194 | 0.699367268 | 2.08E-06 |
BmuPB001610 | FURIN | turquoise | 0.567906296 | 0.000303008 | 0.697046314 | 2.32E-06 |
BmuPB000679 | ZGPAT | turquoise | 0.504624296 | 0.001698141 | 0.683958391 | 4.25E-06 |
BmuPB007049 | SKP2 | turquoise | -0.527608908 | 0.000943732 | -0.669790419 | 7.91E-06 |
BmuPB000224 | C2orf6 | turquoise | 0.531191165 | 0.000857925 | 0.652871802 | 1.59E-05 |
BmuPB007420 | Table 2 | turquoise | 0.505905132 | 0.001645204 | 0.652844304 | 1.59E-05 |
BmuPB018569 | ORAOV1 | turquoise | 0.51035386 | 0.001472421 | 0.637043276 | 2.95E-05 |
BmuPB010550 | CCPG1 | turquoise | 0.644779605 | 2.19E-05 | 0.612874899 | 7.08E-05 |
BmuPB012521 | TMEM57 | turquoise | 0.57317319 | 0.000258349 | 0.60967979 | 7.91E-05 |
BmuPB013324 | MCM3 | blue | -0.583585649 | 0.000186982 | 0.841505398 | 1.29E-10 |
BmuPB010064 | SPC24 | blue | -0.509962181 | 0.001486965 | 0.744497748 | 1.93E-07 |
BmuPB003102 | C17orf49 | blue | -0.526803776 | 0.000964031 | 0.741805787 | 2.26E-07 |
BmuPB015902 | SERPINH1 | blue | -0.607769763 | 8.44E-05 | 0.721017606 | 7.04E-07 |
BmuPB016582 | SRPX2 | blue | -0.51838468 | 0.001200558 | 0.698151166 | 2.20E-06 |
BmuPB012996 | UCK2 | blue | -0.588274454 | 0.000161067 | 0.677420455 | 5.68E-06 |
BmuPB015372 | UMPS | blue | -0.503577413 | 0.001742514 | 0.648111658 | 1.92E-05 |
We then used the AnimalTFDB 3.0 database [15] to examine transcription factors in these 27 “hubs” and found that ZGPAT encodes a transcription regulator protein and was significantly upregulated in breast tissue at 90 months of age (Fig. 4a). previous study reported that this protein specifically binds the 5'-GGAG[GA]A[GA]A-3' consensus sequence and represses transcription by recruiting the chromatin multi-complex NuRD to target promoters [16]. High expression of ZGPAT in breast tissue at 90 months of age was observed, and it potentially regulated the transcription of 280 genes (weight > 0.15) in the network from the “turquoise” module. To identify the most important cellular activities controlled by this TF regulatory network, we analyzed over-represented GO biological process and molecular function terms and KEGG pathways. These potential target genes were enriched in the GO categories of “protein binding,” “ATP binding,” and “zinc ion binding,” among others, and the KEGG categories of “aminoacyl-tRNA biosynthesis,” “autophagy animal,” and “protein processing in endoplasmic reticulum” (Fig. 4b). These enriched GO terms and KEGG pathways are likely to play roles in regulating protein synthesis, processing, and secretion in breast tissue. For example, 6 of 7 genes from “protein processing in endoplasmic reticulum” were also upregulated at 90 months of age in breast tissue (Fig. 4c) and involved in multiple processes in the endoplasmic reticulum, including vesicle trafficking (SEC24C), folding and assembly (SELENOS), transportation (BCAP31), and ubiquitination and degradation (BAG1, UBE2G2, and MARCH6) [11, 12]. Only DNAJC10 was downregulated at 90 months of age in breast tissue, and this gene encodes an endoplasmic reticulum co-chaperone that is part of the endoplasmic reticulum-associated degradation complex involved in recognizing and degrading misfolded proteins [12].
Tissue network analysis indicates that high expression of HIF1A regulates energy metabolism in the lung
Within the tissue-related gene module network, four modules showed a positive correlation and two modules showed a negative correlation with the lung, and all significant module-trait relationships were negative in the muscle but positive in the breast (Fig. 3d). Moreover, 99.54% of the total 8,560 tissue-related genes were related to the top 4 modules (“turquoise,” n = 3833, “blue,” n = 2795, “brown,” n = 1052, “yellow,” n = 339) (Fig. 5a, Table S6), and these modules were also highly correlated with other modules; for example, “brown,” “yellow,” and “black” showed a high eigengene adjacency with each other (Fig. 5b).
We applied the more stringent threshold of the absolute value of gene significance for the age and module membership measures in the top four modules to identify “hub” genes in the “turquoise,” “blue,” “brown,” and “yellow” modules. With the threshold values of |GS|>0.7 and |MM| >0.8, 34 “hub” genes were identified in the “turquoise” module. Then, 24 “hubs” were filtered from the gene significance of module-lung relationships, and 10 “hubs” were filtered from the gene significance of module-breast relationships; these were further divided into 3 clusters by hierarchical clustering, and they showed high expression levels in the breast (cluster 1), lung (cluster 2), and biceps brachii muscle (cluster 3) tissues, respectively, with distinct clustering patterns by tissue (Fig. 5c).
Table 2
List “hub” genes in the consensus network for tissue
Yak ID | Gene symbol | Module color | Tissue | Gene significance | p-value | Module membership | p-value |
BmuPB014336 | EEF1G | turquoise | lung | -0.73907039 | 2.64E-07 | 0.826748126 | 5.20E-10 |
BmuPB017352 | PMS1 | turquoise | lung | -0.71599797 | 9.13E-07 | 0.850552958 | 5.12E-11 |
BmuPB000878 | MTPAP | turquoise | lung | -0.715583465 | 9.33E-07 | 0.912877613 | 8.73E-15 |
BmuPB018762 | CXHXorf58 | turquoise | lung | -0.709433358 | 1.27E-06 | 0.82262518 | 7.50E-10 |
BmuPB014450 | RWDD4 | turquoise | lung | -0.708526806 | 1.33E-06 | 0.923713618 | 9.94E-16 |
BmuPB011005 | HUS1 | turquoise | lung | -0.707137304 | 1.43E-06 | 0.875052199 | 2.97E-12 |
BmuPB005540 | MTUS1 | turquoise | lung | -0.704482832 | 1.62E-06 | 0.871626441 | 4.58E-12 |
BmuPB011453 | EBF3 | turquoise | lung | -0.703373458 | 1.71E-06 | 0.870717052 | 5.13E-12 |
BmuPB010608 | LAMB3 | turquoise | lung | 0.70783756 | 1.38E-06 | -0.85067459 | 5.05E-11 |
BmuPB004299 | C3H3orf58 | turquoise | lung | 0.708872549 | 1.31E-06 | -0.88522675 | 7.61E-13 |
BmuPB020871 | G6PD | turquoise | lung | 0.708930411 | 1.30E-06 | -0.90727231 | 2.41E-14 |
BmuPB007438 | ZCCHC6 | turquoise | lung | 0.709740967 | 1.25E-06 | -0.84738754 | 7.13E-11 |
BmuPB007592 | CTDSPL | turquoise | lung | 0.711634321 | 1.14E-06 | -0.8813581 | 1.30E-12 |
BmuPB004894 | HIF1A | turquoise | lung | 0.713237417 | 1.05E-06 | -0.87302984 | 3.84E-12 |
BmuPB020508 | FAM122B | turquoise | lung | 0.720139618 | 7.37E-07 | -0.82428698 | 6.48E-10 |
BmuPB018102 | CCDC82 | turquoise | lung | 0.720592854 | 7.20E-07 | -0.90665968 | 2.68E-14 |
BmuPB014539 | CNTRL | turquoise | lung | 0.722130497 | 6.64E-07 | -0.87149974 | 4.65E-12 |
BmuPB003882 | VAV3 | turquoise | lung | 0.725601285 | 5.53E-07 | -0.89496378 | 1.82E-13 |
BmuPB020153 | EPS8L1 | turquoise | lung | 0.727495147 | 4.99E-07 | -0.82286844 | 7.34E-10 |
BmuPB010308 | PGM2 | turquoise | lung | 0.746811608 | 1.69E-07 | -0.83795918 | 1.83E-10 |
BmuPB004008 | GDAP2 | turquoise | lung | 0.754806874 | 1.05E-07 | -0.88661303 | 6.26E-13 |
BmuPB012568 | WASF2 | turquoise | lung | 0.757428944 | 8.92E-08 | -0.83874427 | 1.69E-10 |
BmuPB011966 | ACTG1 | turquoise | lung | 0.772114072 | 3.49E-08 | -0.84373776 | 1.03E-10 |
BmuPB014173 | STAT6 | turquoise | lung | 0.80896084 | 2.37E-09 | -0.84981355 | 5.53E-11 |
BmuPB015106 | MAN2A1 | turquoise | breast | 0.705124693 | 1.57E-06 | -0.84782348 | 6.81E-11 |
BmuPB008614 | VPS26A | turquoise | breast | 0.731553581 | 4.01E-07 | -0.88669366 | 6.18E-13 |
BmuPB018496 | FAM92A | turquoise | breast | 0.718223061 | 8.14E-07 | -0.85598454 | 2.85E-11 |
BmuPB005078 | RASA1 | turquoise | breast | 0.705902582 | 1.51E-06 | -0.85869193 | 2.11E-11 |
BmuPB011476 | FAM53B | turquoise | breast | -0.728187435 | 4.81E-07 | 0.848680696 | 6.23E-11 |
BmuPB008348 | EPDR1 | turquoise | breast | -0.710771064 | 1.19E-06 | 0.832481139 | 3.08E-10 |
BmuPB003453 | TROVE2 | turquoise | breast | 0.72654448 | 5.26E-07 | -0.80669368 | 2.84E-09 |
BmuPB021200 | VWA7 | turquoise | breast | -0.722814498 | 6.41E-07 | 0.814033145 | 1.56E-09 |
BmuPB010528 | RNF111 | turquoise | breast | 0.737234077 | 2.92E-07 | -0.80146106 | 4.28E-09 |
BmuPB015811 | FRMD3 | turquoise | breast | -0.739223421 | 2.61E-07 | 0.809147869 | 2.33E-09 |
According to the AmalTFDB 3.0 database [15], EBF3, HIF1A, and STAT6 were annotated as transcription factors. EBF3 encodes a member of the early B-cell factor (EBF) family of DNA binding transcription factors. EBF proteins are involved in B-cell differentiation, bone development, and neurogenesis and may also function as tumor suppressors [17]. STAT6 is a member of the STAT family of transcription factors. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators [18]. HIF1A, hypoxia-inducible factor-1, functions as a master regulator of the cellular and systemic homeostatic response to hypoxia by activating the transcription of many genes, including those involved in energy metabolism, angiogenesis, and apoptosis and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia [19]. HIF1A was found to be upregulated in the lung and to potentially regulate the transcription of 2008 genes (weight > 0.15), which were enriched in multiple GO biological process and molecular function categories and KEGG pathways. Notably, most of the enriched GO terms and KEGG pathways were related to energy metabolism, such as “mitochondrial respiratory chain complex I assembly,” “NADH dehydrogenase (ubiquinone) activity,” “ATP binding,” “mitochondrial translation,” “tricarboxylic acid cycle,” and “GTP binding” in GO terms and “thermogenesis,” “carbon metabolism,” and “citrate cycle TCA cycle” in KEGG pathways (Fig. 5d). Mitochondria function as the primary energy producers of the cell and serve as the center of biosynthesis, the oxidative stress response, and cellular signaling, placing them at the hub of a variety of immune pathways [20]. NADH dehydrogenase is a core subunit of the mitochondrial membrane respiratory chain and believed to belong to the minimal assembly required for catalysis [21]. Protein which binds ATP or GTP, carries three phosphate groups esterified to a sugar moiety and represents energy and phosphate sources for the cell [22, 23]. The tricarboxylic acid cycle, a series of metabolic reactions in aerobic cellular respiration, occurs in the mitochondria of animals and plants, and during this cycle, acetyl-CoA, formed from pyruvate produced during glycolysis, is completely oxidized to CO2 via the interconversion of various carboxylic acids. It results in the reduction of NAD and FAD to NADH and FADH2, whose reducing power is then used indirectly in the synthesis of ATP by oxidative phosphorylation [24]. The “thermogenesis” pathway is essential for warm blooded animals, ensuring normal cellular and physiological functioning under conditions of environmental challenge [25].