CpG dinucleotide content in the rainbow trout genome
The average number of sequence read pairs per sample analyzed in this study was 44,610,663. The average number of cytosines per sample analyzed was 907,126,962; of them, 11.74% exist in the CpGs context, 22.56% in CHG, and 65.69% in CHH (where H corresponds to A, T, or C). The highest percentage of the methylated cytosines (62.18%) were in the CpG context compared to only 1.09% in CHG and 1.78% in CHH contexts. The number of CpGs in the rainbow trout genome is 35,336,288. The CpG dinucleotide frequency in the genome is 1.83%, which is 3.4-fold less than the expected frequencies of the sixteen dinucleotides. Mammalian genomes have ~5-fold fewer CpG dinucleotides than expected [27], with 70–80% of the CpGs being methylated [28]. Spontaneous mutations of methylated C residues to T by deamination explain the CpG under-representation in genomes. On the other hand, unmethylated C mutates to U, which is quickly repaired [29].
Consistent with our data, 69.60% of tilapia CpG cytosines are methylated compared to only 0.57% and 0.47% of the CHH and CHG context, respectively[30]. Similar results were reported in pufferfish and zebrafish, where 65-80%, 0.25-1%, and 0.34-1% of cytosines are methylated in CpG, CHG, and CHH, respectively [30, 31]. In humans, on the other hand, almost all DNA methylation (99.98%) exists in CpG dinucleotides; non-CPG methylation accounts for 25% of cytosines in stem cells, though[8]. Mice also have about 74% methylated CpGs compared to about 0.6% non-methylated CpGs [30]. These data generally indicate conservation of CpG methylation in eukaryotic genomes.
In this study, 3,161,570 cytosines in the CpG context were classified as genic (located within ±10kb of the TSS) and were considered in the subsequent analysis. The rest of the data were intergenic.
Non-island genic CpGs and their associated gene expression
Methyl Mini-seq revealed the existence of 2,916,293 non-island CpGs located within ±10kb of TSS of 42,104 loci/genes. The overall CpGs density per nucleotide within the ±10kb region was 145.8 CpGs/NT (Figure 1). However, the CpG density, in the ±1kb region flanking the TSS, was 295.3 CpGs/NT, indicating enrichment of CpGs around TSS compared to other genic regions.
Consistent with our observation, comparative epigenomic studies revealed conserved high CpG density around TSS. In fish, higher CpG density around TSS has been reported; however, the depletion of CpG number, up- and down-stream of TSS, was less than that in mammals, perhaps due to lower GC content of fish genomes [32].
The average percent of CpGs methylation in ±10kb flanking TSS was 57.7%; nevertheless, a sharp decline in DNA methylation was noticed in the ±2kb region flanking the TSS. The average methylation percent of this region was 31.6% and reached as low as 12% near the TSS (Figure 1). Although CpGs increased near TSS, most of the CpGs were under-methylated. Figure 2 presents the number of CpGs per nucleotide near the TSS at different intervals of average methylation percentages.
Few studies investigated the genome-wide DNA methylation patterns in fish. A comparative epigenomics study including fish revealed that CpG density is strongly associated with the unmethylated state of DNA. Higher CpG density and lower methylation was reported around TSS in mouse and zebrafish livers [33]. Pufferfish promoters showed hypomethylated promoters in genes with intermediate CpG densities[32]. Studies explained that some proteins, such as CXXC-containing proteins and CG-rich binding transcription factors recruited by CpG-rich regions, maintain the unmethylated DNA state [32, 34, 35]. Another comparative study, including plants and animals, reported genome-wide high levels of CpGs methylation (~80%) in zebrafish larvae and mouse embryos [36]. A study on the tilapia genome showed a gradual decrease in CpG methylation level to 25% near the TSS compared to 75% in the gene body[30]. Low methylation levels on both sides of the TSS were reported in pufferfish with higher methylation levels in the gene body and downstream of genes [32].
Integrated analysis of Methyl Mini-seq and RNA-Seq showed a weak/moderate correlation between the average percentage of DNA methylation and gene transcription expression deciles calculated based on the expression ranking measured in RPKM (Reads Per Kilobase Million). However, this correlation was dependent on CpG position relative to TSS (Pvalue<0.0001).
Within ±1kb of the TSS, there was a negative correlation between the average DNA methylation percent and gene transcription expression deciles (Figure 3, correlation multivariate = -0.2046, P<0.0001, nonparametric Spearman's correlation = -0.2098, P<0.0001). The average percentage DNA methylation within ±1kb of the genes’ TSS at the first decile (lowest) of the expression range was 37.2%, compared to 14.6% in the 10th decile (highest) (Figure 4).
Conversely, within +3 to +10kb in the gene body, there was a weak positive correlation between DNA methylation and gene expression (Figure 3, correlation multivariate = 0.1378, P<0.0001, nonparametric Spearman's correlation = 0.2000, P<0.0001).
Regarding the effect of CpG location on TSS and consistent with our data, fish and mammalian studies showed that gene expression is inversely correlated with DNA methylation in the first exon and the promoter [37, 38]. A study on European sea bass showed a negative correlation between gene expression and the first intron, and the study speculated that the relationship might be due to transcription factor-binding motifs enrichment[37]. Conversely, a positive correlation has been demonstrated between gene expression and gene body methylation levels [39]. Moore et al. reported that the positive correlation between gene-body DNA methylation and gene expression is only in the dividing, not nondividing, cells[6]. In birds, however, CpG methylation at TSS and gene bodies of the great tit genome was negatively correlated with gene expression (Spearman's rank correlation, Spearman's rho <−0.23)[40]. The transcription factors' sensitivity to DNA methylation can explain the TSS-flanking region methylation effect on gene expression[41]. However, the mechanism for explaining how gene body DNA methylation influences gene expression is still not clear. Guo et al. suggested a mechanistic link between gene transcription and DNA methylation at gene bodies. As gene transcription proceeds, H3K36me3 is deposited in gene bodies and helps recruit the DNA methyltransferase 3-DNMT3 PWWP domain [42]. Effects of DNA methylation on gene expression and splicing was also suggested as reviewed in [27].
Few studies investigated the relationship of DNA methylation to gene expression at the genome-wide level in fish. In tilapia as a general trend, a moderate negative correlation between CpG methylation in the gene promoter region (1000 bp upstream from TSS) and expression level was observed in male versus female fish (y=-0.28x+43.7)[30]. Other studies investigated DNA methylation relationships with a limited number of gene expressions. Atlantic Salmon challenged with high temperature and hypoxia showed an inconsistent (positive and negative) correlation between CpG methylation levels and transcriptional changes of a few genes [43]. Another study on European sea bass detected global changes in DNA methylation caused by small ocean temperature increases. However, the study did not find a causal relationship between DNA methylation and gene expression[44]. Another study on Sea bass looked at a few sex-relevant genes and reported a relationship between cyp19a1a methylation and its expression, but only under a methylation level of about 80%. However, the relationship was positive in males and negative in females[45]. In tongue sole fish, a correlation in expression of the dmrt1 gene and its DNA methylation during gonadal sex determination was reported after hatching[46]. The study showed that male-specific expression of dmrt1 during development was coordinated with the hypermethylation of the gene promoter in ovaries. Interestingly this hypermethylation was reverted in genotypic females by high-temperature incubation, leading to the development of testes.
These studies suggest that DNA methylation is one of several epigenetic mechanisms regulating gene expression. Other mechanisms such as histone modification or noncoding RNA are also suggested to be necessary. A negative correlation between DNA methylation and histone H3K4me3 was observed across mammalian genomes [42]. Guo et al. showed that DNMT3A activity is induced when the histone H3K4 becomes unmethylated [42]. Additionally, if CpG is located in an enhancer or transcription factor- or repressor- binding site, CpG location may define the relationship between DNA methylation and gene expression, positive or negative[11, 41].
CpGs islands within or near genes and their associated gene expression
Methyl Mini-seq revealed 245,287 CpGs within 6,372 islands (CGI) in 6,537 genes (±10kb of TSS). Only 669 CGI were in the promoters (-2kb of TSS) of 681 genes. The genes with promoter CGI comprise 10.4% of the total number of genes with CGI reported in this study (6,537). This percentage of genes with promotor CGIs is a much smaller fraction than expected since the majority of the mammalian gene promoters, especially human housekeeping promoters, contain CGIs at an approximate frequency of 60%[47]. Also, promoters with CGIs are conserved between humans and mice [48]. To further explore our observation, we looked at the rainbow trout NCBI genome reference annotation and found only 349 genes with promotor CGIs within -1kb of TSS, which is less than 1% of the genes in the genome. Whereas, 721 genes had CGIs in the first 1kb of the gene body (discussed below, figure 8). These data may indicate that rainbow trout gene promotors are generally CGI-poor relative to mammalian genomes. However, CGI density in fish genes, particularly promoters, has not been thoroughly investigated. Variation in the number and density of CGIs reported in 4 fish genomes was much broader than in other vertebrates, including mammals[49]. Computational CGI prediction models, based on mammalian species [50], may not accurately predict CGIs in cold-blooded vertebrates, shifting CGIs away from gene promoters[51]. Further studies are needed to explore CGI density in gene promotors of fish.
The average methylation level of promotor CGIs in this study was 55.73%. CGIs are usually hypomethylated to permit housekeeping and tissue-specific gene expression[47]. CpG-poor regions, on the other hand, are usually hypermethylated and involved in gene silencing [52].
The CpG density within CGIs was 12.26 CpGs/NT, lower than that of the non-island CpGs (145.8 CpGs/NT) (Figures 5 vs. 1). The CpG density -1kb upstream of the TSS was 5.27 CpGs/NT, even lower than other genic regions, contrary to the sharp increase in CpG density +1kb downstream of the TSS; 34.8 CpGs/NT, the highest density in the gene body.
Average percent methylation of CpGs located in CGIs was 63.20% compared to 57.7% of non-island CpGs. However, similar to non-island CpGs, a steep decline in DNA methylation was noticed in the TSS flanking regions,(i.e. -1kb in the gene promoter compared to +3kb in the gene body). Average percent methylation of this region was 25.3% and declined to as low as 15% near the TSS (Figure 5). Beyond the -1kb to +3kb region flanking the TSS, a general trend of increasing DNA methylation levels was observed as the genomic window moved away from the TSS (Figure 5). Like the non-island CpGs, CpGs near TSS were predominantly unmethylated, although they were more abundant in number (data similar in trend to figure 2 and not shown).
A comparative study, including plants and animals, reported high levels of CpG methylation (~80%) genome-wide in zebrafish larvae and mouse embryos; however, CGI methylation was low [36]. The study also reported slightly higher CpG methylation in the gene body and depletion of DNA methylation around TSS overlapping with CGI.
Combined analysis of the Methyl Mini-seq and RNA-Seq showed a trend similar to non-island CpGs in the relationship between average percent DNA methylation and gene transcription and expression deciles. This correlation varied according to CpG position relative to TSS (pvalue<0.0001).
Within ±1kb of the TSS, there was a weak negative correlation between the average percentage of DNA methylation and gene transcription expression deciles (Figure 6, correlation multivariate = -0.1780, P<0.0001, nonparametric Spearman's correlation = -0.1954, P<0.0001). This inverse relationship between gene expression decile and DNA methylation in the ±1kb flanking TSS is further demonstrated in figure 7. The average percent DNA methylation within ±1kb of genes’ TSS at the first decile of the expression range was 38.6%, compared to 23.5% in the 10th decile (Figure 7).
In opposition and within +3 to +10 kb in the gene body, there was positive correlation between DNA methylation and gene expression in (Figure 6, correlation multivariate = 0.1546, P<0.0001, nonparametric Spearman’s correlation = 0.2239, P<0.0001).
Similar to our study, a negative correlation between CpG density and DNA methylation relative to TSS was reported in humans and other mammals implying importance in essential molecular functions such as gene expression. However, precisely how the CGIs regulate gene expression is still unclear. Although TSS exist in about half of the CpG islands, those TSS often lack common promoter sequences[6, 53]. DNA methylation's role in regulating gene expression was suggested through interaction with nucleosome histone modification [6, 54].
Studies concerning the relationship between CGI density, methylation, and gene expression are scarce in fish, especially non-model species. Zebrafish embryos exhibit low DNA methylation in CGIs [36]. Decreased de novo DNA methylation occurring at the CpG island of the no-tail gene of zebrafish embryos is correlated with gene repression[55]. In gilthead sea bream and although at a low level, DNA methylation was correlated with expression of the sirtuin-1 gene, a master regulator of metabolism. However, the involvement of other mechanisms in transcriptional regulation was suggested in the study[56].
In non-fish species, a recent study in chicken analyzed more than 3000 CGIs from 20 tissues and their association with gene expression. Scientists identified only 121 significantly correlated CGI-gene pairs, suggesting that alteration of CGI methylation rarely affects gene expression[57]. In a recent review, Bird, A. [47] claimed that embryonic gene transcriptional activity put methylation-free footprints on CGIs; methylation does not silence actively transcribed genes and only affects genes already silenced by other means. The involvement of DNA methylation in gene expression and its interaction with other epigenomics modifications warrants further investigation in fish.
We noticed that most CGIs in the 0-to+2kb region were hypomethylated (0-44%); however, this was not correlated with gene expression (Figure 8). A total of 1,223 genes containing 1,241 CGI (39,933 CpGs) showed these hypomethylated CGI marks. Interestingly, gene ontology annotation of the genes with these CGI marks revealed high enrichment of molecular functions relevant to DNA-binding and transcription regulation activity (Supplementary files S1&2). Examples of genes with essential functions in this list are 51 genes relevant to homeobox proteins, 24 forkhead box genes, 5 fibroblast growth factors, and 4 CCAAT/enhancer-binding protein genes. Homeobox gene methylation was reported in human lung cancer with a reduction of methylation at the CpG-rich region. This reduction was correlated with active histone H3 lysine-4 methylation chromatin mark in the HOXA region[58]. DNA methylation of some homeobox genes correlated with gene expression associated with muscle degradation (discussed below).
In addition, an increased percentage of hypermethylated CGIs (>97%) was noticed in the gene body (+2 to +10kb) of the highly expressed genes (> 6 deciles). Also, most CGI in -2 to -10 upstream of genes were moderate- to hyper-methylated (>44%) regardless of gene expression (Figure 8).
A comparison between most expressed genes (Decile 10) and silenced genes (Decile 1) revealed silenced genes have more hypermethylated CGI around TSS -570 to 2kb (figure 9).
Differential DNA methylation and gene expression between diploid/gravid and triploid/sterile fish
Differentially methylated (DM) CpGs
We restricted our analyses of DM CpGs within -1 to +10kb flanking the TSS of genes due to this region's potential effect on gene expression. Between the diploid/gravid and triploid/sterile fish, there were 1,206 DM CpGs located in 971 genes (Supplementary Table S3). This number of DM CpGs represents only 0.03% of the total CpGs (3,161,570) identified in this study, indicating no genome-wide effect of ploidy on DNA methylation as hypothesized. Several genes had multiple CpGs, 11 had more than 4 DM CpGs, and 131 genes had 2 DM CpGs or more. Gene ontology (GO) enrichment analysis of the DM CpGs revealed enriched terms involved in regulating gene expression, metal transport, and cell adhesion under molecular functions and biological processes, as seen in table 1.
Interestingly, our previous studies showed associations between cell adhesion genes and fillet softness in rainbow trout [59]. The gravid fish in this study had softer fillets than sterile fish [24]. Therefore, the identified DM CpGs could be used as epigenetic markers associated with muscle atrophy and quality.
Table 1, Gene ontology enrichment analysis of DM CpGs categorized in molecular functions (MF) and biological processes (BP)
Source
|
Term_name
|
Term_id
|
Adj_pvalue
|
GO:MF
|
DNA-binding transcription factor activity, RNA polymerase II-specific
|
GO:0000981
|
1.96E-03
|
GO:MF
|
Metal ion transmembrane transporter activity
|
GO:0046873
|
2.24E-02
|
GO:BP
|
Homophilic cell adhesion via plasma membrane adhesion molecules
|
GO:0007156
|
6.60E-05
|
GO:BP
|
Cell adhesion
|
GO:0007155
|
3.18E-02
|
GO:BP
|
Biological adhesion
|
GO:0022610
|
3.18E-02
|
GO:BP
|
Cation transport
|
GO:0006812
|
1.66E-02
|
GO:BP
|
Metal ion transport
|
GO:0030001
|
9.59E-03
|
GO:BP
|
Cell-cell adhesion
|
GO:0098609
|
2.59E-04
|
GO:BP
|
Cell-cell adhesion via plasma-membrane adhesion molecules
|
GO:0098742
|
1.28E-04
|
Integrated analysis of differential gene expression and DNA methylation
Integrated analysis of differential gene expression and DNA methylation identified 31 DM CpGs associated with 28 differentially expressed genes involved in muscle atrophy, primarily (Table 2). The table shows that most DM CpGs were located in transcription factor binding sites. The list of transcription factors includes Pax 2/5/6 and 9, myogenin, and Hoxa5, and these transcription factors possess relevant muscle functions.
The genes spanning these 31 DM CpGs epigenetic markers were classified according to their functions in the following categories:
Apoptosis and epigenetic regulation genes
Apoptosis is a common mechanism of muscle atrophy associated with muscle denervation, muscular dystrophy, spinal cord injury, limb suspension, and immobilization [59]. This list includes two DM CpGs in the promotor of the programmed cell death protein-5 and one DM CpG in the apoptosis-enhancing nuclease. Also, we previously showed the involvement of apoptosis in rainbow trout muscle degeneration using the same fish model as this study [23].
The epigenetic list includes a hypomethylated CpG in the gene body associated with the downregulated expression of the Histone-lysine N-methyl transferase SMYD1 gene. SMYD1 gene encodes a muscle-specific lysine methyltransferase with an essential role in fast-twitch muscle physiology and myofibril integrity. Mouse myofibers lacking the SMYD1 are susceptible to atrophy[60].
There was also a hypermethylated CpG in the promotor associated with downregulated expression of the Cyclin-dependent kinase 2-associated protein-1 (CDK2AP1) gene. This gene impacts the cell cycle by negatively regulating the CDK2 activity and epigenetic regulation by participating in nucleosome remodeling and histone deacetylation[61].
Another gene involved in epigenetic regulation of muscle growth is ATP-dependent RNA helicase DDX39A which had two hypomethylated CpGs in the gene body associated with reduced gene expression. Homozygous mutants of DDX39A exhibited cardiac and trunk muscle dystrophy in zebrafish due to early terminal differentiation of cardiomyocyte and myoblast; the mechanism of action of DDX39A involves obstructing mRNA splicing of members of the KMT2 gene family [62].
Autophagy, glycolysis, collagen metabolism, cell membrane functions genes
The gene list for this category includes two genes relevant to autophagy named Autophagy-related protein-13 (ATG13), Sestrin-3, and 6-phosphofructo-2-kinase (PFKFB3), which is related to autophagy and glycolysis.
The autophagy gene pathway is vital for muscle energy homeostasis and bulk protein turnover. Dysregulation in the autophagy pathway can cause alterations in muscle growth, atrophy, and disorders, as reviewed in [63]. In vitro studies in rainbow trout showed that amino acid deprivation increases autophagosome formation and expression of autophagy-related genes [64]. Twenty-two genes involved in autophagy-related proteolysis were upregulated in rainbow trout muscle atrophy[26]. ATG13 was upregulated in this study with hypomethylated CpG in the promoter regions. Interestingly, this CpG is located in the binding site of three transcription factors, PAX2, 9a/b.
Sestrins can induce autophagy that can cause cell death or have a cytoprotective effect depending on the metabolic and environmental context of the cell[65]. Sestrin can prevent muscle atrophy due to disuse and aging by coordinating anabolic and catabolic signals[66].
PFKFB3 regulates glycolysis by modulating the level of fructose-2,6-bisphosphate. We previously reported that PFKFB3 is upregulated in rainbow trout muscle atrophy [67]. PFKFB4 was suggested to be a novel autophagy regulator through the suppression of oxidative stress[68]. Autophagy protects cells from severe conditions such as limitations in energy supplies and nutrients, and autophagy was suggested to be regulated by glycolysis in cancer cells [69]. Our previous studies showed reduced expression of genes involved in glucose use at transcription and proteomics levels [23, 25], which is a common symptom of muscle atrophy shared with several mammalian models, including diabetes, fasting, cancer, renal failure, and muscle unload [70, 71]. In this study, crosstalk between autophagy and glycolysis in atrophying muscle cells is logical because severe muscle atrophy occurs in response to the higher energetic demands of fish ovarian development/spawning.
The genes with DM CpG include three genes related to collagen metabolism. The first is collagen alpha-1(VIII) which was downregulated in atrophied muscle. In addition, collagen Prolyl 3-hydroxylase-4 (non-enzymatic) (P3H4) is a part of an enzyme complex that catalyzes the hydroxylation of lysine in collagen alpha chains and thus is required for normal collagen biosynthesis and its fibrils cross-linking[72]. P3H4 was downregulated in the gravid fish. This list also includes Proline dehydrogenase-1 (PRODHB), an inner mitochondrial membrane flavoprotein related to the electron transport system and ATP production derived from the breakdown of extracellular collagen to sustain intracellular ATP under nutrient stress conditions[73].
The gene list includes glycerophosphodiester phosphodiesterase domain-containing protein-5 (GDPD5), which was 20 fold down-regulated in atrophying muscle with a hypomethylated CpG in the gene body. A study in mice showed that the expression of a functionally relevant gene named GDE5 was downregulated in atrophied muscles. The study suggested a muscle atrophy protective role for GDE5 reduction [74].
Three affected genes are involved in cell membrane functions named Aquaporin-7 and transmembrane protein 151B (TMEM151B). Aquaporin-7 is aquaglyceroporin, a component of the muscle cell membrane involved in transporting water molecules and glycerol. Aquaporin-7 mRNA expression is upregulated in the skeletal and cardiac muscles of obese Type II diabetic mice, an insulin resistance model that accelerates muscle protein degradation[75]. We previously observed increased water content in the atrophying muscle cells to fill the mobilized proteins and fat voids, which explain the differential expression and methylation of the Aquaporin gene [24, 76]. The other two genes associated with membrane functions are TMEM151B, with SNPs associated with lean mass in humans [77], and Ankyrin-1, involved in cell motility and proliferation[78].
A significant outcome of this study is that the previously reported, dominant biochemical signs of sexual maturation associated with muscle atrophy, high water content[24] and reduced glycolysis[23], had relevant genes with congruent differential expression and methylation. Another notable result is that genes described in this section, PRODHB, GDPD5, and Sestrin, are all involved in activating cells' survival mode due to the significant loss of muscle mass.
Homeobox proteins
The study showed 3 DM CpGs in 2 homeobox genes that were downregulated in atrophying muscle; HOX-C5A and SIX homeobox 4. These genes are the only genes that showed DE among 51 genes relevant to homeobox proteins exhibiting DM CpGs (discussed above). Downregulation of a homeobox member HOXA10 was reported during muscle regeneration after damage in mice[79], and hypermethylation of HOX genes in myogenic cells was reported to help regulate HOX gene expression[80]. In addition, several HOX genes were hypermethylated in geriatric versus young human muscle; a negative relationship between DNA methylation and gene expression was observed. Physical activity was reported to help reverse these age-related epigenetic changes [81].
Other epigenetic markers
This category includes 11 genes with various or uncharacterized functions. The list includes HSP90ß1A, whose expression was downregulated in starved Atlantic salmon and upregulated during muscle differentiation[82]. The list also includes a molecular chaperone gene family member, T-complex protein-1 subunit epsilon [83]. Other genes with essential functions include cytochrome c1 that encodes a subunit of the cytochrome bc1 complex; CACNA1SB, a subunit of the skeletal muscle calcium channel; and cleavage and polyadenylation specificity factor subunit-1.