DOI: https://doi.org/10.21203/rs.3.rs-2383228/v1
Background
Stunting is a global health problem affecting hundreds of millions of children worldwide and contributing to 45% of deaths in children under the age of five. Current therapeutic interventions have limited efficacy. Understanding the epigenetic changes underlying stunting will elucidate molecular mechanisms and likely lead to new therapies.
Results
We profiled the repressive mark histone H3 lysine 9 trimethylation (H3K9me3) genome-wide in peripheral blood mononuclear cells (PBMCs) from 18-week-old infants (n = 15) and mothers (n = 14) enrolled in the PROVIDE study established in an urban slum in Bangladesh. We associated H3K9me3 levels within individual loci as well as genome-wide with anthropometric measurements and other biomarkers of stunting and performed functional annotation of differentially affected regions. Globally elevated H3K9me3 levels were associated with poor linear growth between birth and one year of age. A large proportion of the affected genes code for proteins targeting viral mRNA and highly significant regions were enriched in transposon elements with potential regulatory roles in immune system activation and cytokine production. Maternal data show a similar trend with child’s anthropometry, however, lack statistical significance to infer an intergenerational relationship.
Conclusions
We speculate that high H3K9me3 levels may result in poor linear growth by repressing genes involved in immune system activation. Importantly, changes to H3K9me3 were detectable before the overt manifestation of stunting and therefore may be valuable as new biomarkers of stunting.
Childhood stunting is a result of chronic undernutrition, and even with the available resources, stunting remains a global health problem affecting over 20% of children under the age of five worldwide [1]. Stunting is defined by a height-for-age z (HAZ) score bellow − 2 and emerges withing the first 1,000 days post conception [1, 2]. Without treatment, stunting leads to a significant risk of mortality in children under the age of five and is associated with severe lifelong heath and socioeconomic problems. Common issues associated with stunting include immune system dysfunction [3, 4], cognitive impairment [5, 6], poverty, and paradoxically excessive weight gain often resulting in obesity and with it diseases such as type 2 diabetes and cardiovascular diseases [7, 8]. While nutrient limitation plays a significant role in the development of stunting, it is not the sole driver of stunting. Many factors contribute to stunting, including socioeconomic factors [9–12], limited access to clean water and poor sanitation, which lead to an increased pathogen exposure and can result in enteric infections with or without diarrhea altering gut health, triggering chronic inflammation, and negatively affecting microbiome maturation [13–17]. Maternal health has been identified as another key factor in the development of stunting [16, 18]. Mothers of stunted children were often stunted themselves and evidence suggests they can pass this condition to neonates before birth [19] through mechanisms that are not well understood but that can impact subsequent generations [20, 21].
One of the global nutrition targets released in 2014 is to achieve a 40% reduction in stunting in children under five by 2025[22]. While there has been a decline in the rate of stunting since then, the current interventions are not effective enough to reach this goal. Multiple studies have conducted metabolomic or metagenomic profiling in association with child growth to identify key metabolites for new nutritional therapies [13, 23–26]. In our approach, we aim to understand the detailed changes to molecular circuitry underlying stunting to gain increased precision in targeting specific pathways as new therapies. We focus on epigenetic profiling, specifically targeting histone modifications. Epigenetic profiling provides a powerful tool for studying stunting, as the epigenome is responsive to and reflects environmental signals [27]. This is because the enzymes that deposit and remove histone marks are dependent on the availability of key metabolites, the levels of which are reflected in epigenetic profiles [28], and furthermore, it has been suggested that the epigenome plays a role in intragenerational and transgenerational inheritance [29, 30]. While multiple studies have been conducted to understand the epigenetic changes associated with malnutrition, most of them have concentrated on DNA methylation [31–34] and histone modifications have been less well explored.
In our previous work, we showed that both H3 trimethylation on lysine 4 (H3K4me3, associated with active promoters [35]) and H3 acetylation on lysine 27 (H3K27ac, associated with active enhancers [36]) undergo large scale changes within the first year of life. We observed increased levels of activation at stress and immune response genes in 18-week-old children through hyperactivated H3K27ac. This was followed by massive changes in one-year-old stunted children in which H3K27ac was globally downregulated and H3K4me3 redistributed from promoters to distal ectopic sites. Both H3K4me3 and H3K27ac changes in one-year-olds point to downregulated immune responses and suggest alterations in one-carbon metabolism. In both cases we observed relatively modest changes in early infancy followed by large-scale changes at one year. H3K4me3 and H3K27ac are associated with active transcription and dynamic changes in response to environmental changes. This prompted us to explore the role of more stable chromatin structures in early infancy relative to the development of stunting. To address this, we performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) in 18-week-old children targeting H3 trimethylation on lysine 9 (H3K9me3). H3K9me3 mark is associated with constitutive heterochromatin and has a role in development by repressing lineage inappropriate genes, as well as a role in repressing endogenous retroviruses (ERVs) and other transposable elements (TEs) [36], whose regulatory roles in gene expression are currently being investigated [37–39]. Furthermore, animal studies suggest that H3K9me3 plays a crucial role in transgenerational inheritance in response to environmental changes, including nutritional changes [40, 41]. For this reason, we also profiled H3K9me3 in mothers to uncover relationships between maternal and child heterochromatin in stunting. The results suggest that globally elevated H3K9me3 in early infancy predisposes children to poor growth, possibly through suppression of TEs involved in development of immune responses to pathogens. The H3K9me3 pattern in early infancy is thus a predictor of the risk of stunting and supports a model in which the immune system of stunted children becomes compromised in early infancy.
To identify genome-wide H3K9me3 changes associated with childhood stunting, we performed ChIP-seq targeting H3K9me3 in peripheral blood mononuclear cells (PBMCs) from 18-week-old infants (n = 15) from Bangladesh who were enrolled in the PROVIDE (performance of rotavirus and oral polio vaccines in developing countries) study (n = 700, ref. [42], Fig. 1a, Additional file 1: Table S1). While the blood samples come from 18-week-old infants, for each child we have anthropometric measurements including height-for-age z (HAZ) and weight-for-age z (WAZ) scores within the first year of the child’s life with additional biomarkers from 18 weeks (Fig. 1b, Additional file 1: Table S2). Each HAZ score provides a snapshot of a child’s current health status, which can change dynamically within the first year of life (Fig. 1c). In our analysis, we used the ΔHAZ score as a measure of growth change between two time points (Fig. 1d). Samples selected for this study reliably represent the PROVIDE spectrum of HAZ and ΔHAZ scores and are balanced in sex representation (Additional file 1: Fig. S1,2).
The average H3K9me3 signal within identified H3K9me3 enriched regions, i.e., peaks, increased in children with poor growth between birth and one year as measured by the ΔHAZ (1 year) score (Fig. 2a). The ΔHAZ (1 year) score showed the strongest correlation with the average H3K9me3 signal and H3K9me3 was negatively correlated with the ΔHAZ (1 year) score (Fig. 2b). Furthermore, the first principal component (PC1) of principal component analysis (PCA) explained almost 80% of the variance in H3K9me3 peak signal (Fig. 2c) and again was strongly correlated with ΔHAZ (1 year) score (Fig. 2d). While there were sex differences in the H3K9me3 profiles, these were associated with the second principal component (PC2) and explain only about 10% variance in the data.
Both the average H3K9me3 profiles and signal within H3K9me3 peaks used for PCA were normalized based on Drosophila spike-in content, which we used in our experimental design to uncover any global H3K9me3 profile changes (Additional file 1: Fig. S3a). While normalizing the H3K9me3 profiles based on read count per million mapped reads still showed a negative correlation between a child’s ΔHAZ (1 year) score and the overall H3K9me3 level, the pattern was much weaker and could have been easily missed without spike-in normalization (Additional file 1: Fig. S3b,c). Furthermore, we show that increased proportional content of Drosophila spike-in in the sequencing files, which suggests a genome-wide loss of H3K9me3 (Additional file 1: Fig. S3d), was significantly associated with ΔHAZ (1 year) score (Fig. 2e, Additional file 1: Fig. S3e,f). Altogether these results suggest that infants undergoing poor growth between birth and one year of age have globally elevated H3K9me3 levels compared to healthier infants.
To gain insight into the roles of H3K9me3 regions in stunting, we performed differential analysis in which we asked how H3K9me3 levels changed at each peak with respect to ΔHAZ (1 year) score while controlling for sex. The vast majority of the H3K9me3 regions were significantly upregulated in children with negative changes in HAZ scores during the first year (Fig. 3a-c, Additional file 2: Table S3), as expected given the significant global upregulation of discussed above. For functional analysis, we focused on the top 10% of the most significantly affected regions. These regions were not highly specific for a particular cell type (Additional file 1: Fig. S4a), and compared to the H3K9me3 regions overall, they are closer to genes (Fig. 3d), including being notably more enriched in intronic regions (Fig. 3e). The enrichment in intronic over intergenic regions is specific for the top 10% of the significant regions and the enrichment declines rapidly with further ranking (Additional file 1: Fig. S4b,c). We next asked if these regions are enriched with co-localizing histone marks and DNA-binding factors (DBFs) (Additional file 1: Fig. S4d). As expected, we found the highest enrichment was with the known sets of H3K9me3 regions, but also we found significant enrichment in enhancer regions marked by H3K27ac or H3K4me1 (histone H3 monomethylation on lysine 4) [43], and in H3K36me3 (histone H3 trimethylation on lysine 36). Colocalization of H3K9me3 with H3K36me3 has been shown to be crucial for repressing enhancer activity [44]. Interestingly, we further found significant enrichment in binding sites for the pioneer transcription factor PU.1 (SPI1), a factor crucial for hematopoietic cell fate control [45, 46], suggesting that in stunting, increased H3K9me3 levels at these sites prevent proper immune cell development through restricting SPI1/PU.1 binding.
The assignment of these top H3K9me3 regions to their gene targets revealed genes that are involved in immune system processes, tissue development, nitrogen and phosphate metabolism and stress responses (Additional file 1: Fig. S5a). Importantly, mouse studies identified many of these gene targets to be involved in processes tightly associated with stunting (Additional file 1: Fig. S5b), such as altered interferon gamma (IFN-γ) levels [47], abnormal enterocyte proliferation associated with environmental enteric dysfunction (EED) [4, 17], or neurological pathways, such as altered post-tetanic potentiation, which potentially impacts neurocognitive development [5, 6].
The proximity of these H3K9me3 regions to genes prompted us to explore the H3K9me3 patterns within genes. Two distinct clusters of genes were identified (Additional file 1: Fig. S6a). Cluster 1 contains a smaller subset of genes (n = 1,258) whose coverage changed dynamically with ΔHAZ (1 year) score (Fig. 3f), while the genes in cluster 2 had generally very low H3K9me3 coverage. The cluster 1 genes were enriched in transcription regulation, as well as nucleotide biosynthesis and metabolism, protein deubiquitination, and sensory development (Fig. 3g, Additional file 1: Fig. S6b). The most significantly enriched pathway, however, was Herpes simplex virus 1 infection. We explored the distribution of the genes of interest within this pathway and surprisingly found that 243 out of the 244 genes of interest are genes coding for zinc-finger antiviral proteins (ZAPs) that bind GC-dinucleotide enriched viral mRNA sequences and target them for degradation [47]. This suggests that one of the mechanisms through which H3K9me3 predisposes children to stunting is through paralyzing immune responses to viral invasion.
Since H3K9me3 is well known to localize to and suppress repeated sequences and various classes of transposable elements [48], we next explored the endogenous retroviruses (ERVs) and other classes of transposable elements (TEs) that overlapped with all significantly affected H3K9me3 regions (Additional file 1: Fig. S7a), as well as the top 10% of significant regions (Fig. 4a). In both sets, we identified similar overrepresented classes of TEs, with higher overrepresentation in the top 10% significant regions. We therefore focused on delineating the possible functional relevance of the highly overrepresented TE classes within the top 10% significant regions, specifically the ERVs, long interspersed nuclear elements-L1 (LINE L1), SINE-R-VNTR-Alu (SVA) retrotransposons, and telomeric satellite DNA. Overall, we observed highly significant overlap of ERVs (Fig. 4b) and LINE L1 (Additional file 1: Fig. S7b) with regulatory regions for genes implicated in immune response pathways, including cytokine production and immune cell activation, as well as in DNA damage response pathways, catabolic pathways, and cation homeostasis pathways. Interestingly, we also noted an enrichment of H3K9me3 pathways overlapping SVA retrotransposons implicated in nutrient sensing pathways (Additional file 1: Fig. S7c), however, with much lower significance compared to the enrichment of pathways relevant to ERVs and LINE L1. Telomeric satellite DNA associated with H3K9me3 was also modestly associated with genes implicated in adhesion (Additional file 1: Fig. S7d).
The results from these analyses suggest that high H3K9me3 levels at these gene regulatory sites suppressed immune responses. We previously reported increased H3K27ac targeting genes associated with viral infection [36]. We therefore tested whether ERVs within the top significant H3K9me3 regions share common gene targets with significantly upregulated H3K27ac regions (Fig. 4c). The majority of gene targets were specific to ERVs or H3K27ac, except for 6 genes that, however, did not show any significant pathway enrichment (Fig. 4d). Expanding the analysis to ERV regions overlapping all significant H3K9me3 regions increased the number or overlapping genes to 61, but even these were not associated with any specific pathways.
To determine the relationship between maternal and child’s H3K9me3 profiles, we performed H3K9me3 ChIP-seq on PBMC samples from mothers (n = 14, Fig. 5a) enrolled in the same PROVIDE study. Through differential analysis, we first explored which H3K9me3 regions were misregulated in relationship to the child’s ΔHAZ (1 year) score while controlling for the child’s sex (Additional file 1: Fig. S8a, Additional file 3: Table S4). While we did not observe any significant relationship between maternal H3K9me3 and a child’s ΔHAZ (1 year) score, or a strong correlation between the differential analysis results from children and mothers (Fig. 5b), we noticed that the majority of the maternal H3K9me3 regions were upregulated in mothers whose children had low ΔHAZ (1 year) scores, partially capturing the global pattern observed in 18-week-old infants (Fig. 5c, Additional file 1: Fig. S8B). Further exploration of global relationships between H3K9me3 profiles of mothers and children through correlation analysis of H3K9me3 profiles resulted in identification of two distinct clusters separating mothers from infants (Fig. 5d) suggesting that H3K9me3 carries strong age-related signatures that are possibly masking weaker signatures of stunting. Interestingly, PCA separated maternal and infant samples with PC2, while PC1 still captured 81% of the variance in the data and preserved the strong relationship of the infant H3K9me3 profiles to ΔHAZ (1 year) score (Additional file 1: Fig. S8c,d). The separation of maternal H3K9me3 profiles along PC1 did not reveal a strong correlation with any of the biomarkers (Additional file 1: Fig. S8d).
To determine if subtler patterns in the maternal H3K9me3 profiles are potentially contributing to stunting, we tested whether there were regions shared only between mothers of stunted children and stunted children, here termed shared in stunting (SiS). We also tested whether likewise there were maternal H3K9me3 regions in healthy mothers that were identified in healthy children (SiH) (Fig. 5e). We identified 110 SiS and 97 SiH H3K9me3 regions (Additional file 1: Fig. S9a) that were not cell-type specific (Additional file 1: Fig. S9b) and did not have a distinct signature in terms of distribution across the genome (Additional file 1: Fig. S9c,d). However, functional annotation revealed distinct set of pathways specific for each region set. SiS regions were associated with genes involved in multiple developmental processes, lipid binding, and sugar metabolism, specifically fructose and mannose metabolism, and pentose and glucuronate interconversions (Fig. 5f). In contrast, SiH regions were associated with pathways involved in signaling, regulation, development, growth factor regulation, and endothelial cell migration (Additional file 1: Fig. S9e).
To gain more comprehensive insight into epigenetic changes associated with stunting, we combined H3K9me3 results from this study with previously published results describing H3K4me3 [35] and H3K27ac [36] changes associated with stunting in 18-week-old and one-year-old children from the same cohort (Additional file 1: Fig. S10a-d). First, we identified overlaps between H3K9me3 regions and H3K4me3 or H3K27ac regions in both age groups (Fig. 6a). While we observed a decrease in the overlap of H3K9me3 and H3K27ac peaks between 18 weeks and one year, there was an intriguing increase in the overlaps between H3K9me3 and H3K4me3 regions during infancy, especially in the H3K4me3 regions that were upregulated in stunted children. By further narrowing our focus on significantly affected H3K4me3 and H3K27ac regions in stunting, we found that in 18-week-old children, these regions are almost exclusively distinct from the H3K9me3 regions, compared to the significantly affected H3K4me3 and H3K27ac regions in one-year-old children that tended to colocalize with H3K9me3 signal in 18-week-old children (Fig. 6b, Additional file 1: Fig. S10e-h). Notably, there was an overlap of almost 25% of the small upregulated H3K4me3 regions that were previously suggested to be redistributed from the larger H3K4me3 regions at TSSs [35], and nearly 30% of the H3K27ac regions downregulated in one-year-old stunted children overlapped with H3K9me3 peaks. The downregulated H3K27ac regions in one-year-old stunted children did not have a very specific location relatively to the overlapping H3K9me3 regions, while the upregulated H3K4me3 regions in one-year-all children were predominantly consumed by H3K9me3 regions (Fig. 6c,d, see Additional file 1: Fig. S11a,b for overlaps with all H3K4me3 and H3K27ac regions). These observations suggest a possible mechanism in which the methylation machinery compensates for the high levels of H3K9me3 observed in 18-week-old infants through increased H3K4me3 levels at one year of age.
Functional annotation of overlaps with H3K4me3 regions upregulated in stunted children at one year suggests that these are involved in regulating transcription, cell adhesion and developmental processes (Additional file 1: Fig. S12a), and there was also a relatively weak colocalization of these regions with H3K36me3 (Additional file 1: Fig. S12b). In contrast, the overlaps with H3K27ac peaks that were downregulated in stunting at one year showed enrichment only in cell activation pathways (Additional file 1: Fig. S12c), in addition to strong colocalization with both activating and repressing histone marks (Additional file 1: Fig. S12d), as well as with a broad range of DBFs (Additional file 1: Fig. S12e).
This study was motivated by two goals. The first goal was to uncover epigenetic changes in early infancy associated with the development of stunting. This goal arose from our previously published results in which we observed large-scale stunting-associated changes to H3K4me3 [35] and H3K27ac [36] in one-year-old children suggesting immune system exhaustion and alterations to one carbon-metabolism. However, in those studies we saw no significant changes to H3K4me3 in 18-week-old infants and only relatively smaller changes to H3K27ac which suggested activation of stress responses and genes involved in viral infection. We hypothesized that if there were epigenetic changes in early infancy predisposing children to become stunted, we would need to investigate a more stable mark associated with development, such as H3K9me3 [49]. The second goal was to determine if there were relationships between maternal and child H3K9me3 profiles that could shed light on the propagation of stunting, including transgenerational inheritance, which contributes to the development of stunting via H3K9me3 and other factors in animal models [40].
The results presented here along with prior work on H3K4me3 and H3K27ac are summarized in Fig. 7. The major finding here is that globally elevated H3K9me3 levels in 18-week-old infants are associated with poor growth between birth and one year of age as indicated by the ΔHAZ (1 year) score (Fig. 2). Notably, H3K9me3 levels show much weaker association with a child’s current health status in terms of the HAZ score at 18 weeks (Fig. 2), suggesting that altered H3K9me3 levels do not reflect immediate changes in a child’s environment, but instead play role in programming developmental pathways over time. Overall, these results suggest the possibility that high H3K9me3 levels could be used as a biomarker for poor growth before the overt manifestation of stunting.
Through the detailed analyses presented here we explored molecular mechanisms through which elevated H3K9me3 levels might predispose children to poor growth. Genes residing
Top panel shows that globally increased H3K9me3 levels at 18 weeks of age are associated with poor growth through model illustrated in the bottom panel. In this model, increased H3K9me3 levels restrict expression of ZAP genes and regulatory activity of key TEs resulting in immature immune responses, and thus high pathogen load and overall stress in children destined to be stunted. These changes then trigger increased stress responses and activation of pathways involved in viral infection as indicated by elevated H3K27ac. As an adaptation to a long-term stresses, H3K4me3 and H3K27ac undergo large-scale changes, as described previously [35, 36], which affect the highlighted pathways by one year of age. A significant proportion of H3K4me3 and H3K27ac changes observed in one-year-old children fall into misregulated H3K9me3 regions identified in 18-week-old children.
within misregulated H3K9me3 regions were involved in Herpes simplex virus 1 infection pathway (Fig. 3f,g). These include a large number of ZAP genes which encode proteins that target viral mRNA for degradation [47]. Thus, there their repression resulting from elevated H3K9me3 potentially predisposes children to be more vulnerable to viral infection. TEs within the top significant H3K9me3 regions may also contribute to H3K9me3 regulatory mechanisms. While the functional role of TEs is largely unexplored, it has been thought that TEs, such as ERVs, are marked by H3K9me3 to ensure they remain transcriptionally silent. Recent studies also show that TEs can act as binding sites for transcription factors (TFs) [50] and furthermore that they have roles in development of innate immune responses [37]. Here we show that TEs residing within the top significant H3K9me3 regions are in proximity to genes involved in immune system activation, where we observed an especially high enrichment in cytokine production pathways, as well as DNA damage pathways, catabolic pathways, and other pathways with less significant enrichment (Fig. 4b, Fig S7b-d). It has been recently shown that non-repressed ERVs can create competition for TF binding, and in addition, decreased heterochromatin levels at ERVs were associated with overall decreased H3K27ac in other genomic regions [39]. This model fits well with our data in which increased H3K9me3 at ERV sites and increased H3K27ac levels at other sites in the genome are suggesting a deleterious masking of ERV sites. Taken together, these results suggest a model (Fig. 7) in which increased H3K9me3 levels at key TEs and ZAP genes paralyze activation of immune system responses in early infancy, leading to an increased pathogen load and overall stress in children destined to become stunted. This is consistent with the apparent activation of genes associated with viral infection and stress response pathways through elevated H3K27ac.
Our results show that high levels of H3K9me3 in 18-week-old infants are associated with poor ΔHAZ (1 year) growth score and potentially hamper cytokine production. A recent study performed using samples from the same cohort shows that by one year of age stunting is inversely associated with high cytokine production, specifically of interleukin 8 (IL8) and transforming growth factor-β (TGFβ) [51]. The combined results are consistent with the emerging understanding that changes to immune system function are highly dynamic during early development [51], and further suggest that the development of a healthy immune system may involve precise timing in the developmental capacities of specific immune responses in response to pathogen burden.
We also discovered that the top differentially affected H3K9me3 regions are enriched in SPI1/PU.1 TF binding sites (Additional file 1: Fig. S4d). SPI1/PU.1 is a TF crucial for immune cell differentiation [45, 46] and for this reason we posit that H3K9me3-mediated repression of these sites leads to immune cell immaturity in early infancy. The gene targets of the top differentially affected H3K9me3 regions were further identified in pathways associated with altered circulating IFN-γ levels. Interestingly, these have been previously associated with stunting and susceptibility to amebiasis [47].
Integrated analysis with H3K4me3 [35] and H3K27ac [36] results from children from the same cohort further showed that, as expected, there was not much of overlap between the H3K9me3 regions and the activating H3K4me3 and H3K27ac marks at 18 weeks of age. However, a relatively high proportion of significantly misregulated H3K4me3 and H3K27ac regions in one-year-old stunted children overlap with the upregulated H3K9me3 regions at 18-weeks (Fig. 6b). About 25% of distal ectopic H3K4me3 regions occur within H3K9me3 regions and almost 30% of the upregulated H3K27ac regions overlap with H3K9me3 regions. The misregulated regions were previously described to regulate processes highlighted in Fig. 7. These findings suggest that elevated H3K9me3 levels in 18-week-old children preset a detrimental chromatin environment and that predisposes stunted children to mislocalization of H3K4me3 and misregulation of H3K27ac by one year of age. The H3K4me3 pattern is perhaps indicative of a compensating mechanism for previous over-suppression of these important regulatory regions.
We did not identify any significant H3K9me3 regions in mothers with a strong relationship to the child’s ΔHAZ (1 year) score. However, the overall directionality of the H3K9me3 changes was in concordance with changes observed in the 18-week-old infants (Fig. 5b,c, Additional file 1: Fig. S8b). This suggests that there may be a relationship between the maternal and child profiles, but one that would probably require a much larger sample size to detect. Additional insight would likely come from analysis of paternal data, however, at the moment we are limited in both the number and types of samples available from this highly vulnerable population.
Analysis of the heterochromatin-associated histone mark H3K9me3 in PBMCs of 18-week-old infants showed that globally elevated H3K9me3 levels are associated with poor growth between birth and one year. Importantly, these changes were detectable before the overt appearance of the stunted phenotype and therefore present an opportunity for future work aimed at understanding mechanisms that may be manipulated to avert stunting as well as development of new biomarkers to detect it early. Moreover, these results contribute to the understanding of how immune system dysfunction develops in stunted children.
Deidentified PBMC samples from 18-week-old children (n = 15) and mothers (n = 14) enrolled in the PROVIDE study (n = 700) [42] were obtained in collaboration with icddr,b (International Centre for Diarrhoeal Disease Research, Bangladesh) in Dhaka, Bangladesh. The number of samples was established based on previous work [35, 36], and samples were selected to represent the PROVIDE spectrum of HAZ and ΔHAZ scores, as well as to be balanced in sex representation (Additional file 1: Fig. S1,2). ChIP-seq was performed as previously described in detail in [35]. Briefly, PBMC samples were fixed with formaldehyde, chromatin isolated and sheared, 0.02% (microgram/microgram) Drosophila melanogaster chromatin (cat #53083, Active Motif, Carlsbad, CA, USA) was added for spike-in normalization [52]. H3K9me3 antibodies (12.5 µl per 100 µg chromatin protein solution, Abcam Cat# AB8898, RRID: AB_30684) were used for DNA fragment selection by immunoprecipitation. Sequencing libraries were constructed using the Illumina TruSeq ChIP Library Preparation Kit following the manufacturer instructions. Libraries were sequenced using an Illumina NextSeq500 instrument with high-capacity cartridge in the University of Virginia Genome Analysis and Technology Core, RRID:SCR_018883, yielding between 52 million to 142 million 150 bp single-end reads per sample (Additional file 1: Table S1).
Bowtie2 (2.2.6) [53] with default settings was used to map raw FASTQ files to the hg19 reference genome obtained with refgenie (0.9.3) [54]. SAMtools (0.1.19-44428cd) [55] were used to convert alignment SAM files to BAM format (function: view -S -b), sort (function: sort), and index (function: index) BAM files, report number of mapped (function: view -c -F 4) and unmapped (function: view -c -f 4) reads and to filter out unmapped reads (function: view -h -F 4 -b). ENCODE blacklisted sites [56] were removed with bedtools (v2.26.0) [57] intersect function with argument -v. The quality of both raw FASTQ files and final BAM files was checked with FastQC (v0.11.5) [58] together with multiQC (1.11) [59]. To further visually inspect the quality of the data, BAM alignment files were converted to bigWig files (initially without normalizing) and browsed with Integrative Genomics Viewer (IGV_2.7.2) [60]. To generate bigWig files, BAM files were first converted to BED files with coverage information using bedtools bamtobed, these files were then converted to COV files using bedtools genomecov, and finally bigWig files were generated from the COV files with UCSC bedGraphToBigWig (v4) [61]. Chromosome sizes for bedtools genomecov and bedGraphToBigWig were obtained with refgenie (0.9.3) [54].
Regions of enrichment, i.e., peaks, were generated from BED coverage files using SICER (v1.1) [62] against input (separate input files for children and mothers) with redundancy threshold = 1, window size = 1000, fragment size = 425, effective genome fraction = 0.74, gap size = 5000, and FDR = 0.01.
Count tables with the number of mapped reads for each identified region were created for children and mothers separately. Identified peaks were first merged (mothers and children separately) with bedtools merge and quantification was done with bedtools multicov using BAM alignment files and merged regions.
Code is available from: https://github.com/AubleLab/H3K9me3_stunting.
Drosophila spike-in reads were used for normalization to account for possible global changes in H3K9me3 levels (Fig, S3a). To retrieve the number of spike-in reads within each sample, raw FASTQ files were mapped to the Drosophila dm6 reference genome (obtained with refgenie) using bowtie2 with default parameters. SAMtools (0.1.19-44428cd) [55] were used to convert alignment reads to BAM format (function: view -S -b), sort (function: sort), and index (function: index) BAM files, ENCODE blacklisted sites [56] were removed with bedtools (v2.26.0) [57] intersect function with argument -v, and finally SAMtools were used to report number of mapped reads (function: view -c -F 4). The final normalization factors were then determined by dividing the number of reads that mapped to dm6 by an arbitrary constant (500,000). These normalization factors were then used in differential peak analysis with DESeq2 (1.30.1) [63]. Inverted values of the normalization factors (1/NF) were used to generate normalized bigWig files, specifically bedtools genomecov function with argument -scale set to the inverted normalization factor was used to create normalized bedGraph files from BAM alignment files, which were further converted to normalized bigWig files with UCSC wigToBigWig tool (v4) [61].
Correlations between the percentage of reads mapped to the dm6 reference genome (indicative of globally misregulated H3K9me3 levels, see Additional file 1: Fig. S3d) with all available biomarkers/measurements were calculated using R cor function with method = “spearman”.
The relationship between the percentage of reads mapped to dm6 and a child’s ΔHAZ (1 year) score was determined by fitting a linear model with the R lm function, where the p-value < 0.05 was considered significant.
DESeq2 (1.30.1) [63] was used to identify differentially regulated H3K9me3 regions in both child and maternal datasets. Raw count tables along with drosophila normalization factors were passed to DESeq2 with a formula set to child’s sex + child’s ΔHAZ (1 year) score. Regions with FDR corrected p-values (padj) less than 0.05 were identified as significantly different. Other combinations of formulas combining a child’s sex with other available biomarkers were tested (data not shown), however, no significant regions were identified.
PCA was performed using prcomp R function applied to spike-in- and regularized logarithm- (DESeq2 (1.30.1) [63] rlog function) normalized count tables. Counts from all available regions were used for PCA. Results were plotted using ggplot2 (3.3.6) [64] and points colored based on child’s ΔHAZ (1 year) score with a palette from viridis package (0.6.2) [65]. PCA performed on both children’s and maternal data together was applied on spike-in-normalized counts over merged (bedtools merge function) children’s and maternal H3K9me3 regions. Correlations between PC1 with all available biomarkers/measurements were calculated using R cor function with method = “spearman”.
Average spike-in-normalized H3K9me3 profiles were obtained by first using the deepTools (3.3.1) [66] computeMatrix function to quantify spike-in-normalized bigWig files over H3K9me3 regions identified in children’s datasets. The computeMatrix parameters were set to scale-regions mode with -a 2000, and -b 2000 parameters to expand average profiles by additional ± 2 kb surrounding the H3K9me3 regions. From computeMatrix output, the values for average profiles were obtained with the deepTools plotProfile function with the outFileNameData parameter included, to get the actual average profile values in addition to the figure output. The resulting average profile values were then plotted using ggplot2 (3.3.6) [64], where smoothing of the profiles was done with geom_smooth function with parameters set to method = "loess", span = 0.2 and colored based on child’s ΔHAZ (1 year) score using viridis package (0.6.2) [65].
Average profiles using read count per million mapped reads normalization were obtained by mapping BAM alignment files onto the H3K9me3 regions with ngs.plot (2.61) [67] with default settings, which automatically returns average profiles normalized to read count per million mapped reads. These profiles were then replotted with ggplot2 to include coloring based on a child’s ΔHAZ (1 year) score with viridis package.
Correlations between the centers of average profiles with all available biomarkers/measurements were calculated using the R cor function with method = “spearman”.
The top 10% of H3K9me3 regions (based on FDR-corrected p-value, followed by p-value) that were identified in children’s datasets as differentially misregulated versus child’s ΔHAZ (1 year) score were compared with all of the H3K9me3 regions identified in the children’s datasets using GenomicDistributions (1.5.5) [68]. CalcFeatureDistRefTSS function with default parameters was used to compare the distances of the regions to the nearest TSSs (transcription start sites), and the distances were plotted with plotFeatureDist function with parameters featureName="TSS", size = 1000000, infBins = T, nbins = 300. CalcPartitions function was used to identify overlaps with genomic classes. The genomic classes used as an input to the function were the default classes provided by the GenomicDistributions package with PBMC enhancer regions added from EnhancerAtlas 2 [69]. To create a list of PBMC enhancer regions, enhancer regions from relevant cell types were downloaded from EnhancerAtlas 2, namely CD4+, CD8+, CD14+, CD19+, CD20+, GM10847, GM12878, GM12891, GM12892, GM18505, GM18526, GM18951, GM19099, GM19193, GM19238, GM19239, GM19240, and PBMC cells. Finally, the regions were merged using the bedtools (v2.26.0) [57] merge function. The parameters to the calcPartitions function were set to bpProportion = T, and the overlaps were plotted with plotPartitions function with default parameters. The cell-type specificity of the regions was determined by using calcSummarySignal function with provided H3K9me3 cell-type specific signal matrix (see “Normalized cell-type specific H3K9me3 signal matrix” section of Methods) and plotted with plotSummarySignal function with parameters plotType = "violinPlot", metadata = cellTypeMetadata, colorColumn = "tissueOfOrigin". A modified version of calcCumulativePartitions function was used to calculate cumulative distribution of the overlaps with genomic classes. The original function sorts regions based on their size; the modified function accepts presorted regions. The genomic classes used were identical with the ones used in calcPartitions function. The log2(fold-change) of cumulative distribution values of intergenic / intronic regions were calculated and smoothed for plotting using R loess function with parameter span = 0.15.
Histone mark and DBF (DNA-binding factor) enrichment of the top 10% significantly misregulated regions in children was calculated using LOLA (1.20.0) [70] against the CISTROME [71, 72] database of histone marks and transcription factors (here referred to as DBFs) filtered for blood cell types. The userUniverse parameter in LOLA was set to all defined children’s H3K9me3 regions, otherwise all default parameters were used. Significant enrichments were defined as those with q-values < 0.05. The most significant cell-type/antibody combinations were plotted in form of a dot plot, where dot sizes and dot colors reflect the significance of the enrichment for a given cell-type/antibody combination. Note that CISTROME provides only regions for the hg38 genome assembly, therefore all H3K9me3 regions used here were first converted to hg38 using UCSC liftOver tool [61].
Gene set enrichment of the top 10% significantly misregulated regions in children was done with GREAT [73]. Results were then reported as bar plots. Due to a larger number of identified GO:BP (Gene Ontology: Biological Process) terms, those were not reported in the bar plot and instead were first processed with REVIGO [74] with default parameters. The resulting network was further reorganized in Cytoscape (3.8.0) [75] and terms were manually classified into groups.
A matrix containing normalized H3K9me3 values across different cell types was used to infer cell-type specificity of regions of interest with GenomicDistributions (1.5.5) [68] calcSummarySignal function. To create the H3K9me3 cell-specific matrix, ENCODE [76] data was used. First, BAM alignment files from all available H3K9me3 ChIP-seq experiments on primary cells were used (ENCODE filters: Assay title – Histone ChIP-seq / Target of assay – H3K9me3 / Organism – Homo sapiens / Biosample classification – primary cell / Genome assembly – hg19 / Available file types – BAM). The downloaded BAM files were first sorted and indexed with SAMtools (0.1.19-44428cd) [55] sort and index functions, respectively. To define H3K9me3 regions across genome, BED files were also obtained from ENCODE (ENCODE filters: Assay title – Histone ChIP-seq / Target of assay – H3K9me3 / Organism – Homo sapiens / Biosample classification – primary cell / Genome assembly – hg19 / Available file types – BED narrowPeak). BED files were sorted (sort -k1,1 -k2,2n) and merged with bedtools (v2.26.0) [57] merge function. Raw signal values were quantified by using bedtools multicov function on H3K9me3 processed BAM files with merged H3K9me3 BED file. The resulting raw count tables were further processed in R. First, samples from the same cell-type origin were merged by summing the cell-type replicate signal values, followed by normalization of the signal values for individual cell types. The normalization was performed in three steps: i) signal values above the 99th percentile for a given cell type were set to 1; ii) signal values bellow the 99th percentile for a given cell type were normalized to range between 0 and 1 using the following formula: \({n}_{i}=\frac{{x}_{i}-min\left(x\right)}{max\left(x\right)-min\left(x\right)}\), where ni is the normalized signal value for region i, xi is the raw signal value for a region i, and x is a vector of raw signal values for a given cell type; and iii) signal values ranging between 0 and 1 within a given cell type were further quantile-normalized with normalize.quantiles function from preprocessCore package (1.52.1) [77].
Heatmaps with spike-in-normalized coverage over genes were generated using deepTools (3.3.1) [66] computeMatrix function with scale-regions mode, and parameters --sortRegions descend, -b 2000, -a 2000. Mapped were spike-in-normalized bigWig files onto a BED file containing hg19 coordinates of protein-coding genes along with information about strandness. The gene coordinates were obtained with EnsDb.Hsapiens.v75 (2.99.0) [78], ensembldb (2.14.1) [79], and AnnotationFilter (1.14.0) [80] packages, where ensembldb genes function was applied to EnsDb.Hsapiens.v75 object with AnnotationFilter set up to ~ gene_biotype == "protein_coding". Only standard chromosomes were kept using GenomeInfoDb (1.26.7) [81] keepStandardChromosomes function. The output from the computeMatrix function was then plotted as a heatmap with deepTools plotHeatmap function with parameters -- kmeans 2 --whatToShow 'heatmap and colorbar'.
Genes reported in cluster 1 were then passed to g:Profiler [82] for gene set enrichment analysis. Only KEGG, Reactome, WikiPathways and GO:BP terms were searched, as suggested by Reimand et al [83]. Significant terms (padj < 0.05) were plotted as bar plots showing -log10(padj) values, except for GO:BP terms (due to a larger number of identified GO:BP terms), which were first processed by REVIGO [74] with default parameters, resulting network was further reorganized in Cytoscape (3.8.0) [75] and terms were manually classified into groups.
A library of TEs (transposable elements) was obtained from RepeatMasker [84] library 20140131 for the hg19 genome assembly. Genomic coordinates for individual TE classes were extracted into BED files using R. Unambiguous classes such as RNA, DNA, or classes with a question mark in them were removed from further analysis. Overlaps with H3K9me3 regions were calculated using GenomicDistributions (1.5.5) [68] calcExpectedPartitions function with arguments bpProportion = T, and genomeSize set to sum of hg19 chromosome sizes, as instructed in the GenomicDistributions manual. Overlaps were then plotted with the plotExpectedPartitions function with added ggplot2 layer to show color intensity based on the overall size of overlap in number of base pairs.
To functionally annotate interesting TEs, intersections of all ERVs, LINE L1, SVA retrotransposons, and telomeric satellite DNAs with the top 10% significant, as well as all defined H3K9me3 regions, were obtained with the bedtools (v2.26.0) [57] intersect function. Intersections with the top 10% H3K9me3 regions were passed to GREAT [73], where intersections with all H3K9me3 regions were used as background. Maximum top 20 GO:BP terms, as displayed by GREAT, were then plotted with ggplot2 (3.3.6) [64] as bar plots showing negative log10 of FDR-corrected p-values.
Gene targets with significantly increased H3K27ac in 18-week-old children were obtained from Kupkova et al, 2018 [36] and compared to the gene targets of all ERVs within the top 10% significant H3K9me3 regions annotated by GREAT. Results were plotted as an upset plot using the upset function from the UpSetR package (1.4.0) [85]. Gene targets shared between ERVs and H3K27ac regions were visualized and functionally annotated with STRING [86].
PCA applied to both sets of data was performed as described in the PCA section of Methods.
Results from differential analysis (negative log2(fold-change) per unit change of child’s ΔHAZ score for both children and mothers) were plotted as box plots using ggplot2 (3.3.6) [64] and two-sided one-sample t-tests were used to determine any global unidirectional changes (H0: µlog2FC = 0, H1: µlog2FC ≠ 0, α = 0.05). To show direct comparisons of log2(fold-changes) between children and mothers, H3K9me3 regions from mothers were overlapped with H3K9me3 regions from children with the bedtools (v2.26.0) [57] intersect function with parameter -wao, which returns the original coordinates for overlapping regions. Log2(fold-changes) from the overlapping regions were then plotted as a scatter plot with ggplot2 and the Pearson’s correlation coefficient was calculated with the R cor function.
Correlation analysis between maternal and children’s samples was performed by first merging all maternal and child regions with bedtools merge followed by quantification of all maternal and child samples with bedtools multicov. A correlation matrix was then created by passing the count table to the R cor function with method = “pearson” and plotted as a heatmap using the Heatmap function from ComplexHeatmap (2.6.2) [87], where both rows and columns were clustered using average linkage.
To identify regions specific to mothers of stunted children / mothers of healthy children / stunted children / healthy children, all regions from all child and maternal datasets were first merged with the bedtools merge function to create a list of “master regions”. With the same function, regions of individuals belonging to a given category were merged, creating “category regions”. To compare the regions shared among or specific to categories, overlaps were found between “master regions” and individual “category regions” using bedtools intersect with the -wao parameter, which reports the original entries rather than the intersections. “Master regions” that were identified as overlapping with a given category were then passed to the upset function from UpSetR package (1.4.0) [85] to reveal the number of regions shared between a given combination of categories. Regions “shared in stunting” were defined as those “master regions” that had overlaps found with regions from mothers of stunted children and with regions from stunted children, but no other category, and analogically were defined regions “shared in health”. Summarization of the regions shared in health and shared in control individuals was performed using GenomicDistributions (1.5.5) [68], specifically, the calcFeatureDistRefTSS function was used to compare the distances to the nearest TSS. The log10-transformed distances were then plotted with ggplot2 as density plots, as these provided an easier visual comparison for a smaller number of regions compared to the default histograms offered by GenomicDistributions. Cell-type specificity inference and distribution across genomic classes was calculated as described in the “Functional annotation of children’s H3K9me3 regions” section of Methods.
Gene set enrichment analysis of regions shared in stunting and regions shared in health was done by first assigning the regions to genes with GREAT [73] followed by passing the genes to g:Profiler [82] for identification of enriched biological terms (sources of biological terms were filtered to GO:BP, GO:MF (Gene Ontology: Molecular Function), KEGG, WikiPathways, Reactome). Significant terms were those with adjusted p-value < 0.05. Results with < 10 significantly enriched terms were plotted as bar plots showing -log10(padj) values. Larger sets of terms were passed as GEM files produced by g:Profiler to the EnrichmentMap app [88] in Cytoscape (3.8.0) [75], where terms were organized into networks followed by manual organization of related terms into clusters.
Differential analysis results for H3K4me3 and H3K27ac data from 18-week-old and one-year-old children were obtained from Uchiyama et al [35] and Kupkova et al [36] respectively. These results show association of the H3K4me3 and H3K27ac peaks with a child’s ΔHAZ (18wk) score for 18-week-old-children and ΔHAZ (1 year) score for one-year-old children. Significantly affected regions were considered those with padj < 0.05. Regions were extracted from the result tables and converted to BED files. Overlaps with H3K9me3 regions from this study were found with the bedtools (v2.26.0) [57] intersect function with parameter -wao (to report original regions instead of just intersections). Proportions of overlaps were calculated as the ratio between an overlap and the size of a given region.
Gene set enrichment analysis of overlaps was done as described in the “Relationships between H3K9me3 in mothers and children” section above and histone mark / DBF enrichment analysis was performed as described in the “Functional annotation of children’s H3K9me3 regions” section above. The userUniverse in LOLA was set to overlaps between H3K9me3 and all upregulated H3K4me3 regions in one-year-old stunted children for H3K4me3 analysis, and to overlaps between H3K9me3 and all downregulated H3K27ac regions in one-year-old stunted children for H3K27ac analysis. For H3K27ac, only highly significant enrichments (padj < 0.001) were reported due to the large number of enriched histone marks and DBFs.
Tidyverse package (1.3.1) [89] was used to process data in R. Figures were assembled with Inkscape (1.0.2) [90], Illustrations were created with BioRender [91], and icons were obtained from the Noun Project [92].
ChIP-seq chromatin immunoprecipitation followed by sequencing
DBF DNA binding factor
EED environmental enteric dysfunction
ERVs endogenous retroviruses
GO:BP gene ontology: biological process
GO:MF gene ontology: molecular function
H3K4me1 histone H3 lysine 4 monomethylation
H3K4me3 histone H3 lysine 4 trimethylation
H3K9me3 histone H3 lysine 9 trimethylation
H3K27ac histone H3 lysine 27 acetylation
H3K36me3 histone H3 lysine 36 trimethylation
HAZ height-for-age z
IFN-γ interferon gamma
IL8 interleukin 8
LINE L1 long interspersed nuclear elements-L1
NF normalization factor
PBMCs peripheral blood mononuclear cells
PC1 first principal component
PC2 second principal component
PCA principal component analysis
PROVIDE performance of rotavirus and oral polio vaccines in developing countries (study)
PU.1 hematopoietic transcription factor PU.1 (SPI1)
SiH regions shared in health
SiS regions shared in stunting
SPI1 hematopoietic transcription factor PU.1
SVA SINE-R-VNTR-Alu
TEs transposable elements
TF transcription factor
TSS transcription start site
TGFβ transforming growth factor-β
WAZ weight-for-age z
ZAP zinc-finger antiviral protein
Ethics approval and consent to participate
The study was approved by the Ethical Review Board of icddr,b (FWA 00001468) and the Institutional Review Boards of the University of Virginia (FWA 00006183) and the University of Vermont (FWA 00000727). All samples were analyzed in a de-identified fashion. Within seven days after giving birth, screening for eligibility and study consenting occurred in the household by trained Field Research Assistants. Informed consent was obtained for all participants (trial registration: ClinicalTrials.gov NCT01375647).
Consent for publication
Not applicable.
Availability of data and materials
All raw sequencing FASTQ files generated in this study are available in dbGaP repository, phs001073.v3.p1, https://www.ncbi.nlm.nih.gov/gap/. DbGaP identifiers for the samples used in this study are listed in Additional file 1:Table S1-2.
Competing interests
The authors declare that they have no competing interests.
Funding
Funding for this work was provided by National Institutes of Health (Grant R01 AI043596 to W.A.P) and Biomedical Sciences Graduate Program, University of Virginia (Wagner fellowship to K.K.). The funders did not have any role in design of the study and collection, analysis, and interpretation of data and in writing of the manuscript.
Authors’ contributions
DA conceived of the study and supervised the analyses; SJS isolated chromatin and prepared libraries for sequencing; KK conducted the analyses; RH and WAP, Jr, conceived of and established the PROVIDE Study. All authors contributed to writing of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We are grateful to all mothers and children who participated in the PROVIDE study without who this work would not be possible. We are also grateful to Dr. Stefan Bekiranov for suggesting analysis of transposable elements and Dr. Laura Banaszynski to providing further motivation and valuable insights regarding transposable elements. This work was funded by National Institutes of Health (Grant R01 AI043596 to W.A.P); Biomedical Sciences Graduate Program, University of Virginia (Wagner fellowship to K.K.).