ATAC-Seq validation
To determine whether chromatin accessibility patterns differed between islet endocrine cell types, principal component analysis (PCA) was applied to peak calls across all samples. This confirmed that replicates clustered by cell type (Fig-1A), a finding that was further validated by heatmaps using all defined peaks across replicates (Fig-1B). Alongside quality control applied through the generation and analysis of this dataset, the fraction of reads in peaks (FRiP) score was in excess of the commonly applied benchmark of 30% (Fig-1C). Furthermore, the FRiP score was independent of variability in unique read depth, indicating that peak calls were reproducible across all replicates within cell types independent of read depth range.
Validation of islet cell chromatin accessibility data coupled to companion transcriptomes
We then checked for the presence of chromatin peak enrichment for established alpha, beta, and delta marker genes. We expected enriched ATAC signal at promoter-proximal regions near the transcription start site (TSS) as a reflection of chromatin accessibility. Indeed, cell type-specific chromatin accessibility correlated with gene expression of Ins2, Gcg, and Sst genes for beta, alpha, and delta cells, respectively (Fig-1D-F) [4, 6, 68]. After confirming chromatin accessibility in key cell-identity markers, we sought to compare and contrast select regions identified from prior groups that evaluated chromatin accessibility in human islets [5, 12, 38], as well as to further query whether chromatin was always uniquely enriched on a panel of cell type-specific genes across alpha (Arx, Ttr, Gc), beta (Ucn3, MafA, Pdx1), and delta cells (Pdx1, Hhex, Rbp4, Ghsr) (Fig-2; Fig-S2). Each of these genes demonstrated overall strong concordance between cell type-enriched gene expression and cell type-specific enrichment of available chromatin. This validated the utility of ATAC-Seq data to detect epigenetic determinants of gene expression.
The chromatin landscape of the annotated genome across cell types
As genes make up a small fraction of the entire genome, we determined the overall distribution of peaks across the annotated genome within each cell type. We defined five regions of interest to further explore – promoter-proximal, intronic, exonic, downstream, or distal-intergenic (Fig-3A). We identified a consensus set of 124,494 peaks marking open chromatin through the R package DiffBind. This number is comparable to the number of open regions found in previous studies of pancreatic islet chromatin accessibility [11, 38–40] (Dataset-S1). We then evaluated the distribution of called peaks present in at least one replicate within 3kb upstream of the TSS and confirmed that a majority of genes enriched in each islet cell type were accompanied by promoter-proximal peaks (Fig-3B-D). The distribution of ATAC-Seq peaks across different pre-defined genomic areas was overall similar across alpha, beta, and delta cells. For each endocrine cell type between 21.98–24.88% of open chromatin was promoter-proximal, whereas promoter-proximal areas account for 2.41% of the mouse genome. A further 34.92–38.33% of peaks for all cell types were found on distal-intergenic regions, which was proportional to the fraction of the genome that falls into this category (Fig-3E-G). Finally, we noted that between 33.07–33.65% of peaks occurred on intronic regions (first or other), relative to the 37.7% of the mouse genome classified as intronic [69].
Regional differences and characteristics of differentially enriched chromatin
As our overall distribution of ATAC-Seq peaks across different genomic regions was consistent across alpha, beta, and delta cells, we compared differential chromatin accessibility between these cell types in greater detail. To this end, we performed pairwise differential ATAC-Seq peak enrichment testing across alpha, beta, and delta cells. Out of 124,494 identified consensus regions of open chromatin across the three cell types, 18,409 (14.8%) differentially enriched peaks (p-value < = 0.05) were identified between alpha and beta (Fig-4A), 12,722 (10.2%) between alpha and delta (Fig-4B), and 16,913 (14.6%) between beta and delta cells (Fig-4C).
After performing differential peak enrichment testing, we discovered that 22.89% of all differentially enriched peaks between alpha and beta cells were promoter-proximal (0-3kb) (Fig-5A). A further 33.22% of differential peaks were linked to distal-intergenic regions and another 33.61% of differential peaks were intronic (first and other combined) (Fig-5A). This assessment of differential peaks without considering the direction of enrichment revealed no major difference with overall peak distribution described earlier (Fig-3). However, when factoring in the direction of enrichment we observed that 35.08% of alpha cell-enriched peaks was promoter-proximal. In contrast, only 12.5 % of bta cell enriched peaks occurred in promoter-proximal areas (Fig-5B). Instead, a majority of ATAC peaks enriched in beta cells were located at distal-intergenic regions (45.41%) (Fig-5B).
Between alpha and delta cells, we identified that 21.29% of differentially enriched peaks occurred promoter-proximally. Another 36.4% of peaks occurred on distal intergenic regions and 35.5% on intronic regions (Fig-5C). A similar preference of alpha cell-enriched peaks in promotor-proximal regions was evident when comparing alpha to delta cells, with 30.33% of all enriched alpha peaks occurring promoter-proximally, but only 9.56% of delta cell peaks. Instead, 38.41% of delta cell enriched peaks were distal-intergenic (Fig-5D).
Lastly, between beta and delta cells, 26.62% of all differentially enriched peaks were promoter-proximal, 32.2% distal intergenic, and 34.33% on intronic regions (Fig-5E). Further break down revealed a bias towards distal-intergenic enriched peaks within beta cells (42.86%), as opposed to promoter-proximal peaks in delta cells (28.20%) (Fig-5F).
Differential chromatin enrichment in the majority of cases correlates with gene expression
So far, we detected a disproportionate fraction of peaks associated with promoter-proximal regions in general, compared to the fraction of the genome that consists of this type of region (Fig-3). Moreover, ATAC-Seq peaks that were differentially enriched in alpha and - to a lesser extent - delta cells were considerably more likely to occur at promoter-proximal sites. Instead, peaks enriched in beta cells more likely occurred at distal intergenic regions (Fig-5). Therefore, we determined whether the enrichment of promoter-proximal peaks correlated with increased expression of the corresponding genes. Genes with increased expression in a cell type accompanied by a significantly enriched ATAC-Seq peak proximal to its TSS were considered ‘congruent’ genes (Fig-6A). The underlying mechanism in such a scenario might be the presence of transcriptional activators at the promoter-proximal site that promote gene expression. Conversely, genes with a significantly enriched ATAC-Seq peak proximal to its TSS accompanying a reduction in corresponding gene expression were considered ‘incongruent’ genes (Fig-6B). The underlying mechanism for these genes might be the presence of transcriptional repressors at the promoter that prevent gene expression (Dataset-S2) [70–73]. Finally, genes that had significantly enriched chromatin in either cell type, but no evidence of mRNA expression were considered ‘unexpressed’ (Fig-6C).
When we compared differentially enriched TSS-associated chromatin against corresponding gene expression between alpha and beta cells, we found that in the majority of cases (86%), differential chromatin enrichment on TSS regions successfully captured the epigenetics of gene regulation. Exactly, 50% of genes with differentially enriched chromatin at the TSS had a corresponding increase in gene expression within the same cell type (congruent genes; Fig-6D). 36% showed TSS chromatin accessibility enrichment, but with a reduction in gene expression for each cell (incongruent genes - either alpha repressed (33%), or beta repressed (3%)). Strikingly, a substantial majority of the incongruent genes in this comparison were alpha repressed. Finally, only 14% of all genes with differentially enriched TSS chromatin showed no expression in either cell type (unexpressed) (Fig-6D).
We highlight the beta-specific gene MafA as an example of an incongruent gene that is putatively alpha repressed or alpha cell poised (Fig-2E; Fig-S3A).
We observed a similar distribution between congruent (55%), incongruent (24%), and no expression genes (20%), between alpha and delta cells. We noted a more uniform distribution between alpha (14%) and delta (10%) repressed genes. (Fig-6E). Upon visualizing gene expression and chromatin accessibility, we confirmed congruent gene expression and TSS chromatin accessibility of key transcription factors known to regulate both alpha – MafB, Ttr, and Arx – and delta – Pdx1 and Hhex – cell fate (Fig-S3B).
For the pairwise comparison between beta and delta cells, we again found a similar fraction of congruent (57%), incongruent (32%), and no expression (11%) genes (Fig-6F). We noted a minor fraction of repressed genes with open chromatin in beta cells (1.5%), with the majority of repressed genes corresponding to delta cells (30.45%), similar to the pattern seen in alpha repressed genes between alpha and beta cells. Further visualization of select marker gene expression against chromatin accessibility showed generally good congruence between chromatin accessibility at the TSS and gene expression (Fig-S3C).
Poised genes are enriched in beta cells with a non-beta cell lineage history
Since most of the incongruent genes are assigned to alpha and delta cells in their respective comparisons with beta cells, we further interrogated whether these alpha- or delta- repressed genes could be poised beta cell genes. In order to do so, we incorporated transcriptome data from beta cells with an alpha- or delta- cell lineage history [67] – also from our companion RNA-Seq experiment. Unfortunately, there is no accompanying ATAC-Seq data for these cells. These cells, termed “transdifferentiated,” are beta cells (defined by the presence of Ins1-driven mCherry expression), but have either a Gcg- or Sst-Cre lineage label, reflective of a lineage history as an alpha or delta cell, respectively. We reasoned that if alpha- or delta- repressed genes are poised beta cell genes, we should expect to observe a stepwise transition in gene expression levels, showing little or no expression in either alpha or delta cells, to intermediate expression in alpha- or delta- transdifferentiated cells, and full expression in beta cells. We confirmed that the majority (83.6%) of alpha-repressed genes showed intermediate expression in the alpha-to-beta-transdifferentiated population, and the highest expression in beta cells (beta cell genes). A subset of genes (16.4%) showed the highest expression in the alpha-transdifferentiated population (Fig-7A). We observed a similar pattern between delta, delta-transdifferentiated, and beta cells; however, only half (50.18%) of delta repressed genes demonstrated an intermediate expression in the delta-to-beta-transdifferentiated population and the highest in beta (Fig-7B). The remainder of the genes showed the highest expression in delta-to-beta-transdifferentiated cells.
Differential meta-chromatin enrichment testing
Given that for a majority of genes, TSS-associated chromatin recapitulated the underlying regulation of gene expression, we inquired whether differentially enriched chromatin peaks were associated with genes concentrated in pathways or gene networks that would better reflect our understanding of the biology across these different islet endocrine cell types. Between alpha and beta cells, KEGG set pathway testing of differentially accessible chromatin identified pathways related to protein digestion and absorption and cell adhesion molecules unique to beta cells; Hippo, Wnt, and ubiquitin-mediated proteolysis unique to alpha cells; and MAPK, axon guidance, and cAMP pathways enriched within both (Fig-S4A-B). Upon comparing the differentially accessible chromatin between alpha and delta cells, adherens junctions were enriched in delta cells, while no pathways were enriched specifically in alpha cells. MAPK, axon guidance, and Ras signaling pathways showed general enrichment of associated peaks within both alpha and delta cells (Fig-S5A-B). Lastly, in beta and delta cells, a pairwise analysis of differentially accessible chromatin identified the Glycosaminoglycan (GAG) biosynthesis pathway as unique to beta cells - where GAG metabolism and biosynthesis impairment has been linked to beta cell dysregulation [74], adherens junctions and Rap1 signaling pathways unique to delta cells, and MAPK, axon guidance, and cAMP signaling pathways enriched within both, as well as identified between alpha and beta cells (Fig-S6A-B).
Islet transcription factor ChIP-Seq binding correlates with open chromatin
After exploring the interrelationship between accessible chromatin and gene expression, we expanded our approach to include additional controls to the regulation of islet cell gene expression. We therefore aggregated high-quality, mouse pancreatic islet transcription factor binding data via ChIP-Seq - Pdx1 [75], Nkx6-1 [76], Neurod1 [77], Insm1 [77], Foxa2 [77], Nkx2-2 [78], Rfx6 [79], MafA [24], Isl1 [80], Kat2b [81], Ldb1 [80], and Gata6 [82] - and asked what fraction of open chromatin – as defined by our consensus ATAC-Seq peak set – contains binding sites for each respective transcription factor. Of all open chromatin sites, individual transcription factors occupied a significant percent with Foxa2 (29.07%), Insm1 (28.40%), and Neurod1 (20.09%) showing the highest percentage of ChIP-confirmed binding. This provided further support that our ATAC-Seq data was of high quality and suggested that open chromatin is a reliable indicator of epigenetic regulation (Table-S3A). To further explore whether these aggregated transcription factor ChIP data convey epigenetic relevance, we queried what fraction of total ChIP binding sites overlapped with open chromatin. Indeed, we observed that in several cases, over 50% of transcription factor ChIP binding sites overlapped with open chromatin, with the transcription factors Nkx2.2 (63.79%), Neurod1 (55.07%), and Insm1 (51.49%) showing the greatest degree of overlap (Table-S3B). These results supported our findings that open chromatin reflects epigenetic regulation in alpha, beta, and delta cells, in part through the binding of transcription factors.
Transcription factor motif finding suggests genomic preferences at differentially enriched chromatin between cell types
After observing a strong degree of overlap of known islet transcription factor binding on open chromatin, we conducted an unbiased evaluation whether DNA motifs for their respective transcription factor proteins were differentially enriched on ATAC peaks in promoter, intronic, exonic, downstream, or distal regions. We included only transcription factors with known DNA-binding motifs to determine if they were more likely to occur at specific areas of the genome. We required that the transcription factor associated with the DNA sequence motif considered is expressed (RPKM > 0) in the cell type with chromatin-motif association.
Motifs for key transcription factors involved in beta cell identity, such as MafA, were present ubiquitously across most functional regions we defined (promoter-proximal, intronic, exonic, downstream, and distal intergenic) (Fig-8A). In contrast, the motifs for alpha cell-identity driver Irx2 [4] were concentrated at the promoter-proximal regions of chromatin peaks associated with genes differentially expressed by alpha cells. Insm1 [77] motifs were concentrated at the promoter-proximal regions of chromatin peaks associated with genes differentially expressed by beta cells. In another example, DNA-binding motifs associated with the ubiquitous islet transcription factor Pax6 [83] were concentrated on intronic chromatin.
We performed the same transcription factor footprinting test between alpha and delta cells (Fig-8B). Of note, the motif for Pbx3, a transcription factor driving Sst expression in delta cells [84], was enriched in accessible chromatin at promoter-proximal regions. The motif for Stat4, recently implicated in establishing alpha cell identity [85] was concentrated in exonic chromatin. Lastly, the motif for Ptf1a, a transcription factor identified in early pancreatic endocrine cell development [86], was preferentially associated with areas of open chromatin at distal intergenic regions.
Between beta and delta cells, no single transcription factor motif overlapped across all five functional categories, nor were there any unique to downstream or exonic regions, as we observed in the prior alpha and delta comparisons (Fig-8C). Of note, the motif for Smad3, a transcription factor important for islet development [87], as well as the negative regulation of insulin secretion in beta cells via occupancy of the insulin promoter [88], was concentrated in promoter-proximal accessible chromatin. Motifs for Insm1 [77] and Nkx6.1 [89], both key beta cell identity transcription factors, were preferentially associated with accessible chromatin at intronic regions. Lastly, motifs for Fev – recently identified as important for the development and differentiation of the endocrine lineage [90] - and Atf3 – linked to enhancer regions in EndoC-bH1 cells [11] - were enriched in accessible chromatin at distal regions.
Validating motif calls against aggregated islet ChIP datasets
As we observed motif binding site preferences across promoter, intronic, exonic, downstream, or distal chromatin regions, we wished to confirm how accurately predictive DNA motif binding sites conveys true transcription factor binding. To do so, we once again turned to our aggregated pancreatic islet ChIP datasets. We applied the same motif detection method as above on individual ChIP datasets and on all open chromatin – as derived from our ATAC-Seq consensus peak set –and assessed how well predicted motif binding overlapped with true ChIP peaks from our selected list of ChIP-Seq data. One transcription factor, Rfx6, showed strong (57%) true positive and low (8.34%) false negative values for its DNA-motif’s ability to predict all Rfx ChIP binding sites (Table-S4A). We then limited this same test to Rfx6 ChIP-Seq binding sites (35.65%) that were shared across 1.19% of all open chromatin ATAC peaks (Table-S3A-B). Notably, when comparing DNA-motif predictions for Rfx6 against these shared Rfx6 ChIP-Seq binding sites, we observed that 65.10% were true positives, while 8.81% were false predictions (Table-S4B). When we expanded this analysis to other transcription factors, we observed a broad distribution in true positive values across these DNA-binding motifs (4.71–65.10%), and also noted a relatively low range of false positives (1.68–8.81%). This is reflective of a limitation in the ability for DNA-motifs to consistently predict true transcription factor binding.
Determining overlap of differentially enriched chromatin with islet ChIP and histone datasets
Given that many of the DNA-motifs associated with transcription factors do not have available ChIP-Seq datasets derived from mouse pancreatic islets, we sought to understand whether differential chromatin between cell types could be associated to transcription factors and histone markers integral to pancreatic islet cell fate that do have available ChIP data, as opposed to relying only on predictive motifs. As we previously observed strong enrichment of transcription factor binding sites across all open islet chromatin, we wanted to confirm if this overlap is augmented in differentially enriched chromatin associated with pancreatic islet transcription factor binding sites via aggregated islet ChIP data - Pdx1, Nkx6-1, Neurod1, Insm1, Foxa2, Nkx2-2, Rfx6, and MafA - and select, key histone marks — H3K27ac [91], H3K4me3 [91], and H3K4me1 [12]. Our intention in integrating these data was in anticipation that they may help delineate whether enhancer regions are poised (defined by: H3k4me1) [92], or active (defined by: H3k27ac and H3k4me1) [93], and whether promoter regions are active (defined by: H3k4me3) [94].Indeed, we observed that differentially enriched ATAC-Seq peaks within our comparisons occurred at much higher rates than random chance across a majority of transcription factor ChIP data associated with islet cell identity (Fig-S7A-C) as well as all predictive histone marker regions (Fig-S7D-F). This further supported our hypothesis that open chromatin, and now specifically differentially enriched chromatin, would be directly associated with the transcription factors responsible for shaping islet cell-specific gene expression patterns and identity. As differentially enriched chromatin is associated with cell-identity regulatory networks, we inquired to selectively evaluate these regions for enhancers that may be relevant to pancreatic islet cell identity.
Identifying and visualizing putative islet cell-type specific enhancers via epiRomics
These transcription factor and histone ChIP datasets were then fed into an R package named epiRomics that we developed to identify putative enhancer regions involved in pancreatic islet cell identity. We defined enhancer regions by the co-localization of H3k27ac and H3k4me1 (activating) histone modifications within islet cell chromatin. We narrowed our definition further by requiring these regions to also have transcription factor binding sites - Pdx1, Nkx6-1, Neurod1, Insm1, Foxa2, Nkx2-2, Rfx6, MafA, Isl1, Kat2b, Ldb1, or Gata6 – defined by islet ChIP data that are either ubiquitously or selectively expressed across the three islet cell types (Fig-2E-F; Fig-S2E; Fig-S8A-I). This first pass resulted in 28,645 putative enhancer regions (Dataset-S3). We then filtered this list against chromatin accessible regions from our ATAC-Seq data sets of alpha, beta, and delta cells, resulting in 16,651 putative active enhancers (Fig-9A). To further increase our confidence in these enhancer calls, we crossed our putative enhancer regions against the FANTOM5 curated enhancer database [95]. This resulted in a conservative list of 3,535 putative enhancer regions. Of these, 3,203 were inaccessible to at least two out of three islet endocrine cell types (Fig-9B) (Dataset-S4).
In both putative enhancer lists, we found that 39.8–43.2% of the enhancer regions we identified were common across all cell types, supporting the theory that related cell types of a common origin would have a sizeable commonality of similar regulatory regions involved in development and maintenance (Fig-9A-B). Interestingly, between 1.53–10.1% of called enhancers were associated with accessible chromatin unique to each cell type. Enhancer regions selective to beta cells were identified at the highest frequency (~ 10%), while alpha and delta enhancers made up ~ 2% of the list.
As a reality check, we cross referenced our putative enhancer list with several established enhancer sites that were experimentally confirmed. Our approach correctly predicted an established intronic enhancer on the Slc30a8 gene, which was demonstrated to be regulated in part by the Pdx1 transcription factor (Fig-9C) [96]. Our approach also correctly predicted a previously identified promoter-proximal enhancer region targeting Pdx1, with co-occurring binding sites for islet transcription factors Insm1, Neurod1, and Foxa2 (Fig-9D) [77]. This validated the use of the epiRomics approach to predict novel, not previously identified enhancer regions.
We investigated cell-specific or common putative enhancer candidates, starting with those with the highest number of transcription factor co-binding sites from our list. One of the top predicted beta cell-unique putative enhancer regions is located on the sixth exon of the Slc35d2 gene and aligns with eight different ChIP co-localization binding sites (Fig-10A). An alpha cell-unique putative enhancer located at a distal-intergenic region ~ 30kb upstream of Dusp10, overlapped precisely with six sites of co-binding from various transcription factors (Fig-10B). A delta cell-unique region at a distal-intergenic enhancer region ~ 21kb upstream of Gm20745 aligned closely with no fewer than 12 sites of co-binding from multiple transcription factors. (Fig-10C). And finally, a common enhancer located ~ 32kb upstream of Snap25 – a gene expressed in alpha, beta, and delta cells, associated with a total of 17 co-binding sites of aggregated transcription factors. (Fig-10D).
We noted further examples of enhancer regions that are inaccessible to beta, but present in both alpha and delta (Fig-S9A) cells, or others with chromatin accessibility across all cell types with an adjacent, intronic enhancer region uniquely available to beta cells alone (Fig-S9B). In particular, the Slc2a2 gene shares common open chromatin across alpha, beta, and delta cells. However, beta cells have a gained accessible chromatin region on the first intron identified as a putative enhancer and which overlaps with six co-binding sites. Finally, we noted more putative regions that were enriched in both alpha and beta cells, and present in delta cells as well (Fig-S9C-D).