Core and disruptive genomic compartments of T. cruzi have a different open chromatin profile
To investigate whether the lack of transcriptional regulation of protein-coding genes would be reflected in the open chromatin profile of T. cruzi, we performed a FAIRE-seq genome-wide approach. We rescued these regions from the genome of epimastigotes under exponential proliferation. Biological duplicates were processed together with input control samples (reversed cross-link before DNA extraction) to account for any bias in genome sequencing and assembly. FAIRE-seq reads presented high duplication levels (66.2-72.1% in epimastigotes – Figure S1 A) and high Phred score (quality) values (>30 – Figure S1 B). After the mapping process (Figure S1 C), only mapped reads with mapping quality scores (MAPQs) above 10 were retained, resulting in approximately 116 million reads mapped (Figure S1 D). Spearman correlation analysis (Figure S1 E) and PCA (Figure S1 F) using genome-wide read coverage of 50 bp windows for each replicate indicated a high correlation among them, showing good reproducibility and agreement among biological replicates. To avoid bias at repetitive regions, we removed multimapped reads, resulting in reduced (or zero) read coverage at repetitive regions of the genome (Figure S2). Most of the repetitive regions occurred in intergenic sections (92.75%) of the genome (Supplementary Table S1). Of note, strand switch regions (SSRs) were not considered intergenic regions; nevertheless, no repeat annotations were found there. Only 309 CDSs (5.65% of total repetitive regions, in bp) were located in annotated repetitive regions.
To obtain an overview of the open chromatin distribution in large genomic regions, we compared the data of epimastigote FAIRE-seq (E) and their corresponding control samples (C). Figure 1 shows the distribution of reads obtained from both samples on two contigs from Dm28c (PRFA1000011 – 594 kb and PRFA1000027 – 454 kb). Datasets were normalized to reads per genomic content (RPGC) to account for different sequencing depths. Visual inspection of RPGC levels indicates that the FAIRE sample has higher overall levels along the contigs with some clear enrichments when compared to the control sample. Ratio levels in log2 (E/C) indicated that most regions were enriched in FAIRE samples (Figure 1A). Repetitive regions are depleted in both FAIRE and control samples, likely due to the applied mapping and filtering steps (Figure S2).
Regions covered by the core compartment (composed of conserved and hypothetical conserved genes, green tracks) had higher RPGC normalized counts than the disruptive compartment regions (red tracks) (Figure 1A and B, Table S2-3). Differences in RPGC counts between these two compartments were also observed in control samples; however, this difference was even more evident in FAIRE samples, which was reflected by higher levels of log2 ratio (E/C) in the core than in the disruptive compartment (Figure 1B). Remarkably, removing multimapped reads (see methods - Figure S2B) slightly affected this difference since the same criteria were used for experimental and control samples. Instead, this can be explained by the less open chromatin at the multigenic family in contrast to a more open chromatin profile in the nonmultifamily CDS.
Because the distribution of open chromatin was revealed to be distinct between genome compartments, we asked whether their landscapes, at a gene level, were also different. Strikingly, the landscape of open chromatin greatly differs among virulence factors (present in the disruptive compartment), among them, and among the remaining protein-coding genes. Figure 1C shows that at most core-gene members, the regions coding for their 5’ and 3’ UTRs are enriched in open chromatin, corroborating the impoverishment of nucleosomes seen previously by MNASE-seq (Figure S3B). In contrast, no clear enrichment of open chromatin was found around the genomic regions coding for the 5’ UTR of virulence factors, while the regions coding for the 3’ UTR greatly differed among them: TS and GP63 exhibited an enrichment similar to other CDSs. At the same time, MASPs had a clear depletion followed by an enrichment a few base pairs downstream (Figure 1C). Nucleosome occupancy partially explains the active chromatin distribution around the virulence factor genes, as some regions (such as the upstream regions of mucins, GP63, DGF-1 and RHS) exhibit enrichment or decrease both in FAIRE and MNAse data (Figure S3A).
Open chromatin is enriched at divergent SSRs (dSSRs) and uniformly distributed along PTUs
In trypanosomatids, transcription of protein-coding genes initiates mainly at dSSRs and terminates at cSSRs; however, transcription might also start at some non-SSRs, mainly close to tRNA genes (27)(6, 28). This latter has not yet been described in T. cruzi, we inspected dSSRs as a proxy of transcription start sites, which resulted in enrichment of open chromatin compared to PTUs, as evidenced in Figure 2A and 2B. Combining a previous nucleosome mapping and occupancy profile (25) with the FAIRE-seq data along PTUs revealed complementary opposite landscape profiles: open chromatin regions mainly reflect nucleosome-depleted regions (Figure 2A).
PTU regions were hierarchically clustered into three groups based on their log2 (E/C) RPGC level (Figure 2C). Clusters 2 and 3 are very similar, exhibiting a near flat pattern with low overall RPGC levels. In contrast, Cluster 1 showed higher overall levels of open chromatin, with a decrease at the edges, mainly at the PTU ending region. Clusters 2 and 3 contained significantly more genes (per bp) from the multigenic family than Cluster 1, which was enriched mainly from genes of the core compartment (Figure 2D).
Levels and relative nuclear position of eu- and heterochromatin changes during metacyclogenesis
Many morphological changes are observed during the differentiation of epimastigotes to MTs, including nuclear elongation and kinetoplast repositioning to the parasite posterior end. Previously, it has been reported that the heterochromatin near the nuclear envelope in the epimastigote forms spreads progressively along with the nucleus in the intermediate forms, reaching a higher level of compaction in metacyclics in MTs (15). However, only in this work a systematic evaluation and 3D reconstruction of the two traditional chromatin classes, eu- and heterochromatin, was performed to check chromatin remodeling during parasite differentiation. The obtained results indicate that in T. cruzi epimastigotes, euchromatin resides in the central area, whereas heterochromatin is mainly distributed close to the nuclear envelope and surrounding the nucleolus (Figure 3A – epimastigote). The euchromatin volume is higher in epimastigote and intermediate I forms and decreases during differentiation to MT. Heterochromatin, in the epimastigote form, spreads throughout the nucleus, and as metacyclogenesis advances, its percentage increases and its location becomes increasingly peripheral. These results indicate that progressive chromatin remodeling occurs during parasite differentiation. It is worth observing that during metacyclogenesis, the nucleolus reduces its size, as it undergoes disassembly and dispersion throughout the nuclear matrix, which may be related to the decrease in ribosomal biogenesis (Figure 3, Supplemental videos S1-4).
Genome-wide analysis of open chromatin regions in epimastigotes and MTs
Tridimensional reconstruction of chromatin areas along epimastigote to MT differentiation indicated a significant reduction in euchromatin regions (Figure 3), which are considered open chromatin areas. To gain more insights into these changes, we performed a comparison of FAIRE-seq data from epimastigote and MT forms. Visual inspection of normalized RPGC levels along the T. cruzi genome confirms a greater abundance in open chromatin and few clear enrichments in epimastigote forms than MTs (Figure 4A).
As mentioned above, the disruptive compartment is enriched in virulence factors that are mainly expressed in infective forms (23). To address whether the disruptive compartment would be enriched in open chromatin in MTs relative to epimastigotes, RPGC-normalized FAIRE-seq data were compared between life forms, obtaining log2 ratios. Median RPGC values from the disruptive and core compartments were lower in MTs (Figure 4B, Table S2-3). Differences between compartments within life forms are very similar: the core compartment has 3.9 times more open chromatin than the disruptive compartment in both epimastigotes and MTs. In the MNase data, core and disruptive compartments also presented different RPGC levels, with the former being significantly higher (Figure S3B). In general, virulence factors (TS, GP63, MASP, and mucins) and core compartments have 2.6 and 2.8 times (fold change, median values) less open chromatin in MTs than in epimastigotes (Figure S3A), which agrees with global changes in other genomic features (see Figure 4C, Table S4). The open chromatin landscape of virulence factor genes is similar in both life forms, with a slight difference at the upstream coding region of mucins (Figure S3D). Taken together, these results indicate that virulence factors indeed have a different pattern of open chromatin when compared to other CDSs (as we discussed below) that seemed to be maintained between life forms, which reinforces major posttranscriptional control of these genes.
Previously, we found that dSSRs from TCTs were enriched in nucleosomes compared to epimastigotes (25). Then, we speculated whether dSSRs from epimastigotes would be more abundant in open chromatin. Indeed, dSSRs are approximately 3.6 more enriched in open chromatin at epimastigotes (Figure 4C) (Figure S4A), which corroborates a lower nucleosome occupancy in replicative forms compared to nonreplicative forms.
Upon hierarchical cluster analysis, the enrichment of open chromatin at dSSRs can be further visualized in a PTU context (Figure 4D – top, Table S5). Most notably, Cluster 1 showed enrichment at dSSRs followed by a decreasing signal level toward the cSSRs. A detailed inspection of Cluster 1 elements revealed that approximately 10% of their PTUs are located downstream of tDNA loci. Cluster 2 represents PTUs whose dSSRs have similar levels of openness compared to their adjacent PTUs. Furthermore, interestingly, Cluster 3 encompasses PTUs whose levels of open chromatin are equal between E and MTs (log2 E/MT=0). Taken together, the obtained data indicate that different PTUs have distinct levels of openness at their transcription initiation regions. Clusters 2 and 3 were enriched in genes from the dispersed compartment (data not shown).
Considering only the PTU region, epimastigotes showed a significantly increased signal (2.9 times) compared to MTs (Figure 4C). The hierarchical cluster analysis reflected PTUs with different levels of open chromatin among life forms and, importantly, near-flat enrichment along the PTU region (Figure 4D bottom, Table S6). In accordance with hierarchical clustering of PTUs with SSRs (Figure 4D top), Clusters 2 and 3 were also enriched in the multigenic family (data not shown).
FAIRE-seq data correlate with steady-state gene expression levels in both life forms
Given the different open chromatin profiles observed among PTUs (Figure 4D), we investigated whether a similar pattern occurs in distinct CDSs from different life stages. Thus, CDSs were hierarchically clustered into three groups based on their log2 RPGC level (Figure S4D). An increase in open chromatin was found mainly at genomic regions encompassing 5' and 3' UTRs at Clusters 1 and 2. Similar to the results shown above, Cluster 1, which has the highest log2 ratio, was enriched in genes from the core compartment, whereas Clusters 2 and 3 were enriched in genes from the disruptive compartment and repetitive genes (data not shown).
Then, we explored whether regions enriched in open chromatin would have transcripts expressed at higher levels. CDSs were first classified as high, medium or low expressed in each life form based on their TPM counts obtained from a transcriptomic study published elsewhere (19) (Figure S5). RPGC normalized counts were retrieved for each CDS according to their respective expression class and each life form (Figure 5). Surprisingly, a positive correlation of open chromatin with steady-state transcription levels was found. For epimastigotes, significant differences in FAIRE enrichment were observed when comparing all expression classes. In contrast, for MTs, significance was observed mainly when compared to the low expressed genes (high versus low and medium versus low).
Open chromatin is developmentally regulated at tDNA loci, likely reflecting translation regulation
FAIRE-seq analysis highlighted a global decrease in the levels of open chromatin in MTs compared to epimastigotes. However, we wondered whether the open chromatin profile would give us more clues about the differences in gene expression and phenotype found between life forms, especially those related to their differentiation program. Comparing all genomic features, we detected striking differences in open chromatin enrichment at the regions uncovered by the small nuclear RNAs (snRNAs) and tRNA genes. Of note, this enrichment was significantly higher (fold changes of 13.4 and 9.3 for tDNAs and snDNA, respectively) in epimastigotes than in MTs (Figure 4C and 6A-B).
Hierarchical cluster analysis of the E/MT ratio revealed that the majority of tDNAs had at least six times more open chromatin in epimastigotes than in MTs (Figure 6C, Figure S6, Table S7). Clusters were not related to the tRNA isoaceptor type or class (Figure S6B); however, their distribution reflects their location regarding the transcription direction of the adjacent PTU (Figure S6C). For example, Clusters 1 and 2 were enriched in tDNAs that were mainly located between codirectional (60% and 30%, respectively) and divergent (10% and 40%, respectively) PTUs (Figure 6D). This distribution suggests that chromatin alterations at tDNAs may affect transcription initiation at adjacent PTUs.
Finally, we addressed whether the increase in open chromatin within tDNAs would be associated with tRNA expression levels. Evaluation of tRNA expression by PCR is not a straightforward approach once this molecule is highly modified. Then, we evaluated tRNA expression in life forms using an RNA FISH approach. Previously, (29) showed that compared to MTs, epimastigotes have much higher amounts of tRNAs. To confirm this finding, epimastigotes and MTs were probed simultaneously using one of the four probes against 5’ and 3’ Asp and Glu tRNAs. Figures 4E and S6D indicate that tRNAs are much lower in MTs, indicating that tRNAs are approximately 50% less expressed in nonreplicative forms.