Distinct functions of genome-wide chromatin remodeling in the differentiation of T helper Type 1 and 2 cells

T helper type 1 and 2 (Th1 and Th2) cells play critical roles in infectious, autoimmune and allergic diseases. Here we mapped genome-wide distribution of DNase I hypersensitive (DHS) sites in Th1, Th2 and their precursors naïve CD4 T cells. DHS sites were found unevenly distributed in the genomes with highest densities within 2kb of the transcription start sites (TSS). At the whole genome level, the DHS values, representing chromatin openness, but not the numbers of DHS sites showed strong positive correlation with gene expression. Th1 and Th2 differentiations were accompanied by changes of genome-wide distribution of DHS sites. The differentiated cells assumed more open chromatin structures than their precursors. During Th1 differentiation changes of DHS values could be statistically positively or negatively associated with changes of gene expression depending on the locations of the DHS sites, whereas only positive association was found during Th2 differentiation.


Introduction
Successful clearance of different infections requires tailor-made immune responses to the specific infectious agents. To accomplish this, conventional naïve CD4 T cells differentiate into different effector T helper subsets, each of which produces a distinct set of cytokines that mediate their effector functions. The first described subsets are the T helper type 1 and type 2 (Th1 and Th2) cells (1). The Th1 cells produce their signature cytokine IFN-γ, whereas the Th2 cells produce IL-4, 5 and 13. Th1 responses are required for the clearance of intracellular bacterial infection (2), and Th2 responses are required for the clearance of helminth infections (3). While the protective immunities provided by Th1 and Th2 responses to infections benefit the hosts, Th1 and Th2 responses to self-antigens and allergens cause detrimental immunopathology in organ specific autoimmune and allergic diseases, respectively (4,5).

Differentiation of naïve CD4 T cells into Th1 or Th2 cells is induced by the
combination of TCR stimulation and polarizing signals. IL-12 and IL-4 provide strong polarizing signals for Th1 and Th2 differentiation, respectively (6,7). Other cytokines such as IL-33 and TSLP, as well as Notch ligands on the cell surface, also provide polarizing signals (8)(9)(10)(11). These signals cooperate to induce Th1 and Th2 specific master transcription factors to further drive Th1 and Th2 differentiations, which are mutually exclusive (12,13). Once differentiated, Th1 and Th2 cells gain the ability to produce their effector cytokines upon re-stimulation by the TCR without the original polarizing signals.
Chromatin remodeling is a widely recognized feature of cell differentiation.
Chromatin remodeling changes the distribution of "open" chromatin regions, which are often referred to as DNase I hypersensitive (DHS) sites for their high sensitivity to digestion by DNase I (14). The open chromatin structures at the DHS sites allow transcription factors to access cis regulatory sequences. In fact, various cis regulatory elements such as promoter, locus control region (LCR), enhancer, boundary element and even silencer are located at the DHS sites (10,15,16).
Previous studies of chromatin remodeling during Th1 and Th2 cell differentiation have focused on DHS sites at the genomic loci of the effector cytokine genes. Subset specific DHS sites are detected at the Th2 cytokine and IFN-γ gene loci, and they play critical roles in the specific expression of the effector cytokines (10,(17)(18)(19). However, it remains unclear whether chromatin remodeling occurs only at the local levels or there is genome wide chromatin remodeling during the Th subset differentiation. In the present study, we compared genome-wide distribution of DHS sites in naïve CD4 T cells, Th1 and Th2 cells, and analyzed the relations between genome wide chromatin remodeling and global gene expression profiles.

Genome-wide identification of DHS sites
To gain understanding of genome-wide distribution of open chromatin regions, we identified DHS sites in Th1, Th2 and their precursors naïve CD4 T cells using the DNase-Seq protocol developed by Boyle et al (20). Figure 1 shows representatives of the 140bp PCR products of the DHS libraries that contain 20bp of genomic DNA sequences derived from the DNA ends generated by DNase I digestion of the nuclei of naïve CD4 T cells, Th1 and Th2 cells. For each cell type, multiple independent replica libraries were constructed. The library read counts ranged from 21.3 to 60.2 millions.
The reads were mapped to the mouse genome, and DHS peaks were called using F-Seq (21). DHS peaks of a representative library at the CD4, IFN-γ and IL-4/Il13 loci are shown for naïve CD4, Th1 and Th2 cells, respectively ( Fig S1).

Characteristics of genome-wide distribution of DHS sites
We assumed that the location of a DHS site is a key determinant of whether the DHS site plays a role in regulating its associated gene. We therefore analyzed the distribution of DHS sites at distances relative to their associated genes. We plotted the numbers of DHS sites against the distances to the transcriptional start sites (TSS) or transcriptional termination sites (TTS) of their associated genes. Large numbers of DHS sites were concentrated near the TSS, and to lesser degrees near TTS (Fig S2B, C). To gain better resolutions, we limited the genomic distances to cover only 80% of the DHS sites.
As shown in Fig 2B, sharp spikes of DHS peaks were found near the TSS in all cell types. The numbers of peaks decreases as the distance to TSS increases in both the up-and downstream directions. Similarly, the numbers of DHS sites decrease when the distances to the TTS increase ( Fig 2C). Based on these observations, we divided the DHS sites into 6 location categories, including -5kb to -2kb from TSS, -2kb to TSS, TSS to +2kb, +2kb to TTS, TTS to TTS +5kb, and intergenic regions (Fig 3, left). To better show the distribution of the DHS sites in genome wide landscape, we calculated DHS peak densities (peaks/kb) in each location category. Highest peak densities were found in the -2kb to TSS and TSS to +2kb categories. The regions with the second highest peak densities were the regions from -5kb to -2kb upstream of the TSS and within 5kb downstream of TTS. On the other hand, the regions from +2kb to TTS and the intergenic regions showed relatively low peak densities (Fig 3, right).

Relations between DHS and gene expression in each cell type
To quantitatively measure the "openness" of the chromatin structure of a DHS site, we assigned a DHS value to each DHS site as the read counts of 600bp genomic interval centered at the position of the DHS peak normalized to the average read counts of all DHS sites in a cell type (22). On the other hand, gene expression level was determined as RPKM. We divided the genes in each cell type according to their RPKM values (top 25%, 50% and bottom 50% and 25%), and compared the DHS values of the DHS sites associated with the genes. When all DHS sites were analyzed regardless of their locations relative to the genes, DHS values decreased along with RPKM ( Fig 4A). When the DHS sites in different location categories were examined, strong positive correlations between DHS values and RPKM were found in DHS sites in the location categories of -2kb to TSS and TSS to +2kb, whereas the correlation was much weaker with DHS sites of other location categories ( Fig 4C). In contrast to the DHS values, the total numbers of DHS sites showed only marginal, if any, positive correlation with the RPKM of their associated genes ( Fig 5A). Nonetheless, the positive correlation was more notable in the +2kb to TSS and intergenic location categories than other categories ( Fig 5C).

Correlation coefficients between DHS and RPKM values
As an independent method of determining the correlation between DHS values and RPKM, we calculated Kendall rank correlation coefficients (tau values). In agreement with Fig 4A, positive correlation coefficient values were obtained for all 3 cell types when all DHS sites were considered (Table 1). Similarly, highest positive correlation coefficients were obtained with DHS sites in the -2kb to TSS and TSS to +2kb location categories ( Table 1).

Dramatic changes of DHS site during Th1 and Th2 differentiations
To determine whether there is genome wide chromatin remodeling during the

Correlations between the changes of DHS values and gene expression during Th1 and Th2 differentiation
Kendall rank correlation coefficients were calculated between ΔDHS of DHS sites and the changes of expression levels, i.e., ΔRPKM, of their associated genes ( Table 2).
When all DHS sites were considered, we detected positive correlations between ΔDHS and ΔRPKM during Th1 and Th2 differentiation. When DHS sites were divided into different location categories, strongest positive correlations were detected in the categories of -2kb to TSS and TSS to +2kb. For the remaining categories during Th1 differentiation, 1 category (TTS to TTS + 5kb) showed somewhat weaker positive correlation, but more strikingly 2 categories (+2kb to TTS and intergenic) showed statistically significant negative correlations; and 1 category (-5 to -2kb) showed negative correlation that was not statistically significant. The negative correlations suggest that overall DHS sites in these location categories may be more likely to function as gene silencers during Th1 differentiation. In contrast, all the remaining categories of DHS sites showed positive correlations with changes of gene expression during Th2 differentiation.

Discussion
Although many studies have analyzed chromatin remodeling or DHS sites at the genomic loci of Th1 and Th2 effector cytokine genes, there have been no studies that have analyzed genome wide chromatin remodeling, and how genome wide chromatin structures correlate with global gene expression in these cells and during their differentiation from naïve CD4 T cells. The present study fills these gaps of knowledge.
We used DNase-Seq to map genome wide distribution of DHS sites in naïve CD4 T cells, Th1 and Th2 cells. We first determined how the DHS sites are distributed genome wide. As the location of a DHS is important for determining whether the DHS site may regulate the expression of its associated gene, we divided the DHS sites into 6 location categories based on their locations relative to the gene bodies. We found that in all cell types, the genomic regions within 2kb surrounding the TSS of the associated genes have the highest densities of DHS sites. To lesser degrees, the regions from -5kb to -2kb and within 5kb downstream of TTS also have relatively high densities of DHS sites. In contrast, the region from +2kb to TTS and intergenic region have low densities of DHS sites. These findings provided genome wide views of the chromatin structure landscapes of these cells.
For each DHS site, we assigned a DHS value to quantitatively measure its DNase I sensitivity or the "openness" of the chromatin. When all DHS sites were

DNase Seq and data processing
The DHS libraries were gel purified and sequenced on an Illumina HiSeq 1000 genome analyzer at the Marshall University Genomic Core Facility. Adaptor sequences were trimmed from the sequence reads using Trimmomatics (28). The trimmed reads were aligned to the mouse reference genome mm10 using Bowtie (29). DHS peaks were called with F-Seq (21). Gene annotation of the DHS peaks was performed using the Perl script annotatePeaks.pl in HOMER (30). For intergenic DHS peaks, we compared the distances from a DHS peak to the TTS of its upstream gene and to the TSS of its downstream gene, then chose the gene that is closer to the DHS peak as the associated gene. For quantitative measurement, we used a formula similar to the previously described (22) to define the DHS value of a DHS site as !i/( where Ni is the tag count in a 600bp interval centered on the i-th DHS peak and m is the total number of peaks in a cell type. ΔDHS values between 2 cell types were determined as previously described (22).

RNA-Seq data processing
RNA-Seq analyses of gene expression in naïve CD4 T cells, Th1 and Th2 cells were described previously (24). The RNA-Seq raw data files (2 replicas for each cell type) in FASTQ format were summarized in Table S3. All the data files passed quality control test by FastQC (31). The RNA-Seq reads were aligned to the mouse reference genome mm10 using TopHat (32). Gene expression quantification was performed using HTseqcount pipeline (33). We calculated the RPKM value for each gene using locally

Analyses of relationship between DHS sites and gene expression
Kendall tests were performed between the DHS values and the RPKM values for all or each of the location categories of the DHS sites using the Kendall R package (34) to calculate correlation coefficient or "tau" values. P values were obtained using the GeneNet R package (35) to determine whether the correlations were statistically significant. Relations between ΔDHS and ΔRPKM were analyzed similarly.     Table S1.