Experimental
LNCaP Cell culture and DHT treatment
LNCaP cells (#CRL-1740, ATCC) were grown in phenol red-free RPMI (#11835030, GIBCO) with 10% charcoal-stripped FBS (#100–119, GemBio) for 3 days and then were stimulated with 10nM dihydrotestosterone (DHT) (A8380, Sigma) for 0.5, 4, 16 or 72 hours. For the vehicle samples, the cells were treated with 100% EtOH. Subsequently, cells were collected for further experiments (ChIP-seq, RNA-seq, ATAC-seq, HiChIP, or Start-seq) accordingly38. LNCaP cells were authenticated by sequencing and comparing short tandem repeats to parental LNCaP cells in the ATCC database. Prior to experiments, cells were tested for mycoplasma contamination with LookOut Mycoplasma PCR Detection Kit (Sigma-Aldrich #D9307).
Chromatin Immunoprecipitation Assays with Sequencing (ChIP-seq)
ChIP-seq in LNCaP was performed as previously described38. Briefly, 10 million cells were fixed with 1% formaldehyde at room temperature for 10 min, quenched, and collected in lysis buffer (1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS and protease inhibitor [#11873580001, Roche] in PBS). Chromatin was sonicated with a Covaris E220 sonicator (140PIP, 5% duty cycle, 200 cycle burst). The sample was then incubated with antibodies (AR: Abcam ab133273, FOXA1: Abcam ab23738, H3K27ac: Diagenode 3C15410196; H3K4me3: Diagenode 3C15410003) coupled with Dynabeads protein A and protein G beads (Life Technologies) at 4 oC. Incubated chromatin was washed with RIPA wash buffer (100 mM Tris pH 7.5, 500 mM LiCl, 1% NP-40, 1% sodium deoxycholate) for 10 min six times and rinsed with TE buffer (pH 8.0) once. DNA was purified using a MinElute column followed by incubation in the de-crosslinking buffer (1% SDS, 0.1 M NaHCO3 with Proteinase K and RNase A) at 65 oC. Eluted DNA was prepared as sequencing libraries with the ThruPLEX-FD Prep Kit (Takara bio, # R400675). Libraries were sequenced with 150-BP PE on an Illumina HiSeq 2500 Sequencing platform at Novogene.
RNA-Seq
LNCaP cells (5x10^5) were harvested for RNA-seq39. Total RNA was collected from the cells using an RNeasy kit (Qiagen, #74104,) with DNase I treatment (Qiagen, #79254). The library preparations, quality control, and sequencing on HiSeq 2500 Sequencing platforms (150-BP PE) were performed by Novogene.
Assay for transposase-accessible chromatin with sequencing (ATAC-seq)
LNCaP cells were isolated and subjected to modified ATAC–seq as previously described38. Briefly, 50,000 nuclei were pelleted and resuspended in ice-cold 50µl of lysis buffer (10 mM Tris-HCl, pH 7.5, 10mM NaCl, 3mM MgCl2, 0.1% NP-40, 0.1% Tween20, and 0.01% Digitonin). The subsequent centrifugation was performed at 500 g for 10 min at 4°C. The nuclei pellets were resuspended in 50µl of transposition buffer (25µl of 2× TD buffer, 22.5µl of distilled water, 2.5 µl of Illumina Tn5 transposase) and incubated at 37°C for 30 min with shaking at 1000 rpm for fragmentation. Transposed DNA was purified with the MinElute PCR Purification kit (Qiagen). DNA was purified using Qiagen MinElute (#28004), and the library was amplified up to the cycle number determined by 1/3rd maximal qPCR fluorescence. ATAC-seq libraries were sequenced with 150-BP PE high-throughput sequencing on an Illumina HiSeq 2500 Sequencing platform (Novogene).
HiC combined with capture ChIP-seq (HiChIP)
HiChIP was performed as previously described39. Trypsinized LNCaP cells (10 million) were fixed with 1% formaldehyde at room temperature for 10 min and quenched. The sample was lysed in HiChIP lysis buffer and digested with MboI (NEB) for 4 h. After 1 h of biotin incorporation with biotin dATP, the sample was ligated with T4 DNA ligase for 4 h with rotation. Chromatin was sonicated using Covaris E220 (conditions: 140 PIP, 5% DF, 200 CB) to 300–800 bp in ChIP lysis buffer (1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS and protease inhibitor in PBS) and centrifuged at 13,000 rpm. for 10 minutes at 4°C. Preclearing 30µl of Dynabeads protein A/G for 1 h at 4°C was followed by incubation with antibodies (H3K27ac, Diagenode, 3C15410196; H3K4me3, Diagenode 3C15410003). Reverse-crossed IP sample were pulled down with streptavidin C1 beads (Life Technologies), treated with Transposase (Illumina) and amplified with reasonable cycle numbers based on the qPCR with a 5-cycle pre-amplified library. The library was sequenced with 150-BP PE reads on the Illumina HiSeq 2500 Sequencing platform (Novogene).
Small Capped Nascent RNA Sequencing (Start-seq)
For Start-seq, LNCaP cells were grown and collected as described as above. Cell pellets were washed with ice-cold 1x PBS. 1 million cells were treated with 1.5mL of Nun Buffer (0.3M NaCl, 1M Urea, 1% NP-40, 20mM HEPES pH 7.5, 7.5mM MgCl2, 0.2M EDTAm, protease inhibitors and 20U/mL SUPERase-IN) for 30 min on ice with frequent vortexing. Chromatin was pelleted by centrifugation at 12500 g for 30 min for 4°C. After 3 times-washing with 1mL ice-cold chromatin washing buffer (50mM Tris-HCL pH7.5 and 40U/mL SUPERase-In) and additional 0.5mL of Nun buffer. After centrifugation for 5 min at 500 g, 0.5m TRIzol was added to the remaining pellet. Libraries were prepared according to the TruSeq Small RNA Kit (Illumina). To normalize samples, 15 synthetic capped RNAs were spiked into the Trizolpreparation at a specific quantity per 10^7 cells, as in40. The library was sequenced with 150-BP PE reads on the Illumina HiSeq 2500 Sequencing platform (Novogene).
CRISPRi
Guide RNAs (gRNAs) were designed for each enhancer and promoter region using CRISPR-SURF41. Cas-OFFinder was used to eliminate off-target gRNAs42. LNCaP cells stably expressing dCas9-KRAB (Addgene #89567) were seeded in a 6-well plate at a density of 200K cells per well. For transfection, a total of 500-1500ng DNA was used, divided according to the number of available gRNAs. Transfection was performed using Mirus TransIT-X2. After transfection, the media was replaced and 2ng/µl of puromycin was added for selection. Following 72 h of selection, the medium was changed to charcoal-stripped serum for androgen deprivation. After 48 h, the cells were treated with 10nM DHT for 4 h and then trypsinized. RNA extraction and cDNA preparation was performed using the LunaScript® RT SuperMix Kit. The androgen-induced expression was quantified using qRT-PCR and each experiment was conducted in triplicate. The sequences of gRNAs and qRT-PCR primers can be found in Supp Table 1.
Bioinformatics Analyses
RNA-seq analysis
Reads were aligned to the hg19 human genome with STAR (v2.7.9)43 with quant mode (--quantMode TranscriptomeSAM). Next, “toTranscriptome” bam files were fed into Salmon (v0.14.1)44 to quantify TPM values for (Gencode v19)45 transcripts. To generate the signal track files from RNA-seq, VIPER46 is used. Briefly, STAR as the default aligner, converts BAM files into BigWig files using Bedtools (v2.30.0)47.
ChIP-seq and ATAC-seq analyses
ChIP-seq and ATAC-seq were processed through the ChiLin pipeline48. Briefly, Illumina Casava1.7 software used for base calling and raw sequence quality and GC content were checked using FastQC (v0.10.1). The Burrows–Wheeler Aligner (BWA49, v0.7.10) was used to align the reads to the human genome hg19. Then, MACS250 (v2.1.0.20140616) was used for peak calling with an FDR q-value threshold of 0.01. Bed files and Bigwig files were generated using bedGraphToBigWig51 and the union of narrow and broad peaks from ChIP-seq were used as anchors to call loops. The following quality metrics were assessed for each sample: (i) percentage of uniquely mapped reads, (ii) PCR bottleneck coefficient to identify potential over-amplification by PCR, (iii) FRiP (fraction of non-mitochondrial reads in peak regions), (iv) peak number, (v) number of peaks with 10-fold and 20-fold enrichment over the background, (vi) fragment size, (vii) the percentage of the merged peaks with promoter, enhancer, intron, or intergenic, and (viii) peak overlap with DNaseI hypersensitivity sites. For datasets with replicates, the replicate consistency was checked by two metrics: 1. Pearson correlation of reads across the genome using UCSC software wigCorrelate after normalizing signal to reads per million and 2. percentage of overlapping peaks in the replicates.
HiChIP Loop calling
After trimming adaptors from the HiChIP datasets using Trim Galore (v0.5.0) (https://github.com/FelixKrueger/TrimGalore), we used HiC-Pro (v3.1.0)52 as previously described in Giambartolomei and Seo et al.39. This aligned the reads to the hg19 human genome, assigned reads to MboI restriction fragments, and removed duplicate reads. We used the following options: <MIN_MAPQ = 20, BOWTIE2_GLOBAL_OPTIONS = –very-sensitive–end-to-end–reorder, BOWTIE2_LOCAL_OPTIONS = –very-sensitive–end-to-end–reorder, GENOME_FRAGMENT = MboI_resfrag_hg19.bed, LIGATION_SITE = GATCGATC, LIGATION_SITE = “GATCGATC,” BIN_SIZE = “5000.”> All other default settings were used. To build the contact maps, the HiC-Pro pipeline selects only uniquely mapped valid read pairs involving two different restriction fragments. We applied FitHiChIP (v10.0)53 for bias-corrected peak and DNA loop calls. FitHiChIP models the genomic distance effect with a spline fit, normalizes for coverage differences with regression, and computes statistical significance estimates for each pair of loci. We used the FitHiChIP loop significance model to determine whether interactions are significantly stronger than the random background interaction frequency. As anchors to call loops in the HiChIP analyses, we used 842367 regions for H3K27ac and 136939 regions for H3K4me3, which resulted from merging ChIP-seq narrow and broad peaks comprised 2 replicates for each broad and narrow peak and each of the five-time points (0m, 30m, 4h, 16h, 72h) for H3K27ac, and 1 replicate for each broad and narrow peak and each of the same time points for H3K4me3. We used a 5 kb resolution and considered only interactions between 5 kb and 3 Mb. We used the peak to all for the foreground, meaning at least one anchor needed to be in the peak rather than both. The corresponding FitHiChIP options specification is < IntType = 3 > For the global background estimation of expected counts (and contact probabilities for each genomic distance), FitHiChIP can use either peak-to-peak (stringent) or peak-to-all (loose) loops for learning the background and spline fitting. We specified the suggested option to merge interactions close to each other to represent a single interaction when their originating bins are closer. The corresponding FitHiChIP options specifications are < UseP2PBackgrnd = 0 > and < MergeInt = 1> (FitHiChIP (L + M)). We used the default FitHiChIP q value < 0.01 to identify significant loops. We used hicpro2higlass (v3.1.0) to convert allValidPairs to .cool files after having specified the hg19 chromosome sizes, using the following command: <hicpro2higlass.sh -i sample.allValidPairs -r 5000 -c chrsizes -n > The reads from HiChIP data were merged from every time point for H3K27ac and H3K4me3 separately, and the reference loop sets were called with the same parameters above (resulting loop number, n = 296,326; n = 278,491 respectively). Next, the contact frequency values of each time point at loops are captured from the .cool files using the Python package, Cooler (v0.9.3). The count table was then TMM normalized among time points to reduce the between-sample variation.
Annotation of Cis-Regulatory Elements / Defining Background Genes
CREs were defined as +/- 2.5 kb around the summit of the accessible region from ATAC-seq peaks at any given time point. Promoters were defined with a multi-step process. First, we identified the highest expression transcript (Gencode v19)45 isoforms using Salmon (v0.14.1)44, see RNA-seq analysis. Next, the start locations -according to strand information- of the highest expressing transcripts were collected and the summit was extended +/2.5 kb. If they overlap with a defined CRE, we assigned the overlapping CRE as the active promoters. From these annotated active promoters, the genes of HALLMARKS_ANDROGEN_RESPONSE from the Molecular Signature Database (MSigDB)54 were defined as AR-regulated genes. The transcriptome was divided into four quartiles and AR-regulated genes were compared with similarly expressing genes (Supp Fig. 1A, B). Overlapping CREs to an AR ChIP-seq peaks at any time point were defined as potential AR-bound enhancers. The median number AR-bound enhancers (E + AR; ~2) of AR-regulated genes were less than AR-free enhancers (E-AR; ~14) (Supp Fig. 1E). Similarly, a CRE was considered FOXA1-bound, if it intersects with a FOXA1 peak at any time point (Supp Fig. 1C). To identify down-regulated genes, we calculated the log2foldChange induction between 16h and 0m, considering those with a value below − 1 as downregulated genes (Supp Fig. 3A, B).
Constructing the Graph Network
When we called significant loops from each time point separately, there were few called loops overlapping between time points potentially due to both the loop calling methodologies and experimental noise present in HiChIP data55. As we could observe the matching loops in the contact matrices of all time points (Supp Fig. 5), we instead generated a reference loop set, normalized the count matrix, and compared contact frequencies similar to published work20. A custom R (v4.1.1) script (https://github.com/lacklab/AR_transcription) was used to extract interaction pairs of annotated CRE regions (BED) according to the reference loop set (BEDPE; both H3K27ac and H3K4me3) using `GenomicInteractions` (v1.34)56. The graph structure is built in custom Python (v3.9.7) script (link) using the NetworkX (v3.1) package57. Briefly, any pair of CREs are included as nodes in the network with TMM normalized HiChIP contact frequency as weighted edges, for both H3K27ac and H3K4me3 at every time point.
Epigenetic changes of CREs and kinetic changes of enhancers of AR-regulated genes
The average signal (AR, FOXA1, H3K27ac, H3K4me3, ATAC) at the promoters and CREs were collected using a custom python package (https://github.com/breambio/bluegill). To reduce the batch effect across time, TMM normalization was applied individually for each epigenetic factor58. To visualize the temporal change upon androgen stimulation, line plots were drawn (Fig. 2). For the selected six AR-regulated genes (Fig. 3F), both the average signal (AR, FOXA1, H3K27ac, H3K4me3, ATAC) over the first-degree interacting enhancers of each gene promoter and contact frequency (CF) between every gene promoter and corresponding enhancer were collected for every time point. The standard deviation (SD) of each feature along the time domain was calculated and min-max normalized.
Calculation of chromatin contact frequency change
Promoter-centric analysis was done with the query loop sets, AR-regulated genes’ promoter loops (P + AR), and random highly expressed genes’ promoter loops (P-AR; n = 100 ; seed = 7). This was compared to reference loop sets (n = 100) that was randomized (1000 iterations; seed = 7) from highly expressed genes’ promoter loops (Fig. 3B). Fold change was calculated for each iteration between the average contact frequency of loops in the query and reference. The same approach was used to calculate the contact frequency fold change from the enhancer viewpoint. The loop sets were selected according to the E-P pairs. While selecting the query and reference loops, the number of randomly selected promoters was fixed (n = 100). The loops between AR-regulated gene promoters to AR-bound (E + AR) and AR-free enhancers (E-AR), and between AR-independent gene promoters to AR-free enhancers (E-AR) were compared to randomized loops from an AR-free enhancer to highly expressed gene promoters (1000 iterations; seed = 7).
Start-seq analysis
Adaptor sequences and low-quality 3′ ends were removed from paired-end reads of all samples using cutadapt (v1.2.1). Reads shorter than 20 nucleotides were discarded (-m 20 -q 10), and a single nucleotide was trimmed from the 3′ end of all remaining reads to enable successful alignment with bowtie (v1.1.1). The first in pair flagged reads were filtered to generate signal (bigwig) files for each strand using bedtools genomecov (-bg -5 -strand +/-, respectively).
The transcripts (Gencode v19) were extracted GTF file. The maximum signal within a range of +/- 500 bp of TSS was gathered from the forward Start-seq track for plus-stranded transcripts, and the reverse for minus-stranded transcripts. Next, the log2foldchange (LFC), compared to 0m, was calculated for every time point (30m, 4h, 16h, 72h). If a transcript was found to be LFC > 1 at any time point, it was considered as differential up regulation. Next, the nascent expression levels at all time points for those transcripts (the union of differential up-regulation) were z-normalized to capture the highest expression time point, which is assigned to time-based expression groups (Fig. 4A). Similar to enhancer viewpoint analysis, the first-degree AR-bound enhancer contact frequency to the gene promoters was compared to randomly selected contact frequencies (1000 iterations) between the highly expressed gene promoters (n = 100) and their first-degree AR-free enhancers.
Contact frequency, expression correlation
To explore the contact frequency vs. expression, we performed three first-degree summarization models (average, maximum, and sum). Briefly, we identified all first-degree contacts of every gene promoter, then we summarized the contact frequencies with the corresponding function of the models for every gene. To reduce the signal noise, we first sorted genes according to expression, binned them into equal-sized sets (k = 25), plotted bins as a scatter plot with average log expression on the x-axis, and averaged summarized contact frequency on the y-axis (Fig. 5B, Supp Fig. 7).
Random Forest Regressor
The binned (k = 25) contact frequency and expression data (see above) were used to train a random forest regressor from the Sklearn (v1.3.1)59 with “random_state = 0”. The permutation importance of each feature is also calculated within the Sklearn60.
Gini Index Analysis
The chromatin contact frequency between gene promoter and first-degree interacting elements was identified and the Gini index was calculated for a 16h time-point (Fig. 5C). To calculate the Gini index a custom python function was utilized (https://github.com/lacklab/AR_transcription). The mean absolute difference is the average absolute difference between all pairs of items in the population. The relative mean absolute difference is obtained by dividing the mean absolute difference by the population's average (\(\overline{x}\)) to account for differences in scale (e.q 1), where \({x}_{i}\) is contact frequency of a loop i, and there are n loops of a promoter.
$$G= \frac{\sum _{i=1}^{n}\sum _{j=1}^{n}\left|{x}_{i}-{x}_{j}\right|}{2{n}^{2}\overline{x}}$$
Circle Plots
Circle plots were drawn with NetworkX and Matplotlib libraries57,61 (https://github.com/lacklab/AR_transcription).
Statistical analysis
The probability of finding k loops from each CREs was determined by hypergeometric distribution (Supp Fig. 4). When comparing the target set with the background set, a non-parametric Mann-Whitney-U test was applied. Correlation coefficients and their p-values were calculated with Spearman’s test. All statistical tests were performed using the SciPy (v1.11.1) Python package62.