Chromatin Immunoprecipitation
A full-length wild-type AR is transferred into PC3 cells that are prostate cancer cells that do not express the AR. The efficiency of the transfection was measured by Western blotting using an antibody against the human AR. The transfected PC3 cells showed a ten-fold increase of the AR protein expression as compared with the baseline level. For chromatin immunoprecipitation (ChIP), two anti-AR antibodies from BD Pharmagen were tested for their performances and the better one (BD Pharmagen Cat No. 554224) was used. A no antibody ChIP control was included. The specificity of the chromatin IP was confirmed by PCR amplification of the promoter of the prostate specific antigen (PSA) gene, a gene shown to be directly regulated by the AR[24]. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with 'no antibody' chromatin IP do not. PC3 cells were grown in two T-75 culture flasks in RPMI + 10%FBS + 1X L-glutamine. After two days, the cells were at 85% confluency and were then transfected with plasmid carrying the human Androgen Receptor gene driven by a cmv promoter. The transfection was carried out according to Invitrogen’s lipofectamine 2000 protocol. The efficiency of the transfection experiment was measured by Western blot using antibodies against the human androgen receptor (BD Pharmagen Cat No. 554224 and 554225). The anti-AR antibody 554224 reacts about 20 folds stronger than the antibody 554225 against the AR-DNA complexes as detected by Western blot analysis. The anti-AR antibody 554224 was therefore used for chromatin IP. Androgen expression of the transfected PC3 cells showed a ten-fold increase compared to the baseline level. 48 hours after transfection, the protein was cross-linked to DNA by adding formaldehyde according to the protocol by Upstate's chromatin immonoprecipitation kit. A 'no antibody' control chromatin IP was also performed.
ChIP-Seq data analysis
ChIP-Seq has gained its popularity since its development from the year 2007[25–27].
To date, quite a large number of ChIP-Seq data analysis software programs have been developed world-widely and genome-widely by distinguished researchers all around the world[28–33].
Succintly speaking, the all kinds of ChIP-Seq data peak-callers could be divided into
three categories:
(A)Those which could only perform 'one-sample' ChIP-Seq data analysis including F-Seq[34],GeneTrack[35] et al;
(B)Those which could perform both 'one-sample'and 'two-sample' ChIP-Seq data analysis, including QuEST[36], MACS[37],SISSRs[38],CisGenome[39],ERange[40],MOSAICS[41] et al;
(C)Those which could only perform 'two-sample' ChIP-Seq data analysis,including CCAT[42],PeakSeq[43],CMT[33] et al.
In this research, the chromatin IP DNA for the androgen receptor(AR) in PC3-AR cells was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. ChIP DNA end repairing, adaptor ligation, amplification were performed as described earlier[22, 25]. Fragments of about 100 bps (without linkers) were isolated from agarose gel and used for sequencing using the Solexa/Illumina 1 G genetic analyzer. ChIP-Seq data analysis Solexa Pipeline Analysis was performed as described earlier[22, 25]. Sequence reads that map to multiple sites in the human genome were removed. The output of the Solexa Analysis Pipeline was converted to browser extensible data (BED) files for viewing the data in the UCSC genome browser.
To systematically identify AR binding 'islands' or 'peaks'in the androgen receptor ChIP-Seq dataset(Supplementary_File_6) in the PC3-AR cells in this study, the 'one-sample' ChIP-Seq data analysis has been adopted.
There are two reasons for choosing the 'one-sample' ChIP-Seq data analysis in this study:
Firstly, the IGG control ChIP-Seq dataset for the androgen receptor(AR) in PC3 cell line hasn't been backed-up well by all the authors after the initial publication[22]. Therefore, it's impractical to perform 'two-sample' ChIP-Seq data analysis utilizing AR treatment ChIP-Seq dataset alone in this study. Also, although incorporating a control sample into ChIP-Seq data analysis could eliminate some false positive peaks in some or a number of circumstances[33, 42, 43], an open question one may ask would be that are all the peaks called by a ChIP-Seq peak-caller in 'two-sample' mode would be equivalent to the set of total genuine peaks in a standard ChIP-Seq experiment? If not, there might still be some or a number of genuine peaks predicted by a ChIP-Seq peak caller in 'one-sample' mode but not by a ChIP-Seq peak-caller in 'two-sample' mode. From this sense, the 'one-sample' ChIP-Seq data analysis might still be quite valuable especially when there is no a control ChIP-Seq experiment performed or the control ChIP-Seq dataset has been lost(e.g, not backed-up well).
Secondly, the sequencing tags for the AR treatment and IGG control ChIP-Seq datasets as experimented and probed in the previous publication by Lin et al[22] are much unbalanced(for an example,5,354,469 sequencing tags vs 1,089,089 sequencing tags for AR treatment vs IGG control).Usually, a ChIP-Seq peak-caller in 'two-sample' data analysis mode(such as the CCAT[42] or MACS[37] or CisGenome[39] or some ChIP-Seq peak-caller else) would normalize the treatment and control ChIP-Seq datasets to make them more balanced for the data analysis. Some international researcher(s ) might have argued that the ChIP-Seq data analysis in 'two-sample' mode could eliminate false positives. This might raise another open question that are some or a number of false positive peaks really eliminated after sequencing reads are normalized for the treatment and control ChIP-Seq datasets in the ChIP-Seq data analysis in 'two-sample' mode?
For an example, CCAT[42] is a one of the most popular ChIP-Seq peak-callers which could only perform 'two-sample' ChIP-Seq data analysis and which is based on a normalization method. However, sometimes, some peaks with only few sequencing tags(to say, as few as only 2 in number) would be predicted by CCAT as putative significant peaks. Although one may critically argue that peaks with as few as only 2 sequencing tags might stand for weak transcription factor(TF) or histone modification(HM) binding signals, in some other international researchers' opinions, these peaks mightn't be different from background noise and the ChIP-Seq data analysis programs like the CisGenome[39] as well as TIP[44] and iTAR[45] et al would usually filter out these peaks and deem them as insignificant ones in the ChIP-Seq data analysis. From this sense, since the 'one-sample' ChIP-Seq data analysis would usually results in much more peaks reported than the 'two-sample' ChIP-Seq data analysis, sometimes or in a number of cases, the 'one-sample' ChIP-Seq data analysis mightn't be worse than the 'two-sample' ChIP-Seq data analysis in terms of total number of genuine peaks found if disregarding the total false positives and false discovery rate(FDR) reducing effects.
In this study, four ChIP-Seq peak-calls(namely MACS14,SISSRs_v1.4,CisGenome_v2.0 and ERange 2.1) in one-sample ChIP-Seq data analysis mode have been chosen to perform the ChIP-Seq data analysis on the androgen receptor ChIP-Seq dataset alone in the PC3-AR cells.
For running the MACS14 program, parameters are set as default.
For running the SISSRs_v1.4 program, parameters are set as turning on the '-u' option as well as p-value cutoff as 0.05 and FDR cutoff as 0.05 with other parameters are set as default.
For running the CisGenome_v2.0 program, parameters are set by default.
For running the ERange 2.1 program, parameters are set with 'minHits' as 5 and other parameters by default.
Since the peak-calling results from different ChIP-Seq peak-calling programs usually would vary to some extent for the same treatment ChIP-Seq dataset(in 'one-sample' mode) or for the same treatment ChIP-Seq and control datasets. In PeakFinderMetaServer(PFMS)[46], to account for the heterogeneity of the peak-calling results reported from the different ChIP-Seq peak-calling programs, the authors have introduced a voting mechanism to identify putative significant peaks. The number of votes in the PFMS[46] has been defined as the number of peak finders that would have called the region. Generally speaking, if the number of votes for the region called by the PFMS would be greater, the more putative significant the region called would be.
In this study, a similar voting mechanism has been proprosed. Rather than voting the peaks as called by the different ChIP-Seq peak-calling programs, the genes in the human genome are voted in this research instead. If a gene is supported by at least half of the total number of ChIP-Seq peak-callers in the data analysis(say,2 out of 4 in this study), then the gene would be regarded a putative significant gene in the ChIP-Seq data analysis(majority voting).
ChIP on chip analysis
The chromatin IP DNA was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. The DNAs were then blunt-ended by T4 DNA polymerase (in 75ul volume, add 5X T4 DNA pol buffer, 0.7ul 10mg/ml BSA, 0.2ul 5U/ul T4 DNA pol, and 1ul 10mM each dNTPs) for 5 minutes at 37°C. The reactions were stopped by incubating at 75°C for 10 minutes, and the fragments were purified using Qiagen Qiaquick PCR purification kit. The DNA was eluted with 100 ul dH2O, speed vacuum dried and resuspended in 7.5ul dH2O. Equimolar chip Linkers, Linker A: 5'-CTGCTCGAATTCAAGCTTCT-3’ and Linker B 5'- AGAAGCTTGAATTCGAGCAGTCAG-3’ were annealed according to IDT protocol. 0.5ul of 2.5 uM annealed linker, 1ul 10X ligase buffer and 1ul ligase were added to each sample. The ligation reactions occurred at 16°C for two days, and the DNA was purified using Qiagen Qiaquick PCR purification kit, using with 80ul dH2O to elute. The specificity of the chromatin IP was confirmed by PCR amplification of the PSA promoter. The PCR primer sequences are available upon request. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with no antibody chromatin IP do not. The DNAs from the AR-ChIP were labeled with Cy3-dCTP and the DNAs from the 'no antibody' ChIP were labeled with Cy5-dCTP by PCR. The PCR product was purified using the Qiagen Qiaquick PCR purification kit. The labeled DNAs were mixed in equal amounts and hybridized to the promoter chip. Hybridization was performed by Nimblegen Inc. NimbleGen’s human HGS17 Promoter 1.0K Chip was used. This promoter chip contains 37,364 promoter regions from human genome build 35. For each promoter region, 10 probes of 50 mers were placed on 1.0 kb promoter region, with probe spacing of about 100 bps. ChIP-chip data analysis Eight Nimble ChIP-chips (four for IP enriched and four for the whole cell lysate) were used for probing DNA segments that AR binds. The promoter array includes 37,215 promoters (corresponding to 19,028 genes), each of which comprises 10 probes covering 1Kb. In many cases, multiple promoters were used to monitor AR binding to a single gene. Promoters are selected with significant differences in IP enriched and the whole cell lysate samples as follows. First, the eight datasets were normalized using the quantile normalization[47]. Second, the log2 ratio test was applied using the empirical distribution from kernel density estimations as explained above. Finally, for each promoter, the overall p-value was calculated by combining the resulting p-values of 10 probes using Fisher's test and the promoters are ranked by p-value in ascending order.
Since there is no consensus on p-value cutoff to determine significance in the international scientific community(http://www.upcscavenger.com/wiki/p-value/#page=wiki). In this research, a less stringent p-value = 0.1 cutoff has been utilized. Genes with p-values equal to or less than 0.1 are selected and deemed as putative significant genes in androgen receptor(AR) ChIP-chip data analysis in the PC3-AR cells in this study.