DOI: https://doi.org/10.21203/rs.3.rs-1977193/v1
Background:
The TMPRSS2 and ERG which could form the TMPRSS2-ERG gene fusion are two important genes in prostate cancer cells.
Previous works by others have found that the ERG could interrupt androgen receptor (AR) signal transducting pathway and the TMPRSS2-ERG gene fusion acts in a pivotal role in prostate cancer progression.
Results: In this study, through transfecting with wild-type androgen receptor with an androgen receptor negative prostate cancer cell line(PC3), both the androgen receptor(AR) ChIP-Seq and ChIP-chip data are generated for the androgen receoptor in the advanced PC3-AR cells. After a series of bioinformatics data analysis, it is found that TMPRSS2 and ERG genes are androgen receptor targeted putative highly significant genes in androgen receptor ChIP-Seq and ChIP-chip datasets in PC3-AR cells.
Conclusions: Identifying of TMPRSS2 and ERG as androgen receptor targeted putative highly significant genes in advanced PC3-AR cells could serve the international scientific community for biomarker identifications and developing novel prostate cancer therapeutic strategies.
The TMPRSS2 gene encodes a protein that belongs to the serine protease family which contains a type II transmembrane domain, a receptor class A domain, a scavenger receptor cysteine-rich domain and a protease domain, which is involved in many physiological and pathological processes and which has demonstrated to be up-regulated by androgenic hormones in prostate cancer cells and down-regulated in androgen-independent prostate cancer tissue(https://www.genecards.org/cgi-bin/carddisp.pl?gene=TMPRSS2).
The ERG gene encodes a member of the erythroblast transformation-specific (ETS) family of transcriptions factors with all members of this family are key regulators of embryonic development, cell proliferation, differentiation, angiogenesis, inflammation, and apoptosis. The ERG gene is involved in chromosomal translocations, resulting in different fusion gene products, such as TMPSSR2-ERG and NDRG1-ERG in prostate cancer(https://www.genecards.org/cgi-bin/carddisp.pl?gene=ERG).
The TMPRSS2 and ERG which could form the TMPRSS2-ERG gene fusion are two of the important genes in prostate cancer cells[1–21].
Previous works by Yu et al have found that the ERG could interrupt androgen receptor (AR) signal transducting and the TMPRSS2-ERG gene fusion acts in a pivotal role in prostate cancer progression, like for an example, blocking AR expression, binding to specific loci in the human genome, and inducing direct activation of the polycomb complex EZH2[2].
In this study, by transfecting with wild-type androgen receptor in an androgen receptor negative prostate cancer cell line(PC3), both ChIP-Seq and ChIP-chip datasets are generated for the androgen receptor(AR) in the PC3-AR cell line.
It is hypothesized that if a gene is targeted by at least one putative significant peak within 50kb distance from its transcription start site(TSS) in the ChIP-Seq or ChIP-chip data analysis, then the gene is regarded as a putative significant gene. It is hypothesized further that the putative significant genes commonly found by these two technologies are putative highly significant genes in the ChIP-Seq and ChIP-chip experiments. By the hypothesizes and assessments, TMPRSS2 and ERG genes are identified as androgen receptor(AR) targeted putative highly significant genes in androgen receptor ChIP-Seq and ChIP-chip data in advanced PC3-AR cells, suggesting the TMPRSS2 and ERG genes which could form the TMPRSS2-ERG fusion pair are potentially quite important in androgen receptor(AR) transcription factor regulating as well as could be exploited for biomarker identifications and developing new therapeutical strategies in the prostate cancer.
Identification of TMPRSS2 and ERG as androgen receptor targeted putative highly significant genes in PC3-AR cells
The results of the androgen receptor(AR) ChIP-Seq 'one-sample' data analysis of the MACS14,SISSRs_v1.4,CisGenome_v2.0,ERange 2.1 programs are presented in Supplementary Files 1–4.
Since all 4 ChIP-Seq peak-callers in the 'one-sample' data analysis would support the ERG gene and 2(CisGenome_v2.0 and SISSRs_v1.4) out of the 4 ChIP-Seq peak-callers would support the TMPRSS2 gene, passing the majority voting cutoff ( > = 50% passing rate) described in the Materials and Methods section, from the assessments, both the TMPRSS2 and ERG genes are recognized as putative significant genes in the androgen receptor(AR) ChIP-Seq 'one-sample' data analysis in PC3-AR cells.
After ChIP-chip data analysis, 8880 putative significant promoters have p-values less than the 0.1 cutoff, which correspond to 6347 putative significant genes, including the TMPRSS2 and ERG genes(Supplementary_File_5).
Based on the hypothesizes defined, both the TMPRSS2 and ERG genes are assessed and recognized as androgen receptor targeted putative highly significant genes in PC3-AR prostate cancer cells from both the ChIP-Seq and ChIP-chip data analysis in this research.
Androgens and the androgen receptor(AR) play important biological roles in the development of urogenital tissues and male phenotype and in the initiation and progression of a number of diseases in the human bodies. Alterations in AR sequences and expression levels and interruptions in AR signaling pathways have significant roles in the progression of the prostate cancer[2, 22].
Previous studies by Lin et al have shown that the growth inhibition program of the AR is quite different from its proliferation growth program and 3 novel AR motifs have been found in the advanced PA3-AR cell line by integrating expression profiling and ChIP-seq analyses[22].
Yu et al utilized the ChIP-Seq technology to comprehensively probe the relationships among the androgen receptor(AR) as well as the TMPRSS2-ERG gene fusion and other factors in a compendium of 57 ChIP-Seq datasets and they have found that the TMPRSS2-ERG gene fusion is quite important in the AR regulatory network and signaling pathway[2].
Dorothee Pflueger et al have discovered several novel non-ETS gene fusions in human prostate and interestingly the non-ETS fusions were all present in prostate cancer harboring the TMPRSS2-ERG gene fusion albeit that they are of low frequency in the prostate cancer[23].
In this study, through utilizing an androgen receptor negative prostate cancer cell line(PC3) transfected with wild-type androgen receptor,both ChIP-Seq and ChIP-chip datasets have been generated for the (androgen receptor)AR transcription facor in the advanced PC3-AR cells. By a slew of bioinformatics data analysis, it is found that TMPRSS2 and ERG genes are androgen receptor targeted putative highly significant genes in androgen receptor ChIP-Seq and ChIP-chip datasets in PC3-AR cells. Identification of TMPRSS2 and ERG as androgen receptor targeted putative highly significant genes in the advanced PC3-AR cells could shed more light in the understanding of the molecular basis mediated by the androgen receptor(AR) in the AR regulating pathways and networks to steward the international scientific researchers for biomarker findings and developing novel prostate cancer therapeutic tactics.
A full-length wild-type AR is transferred into PC3 cells that are prostate cancer cells that do not express the AR. The efficiency of the transfection was measured by Western blotting using an antibody against the human AR. The transfected PC3 cells showed a ten-fold increase of the AR protein expression as compared with the baseline level. For chromatin immunoprecipitation (ChIP), two anti-AR antibodies from BD Pharmagen were tested for their performances and the better one (BD Pharmagen Cat No. 554224) was used. A no antibody ChIP control was included. The specificity of the chromatin IP was confirmed by PCR amplification of the promoter of the prostate specific antigen (PSA) gene, a gene shown to be directly regulated by the AR[24]. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with 'no antibody' chromatin IP do not. PC3 cells were grown in two T-75 culture flasks in RPMI + 10%FBS + 1X L-glutamine. After two days, the cells were at 85% confluency and were then transfected with plasmid carrying the human Androgen Receptor gene driven by a cmv promoter. The transfection was carried out according to Invitrogen’s lipofectamine 2000 protocol. The efficiency of the transfection experiment was measured by Western blot using antibodies against the human androgen receptor (BD Pharmagen Cat No. 554224 and 554225). The anti-AR antibody 554224 reacts about 20 folds stronger than the antibody 554225 against the AR-DNA complexes as detected by Western blot analysis. The anti-AR antibody 554224 was therefore used for chromatin IP. Androgen expression of the transfected PC3 cells showed a ten-fold increase compared to the baseline level. 48 hours after transfection, the protein was cross-linked to DNA by adding formaldehyde according to the protocol by Upstate's chromatin immonoprecipitation kit. A 'no antibody' control chromatin IP was also performed.
ChIP-Seq data analysis
ChIP-Seq has gained its popularity since its development from the year 2007[25–27].
To date, quite a large number of ChIP-Seq data analysis software programs have been developed world-widely and genome-widely by distinguished researchers all around the world[28–33].
Succintly speaking, the all kinds of ChIP-Seq data peak-callers could be divided into
three categories:
(A)Those which could only perform 'one-sample' ChIP-Seq data analysis including F-Seq[34],GeneTrack[35] et al;
(B)Those which could perform both 'one-sample'and 'two-sample' ChIP-Seq data analysis, including QuEST[36], MACS[37],SISSRs[38],CisGenome[39],ERange[40],MOSAICS[41] et al;
(C)Those which could only perform 'two-sample' ChIP-Seq data analysis,including CCAT[42],PeakSeq[43],CMT[33] et al.
In this research, the chromatin IP DNA for the androgen receptor(AR) in PC3-AR cells was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. ChIP DNA end repairing, adaptor ligation, amplification were performed as described earlier[22, 25]. Fragments of about 100 bps (without linkers) were isolated from agarose gel and used for sequencing using the Solexa/Illumina 1 G genetic analyzer. ChIP-Seq data analysis Solexa Pipeline Analysis was performed as described earlier[22, 25]. Sequence reads that map to multiple sites in the human genome were removed. The output of the Solexa Analysis Pipeline was converted to browser extensible data (BED) files for viewing the data in the UCSC genome browser.
To systematically identify AR binding 'islands' or 'peaks'in the androgen receptor ChIP-Seq dataset(Supplementary_File_6) in the PC3-AR cells in this study, the 'one-sample' ChIP-Seq data analysis has been adopted.
There are two reasons for choosing the 'one-sample' ChIP-Seq data analysis in this study:
Firstly, the IGG control ChIP-Seq dataset for the androgen receptor(AR) in PC3 cell line hasn't been backed-up well by all the authors after the initial publication[22]. Therefore, it's impractical to perform 'two-sample' ChIP-Seq data analysis utilizing AR treatment ChIP-Seq dataset alone in this study. Also, although incorporating a control sample into ChIP-Seq data analysis could eliminate some false positive peaks in some or a number of circumstances[33, 42, 43], an open question one may ask would be that are all the peaks called by a ChIP-Seq peak-caller in 'two-sample' mode would be equivalent to the set of total genuine peaks in a standard ChIP-Seq experiment? If not, there might still be some or a number of genuine peaks predicted by a ChIP-Seq peak caller in 'one-sample' mode but not by a ChIP-Seq peak-caller in 'two-sample' mode. From this sense, the 'one-sample' ChIP-Seq data analysis might still be quite valuable especially when there is no a control ChIP-Seq experiment performed or the control ChIP-Seq dataset has been lost(e.g, not backed-up well).
Secondly, the sequencing tags for the AR treatment and IGG control ChIP-Seq datasets as experimented and probed in the previous publication by Lin et al[22] are much unbalanced(for an example,5,354,469 sequencing tags vs 1,089,089 sequencing tags for AR treatment vs IGG control).Usually, a ChIP-Seq peak-caller in 'two-sample' data analysis mode(such as the CCAT[42] or MACS[37] or CisGenome[39] or some ChIP-Seq peak-caller else) would normalize the treatment and control ChIP-Seq datasets to make them more balanced for the data analysis. Some international researcher(s ) might have argued that the ChIP-Seq data analysis in 'two-sample' mode could eliminate false positives. This might raise another open question that are some or a number of false positive peaks really eliminated after sequencing reads are normalized for the treatment and control ChIP-Seq datasets in the ChIP-Seq data analysis in 'two-sample' mode?
For an example, CCAT[42] is a one of the most popular ChIP-Seq peak-callers which could only perform 'two-sample' ChIP-Seq data analysis and which is based on a normalization method. However, sometimes, some peaks with only few sequencing tags(to say, as few as only 2 in number) would be predicted by CCAT as putative significant peaks. Although one may critically argue that peaks with as few as only 2 sequencing tags might stand for weak transcription factor(TF) or histone modification(HM) binding signals, in some other international researchers' opinions, these peaks mightn't be different from background noise and the ChIP-Seq data analysis programs like the CisGenome[39] as well as TIP[44] and iTAR[45] et al would usually filter out these peaks and deem them as insignificant ones in the ChIP-Seq data analysis. From this sense, since the 'one-sample' ChIP-Seq data analysis would usually results in much more peaks reported than the 'two-sample' ChIP-Seq data analysis, sometimes or in a number of cases, the 'one-sample' ChIP-Seq data analysis mightn't be worse than the 'two-sample' ChIP-Seq data analysis in terms of total number of genuine peaks found if disregarding the total false positives and false discovery rate(FDR) reducing effects.
In this study, four ChIP-Seq peak-calls(namely MACS14,SISSRs_v1.4,CisGenome_v2.0 and ERange 2.1) in one-sample ChIP-Seq data analysis mode have been chosen to perform the ChIP-Seq data analysis on the androgen receptor ChIP-Seq dataset alone in the PC3-AR cells.
For running the MACS14 program, parameters are set as default.
For running the SISSRs_v1.4 program, parameters are set as turning on the '-u' option as well as p-value cutoff as 0.05 and FDR cutoff as 0.05 with other parameters are set as default.
For running the CisGenome_v2.0 program, parameters are set by default.
For running the ERange 2.1 program, parameters are set with 'minHits' as 5 and other parameters by default.
Since the peak-calling results from different ChIP-Seq peak-calling programs usually would vary to some extent for the same treatment ChIP-Seq dataset(in 'one-sample' mode) or for the same treatment ChIP-Seq and control datasets. In PeakFinderMetaServer(PFMS)[46], to account for the heterogeneity of the peak-calling results reported from the different ChIP-Seq peak-calling programs, the authors have introduced a voting mechanism to identify putative significant peaks. The number of votes in the PFMS[46] has been defined as the number of peak finders that would have called the region. Generally speaking, if the number of votes for the region called by the PFMS would be greater, the more putative significant the region called would be.
In this study, a similar voting mechanism has been proprosed. Rather than voting the peaks as called by the different ChIP-Seq peak-calling programs, the genes in the human genome are voted in this research instead. If a gene is supported by at least half of the total number of ChIP-Seq peak-callers in the data analysis(say,2 out of 4 in this study), then the gene would be regarded a putative significant gene in the ChIP-Seq data analysis(majority voting).
ChIP on chip analysis
The chromatin IP DNA was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. The DNAs were then blunt-ended by T4 DNA polymerase (in 75ul volume, add 5X T4 DNA pol buffer, 0.7ul 10mg/ml BSA, 0.2ul 5U/ul T4 DNA pol, and 1ul 10mM each dNTPs) for 5 minutes at 37°C. The reactions were stopped by incubating at 75°C for 10 minutes, and the fragments were purified using Qiagen Qiaquick PCR purification kit. The DNA was eluted with 100 ul dH2O, speed vacuum dried and resuspended in 7.5ul dH2O. Equimolar chip Linkers, Linker A: 5'-CTGCTCGAATTCAAGCTTCT-3’ and Linker B 5'- AGAAGCTTGAATTCGAGCAGTCAG-3’ were annealed according to IDT protocol. 0.5ul of 2.5 uM annealed linker, 1ul 10X ligase buffer and 1ul ligase were added to each sample. The ligation reactions occurred at 16°C for two days, and the DNA was purified using Qiagen Qiaquick PCR purification kit, using with 80ul dH2O to elute. The specificity of the chromatin IP was confirmed by PCR amplification of the PSA promoter. The PCR primer sequences are available upon request. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with no antibody chromatin IP do not. The DNAs from the AR-ChIP were labeled with Cy3-dCTP and the DNAs from the 'no antibody' ChIP were labeled with Cy5-dCTP by PCR. The PCR product was purified using the Qiagen Qiaquick PCR purification kit. The labeled DNAs were mixed in equal amounts and hybridized to the promoter chip. Hybridization was performed by Nimblegen Inc. NimbleGen’s human HGS17 Promoter 1.0K Chip was used. This promoter chip contains 37,364 promoter regions from human genome build 35. For each promoter region, 10 probes of 50 mers were placed on 1.0 kb promoter region, with probe spacing of about 100 bps. ChIP-chip data analysis Eight Nimble ChIP-chips (four for IP enriched and four for the whole cell lysate) were used for probing DNA segments that AR binds. The promoter array includes 37,215 promoters (corresponding to 19,028 genes), each of which comprises 10 probes covering 1Kb. In many cases, multiple promoters were used to monitor AR binding to a single gene. Promoters are selected with significant differences in IP enriched and the whole cell lysate samples as follows. First, the eight datasets were normalized using the quantile normalization[47]. Second, the log2 ratio test was applied using the empirical distribution from kernel density estimations as explained above. Finally, for each promoter, the overall p-value was calculated by combining the resulting p-values of 10 probes using Fisher's test and the promoters are ranked by p-value in ascending order.
Since there is no consensus on p-value cutoff to determine significance in the international scientific community(http://www.upcscavenger.com/wiki/p-value/#page=wiki). In this research, a less stringent p-value = 0.1 cutoff has been utilized. Genes with p-values equal to or less than 0.1 are selected and deemed as putative significant genes in androgen receptor(AR) ChIP-chip data analysis in the PC3-AR cells in this study.
Acknowledgement
The author thanks Profs Biaoyang Lin and Daehee Hwang for technique assistance.
Author Contributions
XH has conceived the study, performed the data analysis and wrote the paper.
Competing Interests
None declared.
Funding
This work is in part funded the MOST (http://www.most.gov.cn/eng/) with grant numbers 2006AA02A303, 2006AA02Z4A2, 2006DFA32950 and 2007DFC30360.