Identification of TMPRSS2 and ERG as androgen receptor targeted putative highly significant genes in PC3-AR cells from AR ChIP-Seq and ChIP-chip data

Introduction

The TMPRSS2 gene encodes a protein that belongs to the serine protease family which contains a type II transmembrane domain, a receptor class A domain, a scavenger receptor cysteine-rich domain and a protease domain, which is involved in many physiological and pathological processes and which has demonstrated to be up-regulated by androgenic hormones in prostate cancer cells and down-regulated in androgen-independent prostate cancer tissue(https://www.genecards.org/cgi-bin/carddisp.pl?gene=TMPRSS2).

The ERG gene encodes a member of the erythroblast transformation-specific (ETS) family of transcriptions factors with all members of this family are key regulators of embryonic development, cell proliferation, differentiation, angiogenesis, inflammation, and apoptosis. The ERG gene is involved in chromosomal translocations, resulting in different fusion gene products, such as TMPSSR2-ERG and NDRG1-ERG in prostate cancer(https://www.genecards.org/cgi-bin/carddisp.pl?gene=ERG).

The TMPRSS2 and ERG which could form the TMPRSS2-ERG gene fusion are two of the important genes in prostate cancer cells[1–21].

Previous works by Yu et al have found that the ERG could interrupt androgen receptor (AR) signal transducting and the TMPRSS2-ERG gene fusion acts in a pivotal role in prostate cancer progression, like for an example, blocking AR expression, binding to specific loci in the human genome, and inducing direct activation of the polycomb complex EZH2[2].

In this study, by transfecting with wild-type androgen receptor in an androgen receptor negative prostate cancer cell line(PC3), both ChIP-Seq and ChIP-chip datasets are generated for the androgen receptor(AR) in the PC3-AR cell line.

It is hypothesized that if a gene is targeted by at least one putative significant peak within 50kb distance from its transcription start site(TSS) in the ChIP-Seq or ChIP-chip data analysis, then the gene is regarded as a putative significant gene. It is hypothesized further that the putative significant genes commonly found by these two technologies are putative highly significant genes in the ChIP-Seq and ChIP-chip experiments. By the hypothesizes and assessments, TMPRSS2 and ERG genes are identified as androgen receptor(AR) targeted putative highly significant genes in androgen receptor ChIP-Seq and ChIP-chip data in advanced PC3-AR cells, suggesting the TMPRSS2 and ERG genes which could form the TMPRSS2-ERG fusion pair are potentially quite important in androgen receptor(AR) transcription factor regulating as well as could be exploited for biomarker identifications and developing new therapeutical strategies in the prostate cancer.

Materials And Methods

Chromatin Immunoprecipitation

A full-length wild-type AR is transferred into PC3 cells that are prostate cancer cells that do not express the AR. The efficiency of the transfection was measured by Western blotting using an antibody against the human AR. The transfected PC3 cells showed a ten-fold increase of the AR protein expression as compared with the baseline level. For chromatin immunoprecipitation (ChIP), two anti-AR antibodies from BD Pharmagen were tested for their performances and the better one (BD Pharmagen Cat No. 554224) was used. A no antibody ChIP control was included. The specificity of the chromatin IP was confirmed by PCR amplification of the promoter of the prostate specific antigen (PSA) gene, a gene shown to be directly regulated by the AR[24]. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with 'no antibody' chromatin IP do not. PC3 cells were grown in two T-75 culture flasks in RPMI + 10%FBS + 1X L-glutamine. After two days, the cells were at 85% confluency and were then transfected with plasmid carrying the human Androgen Receptor gene driven by a cmv promoter. The transfection was carried out according to Invitrogen’s lipofectamine 2000 protocol. The efficiency of the transfection experiment was measured by Western blot using antibodies against the human androgen receptor (BD Pharmagen Cat No. 554224 and 554225). The anti-AR antibody 554224 reacts about 20 folds stronger than the antibody 554225 against the AR-DNA complexes as detected by Western blot analysis. The anti-AR antibody 554224 was therefore used for chromatin IP. Androgen expression of the transfected PC3 cells showed a ten-fold increase compared to the baseline level. 48 hours after transfection, the protein was cross-linked to DNA by adding formaldehyde according to the protocol by Upstate's chromatin immonoprecipitation kit. A 'no antibody' control chromatin IP was also performed.

ChIP-Seq data analysis

ChIP-Seq has gained its popularity since its development from the year 2007[25–27].

To date, quite a large number of ChIP-Seq data analysis software programs have been developed world-widely and genome-widely by distinguished researchers all around the world[28–33].

Succintly speaking, the all kinds of ChIP-Seq data peak-callers could be divided into

three categories:

(A)Those which could only perform 'one-sample' ChIP-Seq data analysis including F-Seq[34],GeneTrack[35] et al;

(B)Those which could perform both 'one-sample'and 'two-sample' ChIP-Seq data analysis, including QuEST[36], MACS[37],SISSRs[38],CisGenome[39],ERange[40],MOSAICS[41] et al;

(C)Those which could only perform 'two-sample' ChIP-Seq data analysis,including CCAT[42],PeakSeq[43],CMT[33] et al.

In this research, the chromatin IP DNA for the androgen receptor(AR) in PC3-AR cells was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. ChIP DNA end repairing, adaptor ligation, amplification were performed as described earlier[22, 25]. Fragments of about 100 bps (without linkers) were isolated from agarose gel and used for sequencing using the Solexa/Illumina 1 G genetic analyzer. ChIP-Seq data analysis Solexa Pipeline Analysis was performed as described earlier[22, 25]. Sequence reads that map to multiple sites in the human genome were removed. The output of the Solexa Analysis Pipeline was converted to browser extensible data (BED) files for viewing the data in the UCSC genome browser.

To systematically identify AR binding 'islands' or 'peaks'in the androgen receptor ChIP-Seq dataset(Supplementary_File_6) in the PC3-AR cells in this study, the 'one-sample' ChIP-Seq data analysis has been adopted.

There are two reasons for choosing the 'one-sample' ChIP-Seq data analysis in this study:

Firstly, the IGG control ChIP-Seq dataset for the androgen receptor(AR) in PC3 cell line hasn't been backed-up well by all the authors after the initial publication[22]. Therefore, it's impractical to perform 'two-sample' ChIP-Seq data analysis utilizing AR treatment ChIP-Seq dataset alone in this study. Also, although incorporating a control sample into ChIP-Seq data analysis could eliminate some false positive peaks in some or a number of circumstances[33, 42, 43], an open question one may ask would be that are all the peaks called by a ChIP-Seq peak-caller in 'two-sample' mode would be equivalent to the set of total genuine peaks in a standard ChIP-Seq experiment? If not, there might still be some or a number of genuine peaks predicted by a ChIP-Seq peak caller in 'one-sample' mode but not by a ChIP-Seq peak-caller in 'two-sample' mode. From this sense, the 'one-sample' ChIP-Seq data analysis might still be quite valuable especially when there is no a control ChIP-Seq experiment performed or the control ChIP-Seq dataset has been lost(e.g, not backed-up well).

Secondly, the sequencing tags for the AR treatment and IGG control ChIP-Seq datasets as experimented and probed in the previous publication by Lin et al[22] are much unbalanced(for an example,5,354,469 sequencing tags vs 1,089,089 sequencing tags for AR treatment vs IGG control).Usually, a ChIP-Seq peak-caller in 'two-sample' data analysis mode(such as the CCAT[42] or MACS[37] or CisGenome[39] or some ChIP-Seq peak-caller else) would normalize the treatment and control ChIP-Seq datasets to make them more balanced for the data analysis. Some international researcher(s ) might have argued that the ChIP-Seq data analysis in 'two-sample' mode could eliminate false positives. This might raise another open question that are some or a number of false positive peaks really eliminated after sequencing reads are normalized for the treatment and control ChIP-Seq datasets in the ChIP-Seq data analysis in 'two-sample' mode?

For an example, CCAT[42] is a one of the most popular ChIP-Seq peak-callers which could only perform 'two-sample' ChIP-Seq data analysis and which is based on a normalization method. However, sometimes, some peaks with only few sequencing tags(to say, as few as only 2 in number) would be predicted by CCAT as putative significant peaks. Although one may critically argue that peaks with as few as only 2 sequencing tags might stand for weak transcription factor(TF) or histone modification(HM) binding signals, in some other international researchers' opinions, these peaks mightn't be different from background noise and the ChIP-Seq data analysis programs like the CisGenome[39] as well as TIP[44] and iTAR[45] et al would usually filter out these peaks and deem them as insignificant ones in the ChIP-Seq data analysis. From this sense, since the 'one-sample' ChIP-Seq data analysis would usually results in much more peaks reported than the 'two-sample' ChIP-Seq data analysis, sometimes or in a number of cases, the 'one-sample' ChIP-Seq data analysis mightn't be worse than the 'two-sample' ChIP-Seq data analysis in terms of total number of genuine peaks found if disregarding the total false positives and false discovery rate(FDR) reducing effects.

In this study, four ChIP-Seq peak-calls(namely MACS14,SISSRs_v1.4,CisGenome_v2.0 and ERange 2.1) in one-sample ChIP-Seq data analysis mode have been chosen to perform the ChIP-Seq data analysis on the androgen receptor ChIP-Seq dataset alone in the PC3-AR cells.

For running the MACS14 program, parameters are set as default.

For running the SISSRs_v1.4 program, parameters are set as turning on the '-u' option as well as p-value cutoff as 0.05 and FDR cutoff as 0.05 with other parameters are set as default.

For running the CisGenome_v2.0 program, parameters are set by default.

For running the ERange 2.1 program, parameters are set with 'minHits' as 5 and other parameters by default.

Since the peak-calling results from different ChIP-Seq peak-calling programs usually would vary to some extent for the same treatment ChIP-Seq dataset(in 'one-sample' mode) or for the same treatment ChIP-Seq and control datasets. In PeakFinderMetaServer(PFMS)[46], to account for the heterogeneity of the peak-calling results reported from the different ChIP-Seq peak-calling programs, the authors have introduced a voting mechanism to identify putative significant peaks. The number of votes in the PFMS[46] has been defined as the number of peak finders that would have called the region. Generally speaking, if the number of votes for the region called by the PFMS would be greater, the more putative significant the region called would be.

In this study, a similar voting mechanism has been proprosed. Rather than voting the peaks as called by the different ChIP-Seq peak-calling programs, the genes in the human genome are voted in this research instead. If a gene is supported by at least half of the total number of ChIP-Seq peak-callers in the data analysis(say,2 out of 4 in this study), then the gene would be regarded a putative significant gene in the ChIP-Seq data analysis(majority voting).

ChIP on chip analysis

The chromatin IP DNA was digested with proteinase K and then purified by Qiagen Qiaquick PCR purification kit. The DNAs were then blunt-ended by T4 DNA polymerase (in 75ul volume, add 5X T4 DNA pol buffer, 0.7ul 10mg/ml BSA, 0.2ul 5U/ul T4 DNA pol, and 1ul 10mM each dNTPs) for 5 minutes at 37°C. The reactions were stopped by incubating at 75°C for 10 minutes, and the fragments were purified using Qiagen Qiaquick PCR purification kit. The DNA was eluted with 100 ul dH₂O, speed vacuum dried and resuspended in 7.5ul dH₂O. Equimolar chip Linkers, Linker A: 5'-CTGCTCGAATTCAAGCTTCT-3’ and Linker B 5'- AGAAGCTTGAATTCGAGCAGTCAG-3’ were annealed according to IDT protocol. 0.5ul of 2.5 uM annealed linker, 1ul 10X ligase buffer and 1ul ligase were added to each sample. The ligation reactions occurred at 16°C for two days, and the DNA was purified using Qiagen Qiaquick PCR purification kit, using with 80ul dH₂O to elute. The specificity of the chromatin IP was confirmed by PCR amplification of the PSA promoter. The PCR primer sequences are available upon request. The DNAs isolated with anti-AR chromatin IP show a specific band corresponding to the PSA promoter, and the DNAs isolated with no antibody chromatin IP do not. The DNAs from the AR-ChIP were labeled with Cy3-dCTP and the DNAs from the 'no antibody' ChIP were labeled with Cy5-dCTP by PCR. The PCR product was purified using the Qiagen Qiaquick PCR purification kit. The labeled DNAs were mixed in equal amounts and hybridized to the promoter chip. Hybridization was performed by Nimblegen Inc. NimbleGen’s human HGS17 Promoter 1.0K Chip was used. This promoter chip contains 37,364 promoter regions from human genome build 35. For each promoter region, 10 probes of 50 mers were placed on 1.0 kb promoter region, with probe spacing of about 100 bps. ChIP-chip data analysis Eight Nimble ChIP-chips (four for IP enriched and four for the whole cell lysate) were used for probing DNA segments that AR binds. The promoter array includes 37,215 promoters (corresponding to 19,028 genes), each of which comprises 10 probes covering 1Kb. In many cases, multiple promoters were used to monitor AR binding to a single gene. Promoters are selected with significant differences in IP enriched and the whole cell lysate samples as follows. First, the eight datasets were normalized using the quantile normalization[47]. Second, the log2 ratio test was applied using the empirical distribution from kernel density estimations as explained above. Finally, for each promoter, the overall p-value was calculated by combining the resulting p-values of 10 probes using Fisher's test and the promoters are ranked by p-value in ascending order.

Since there is no consensus on p-value cutoff to determine significance in the international scientific community(http://www.upcscavenger.com/wiki/p-value/#page=wiki). In this research, a less stringent p-value = 0.1 cutoff has been utilized. Genes with p-values equal to or less than 0.1 are selected and deemed as putative significant genes in androgen receptor(AR) ChIP-chip data analysis in the PC3-AR cells in this study.

References

Wright ME, Tsai MJ, Aebersold R. Androgen receptor represses the neuroendocrine transdifferentiation process in prostate cancer cells. Mol Endocrinol. 2003;17:1726–37.
Yu J, et al. An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression. Cancer Cell. 2010;17:443–54.
Soller MJ, et al. Confirmation of the high frequency of the TMPRSS2/ERG fusion gene in prostate cancer. Genes Chromosomes Cancer. 2006;45:717–9.
Tomlins SA, et al (2008) Role of the TMPRSS2-ERG gene fusion in prostate cancer, Neoplasia, 10, 177–188. Tomlins SA, et al (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer, Science, 310, 644–648. Wang J, et al. (2008) Pleiotropic biological activities of alternatively spliced TMPRSS2/ERG fusion gene transcripts, Cancer research, 68, 8516–8524.
Yoshimoto M, et al. Three-color FISH analysis of TMPRSS2/ERG fusions in prostate cancer indicates that genomic microdeletion of chromosome 21 is associated with rearrangement. Neoplasia. 2006;8:465–9.
Hermans KG, et al. TMPRSS2:ERG fusion by translocation or interstitial deletion is highly relevant in androgen-dependent prostate cancer, but is bypassed in late-stage androgen receptor-negative prostate cancer. Cancer Res. 2006;66:10658–63.
Hsu T, Trojanowska M, Watson DK. Ets proteins in biological control and cancer. J Cell Biochem. 2004;91:896–903.
Oikawa T, Yamada T (2003) Molecular biology of the Ets family of transcription factors, Gene, 303, 11–34. Perner S, et al. (2006) TMPRSS2:ERG fusion-associated deletions provide insight into the heterogeneity of prostate cancer, Cancer research, 66, 8337–8341.
Klezovitch O, et al. A causal role for ERG in neoplastic transformation of prostate epithelium. Proc Natl Acad Sci USA. 2008;105:2105–10.
King JC, et al. Cooperativity of TMPRSS2-ERG with PI3-kinase pathway activation in prostate oncogenesis. Nat Genet. 2009;41:524–6.
Zong Y, et al. ETS family transcription factors collaborate with alternative signaling pathways to induce carcinoma from adult murine prostate cells. Proc Natl Acad Sci USA. 2009;106:12465–70.
Zhou F, et al. TMPRSS2-ERG activates NO-cGMP signaling in prostate cancer cells. Oncogene. 2019;38(22):4397–411.
Väänänen RM, et al. Altered PCA3 and TMPRSS2-ERG expression in histologically benign regions of cancerous prostates: a systematic, quantitative mRNA analysis in five prostates. BMC Urol. 2015;15:88.
NKX3.1 Suppresses TMPRSS2-ERG Gene Rearrangement and Mediates Repair of Androgen Receptor-Induced DNA, Damage, Bowen C, et al.(2016) Cancer Res, 75(13): 2686–98.
Scaravilli M, et al. Androgen-Driven Fusion Genes and Chimeric Transcripts in Prostate Cancer. Front Cell Dev Biol. 2021;9:623809.
Song C,J, et al. Predictive significance of TMRPSS2-ERG fusion in prostate cancer: a meta-analysis. Cancer Cell Int. 2018;18:177.
García-Perdomo H,A,et al. Association between TMPRSS2:ERG fusion gene and the prostate cancer: systematic review and meta-analysis,Cent. Eur J Urol. 2018;71(4):410–9.
Luedeke M, et al. Prostate cancer risk regions at 8q24 and 17q24 are differentially associated with somatic TMPRSS2:ERG fusion status. Hum Mol Genet. 2016;25(24):5490–9.
Urbinati G,et al.(2016)Knocking Down TMPRSS2-ERG Fusion Oncogene by siRNA Could be an Alternative Treatment to Flutamide, Mol Ther Nucleic Acids,5(3): e301.
Kobelyatskaya A,A,et al. Impact TMPRSS2–ERG Molecular Subtype on Prostate Cancer Recurrence. Life (Basel). 2021;11(6):588.
Urbinati G,et al.(2012)Antineoplastic Effects of siRNA against TMPRSS2-ERG Junction Oncogene in Prostate Cancer,PLoS One,10(5): e0125277.
Lin B,Y,. et al. Integrated expression profiling and ChIP-seq analyses of the growth inhibition response program of the androgen receptor. PLoS ONE. 2009;4(8):e6589.
Pflueger D,et al. )Discovery of non-ETS gene fusions in human prostate cancer using next-generation. RNA sequencing Genome Res. 2011;21(1):56–67.
Riegman P,H,et al.(1991)Identification and androgen-regulated expression of two major human glandular kallikrein-1 (hGK-1) mRNA species,Mol Cell Endocrinol,76(1–3):181–90.
Barski A,et al. High-resolution profiling of histone methylations. Hum genome Cell. 2007;18(4):823–37. 129(.
Johnson D,S,et al.(2007)Genome-wide mapping of in vivo protein-DNA interactions,Science,8;316(5830):1497–502.
Robertson G,et al.(2007)Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing.Nat Methods,4(8):651-7.
Laajala T,D,et al.(2009)A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments,BMC Genomics,10: 618.
Wilbanks E,G,et al.(2010)Evaluation of algorithm performance in ChIP-seq peak detection,PLoS One,8;5(7):e11471.
Szalkowski A,M,et al.(2013)Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts,Brief Bioinform,12(6):626–33.
Xing H,P,et al.(2012)Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data,PLoS Comput Biol,8(7):e1002613.
Wu H, Ji, H,K.(2014)PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information,PLoS One,7;9(3):e89694.
Rezaeian I, Rueda L(2014)CMT: a constrained multi-level thresholding approach for ChIP-Seq data analysis,PLoS One,15;9(4):e93873.
Boyle A,P,et al.(2008) F-Seq: a feature density estimator for high-throughput sequence tags,Bioinformatics,1; 24(21): 2537–2538.
Albert I. et,al.(2008)GeneTrack—a genomic data processing and visualization framework,Bioinformatics,15; 24(10): 1305–1306.
Valouev A,et al.(2008)Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data,Nat Methods,5(9):829–34.
Zhang Y,et al.(2008)Model-based analysis of ChIP-Seq (MACS)Genome Biol,9(9): R137.
Jothi R,et al.(2008)Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data,Nucleic Acids Re36(16): 5221–5231.
Ji H,K,et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol. 2008;26(11):1293–300.
Mortazavi A,et al.(2008)Mapping and quantifying mammalian transcriptomes by RNA-Seq,Nat Methods,5(7):621–8.
Kuan P,F,et al.(2011)A Statistical Framework for the Analysis of ChIP-Seq Data,J Am Stat Assoc,106(495):891–903.
Xu H,et al.(2010)A signal-noise model for significance analysis of ChIP-seq with negative control,Bioinformatics,1;26(9):1199–204.
Rozowsky J,et al.(2009)PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls,Nat Biotechnol,27(1):66–75.
Cheng C,et al.(2011)TIP: a probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles,Bioinformatics,1;27(23):3221–7.
Yang C,C,et al.(2016)iTAR: a web server for identifying target genes of transcription factors using ChIP-seq or ChIP-chip data,BMC Genomics,12;17(1):632.
Kruczyk M,et al.(2013)Peak Finder Metaserver-a novel application for finding peaks in ChIP-seq data,BMC Bioinformatics,23;14:280.
Bolstad B,M.,et al.(2003)A comparison of normalization methods for high density oligonucleotide array data based on variance and bias,Bioinformatics,22;19(2):185 – 93.

Identification of TMPRSS2 and ERG as androgen receptor targeted putative highly significant genes in PC3-AR cells from AR ChIP-Seq and ChIP-chip data

Abstract

Introduction

Results

Discussion

Materials And Methods

Chromatin Immunoprecipitation

Declarations

References