Integration of RNA-seq and Ribosome Proling to Characterize Transcriptional and Translational Estrogen Responses of Genes in Human Breast Cancer

Estrogen is a hormone that is frequently essential in breast cancer to drive key transcriptional programs by interacting with the estrogen receptor alpha that upregulates proliferative and oncogenic genes and represses apoptotic and tumor suppressor genes. Protein-coding targets of estrogen regulation in breast cancer are well-dened. However, long non-coding RNA (lncRNA) genes account for the majority of human gene catalogs. The coding status of these genes – their accidental, or regulated, translation by ribosomes, under the inuence of estrogen – remains a controversial topic.


Abstract Background
Estrogen is a hormone that is frequently essential in breast cancer to drive key transcriptional programs by interacting with the estrogen receptor alpha that upregulates proliferative and oncogenic genes and represses apoptotic and tumor suppressor genes. Protein-coding targets of estrogen regulation in breast cancer are well-de ned. However, long non-coding RNA (lncRNA) genes account for the majority of human gene catalogs. The coding status of these genes -their accidental, or regulated, translation by ribosomes, under the in uence of estrogen -remains a controversial topic.

Methods
Here, we performed comprehensive transcriptome analysis using RNA-Seq, as well as ribosome pro ling using Ribo-Seq, on the same samples: biological replicates of human estrogen receptor alpha (ERa) positive MCF7 breast cancer cells before and after estrogen treatment. We correlated these two datasets, globally highlighting protein-coding and lncRNA differentially expressed genes and transcripts that were positively as well as negatively responsive to estrogen, separately at the transcriptional level and the translational (as approximated by ribosome binding) level.

Results
Our data showed that some transcripts were more robustly detected in RNA-Seq than in the ribosome-pro ling data, and vice versa, suggesting distinct gene-speci c estrogen responses at the transcriptional and the translational level, respectively. Certain differentially expressed transcripts may point to the regulation of alternative splicing by estrogen.
Several pseudogenes were co-and anti-regulated with their cancer-functional parental genes. Gene ontology analysis highlighted cancer-relevant pathways enriched after estrogen treatment in cells.

Conclusions
Our study represents a signi cant advance in the estrogen receptor biology, because we demonstrated global effects of estrogen on splicing and translation that are distinct from, and not always correlated with, its effects on transcription, and that differ globally for protein-coding and lncRNA genes. We have also highlighted for the rst time the transcriptional and translational response of expressed pseudogenes to estrogen, pointing to new perspectives for biomarker and drug-target development for breast cancer in future.

Background
Breast cancer is the most prevalent cancer for women in the world, with over two million case events and approximately 630,000 deaths in 2018 (Bray et al., 2018). Clinically, several different subtypes of breast cancer have been de ned (Russnes et al., 2017): hormone receptor positive (includes estrogen receptor, i.e. ER, positive, and progesterone receptor positive, breast cancer), HER2 positive, and Basal. Some basal breast cancers are triple negative (no ER, PR, or HER2), though some triple negatives are not basal (Rakha et al., 2009;Gazinska et al., 2013) Approximately 80% of breast cancers are estrogen-receptor (ER) positive, and these are of enormous clinical signi cance worldwide. Estrogen (17-β-estradiol; abbreviated as E2) activates the estrogen receptor α (ER). ER translocates from the cellular surface to the cytoplasm and then to the nucleus, where it directly regulates its target genes. In ER positive cancer cells, ER transcriptional regulation features massive activation of cell-proliferation genes, including oncogenes, and suppression of pro-apoptotic genes, ultimately stimulating the proliferation of cancer cells.
Moreover, ERα dominates overall ER activity, and is expressed in in most breast cancers, underscoring its signi cant role in both oncogenesis and progression (Ali & Coombes, 2000). Interrogating the effects of E2-induced ERαactivation on the gene regulatory program in breast cancer cells can improve our understanding of breast cancer, and can point to more effective, less side-effect-ridden, more individualized potential therapies in the future.
High-throughput RNA sequencing (RNA-Seq) provides precise quanti cation of gene expression in living cells (Conesa et al., 2016) and yields accurate and comprehensive transcriptome pro les of tumors and normal cells, and helps to de ne and characterize gene-regulatory responses to treatments (Lin et al., 2018). Intriguingly, the correlation of mRNA and protein levels in mammalian cells is imperfect, indicating that the study of gene regulation solely from RNA-Seq data is inherently insu cient to understand the adaptive landscape of cells to stimuli (Haider & Pal, 2013).
Therefore, here we deployed distinct but complementary approaches that distinguish transcribed RNAs from those that are bound by ribosomes and translated. This technique provides quanti cations of both transcriptional and posttranscriptional responses to estrogen in MCF7 cells, one of the most commonly used human ER-positive breast cancer cell line model systems. Ribo-Seq analysis is based on the deep sequencing exclusively ribosome-protected RNA fragments that are presumably being translated, lling the gap between transcriptome and proteome (mass spectrometry) data (Ingolia et al., 2009), so as to better understand the functions and regulatory responses of speci cally the ribosome-bound, presumed-to-be-translated, fraction of the transcriptome (Calviello & Ohler, 2017).
The re nement of the human gene catalogs based on transcriptome data, such as the Gencode effort (www.gencodegenes.org), has revealed that protein-coding genes comprise only less than one-third, of human genes, while the remainder of genes are non-coding RNA genes, which include long non-coding RNA (lncRNA) genes (Derrien et al., 2012). LncRNA genes are as abundant as protein-coding genes in mammalian genomes, but aside from a hundred or so mechanistically well-characterized examples that point to diverse, epigenetic and post-transcriptional, gene-speci c and global, nuclear and cytoplasmic as well as extracellular, positive and negative mechanisms that differ drastically between different lncRNAs, their functions remain surprisingly obscure. Classically, lncRNAs were assumed to not be translated, and our prior work found that translation is extremely rare in two human cell line models using deep shotgun proteomics (Banfai et al., 2012). However, many lncRNAs are cytoplasmic, and if ribosome entry sites and open reading frames are present, then ribosomes might not be able to distinguish these noncoding transcripts from conventional messenger RNAs, and might therefore translate proteins from short open reading frames in some lncRNAs. In fact, a commonly used computational de nition of lncRNAs stipulates that these transcripts may contain small Open Reading Frames (smORFs) of less than 100 amino acids, provided that such smORFs lack sequence homologies to known proteins (Dinger et al., 2008). Furthermore, all cytoplasmic RNAs, including non-coding RNAs, may be accessed by ribosomes during the "pioneer round of translation," a key step of the Nonsense-Mediated Decay (NMD) mechanism (lshigaki et al., 2001). In summary, the possibility of ribosomal translation of lncRNAs, and the potential for regulation driven by response to hormones, cannot be excluded.
Pseudogenes are truncated or full-length copies of functional genes that have been generated through retrotransposition and/or gene duplication mechanisms over evolutionary time (Zheng et al., 2007;Ruan et al., 2007).
Formerly thought to be non-functional, pseudogenes are now increasingly recognized as important functional components of the transcriptome, and they can be sequence-speci c regulators of their parental genes, such as antisense lncRNA transcription from a pseudogenic locus that regulates the parental gene ( We recently showed that estrogen-responsive lncRNAs regulate proliferation, viability, and death in breast cells during estrogen response (Lin et al., 2016). Transcribed pseudogenes are diagnostic as well as prognostic markers in cancer (Poliseno et al., 2015), and the role of pseudogenes in estrogen response merits further and directed study. Therefore, we set out to identify, and study the functions of the differentially expressed genes and transcripts, with an emphasis on alternative splicing, lncRNAs, and pseudogenes, in breast cancer cells before and after estrogen exposure, and to correlate transcriptomics with ribosome-pro ling data, to identify mRNA and lncRNA genes with exclusively/primarily transcriptional and exclusively/primarily translational responses to estrogen, and to distinguish them from the majority of genes that had well-correlated transcriptional and translational responses to estrogen stimulation.

Methods
Cell culture and RNA extraction

Data pre-processing
The quality of all raw reads generated from the transcriptomic and ribosome-pro ling libraries were evaluated using a FASTQC v0.11.7 software (Andrews, 2018). Read ltering was performed using FASTP v0.19.7 (Chen et al., 2018), based on the following criteria: Adaptor sequences within the reads were automatically ltered out; reads with global quality score < Q20 for RNA-Seq and 15 for Ribo-Seq were discarded; nucleotides from 5 ' or 3 ' with quality score < Q20 were trimmed from the read.

Read alignment
The pre-processed reads from the RNA-Seq and Ribo-Seq data were aligned to the reference human genome sequence (hg19 assembly) using HISAT2 v2.1.0 (Kim et al., 2015). For RNA-Seq data, default parameters were used for HISAT, with the exception of setting strandedness to be strand-speci c to the rst strand. For Ribo-Seq data, the rst of paired reads was aligned to reference genome with unstranded-speci c option, with the mate read discarded due to the redundant nature derived from two overlapping reads. All SAM les derived from HISAT2 were converted into the BAM les using SAMTOOLS v1.9 for downstream analyses (Li et al., 2009), with multi-mapped reads speci cally removed from Ribo-Seq alignments due to the reduced speci city of these short (~ 30nt) reads.

Normalization and expression level measurement
For gene level counts of both RNA-Seq and Ribo-Seq, the pre-processed reads from all samples mapping to human exons within gencode.v19.annotation.gtf were counted using the "summarizeOverlaps" function in R package: GenomicAlignments (Lawrence et al., 2013). The counting mode of "union" was used for both RNA-Seq and Ribo-Seq and un-normalized gene counts were normalized using DEseq2 package (Love et al., 2014).
For transcript level analysis, read counts mapping to individual transcripts were assigned by Cuffdiff2 v2.2.1 (Trapnell et al., 2012). Raw counts were converted into normalized FPKM (fragments per kilobase of exon per million fragments mapped) expression levels, followed by scaling via the median of the geometric means of fragment counts across all samples.

Differential expression analysis
To identify differentially expressed genes (DEGs), DEseq2 was employed, using raw gene counts. Genes with |log2 FC|>1.0 and P adj <0.05 were considered as signi cantly DEGs. To study differential expression between the estrogen treatment and the control (no estrogen treatment) groups at the transcript level in both RNA-Seq and Ribo-Seq, Cuffdiff2 was used to determine individual transcript abundance based on the assignment of reads to most likely transcript models. Transcripts showing |log2 FC|> 1.0 and q < 0.05 were considered as signi cantly differentially expressed transcripts (DETs).

Quantitative correlation of RNA-Seq and Ribo-Seq
Lowly expressed genes are automatically ltered out by "independent ltering" function embedded within the DEseq2 package. The presence of discrepancy in sequencing depths between the RNA-Seq and Ribo-Seq could lead to the inaccurate analysis, therefore we performed statistical adjustments on fold changes in terms of the expression level made to two datasets in order to reduce this gap. To compare fold change at the gene level between the transcriptomic and ribosome pro ling datasets, we took advantage of SE (standard error) of fold change provided by DEseq2. A differential fold change (between RNA-Seq and Ribo-Seq) was calculated if unadjusted fold change in RNA-Seq is more than in Ribo-Seq, as (FC RNA−Seq -1.0*SE) -(FC Ribo−Seq + 1.0*SE). Genes with a cutoff of differential fold change of 2.0 and P adj < 0.05 were considered to be expressed more in RNA-Seq than in Ribo-SEq. The corresponding genes whose unadjusted fold change in Ribo-Seq is more than in RNA-Seq were calculated in a similar manner.

Differential pseudogenes expression analysis
To identify putative pseudogenes, we used very stringent criteria by only considering uniquely mapped reads in both RNA-Seq and ribosome-pro ling datasets. All multi-mapped reads were discarded from the original bam les using SAMTOOLS, followed by the re-counting of read mapping to exons. Differential gene expression analysis was performed using DEseq2 as described above in order to obtain the differentially expressed pseudogenes (DEPs).
To identify the parental genes of each pseudogene, the psiDR database (Pei et al., 2012) was used. The parental gene name of candidate pseudogenes de ned by in psiDR database to be extracted for downstream analyses. For those remaining genes whose parental gene could not be de ned by this method, NCBI-BLASTN (Altschul et al., 1990) was employed to search for potential parental genes using a e-value cutoff of 10 − 4 and a sequence similarity > 80%.
Due to the high sequence similarity between the pseudogenes and their parental genes, we also manually examined and visualized the mapped reads to pseudogenes using the UCSC Genome Browser  in order to ascertain that the reads are accurately mapped with greater con dence to pseudogenes rather than their parental genes. After aligning these reads to genome using BLAT software , was used to con rm that these reads mapped uniquely to pseudogenes or with a higher similarity to the pseudogenes compared to their parental genes ( Fig. 5).

Functional and KEGG enrichment analysis
Gene Ontology (GO) term and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed for up-regulated genes with log 2 (fold change) > 1.0 and P adj < 0.05 and down-regulated genes with log 2 (fold change) <-1.0 using Metascape (Zhou et al., 2019). The signi cantly enriched GO terms and KEGG pathways were identi ed with P adj < 0.05.

Differential Gene Expression Analysis
To investigate gene regulation during estrogen response, differential gene expression analysis at the gene level was performed on the transcriptome data as well as ribosome-pro ling datasets. We identi ed 1162 transcriptionally upregulated genes after estrogen treatment, mainly consisting of 989 protein-coding genes and 97 lncRNA genes, in RNA-Seq data, compared to 659 genes with up-regulated ribosome binding after estrogen treatment, mainly consisting of 567 protein-coding genes and 15 lncRNA genes, in Ribo-Seq data ( Table 1; Supplementary Tables 3 & 4). Similarly, we identi ed 1286 transcriptionally down-regulated genes (mainly consisting of 856 protein-coding genes and 261 lncRNA genes) in RNA-Seq data, and 793 genes with down-regulated ribosome binding after estrogen treatment (mainly consisting of 626 protein-coding genes and 93 lncRNA genes), in Ribo-Seq data ( Table 1; Supplementary   Tables 3 & 4). As anticipated, the functional-category enrichment analysis of the estrogen-induced protein-coding genes revealed they were signi cantly enriched in biological processes directly relevant to cancer such as cell division,   Table 1a; p < 1.4*10 − 12 in Table 1b).
To investigate the speci cs of E2-mediated regulatory events that may affect only differential expression or only differential ribosome occupancy, but not both, we started from the lists of genes that were exclusively detected in only one of RNA-Seq and Ribo-Seq, respectively. Unsurprisingly, the number of genes exclusively detected in RNA-Seq is substantially greater than the number of genes exclusively detected in Ribo-Seq (Figs. 1B & 1D), given that Ribo-Seq, by design, surveys only the actively translated subset of the transcriptome. The numbers for lncRNAs were comparable to those of all-Ribo-Seq-detected vs. exclusively-Ribo-Seq-detected protein-coding gene mRNAs (Figs. 1F,  1H, 1J, & 1L). We further examined the relationship of gene expression pro le (before and after E2 treatment) between the transcriptomic and ribosome pro ling datasets, the majority of genes that were expressed in RNA-Seq were also detected in Ribo-Seq for both the total gene set (comprised mostly, though not solely, of protein-coding genes) and the protein-coding genes ( Figs. 2A & 2B). Most of the differentially expressed protein-coding genes found in Ribo-Seq were also found to be differentially expressed in RNA-Seq, although approximately half of the genes differentially expressed in RNA-Seq were not differentially expressed in Ribo-Seq (Figs. 2D & 2E), implying that some genes may be transcriptionally, but not translationally, regulated by estrogen. Interestingly, approximately 67.6% lncRNA genes are un-detectable in ribosome pro ling datasets (Fig. 2C), and most differentially expressed lncRNA genes are only differentially expressed in RNA-Seq, not in Ribo-Seq (Fig. 2F), implying that many lncRNAs may never encounter a ribosome and that, for those that do, only transcriptional but not translational regulation takes place -consistent with these ncRNAs exerting RNA-mediated functions that do not depend on ribosome binding or translation. Hence, the vast majority lncRNAs appear untranslated in our data, consistent with the previous work by Banfai et al (Banfai, et al. 2011).

Quantitative correlation of RNA-Seq and Ribo-Seq
To investigate canonical (transcriptional) and noncanonical (translational) estrogen responses, we compared differential expression and differential ribosome occupancy at the whole-gene level (not for individual transcripts if more than one per gene) for genes that exhibited both types of responses, as measured by RNA-Seq and Ribo-SEq. As anticipated, we observed a strong positive correlation (R = 0.82) between transcriptomic data and ribosome pro ling data (Fig. 3A); however, there were 106 genes that were differentially expressed to a far greater extent in RNA-Seq than in Ribo-Seq (transcriptome), and 129 genes that were differentially expressed to a far greater extent in Ribo-Seq than in Ribo-Seq ("translatome"). Those genes potentially represent speci c estrogen-regulated differential expression events independent of ribosome occupancy, and speci c estrogen-regulated translation differences independent of gene expression levels. This biological nding may help further illuminate the mechanisms of gene regulation in breast cancer and should be explored in future work.
To systemically examine the difference of protein-coding and lncRNA genes in the extent of their correlation between transcriptional and translational estrogen response, we analysed protein-coding and lncRNA genes separately for the above criteria. The strong correlation (R = 0.88) in the protein-coding genes demonstrated that most of these genes have a concordant transcriptional and translational response to estrogen (Fig. 3B). In contrast, we observed a substantially weaker correlation (R = 0.69) between lncRNA genes' transcriptional and translational changes in response to estrogen, indicating that many lncRNA genes do not exhibit consistent transcriptional and translational responses to estrogen, even though they are ribosome-bound (Fig. 3C).
To further investigate the expression level of protein-coding and lncRNA genes with ribosome, we took advantage of RPKM normalization because it takes gene length into account, in order to examine relative expression pro les within and across each expression-level bin. Compared with the protein-coding genes, it is evident that majority of lncRNA genes were not ribosome bound, and that the result was not con ned to the lowest-expressed lncRNA genes, especially for those genes with RPKM > 10 ( Fig. 4). Together with the weak correlation of expression on gene level for lncRNA genes, our results suggest that many lncRNA genes are not ribosome bound and that -even if they have Open Reading Frames (ORFs) -they are only rarely, if at all, translated into peptides in breast cancer cells, and that the levels of these rare peptides rarely respond to estrogen (Fig. 3).

Differential pseudogene expression analysis
To investigate pseudogene differential transcription and translation within the framework of the estrogen response, we searched all our RNA-Seq and Ribo-Seq datasets for the expression of the pseudogenes, with a high degree of stringency and relying on unambiguous sequencing reads that aligned to the pseudogenes at single loci better than to their putative parental genes and better than to other pseudogene copies. We identi ed 15 estrogen-induced pseudogenes in transcriptomics data, 31 estrogen-induced pseudogenes in ribosome-pro ling data, 22 estrogenrepressed pseudogenes in RNA-Seq data, and 5 estrogen-repressed pseudogenes in ribosome-pro ling data (Supplementary Tables 5 & 6). As a window into these expressed and ribosome-bound pseudogenes' potential functions, we manually examined the pseudogenes' parental genes for known relevance to cancer by searching the literature, based on the premise the functions of expressed pseudogenes may be related to or may involve the direct regulation of their parental genes (Johnsson et al., 2013). Interestingly, of the parental genes that gave rise to these differentially expressed pseudogenes, 16 are related to cancer as supported by the evidence from published literature, and 10 of those 16 were shown in previous studies to be speci cally functionally relevant to breast cancer (Tables 2 &  3). There were low numbers of overlapping differentially expressed pseudogenes (four pseudogenes) with both differential expression and differential ribosome occupancy, a low number perhaps explainable by the contention that not many expressed-pseudogene transcripts are expected to bind to ribosomes since their ORFs are missing or severely disrupted and hence they may not have ribosome entry sites or start codons, may only bind ribosomes once during the pioneer round of translation, and are generally not be translated into proteins (Supplementary Table 7).   (Poliseno et al., 2010;Johnsson et al., 2013). Correlation of expression levels between the pseudogene andits parental gene may indicate the potential mechanism in gene regulation (Gupta, et al., 2015). Our data reveals signi cant pairs of potentially co-regulated genes, such as AC005336.4:CYP4F11 and RP11-424C20.2:UHRF1 in transcriptomics data as well as RP11-83J16.3:TUBA1B and RP11-417L14.1:KPNA2 in ribosome-pro ling data, that hint at the possibility of speci c pseudogene:gene regulatory interaction events after estrogen treatment. Another function of pseudogenes in the regulation of parental-gene mRNA stability is by acting as miRNA molecular sponges, and can render a situation where the pseudogene transcript is "sponging" microRNAs away from the parental gene's mRNA, stabilizing or upregulating that mRNA (Gupta, et al., 2015). Interestingly, the pairs RP11-564A8.4:RPL13A and RP11-72P19.2:TATDN2 in the transcriptomics data, where the pseudogene RNA and the parental-gene mRNA are discordantly regulated in each pair, are consistent with such a regulatory outcome during estrogen response.
Another interesting example is the KHSRP (KH-type splicing regulatory protein) pseudogene (RP11-473O3.1). KHSRP is oncogenic, and plays an important role in promote tumour growth and metastasis (Yan et al.,201). It also takes part in the maturation of miRNAs in breast cancer (Santarosa et al., 2010;Trabucchi et al., 2009). The expression of this pseudogene was signi cantly downregulated, whereas its parental gene was signi cantly upregulated, at the transcript level after estrogen treatment (Fig. 6a). Interestingly, our data showed that the antisense strand of the KHSRP pseudogene was transcribed, therefore it might regulate the expression of its parental gene by binding to the sense strand of the parent gene (Fig. 6a) or by a PTENP1-like mechanism. In other words, the downregulation of the pseudogene might reduce the number of its RNA molecules that can bind and inhibit its parental gene. This might upregulate the expression of the parental gene and make more ribosome-bound mRNA molecules available for protein translation, promoting tumour growth and metastasis. Our data showed that the NONO pseudogene was signi cantly downregulated, in contrast the expression of the transcripts of its parental gene particularly the ribosome-bound transcripts were signi cantly up-regulated after estrogen treatment (Fig. 6b). Interestingly, manual examination of the transcript sequence of the pseudogene in the IGV genome browser revealed that both of its sense and antisense strand sequences were transcribed (generally more reads from the antisense strand compared to the sense strand i.e. 16 antisense vs 3 sense reads). We cannot rule out the possibility that the antisense transcripts may regulate the pseudogene or its parental gene, similarly to the PTEN scenario. Therefore, the E2-driven downregulation of the NONO pseudogene might increase the expression level of its parent gene and ribosome-bound mRNAs, promoting E2-dependent breast cancer tumorigenesis and progression.

Discussion
The difference between protein-coding genes (Fig. 1E & 1G) and lncRNA genes (Fig. 1I & 1K), as well as the absence of many expressed lncRNAs from ribosome pro ling data (Fig. 2C), led us to conclude that, unlike mRNAs, the vast majority lncRNAs are not ribosome-bound in MCF7 cells, regardless of estrogen treatment. Interestingly, differentially regulated lncRNAs are overwhelming repressed rather than induced, whereas for protein-coding genes about half are induced and half repressed. We identify a small collection of differentially ribosome-bound lncRNAs, (Fig. 1I & 1K), which correlate with differential transcriptional regulation more weakly than their protein-coding counterparts (Fig. 3).
These lncRNAs may encode short peptides, and whether they are ectopically translated, or part of a regulatory program, will be an intriguing direction for future study.
These ndings challenge numerous reports that most lncRNAs are ribosome associated in human cells (Ingolia et al., 2014), but are consistent with our previous ndings using shotgun proteomics (Banfai et al., 2011). The considerable numbers of transcripts not bound by ribosomes suggest at least two hypotheses that future work should distinguish experimentally: (i) the lncRNAs are cytoplasmic, but not bound by ribosomes (outside of pioneer-round-ofproofreading events), or (ii) the lncRNAs are nuclear (or sequestered in other cellular compartments); this could also be true for some of the differentially expressed mRNAs that are Ribo-Seq-negative, as nuclear retention of mRNAs is a stress response mechanism in human cells (Prasanth et al., 2005).
Our data showed only a fairly weak correlation between lncRNA genes' transcriptional and translational changes in response to estrogen, in contrast to protein-codinggenes. These ndings directly contradict the still-prevailing view in the eld that many lncRNAs are translated (Ji et al., 2015). That emerging de-facto dogma was never previously tested in a controlled experiment in the same human cells before and after a speci c nuclear hormone treatment, until our experiments, which suggest that most lncRNAs, including those with robust positive and negative transcriptional responses to estrogen, do not associate with ribosomes and hence are translationally inactive.
Differential transcript analysis revealed that alternative promoter usage and alternative splicing of certain proteincoding genes are responsive to estrogen treatment, and that estrogen may separately regulate the transcription and translation of other genes. The mechanism of this apparent preferential translational regulation by estrogen remains unknown, and these ndings need to be validated by independent experiments such as isoform-speci c quantitative RT-PCR and targeted mass spectrometry.
Our data showed that most of the differentially expressed protein-coding genes found in Ribo-Seq were also found to be differentially expressed in RNA-Seq, but most differentially expressed lncRNA genes are only differentially expressed in RNA-Seq, not in Ribo-SEq. There are at least three possible explanations for these observations. First, as expected, most protein-coding genes expressed in RNA-Seq were also found in Ribo-Seq-a reasonable nding, as protein-coding transcripts bound on ribosomes would be translated into proteins. Whereas, the majority of lncRNA genes are absent from the ribosome pro ling datasets, indicating a likelihood that they are not ribosome-boundconsistent with proteomics evidence (Banfai et al., 2012) as well as, importantly, with other Ribo-Seq efforts (Guttman et al., 2013). Second, because our Ribo-Seq libraries were not as deep as, and (due to the difference in protocols) had shorter reads than, our RNA-Seq libraries, we cannot rule out the possibility that it was generally more di cult to detect both the presence and the differential expression of transcripts expressed at low levels (which are more likely to be lncRNAs than mRNAs [Derrien et al., 2012]) in Ribo-Seq than in RNA-sEq. Hence, some of the "RNA-Seq-only" and "differentially expressed in RNA-Seq but not in Ribo-Seq" hits might be false negatives, although this can be corrected by separating all gene sets into bins -ordered and ranked by gene expression level -and comparing mRNAs and lncRNAs only within same-expression-range bins (i.e. highly expressed mRNAs with equally highly expressed lncRNAs, not with all lncRNAs) (Fig. 4). Third, unlike mRNAs, many lncRNAs are nuclear or otherwise restricted to speci c subcellular compartments inaccessible to ribosomes (Lennox & Behlke, 2016;Cabili et al., 2015); this may explain that underrepresentation of lncRNAs in Ribo-Seq-detectable and Ribo-Seq-differentially-expressed datasets. Despite these biological limitations, the results suggest that, if a gene is differentially ribosome-bound upon estrogen treatment, then that gene is likely to also be differentially expressed.
We also present a global study of the expressed-pseudogene transcriptome response to a nuclear hormone. We identi ed several pairs of differentially expressed pseudogenes and their parental genes, some with positive and others with negative correlations between the pseudogene RNA and the parental gene mRNA levels. It is possible that these transcribed pseudogenes regulate their parental genes, instead of being merely differentially co-expressed with them. The estrogen responsive pseudogene transcripts in the positively-correlated pairs may function as ceRNAs (competing endogenous RNAs), sponging miRNAs that would otherwise bind to, and repress, the co-expressed parental genes' mRNAs.
Most of the protein-coding genes' functional categories signi cantly up-and down-regulated by estrogen in RNA-Seq are, expectedly, broadly associated with cancer ( Supplementary Figs. 3 & 4). The estrogen up-regulated genes engaged in cell division and DNA replication con rm cell proliferation after estrogen treatment. Other up-regulated genes involved in microtubule cytoskeleton organization and regulation of cell adhesion facilitate invasion and metastasis, key hallmarks of cancer (Hanahan & Weinberg, 2011). Additionally, the down-regulated genes were engaged in ECM organization, which is usually deregulated during proliferation and metastasis (Walker, et al., 2018).
Combined with the regulation of actin lament-based process in cell migration (Yamaguchi & Condeelis, 2007), the gene-ontology ndings were fully consistent with estrogen's known role in cancer progression. In KEGG analysis, besides the up-regulated and down-regulated genes involved in similar processes (cell cycle and DNA replication; ECM-receptor interaction), the PPAR signalling pathway was detected and is known to contribute to cell growth in cancer (Tachibana et al., 2008). Activation of cAMP signalling inhibits proliferation of cancer cells (Fajardo et al., 2014), implicating that the down-regulated cAMP signalling may stimulate cancer progression.

Conclusion
Our ndings revealed that, unlike protein-coding gene mRNAs, most lncRNAs are not ribosome-bound and therefore are not translated into proteins in human breast cancer cells, whereas for many other lncRNAs, E2 induction affects their expression level, but not their association with ribosomes. Only a small subset of lncRNAs occupied by ribosomes and displaying transcriptional and/or ribosome-occupancy changes was found, while the majority of lncRNAs was not detectably bound by ribosomes, but did respond to estrogen transcriptionally. The results are consistent with mass spectrometric evidence against widespread lncRNA translation, and indicate systematic biological distinctions between the estrogen response of the protein-coding and the non-coding RNA transcriptomes.  Figure 1 Volcano plots in RNA-Seq and Ribo-Seq. (A) ~ (L) evaluate the differential expression and differential ribosome occupancy changes for genes as its title indicates. The x-axis corresponds to the log2 (fold change) in the estrogen treatment versus control group. The y-axis represents the negative log10 of Padj value. A cutoff with Padj value of 0.05 and a fold change of 2 are addressed by dashed lines within the plots. The number of upregulated, downregulated, and "undifferentiated" (detected and expressed, but not differentially expressed) genes is indicated next to each plot.  Correlation of gene expression between transcriptomic and ribosome pro ling data. The y-axis corresponds to thelog2 of adjusted fold change value in Ribo-Seq and the x-axis corresponds to the log2 of adjusted fold change value in RNA-Seq.The gene whose discrepancy in adjusted fold change between in both datasets > 2 or < -2 or between -2 and 2, are marked by "Transcriptome" -the gene is disproportionately more differentially expressed in RNA-Seq, "Translatome" -the gene is disproportionately more differentially expressed in Ribo-Seq, "Undifferentiated" -no evident discrepancy in expression level between the two datasets. Disproportionately greater differential expression refers to the magnitude of absolute-value fold change and can be up or down; we only considered genes that are up in both RNA-Seq and Ribo-Seq after estrogen treatment, and genes that are down in both RNA-Seq and Ribo-Seq after estrogen treatment, in this analysis.