LINE-1 and Alu Promote Breast Tumor Progression Via Forming Chimeric Transcripts

Background (cid:0) LINE-1 (L1) and Alu were reported to regulate tumor development and progression by constituting chimeric transcripts with diverse sequences. The landscape of L1 or Alu chimeric transcripts in breast tumor has not been reported yet, so breast tumor L1 and Alu chimeric transcripts were investigated in this study by analyzing more than 50 billion RNA sequences of breast tumor tissues and adjacent normal tissues from TCGA database. Results (cid:0) The expression of L1 and Alu in breast tumor tissues were signicantly higher than that in the non-tumor tissues. High expression of L1 and Alu in breast tumor predicted poor prognosis. Further exploration demonstrated that L1 and Alu were extensively chimerically expressed with adjacent transcripts. 651 L1-mRNA chimeric transcripts and 1525 Alu-mRNA chimeric transcripts showed signicantly different frequencies between non-tumor and tumor tissues. 1009 L1-lncRNA chimeric transcripts and 2575 Alu-lncRNA chimeric transcripts had signicantly different frequencies between non-tumor and tumor tissues. Function cluster analysis demonstrated that these differently chimerically expressed genes were involved in multi facets of tumorigenesis, including metabolism, signal transduction, immune reaction, cell cycle and apoptosis, etc. Conclusions (cid:0) Chimeric expression with different sequences might be important and


Background
Transposons are DNA elements that can move within (or between) genomes. Transposons can be divided into DNA transposons and retrotransposons according to their transposition manners [1,2]. Long Interspersed Element-1 (LINE-1, L1) and Alu are two of the most active retrotransposons in the primates [3,4]. L1 is the only active autonomous mobile DNA in human, constituting ~ 17% of the human genome.
The intact L1s are about 6 kb. Most L1s are inactivated due to truncations, rearrangements or mutations, and only about 100 copies remain active in typical human genome. The typical Alu elements are ~ 300 bp. There are more than one million copies of Alu elements, accounting for ~ 11% of the human genome.
Alu is a non autonomous transposon, whose transposition relies on the proteins expressed by other autonomous transposons, such as L1. The activations of L1 and Alu not only cause genome instability, but also induce abnormal gene expression [5], which are two of the most basic characteristics for tumors [6,7]. L1 and Alu are hypomethylated and actively transcribed in most tumors [8][9][10]. Moreover, their activations are often accompanied by abnormal expression of tumor related genes [11,12].
Breast cancer is among the top ve malignant tumors and the second leading cause for cancer related death in women [13]. L1 and Alu are hypomethylated in breast tumor, and can regulate the expressions of adjacent genes, including forming chimeric transcripts with them to promote tumorigenesis [14][15][16]. Yet, there has not been a comprehensive report on the pro le of L1 or Alu chimeric expressions in breast cancer. Through analyzing breast cancer transcriptome data from the Cancer Genome Atlas (TCGA) database, we found that L1 and Alu were high expressed in breast tumor tissue and the high expression of L1 and Alu in breast tumor tissue predicted poor prognosis; further exploration demonstrated that L1 and Alu formed extensively chimeric transcripts in breast tumor.

Data resources
The raw transcriptome sequencing data (fastq format) of 606 breast tumors and 59 adjacent normal tissues, with more than 50 billion pairs of sequences were obtained from TCGA database (Table S1). The mRNA sequence dataset and the sequences of L1 and Alu were downloaded from NCBI database; lncRNA dataset were downloaded from Noncode Database (http://www.noncode.org/index.php). The mRNA expression dataset of breast cancers was obtained from the Dashboard Stddata database of broad research institute (https://conference.broadinstitute.org/display/gdac/dashboard-stddata). The lncRNA expression dataset of breast cancers were obtained from Tanric Database of TCGA [31].

Data analyses
The RNA sequencing data were mapped against the index of L1 and Alu using Bowtie2 software and the mapping outputs were ltered. For the single-end matches, the sequences of the other end were extracted from the original sequence les, and then mapped against the index of mRNAs and lncRNAs to obtain the chimeric mRNA and lncRNA datasets, and multiple statistical analyses were performed. All computer programs were written in Perl language. The violin plot and survival analysis were performed with Graphpad software. Cluster analysis were performed with consensuspathdb [32] and plots were generated with RStudio.

Overview of L1 and Alu expressions in breast tumor
We mapped the transcriptome sequence reads against the index of L1 or Alu full-length sequences with Bowtie2 software (Fig. 1). The statistical result shows that the expressions of L1 and Alu in women of white race are signi cantly higher than those in black or Asian race. Moreover, the diversity of expression in white race is also much higher than those in black or Asian race (Fig. 1A), which is a very interesting observation. To our knowledge, there was no similar report so far, and it is worthy of further study in the future. Figure 1B shows the coverage of RNA sequencing on both retrotransposons. Obviously, the coverage of 3'-end is signi cantly higher than that of 5'-end in L1, which is due to the large proportion of 5 '-truncation L1 copies in the genome. The highest coverage of Alu is at 20-50 bp for the similar reason.
This result also shows that the data quality of the RNA sequencing meets the requirement for analyzing these two retrotransposons.
High expression of L1 and Alu in breast tumor predicted poor prognosis Next, we compared the expressions of L1 and Alu between tumor tissues and adjacent normal tissues based on the mapping results. Results demonstrate that the expressions of both L1 and Alu in cancer tissues are signi cantly higher than those in adjacent tissues ( Fig. 2A, Fig. 2B, Student's T-Test).
Compared with differences of means or medians between cancer and normal tissues, the extraordinarily large variances in cancer tissues are even more remarkable, i.e., the expression levels of these two retrotransposons are similar in different normal tissues, whereas, the expression levels vary drastically in different cancer tissues, highlighting the complex regulations of transposon expression in tumorigenesis. Patients were further divided into high or low group according to L1 or (and) Alu expression in breast tumor tissue, and overall prognostic analysis demonstrates that patients with high L1 or (and) Alu expression have poorer prognosis than those with low expression (Fig. 2C-2F).

Widespread chimeric expression of mRNAs with L1 or Alu in breast tumor
To uncover how they promote breast tumor progression, we detected the correlation of L1 and Alu with transcriptome expression in breast tumor rstly. It shows that correlation between them is in median or low level with the highest absolute value of correlation coe cient less than 0.7 ( Figure S1), which implies that L1 and Alu affect breast tumor progression not by acting on sole gene. As L1 and Alu often affect their adjacent gene expression, we scanned their chimeric expression with adjacent genes. We identi ed overall 101,846 mRNA sequences chimerically expressed with L1, 651 of which demonstrated different expression between in tumor tissues and in adjacent tissues (P < 0.05, chi-square test) (Fig. 3A, Fig. 4A, Table S2). For example, the chimeric expressions of ESR1, SPC25, MET, FAM111B, CASP8, etc. with L1 in tumor tissues are signi cantly higher than those in adjacent tissues. Whereas, the chimeric expressions of LPP, CFLAR, ABCC9, ABCA9, LEP, BAG1, GBP5, OXCT1, etc. with L1 in tumor tissues are signi cantly lower than those in adjacent tissues. Similarly, we identi ed 129,462 chimeric transcripts of mRNAs and Alu, 1,525 of which were differently expressed between in tumor tissues and in adjacent tissues (P < 0.05, chi-square test) (Fig. 3B, Fig. 4B, Table S3). The chimeric expression of CD24, GPI, BRCA1, B3GALNT2, ERBB2, TP53, FLT1, CWF19L1, UNSAP1, etc. with Alu are signi cantly higher in tumor tissues than in adjacent tissues. And the chimeric expression MCAM, PIGR, CLDN11, CCL28, ADIPOQ, KIAA1324, SLC2A3, ECI2, EP300, etc. with Alu are signi cantly lower in tumor tissues than in adjacent tissues. Interestingly, there are more chimeric transcripts that are downregulated than those are upregulated (Fig. 3E, Fig. 3F), indicating that a lot of L1/Alu chimeric transcripts may play roles in keeping the homeostasis under physical condition and their downregulations may contribute for tumorigenesis.

L1 and Alu differently chimerically expressed genes involved in multi facets of tumorigenesis
Tumorigenesis is a multi-step process and many elements are involved in this progression. We found a large number of genes chimerically expressed with L1 and Alu in breast tumor in this study. So, we further performed functional cluster analysis of them and found that the functions of these chimeric genes were related to various aspects of tumorigenesis, including pathway related to metabolism, metabolism of protein, signal transduction, gene transcription, immune system, cell cycle and apoptosis etc. (Fig. 5A-5G). Results imply that L1 and Alu might promote breast cancer progression via in uencing multi facets of tumorigenesis.
Comparison of the chimeric expression levels and the overall expression levels of related genes Theoretically, there are two potential reasons for the change of chimeric expressions with L1 and Alu in cancer cells: 1) L1 or Alu is adjacent to these genes and the expression changes of these genes cause the changes of the chimeric expressions; 2) the changes of L1 and Alu activity cause the changes of chimeric expression. If 1) is the major cause, the chimeric expressions should display a strong correlation with the overall expressions of related genes, otherwise, the correlation should be weak. We downloaded gene expression data from previous transcriptome studies of breast cancer, and compared it with the chimeric expression data in current study. Figure 6 shows that there is no correlation between the mRNA or lncRNA chimeric expressions with L1 or Alu and total expressions of these genes, which indicates that the expression difference of these chimeric RNAs is not due to the "passenger effect" of the overall expressions, rather, they should be regulated by an independent mechanism.

Discussion
By retrieving the RNA sequencing data of hundreds of breast tumors and dozens of adjacent normal tissues, we found that the expression of L1 or (and) Alu in tumor was signi cantly higher than that in normal tissue,and higher expression of L1 or (and) Alu in breast tumor predicted poorer prognosis, which was consistent with previous reports on the hypomethylation and upregulation of L1 in breast cancers [14,18,19]. In addition to the upregulation of average expression level, the expression in tumor tissues showed extraordinarily high diversity, highlighting the heterogeneity of the activation and regulation of retrotransposons in tumors.
To explore the mechanism of L1 and Alu promote breast cancer progression, we investigated the expression of L1 and Alu chimeric transcripts in breast tumor. We found that there were signi cant differences in the L1 or Alu chimeric expressions with a large number of protein coding genes and long noncoding RNA genes between in tumor and normal tissues. In addition to the change of expression levels, the formation of chimeric RNAs with L1 or Alu may also endow genes with new functions, and in uence tumor progression. Miglio U et al. reported that the chimeric expression of L1 and MET in certain breast tumor greatly enhanced the invasion of tumor cells [16]. We also found that the chimeric expressions of MET with L1 in tumor were signi cantly higher than those in adjacent normal tissue. BRCA1 gene mutation is one of the most notorious poor prediction factors for breast cancer. It was reported that high frequency of Alu insertions was the main cause for BRCA1 mutation and abnormal insertion of Alu not only greatly increased the incidence of breast cancer, but also induced chemotherapy resistance [20,21]. TP53 is one of the most important tumor suppressor gene. It was reported that TP53 was mutated and inactivated due to Alu insertion in most tumors. TP53 mutation caused by Alu highfrequency insertion was not only related to the occurrence, but also related to the formation of secondary malignant tumor and chemotherapy resistant phenotype in breast cancer [22][23][24]. Through large sample, high-throughput study, we have con rmed that previous reports about frequent insertion of Alu leading to BRCA1 or TP53 are common in breast cancer. These results suggest that more attention should be paid to the role of retrotransposons in breast cancer.
In addition, many lncRNAs demonstrating signi cantly different chimeric expressions with L1 or Alu between cancer and adjacent normal tissue have also been proved to be involved in the incidence and progression of breast cancer. For example, Cai C. et al. found that SNHG16 (NONHSAT056037.2) promoted the migration of breast cancer cells by competitive binding miR-98 and E2F5 [25]. Jiang M. et al. reported that LINC01296 (NONHSAT035480.2) was highly expressed in breast cancer, and its over expression was related to poor prognosis [26]. Tan BS et al. discovered that the expression of NORAD (NONHSAT079548.2) was low in breast tumor, and its knockdown could promote breast tumor metastasis [27]. Xing F. et al. reported that XIST (NONHSAT137541.2) was low expressed in breast tumors, and its knockout promoted breast tumor brain metastasis [28]. All these lncRNAs showed signi cantly different chimeric expressions with Alu between tumor tissue and normal tissue in our study. According to the report of Feng W. et al., KCNQ1OT1 (NONHSAT017523.2) was highly expressed in breast cancer and promoted tumor growth [29]. DeVaux RS. et al. reported that the expression of BHLHE40-AS1 (NONHSAT246212.1) was increased in breast cancer, which could modulate interleukin-6/STAT3 activity to promote breast tumor progression [30]. We also found that chimeric expressions of these two lncRNAs with L1 increased signi cantly in breast tumor tissues.
In addition to the above examples, there are also many well-known genes playing important roles in the genesis and development of breast tumor, such as ESR1, LPP, CFLAR, LEP, BAG1, GBP5, OXCT1, CD24, GPI, B3GALNT2, ERBB2, FLT1, CWF19l1, NUSAP1, MCAM, KIAA1324, SLC2A3, ECI2, etc., have signi cantly different chimeric expression with L1 or Alu between in cancer and normal tissues. Moreover, by comparing L1 and Alu chimeric expressions with the total expressions of related genes in breast cancer, we found that there was no correlation between them, indicating that these chimeric expressions were not the "passenger effect" of the overall expression changes. Instead, the chimeric expressions are more likely to be regulated by independent mechanisms, which may also endow new functions to these genes.
Although the expression of L1 and Alu are signi cantly upregulated in breast cancer, there are more downregulated chimeric mRNAs than upregulated ones (Fig. 3E, 3F), which indicates that many tumor suppressors may function as chimeric transcripts with L1 or Alu under physical condition. Therefore, the changes of L1 and Alu that contribute for tumorigenesis include not only activations, but also suppressions.
Furthermore, the functions of many genes identi ed in this study that are differentially expressed with L1 or Alu between cancer and normal tissue are still elusive (especially a large number of lncRNA genes), suggesting the potential role of these genes in the genesis of breast cancer, which is worth further investigations.

Conclusions
Through analyzing a large number of RNA sequencing data, we found that L1 and Alu were high expressed in breast tumor and their high expression predicted poor prognosis for patients. We also found that gene chimeric expression with L1 or Alu in breast cancer is a very common phenomenon, but not just some special cases. The chimeric expression of these genes may be one of the reasons for promoting breast cancer formation and progression. Function cluster analysis of L1 and Alu chimeric genes demonstrated that functions of these chimeric transcripts were involved in various aspects of tumorigenesis including metabolism, immune reaction, signaling pathway, cell cycle and apoptosis etc. Generally, there are not enough studies on the relationship between human endogenous transposons and cancer yet, and the role of transposons in cancer may be largely underestimated. We believe that in-depth researches in this eld will provide new ideas and targets for cancer diagnosis and treatment.