Overview of L1 and Alu expressions in breast tumor
We mapped the transcriptome sequence reads against the index of L1 or Alu full-length sequences with Bowtie2 software (Fig. 1). The statistical result shows that the expressions of L1 and Alu in women of white race are significantly higher than those in black or Asian race. Moreover, the diversity of expression in white race is also much higher than those in black or Asian race (Fig. 1A), which is a very interesting observation. To our knowledge, there was no similar report so far, and it is worthy of further study in the future. Figure 1B shows the coverage of RNA sequencing on both retrotransposons. Obviously, the coverage of 3'-end is significantly higher than that of 5'-end in L1, which is due to the large proportion of 5 '-truncation L1 copies in the genome. The highest coverage of Alu is at 20–50 bp for the similar reason. This result also shows that the data quality of the RNA sequencing meets the requirement for analyzing these two retrotransposons.
High expression of L1 and Alu in breast tumor predicted poor prognosis
Next, we compared the expressions of L1 and Alu between tumor tissues and adjacent normal tissues based on the mapping results. Results demonstrate that the expressions of both L1 and Alu in cancer tissues are significantly higher than those in adjacent tissues (Fig. 2A, Fig. 2B, Student’s T-Test). Compared with differences of means or medians between cancer and normal tissues, the extraordinarily large variances in cancer tissues are even more remarkable, i.e., the expression levels of these two retrotransposons are similar in different normal tissues, whereas, the expression levels vary drastically in different cancer tissues, highlighting the complex regulations of transposon expression in tumorigenesis. Patients were further divided into high or low group according to L1 or (and) Alu expression in breast tumor tissue, and overall prognostic analysis demonstrates that patients with high L1 or (and) Alu expression have poorer prognosis than those with low expression (Fig. 2C-2F).
Widespread chimeric expression of mRNAs with L1 or Alu in breast tumor
To uncover how they promote breast tumor progression, we detected the correlation of L1 and Alu with transcriptome expression in breast tumor firstly. It shows that correlation between them is in median or low level with the highest absolute value of correlation coefficient less than 0.7 (Figure S1), which implies that L1 and Alu affect breast tumor progression not by acting on sole gene. As L1 and Alu often affect their adjacent gene expression, we scanned their chimeric expression with adjacent genes. We identified overall 101,846 mRNA sequences chimerically expressed with L1, 651 of which demonstrated different expression between in tumor tissues and in adjacent tissues (P < 0.05, chi-square test) (Fig. 3A, Fig. 4A, Table S2). For example, the chimeric expressions of ESR1, SPC25, MET, FAM111B, CASP8, etc. with L1 in tumor tissues are significantly higher than those in adjacent tissues. Whereas, the chimeric expressions of LPP, CFLAR, ABCC9, ABCA9, LEP, BAG1, GBP5, OXCT1, etc. with L1 in tumor tissues are significantly lower than those in adjacent tissues. Similarly, we identified 129,462 chimeric transcripts of mRNAs and Alu, 1,525 of which were differently expressed between in tumor tissues and in adjacent tissues (P < 0.05, chi-square test) (Fig. 3B, Fig. 4B, Table S3). The chimeric expression of CD24, GPI, BRCA1, B3GALNT2, ERBB2, TP53, FLT1, CWF19L1, UNSAP1, etc. with Alu are significantly higher in tumor tissues than in adjacent tissues. And the chimeric expression MCAM, PIGR, CLDN11, CCL28, ADIPOQ, KIAA1324, SLC2A3, ECI2, EP300, etc. with Alu are significantly lower in tumor tissues than in adjacent tissues. Interestingly, there are more chimeric transcripts that are downregulated than those are upregulated (Fig. 3E, Fig. 3F), indicating that a lot of L1/Alu chimeric transcripts may play roles in keeping the homeostasis under physical condition and their downregulations may contribute for tumorigenesis.
The chimeric expression of lncRNAs with L1 or Alu in breast tumor
Besides the coding genes, long non-coding RNAs (lncRNAs) are increasingly recognized to play important roles in evolution and disease progression . Therefore, we analyzed the chimeric expressions of lncRNAs with L1 or Alu. 16,817 lncRNA sequences are chimerically expressed with L1, 1,009 of which are differently expressed between in tumor tissues and adjacent tissues (P < 0.05, chi square test) (Fig. 3C, Fig. 4C, Table S4). The chimeric expressions of LOC286437 (NONHSAT138089.2), KCNQ1OT1 (NONHSAT017523.2), LINC00393 (NONHSAT034309.2), LINC01410 (NONHSAT131526.2), LINC01087 (NONHSAT074491.2), BHLHE40-AS1 (NONHSAT246212.1), etc. with L1 in tumor tissues are significantly higher than that in adjacent tissues. And the chimeric expressions of LCAT_00100281 (NONHSAT174269.1), LCAT_00113317 (NONHSAT176719.1), CECR7 (NONHSAT191948.1), LINC01088 (NONHSAT097104.29), etc. with L1 in tumor tissues are significantly lower than that in adjacent tissues. Similarly, we detected 46,457 lncRNA genes chimerically expressed with Alu, 2,575 of which were differently expressed between in tumor and adjacent tissues (P < 0.05, chi-square test) (Fig. 3D, Fig. 4D, Table S5). For example, the chimeric expressions of LINC01296 (NONHSAT035480.2), SNHG16 (NONHSAT056037.2), LINC00514 (NONHSAT147902.2), LINC01441 (NONHSAT080425.2), etc. with Alu are higher in tumor tissues than in adjacent tissues. And the chimeric expressions of LINC00894 (NONHSAT138981.2), LINC00702 (NONHSAT156543.1), LINC02202 (NONHSAT104813.2), LCAT_00037989 (NONHSAT158544.1), NORAD (NONHSAT079548.2), XIST (NONHSAT137541.2), etc. with Alu are significantly lower in tumor tissues than in adjacent tissues.
L1 and Alu differently chimerically expressed genes involved in multi facets of tumorigenesis
Tumorigenesis is a multi-step process and many elements are involved in this progression. We found a large number of genes chimerically expressed with L1 and Alu in breast tumor in this study. So, we further performed functional cluster analysis of them and found that the functions of these chimeric genes were related to various aspects of tumorigenesis, including pathway related to metabolism, metabolism of protein, signal transduction, gene transcription, immune system, cell cycle and apoptosis etc. (Fig. 5A-5G). Results imply that L1 and Alu might promote breast cancer progression via influencing multi facets of tumorigenesis.
Comparison of the chimeric expression levels and the overall expression levels of related genes
Theoretically, there are two potential reasons for the change of chimeric expressions with L1 and Alu in cancer cells: 1) L1 or Alu is adjacent to these genes and the expression changes of these genes cause the changes of the chimeric expressions; 2) the changes of L1 and Alu activity cause the changes of chimeric expression. If 1) is the major cause, the chimeric expressions should display a strong correlation with the overall expressions of related genes, otherwise, the correlation should be weak. We downloaded gene expression data from previous transcriptome studies of breast cancer, and compared it with the chimeric expression data in current study. Figure 6 shows that there is no correlation between the mRNA or lncRNA chimeric expressions with L1 or Alu and total expressions of these genes, which indicates that the expression difference of these chimeric RNAs is not due to the "passenger effect" of the overall expressions, rather, they should be regulated by an independent mechanism.