Identification of a novel lncRNA‐miRNA‐mRNA competing endogenous RNA network associated with prognosis of breast cancer

Recently, the effects of competing endogenous RNA (ceRNA) on molecular biological mechanism of cancer have aroused great interest. In this study, long noncoding RNA‐microRNA‐messenger RNA (lncRNA‐miRNA‐mRNA) ceRNA network was screened and constructed based on the Cancer Genome Atlas (TCGA) database, and its efficacy in predicting the prognosis of breast cancer patients was evaluated. The RNA‐sequencing, miRNA‐sequencing, and corresponding clinical information were downloaded from the TCGA database, and differentially expressed genes were screened after data searching. The similarity between two groups of genes was analyzed by weighted correlation network analysis (WGCNA). Next, the interaction among lncRNA, miRNA, and mRNA was predicted followed construction of the lncRNA‐miRNA‐mRNA ceRNA network. Finally, univariate and multivariate Cox regression analysis was used to screen prognostic factors to construct prognostic risk model. Receiver operating characteristic (ROC) curve was used to evaluate the efficacy of this model in predicting the prognosis of breast cancer patients. In total 5056 differentially expressed lncRNAs, 712 differentially expressed miRNAs, and 9878 differentially expressed mRNAs were identified in breast cancer tissues. WGCNA predicted that 823 lncRNAs and 1813 mRNAs were closely related to breast cancer. The lncRNA‐miRNA‐mRNA ceRNA network involved in breast cancer was constructed based on 27 lncRNA, 14 miRNAs, and 4 mRNAs. ZC3H12B, HRH1, TMEM132C, and PAG were the possible independent risk factors for the prognosis of breast cancer patients with the area under the signal characteristic curve under ROC curve of 0.609. This study suggested that the prognosis risk model based on ZC3H12B, HRH1, TMEM132C, and PAG1 accurately predicted the prognosis of breast cancer patients.


| INTRODUCTION
Breast cancer, the most frequently diagnosed malignancy in women, remains the second reason for cancer-related in female around the world. [1] It is reported that there are three major therapies for breast cancer, including chemotherapy, endocrine therapy, and targeted therapy. [2] Breast cancer, a heterogeneous disease, varies greatly among different patients and even within each individual patient. [3] Although the survival rate has gradually increased due to the early detection and advances of therapeutic strategy in recent years, there are still many obstacles in the treatment of breast cancer. [4] Since we still do not fully understand how breast cancer occurs and progresses, it is imperative and meaningful to explore the underlying molecular mechanisms and identify novel prognostic biomarkers and potential therapeutic targets for breast cancer.
It is well known that competing endogenous RNAs (ceRNAs) could indirectly regulate each other by reducing the amount of microRNAs (miRNAs) available to target messenger RNAs (mRNAs), thereby affecting the progression of various disease conditions, including cancers. [5] A recent study has demonstrated that long noncoding RNAs (lncRNAs) acting as miRNA decoys modulating the derepression of miRNA targets have been found in multiple human cancers. [6] Moreover, noncoding RNA including miRNA and lncRNA could regulate gene expression at transcriptional and posttranscriptional level. [7] The aberrant expression of miRNA and lncRNA has been widely reported to be implicated in tumorigenesis and development of various cancers, including breast cancer. [8,9] A prior study has proposed a ceRNA hypothesis that mRNA, miRNA and lncRNA could achieve cross-talk between each other by forming a regulatory network. [10] Accumulating evidence has elucidated that lncRNAs could serve as sponges for miRNAs through miRNA response elements, resulting in alterations in the miRNAs-regulated mRNA levels, and the lncRNA-miRNA-mRNA ceRNA network is reported to be involved in the development and progression of some tumors, including breast cancer. [11] Although ceRNAs have been applied for biological function annotations, few of them have investigated the ceRNA network on specific disease systematically in breast cancer. [12] Nevertheless, understanding of the pivotal lncRNA-miRNA-mRNA ceRNA networks associated significantly with prognosis of breast cancer remain very limited. Therefore, the present study is aimed to explore the role of the discovered lncRNA-miRNA-mRNA ceRNA network in the pathogenesis of breast cancer, and its possible mechanisms are explored, which may help to provide a novel direction for treating breast cancer.

| Data acquisition
The RNA-sequencing, miRNA-sequencing, and clinical data of breast cancer were retrieved from TCGA database (https://tcga-data.nci.nih. gov/) and downloaded using data transmission tools (provided by GDC apps). The RNA sequencing data consisted of 1109 breast cancer tissues and 113 normal tissues, which were combined into a matrix file by Perl script (http://www.perl.org/), and then the ensemble ID of the sample was transformed by the annotation file of GENCODE Gene Set-09.2019 version. The LncRNA and mRNA ensemble ID not included in the GENCODE database are excluded. Finally, the expression matrix of lncRNA and mRNA was obtained. The miRNA-Seq data included 1103 breast cancer tissues and 104 normal tissues, which were combined into a matrix file using Perl language (http://www.perl.org/). The downloaded clinical data contained 1097 cases. Due to the availability of public data in TCGA database, this study does not need moral approval or informed consent.

| Screening of differentially expressed genes (DEGs)
The "edge" R software package was employed to screen differentially expressed lncRNAs, miRNAs, and mRNAs in breast cancer tissues and normal samples. All q values were corrected for statistical significance of multiple tests using false discovery rate (FDR) with |log 2 (fold change [FC])| > 0.5 and FDR < 0.05. Volcano maps of differentially expressed lncRNAs, mRNAs, and miRNAs were drawn with ggplot2 of R package.

| Functional enrichment analysis
The gene ontology (GO) functional enrichment analysis, Kyoto Encyclopedia of Genes and Genomes-Gene Set Enrichment Analysis (KEGG-GSEA) pathway enrichment analysis were conducted using the "Cluster-Profiler" R software. Go function enrichment analysis was carried out from three aspects of biological process (BP), cell composition, and molecular function. KEGG-GSEA pathway enrichment analysis was carried out at the significant level of p < 0.05.

| Weighted correlation network analysis (WGCNA)
WGCNA is an algorithm for gene co-expression network recognition through mRNA or lncRNA with different characters in high-throughput expression spectrum, which can distinguish genes into multiple clusters. The pairwise Pearson correlation analysis was utilized to evaluate the weighted co-expression relationship among all datasets in the adjacency matrix. In this study, WGCNA was adopted to analyze differentially expressed mRNAs and lncRNAs to obtain the differentially expressed mRNAs or lncRNA most relevant to prognosis of breast cancer patients.

| Construction of ceRNA network
According to the relationship among differentially expressed lncRNAs, miRNAs, and mRNAs, the ceRNA network was

| RESULTS
3.1 | Screening of differentially expressed lncRNAs, miRNAs, and mRNAs in breast cancer tissues and adjacent normal tissues To explore the expression levels of differentially expressed lncRNAs, miRNAs, and mRNAs in breast cancer, RNA-sequencing data of breast cancer and adjacent normal tissues were obtained from TCGA database, which displayed the clinicopathological characteristics of 1097 patients with breast cancer (Supporting Information: Table S1).
As shown in Figure 1 (6906 upregulated mRNAs and 2972 downregulated mRNAs) were obtained through differential expression analysis.

| Differentially expressed mRNAs play an important role in occurrence and development of breast cancer
To explore the biological functions of the identified differentially expressed mRNAs in breast cancer tissues and adjacent normal tissues, we performed GO enrichment analysis through "ClusterPro-  Figure 2E). These data implied that these significantly differentially expressed mRNAs play an important role in occurrence and development of breast cancer through these enriched pathways.

| Validation of gene modules using WGCNA
WGCNA uses the idea of system biology to find the similarity of gene expression, and identify the highly related gene as a gene module. [13] In this study, WGCNA was performed to analyze the first 80% differentially expressed mRNAs (mean absolute deviation [MAD] > 0.01). Besides, β = 2 (scale free R2 = 0.91) was selected as the soft threshold parameter to ensure scale-free network ( Figure 3A). According to β = 2, the proximity matrix was obtained to further verify that the selected memory network approximated scale free ( Figure 3B). There were 11 differentially expressed mRNAs by WGCNA. The characteristic genes of all modules were calculated and clustered according to their correlation. The dendrogram of module characteristic gene showed that 11 differentially expressed mRNA modules were mainly divided into two clusters ( Figure 3C). Due to the large difference coefficient between the modules, there was no need to merge the modules, and finally 11 modules were determined ( Figure 3D). As shown in Figure 3E, the thermal map of the topological overlap matrix (TOM) presented that there was a high degree of independence between these 11 differentially expressed mRNA modules. Subsequently, we analyzed the similarity and adjacency (breast cancer tissues and adjacent normal tissues) of 11 differentially expressed mRNA modules. It was found that yellow, brown, and black module genes were positively correlated with breast cancer, while purple, blue and red module genes were negatively correlated with breast cancer, including 1813 differentially expressed mRNA ( Figure 3F). Analysis of GO-GSEA ( Figure 3G,H) exhibited that these 1813 differentially expressed mRNAs were significantly related to the BPs of anatomical structure development, cell macromolecule metabolism process, cell metabolism process, development process, and multicellular tissue process.
Moreover, KEGG analysis showed that these 1813 differentially expressed mRNAs were mainly enriched in the neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, renin secretion, the interaction between viral protein and cytokine and cytokine receptor, legionellosis, chemokine signaling pathway, and calcium signaling pathway and so on ( Figure 3I). 3.4 | Seven differentially expressed lncRNA modules were related to occurrence of breast cancer through WGCNA analysis WGCNA was performed to analyze the first 80% differentially expressed lncRNAs (MAD > 0.01). β = 2 (scale free R2 = 0.92) was selected as the soft threshold parameter to ensure scale-free network ( Figure 4A). According to β = 2, the proximity matrix was obtained to further verify that the selected memory network approximated scale free ( Figure 4B). Dendrogram of modular characteristic gene ( Figure 4C) showed that eight differentially expressed lncRNA modules were identified by WGCNA. The characteristic genes of all modules were calculated and clustered according to their correlation. It was found that the coefficient of dissimilarity among some modules was very small. MEDissThres was set as 0.25 to merge similar modules, and finally, seven differentially expressed lncRNA modules were determined ( Figure 4D).
As shown in Figure 4E, the heatmap of the TOM presented that there was a high degree of independence between these seven differentially expressed lncRNA modules. Then, we analyzed the similarity and adjacency of the characteristics and the characters (breast cancer tissues and adjacent normal tissues) of differentially expressed lncRNA modules of seven colors, which revealed that yellow modules were positively correlated with breast cancer, while brown, red, and black modules were negatively correlated with breast cancer ( Figure 4F). The number of differentially expressed lncRNAs in each module is shown in Figure 4G. There are 823 differentially expressed lncRNAs in yellow, brown, red, and black modules.

| Occurrence of breast cancer was related to 355 lncRNA-miRNAs and 68 miRNA-mRNAs
The   Figure 6A-D). The correlation between four genes in the gene model is shown in Figure 6E,F.
Subsequently, the prognostic risk model based on the above four key differentially expressed mRNAs (ZC3H12B, HRH1, TMEM132C, and PAG1) was constructed and then the prognosis risk score was calculated. Patients from TCGA were divided into low-risk and high-risk groups ( Figure 6G). In addition, the expression heat map of four key differentially expressed mRNAs in the high-risk and low-risk groups is shown in Figure 6H.
Kaplan-Meier survival curve showed that OS of high-risk patients was significantly shorter than that of low-risk patients (p = 0.012).
After ROC analysis, we found that the prognostic risk model

| DISCUSSION
Breast cancer is a global health threat with its increasing incidence in female population. [14] In spite of recent advances in early detection and cancer therapeutics of breast cancer, contributing to decrease in breast cancer mortality rates, it still remains the main cause for public health concern. [15] Thus, it is of great importance to unveil the molecular switch controlling malignant transformation of breast cancer, which may also contribute to excavate novel prognostic indicators. Taken together, the current study reveals the lncRNA-miRNA-mRNA ceRNA network related to the pathogenesis of breast cancer.
The data in the current study predicted that there are 5056 lncRNAs and 712 miRNAs dysregulated in breast cancer. Similarly, a great deal of research has shown that miRNAs and lncRNAs play a vital role in the pathogenesis and progression of cancers, including breast cancer. [8,16] Nowadays, a growing body of evidence has also implicated that there is a complex and close relationship between miRNAs and lncRNAs. [10] Moreover, emerging evidence has also indicated that ceRNA network could play crucial role in various pathological processes of tumorigenesis, including breast cancer. [17] For instance, it has been reported that lncRNA CCAT1 exerted as a ceRNA and negatively regulated miR-148b expression, thereby reducing cancer cell colony formation and promoting apoptosis to regulate the progression of breast cancer. [18] Lu et al. have exhibited that LINC00511 function as sponge for miR-185-3p to positively recover E2F1/Nanog axis, hence contributing to breast cancer tumorigenesis and stemness. [19] LncRNA ARNILA is proved to promote the invasion and metastasis of triple-negative breast cancer by sponging miR-204 to facilitate expression of its target gene Sox4. [20] On the contrary, various studies also identified some ceRNA networks which have a suppressive role in breast cancer. [21] Nevertheless, there is still a paucity of studies on comprehensive analysis of ceRNA network in breast cancer, and integrated research is needed to be done.
To systemically explore potential ceRNA regulating these hub genes mentioned above, the upstream miRNAs were predicted first based on an experimentally validated miRNA-target interactions  [22] A recent study has revealed that miR-93 contributes to inducing EMT and drug resistance of breast cancer cells by targeting PTEN. [23] miR-338p is also found to be downregulated in breast cancer and act as tumor inhibitor in breast cancer by targeting SOX4. [24]  proliferation, [25] while its role in breast cancer has not been reported.
Existing literature has indicated that HRH1 is correlated with the better prognosis in patients with breast cancer. [26] TMEM132C is reported to be act as the promising diagnostic and prognostic markers in breast cancer. [27] PAG1 is also demonstrated to stimulate tumor progression and chemoresistance of breast cancer. [28] In this present study, we successfully constructed a novel lncRNA-miRNA-mRNA ceRNA network involved in breast cancer with 27 differentially expressed lncRNA, 14 differentially expressed miRNAs, and 4 differentially expressed mRNAs. The network of lncRNA-miRNA-mRNA interactions has been investigated to serve as the refine biomarker predictions for developing novel therapeutic approaches in breast cancer. [17] The breast-cancer-specific ceRNA network based on the mRNA and miRNA expression data has also been demonstrated in the breast cancer. [12] Moreover, bioinformatics analysis of ceRNA network associated with breast cancer has greatly emerged, a meaningful lncRNA-miRNA-mRNA regulatory axis, namely, LINC00466-hsa-mir-204-NTRK2, plays a vital roles in the prognosis of breast cancer, [29] which further hinted that our results are reliable.
In conclusion, we successfully constructed a potential lncRNA-

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.

DATA AVAILABILITY STATEMENT
The datasets generated/analyzed during the current study are available.