Relationship construction and screening
We have a total of 14370 expression profiles of lncRNAs. Using the miRcode database, we obtained 7098 lncRNA-miRNA pairs containing 1449 lncRNAs and 35 miRNAs. Also, we had 2253 expression profiles of miRNAs and removed samples with a ratio of 0 to 75%, 875 miRNAs were left. Using miRDB, miRTarBase and TargetScan databases, we performed target gene prediction on our miRNAs and got 19571 relationship pairs. Among the three databases, 5311 genes and 780 miRNAs were predicted as target genes at the same time.
Based on the ceRNA hypothesis, lncRNAs can competitively bind miRNAs to prevent them from binding to mRNA in the model of ceRNA. Therefore, the expression level of lncRNA that can theoretically exert ceRNA function is inversely related to the expression level of miRNA. The mRNA expression level in the ceRNA relationship is also negatively correlated with the miRNA expression. We calculated the Pearson correlation coefficients of lncRNAs and miRNAs in lncRNA-miRNA relationship pairs, and screened out 240 pairs with significantly negative correlations (P < 0.05). Similarly, we calculated the Pearson correlation coefficients and their P-values of miRNAs and mRNAs in miRNA-mRNA relationship pairs, and screened out 2883 relationship pairs with significant negative correlations (P < 0.05).
The ceRNA network is a lncRNAs and mRNAs network using miRNA as a medium. In this network, each miRNA must be linked to lncRNA and mRNA simultaneously. That is to say, in our relationship files, only the lncRNA-miRNA relationship files or the miRNA-mRNA relationship files exist. We further filtered out the incompliant relationships in the two relationship pairs, combined with filtered lncRNA-miRNA and miRNA-mRNA pairs, finally we got 227 lncRNA-miRNA relationship pairs and 171 miRNA-mRNA relationship pairs. Some of the relationships are shown in Table 1–2.
Table 1
Several of the lncRNA–miRNA pairs obtained from this analysis.
lncRNA | miRNA |
FAM182A | hsa-miR-139-5p |
FAM182A | hsa-miR-17-5p |
FAM182A | hsa-miR-20b-5p |
FAM182A | hsa-miR-107 |
PART1 | hsa-miR-22-3p |
PART1 | hsa-miR-24-3p |
PART1 | hsa-miR-125a-5p |
PART1 | hsa-miR-129-5p |
SMAD5-AS1 | hsa-miR-455-5p |
AC068020.1 | hsa-miR-22-3p |
Table 2
Several of the miRNA–mRNA pairs obtained from this analysis.
miRNA | mRNA |
hsa-miR-24-3p | TAOK1 |
hsa-miR-139-5p | STAMBP |
hsa-miR-17-5p | PAFAH1B1 |
hsa-miR-301b-3p | CDADC1 |
hsa-miR-24-3p | PDXK |
hsa-miR-24-3p | CNNM3 |
hsa-miR-17-5p | LDLR |
hsa-miR-125a-5p | ESRRA |
hsa-miR-17-5p | RAPGEF4 |
hsa-miR-129-5p | ETV6 |
CeRNA network construction and visualization
Combined with the 227 pairs of lncRNA-miRNA and 171 pairs of mRNA-miRNA obtained above, we used Cytoscape software to visualize the network and, and finally got a ceRNA network diagram. There were 398 edges and 326 nodes in this network, including 146 lncRNAs, 20 miRNAs, and 160 mRNAs (Fig. 1).
Topology analysis and stability analysis of ceRNA networks
For the topology analysis of the constructed ceRNA network, Fig. 2A showed the distribution of node degrees. The degrees of most nodes are very low, and only a small part of the nodes is high, which suggests that those nodes with higher degrees act as hubs for the entire network.
The closeness (Centrality, CC) of a node can calculate the number of connection steps between a given node and other nodes. A more concentrated node will get a higher score, so CC means the shortest path. As shown in Fig. 2B, nodes with the relatively lower number of connections are relatively concentrated, and some points with higher degrees of connectivity are relatively scattered.
The path can reflect the combination of all nodes in the network. Figure 2C showed the shortest path length distribution of the network. It can be seen in the figure that the distribution of the path length is more concentrated, with fewer extremes at both ends, an upper limit of 10, and a lower limit of 1, indicating that most of the nodes in the network can be connected by short path correlation.
Figure 2D showed the degree distribution density graph of nodes. We find that as node degrees increase, the density of nodes decreases sharply, suggesting that most of the nodes in the network show isolation. In the process of disease, it may be due to changes of several key nodes, and then interact with neighboring nodes to induce coexpression, and finally affect the downstream biological processes.
MRNA enrichment analysis (GO and KEGG)
GO (Gene Ontology) analysis shows that most of the enriched biological processes (BP) are all associated with the cell cycle, suggesting that changes in the cell cycle and regulation are involved in the development of breast cancer (Fig. 3A).
MAPK signaling pathway ranked first in the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis. Studies have shown that MAPK signaling pathway mainly increases the expression of cyclin D1, promotes the transition from cell cycle G1 to S phase and accelerates cell division and proliferation(23, 24). It can also inhibit the expression of cell cycle arrest proteins, release cell cycle arrest and promote cell division, which is consistent with the results of our GO analysis. In addition, bladder cancer and pancreatic cancer also appeared in other items, suggesting that mRNAs are closely related to cancer (Fig. 3B).
Network node survival analysis and potential markers of lncRNA
We performed Cox survival analysis on 146 lncRNAs in the network, and finally screened 6 lncRNAs with significant prognostic significance. The result was shown in Table 3. Next, we carried out univariate regression analysis on 6 prognostically differentiated lncRNAs. A forest map was shown in Fig. 4, the high expression of 6 prognostically significant lncRNAs was associated with unfavorable prognosis.
Table 3
Prognosis-related lncRNAs in ceRNA networks.
LncRNA | P value | HR | Low 95%CI | High 95%CI |
PSORS1C3 | 0.0495213 | 0.10516734 | 0.01111244 | 0.99529603 |
AC010536.2 | 0.03919404 | 0.26422563 | 0.07457465 | 0.93617851 |
FGD5-AS1 | 0.01875322 | 1.0110719 | 1.00183077 | 1.02039826 |
IQCF5-AS1 | 0.03728005 | 0 | 0 | 5.41E-54 |
PART1 | 0.04086366 | 1.18655672 | 1.00713105 | 1.39794801 |
AL021707.1 | 0.02160282 | 0.22725254 | 0.06419527 | 0.80447853 |
The prognosis model of lncRNA
Further, we used these 6 lncRNA expression profiles for multivariate Cox regression to construct a prognostic model. Risk Score = − 2.67×PSORSIC3 − 0.71×AC010536.2 + 0.01×FGD5-AS1 − 1.46×IQCF5-AS1 + 0.18×PART1 − 0.89×AL021707.1. We substituted each lncRNA expression profile into the model to calculate the risk coefficient of each sample, and classified the samples according to the median risk factor. It can be seen from the figure that the number of deaths in high-risk group was significantly higher than that in the low-risk group, and the higher the risk coefficient, the lower the expression of the lncRNAs in PSORS1C3, AC010536.2, IQCF5-AS1, AL021707.1, FGD5-AS1, and PART1 (Fig. 5).
To observe the prognostic differences between the high-risk and low-risk samples of the model, we performed survival analysis and plotted K-M curves for the high-risk group and the low-risk group (Fig. 6A). There are significant prognostic differences between the two groups of samples. To further observe the heterogeneity and stability of the model, we analyzed the ROC curve of the model (Fig. 6B). The prognostic model constructed by the 6 lncRNAs has a high AUC area and can significantly distinguish the risk of prognosis, suggesting that the 6 lncRNAs can be served as good prognostic markers.