DOI: https://doi.org/10.21203/rs.3.rs-21667/v1
Sarcomas are rare types of malignant neoplasms of mesenchymal origin and usually spread to other tissues in the body (1). Sarcomas can be classified into more than 100 distinct subtypes, which are typically divided into two heterogeneous groups, including bone sarcomas and soft tissue sarcomas (2, 3). Despite the use of extensive surgical resection, intensive and multiagent chemotherapy, radiotherapy, and emerging treatment strategies consisting of molecular targeting agents, immune checkpoint inhibitors, and adoptive T-cell therapy, the overall 5-year survival rate have not significantly improved over the past few decades for sarcomas (2, 4, 5). Unfortunately, sarcomas patients accompanied with metastatic show a lower 5-year overall survival rate (6). Further investigations into the pathogenesis and identified the novel prognostic biomarkers that facilitate the improvement of therapy and prognosis of sarcomas are urgently needed.
Although ncRNAs (non-coding RNAs) such as miRNA and lncRNA, function as key regulators of gene expression involving in various human diseases is being gradually revealed, the multilayered regulatory networks formed by cross-linked ncRNAs and mRNAs seem to provide new insights into the regulatory mechanism of them in both physiology and pathology. ceRNA network has been suggested to involve in essentially biological processes and play crucial roles in the initiation and development of neoplasms and potentially serve as diagnostic and prognosis markers or therapeutic target (7). MicroRNAs (miRNAs) are small RNAs of 19–25 nucleotides in length that can guide the post-transcriptional repression of protein-coding genes by binding to their mRNAs (8). Long non-coding RNAs (lncRNAs) that harbouring miRNA response element can compete with other RNA transcripts and thus theoretically function as competing endogenous RNAs (ceRNAs) (9). A single miRNA can regulate multiple target RNAs containing the specific miRNA response element and these RNAs can be regulated by multiple miRNAs, which lays lay the foundation for the construction of ceRNA network (10). Recent studies have showed novel roles of ceRNA network in lung cancer, breast cancer, gastric cancer, esophageal adenocarcinoma, esophageal squamous cell carcinoma, cholangiocarcinoma, ovarian cancer, uterine corpus endometrial carcinoma (11–19). While the lncRNA-miRNA-mRNA ceRNA network of sarcomas has not been reported.
In the present study, the RNA expression profles of sarcomas in GEO database were integrated and identified the differentially expressed genes (DEGs), lncRNAs (DELs) and miRNAs (DEMs). Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed to screen out significant functional groups of DEGs. Thereafter, a ceRNA network consisting of DEGs, DELs and DEMs was constructed. In addition, a protein-protein interaction (PPI) network analysis for the DEGs that included in the ceRNA network and prognostic analysis based on the information of sarcoma patients in TCGA database were conducted to explore the effect of these RNAs on sarcomas.
Microarray data including high-throughput genes or miRNAs expression profiles of sarcoma patients and normal samples were acquired from GEO database (www.ncbi.nlm.nih.gov/geo/). Besides, the Affymetrix Human Genome U133 Plus 2.0 chip was re-annotatedaccording to the latest release of the updated gene database to filter lncRNAs probes (20). The process of lncRNAs annotation as following. Gene transcription sequences beginning with NM and XM in the RefSeq database were selected as a mRNA database, and ncRNAs with a length above 200 nucleotides from RefSeq that gene transcription sequences beginning with NR and XR, Ensembl and NONCODE2016 were selected as a lncRNA database. Sequence alignments between 54675 probes downloaded from the annotation file of Affymetrix Human Genome U133 Plus 2.0 Array and the mRNA database were performed by using BLAST, and the E value was set as 10− 5. Probes that match the respective gene were included in the mRNA probe set, and others were compared with the lncRNA database. The E value was set as 10− 5, and probes matched the respective gene were included in the lncRNA probe set, which was used for the analysis of DELs.
DEGs and DELs were identified in 44 Ewing sarcoma patients compared with 18 normal muscle tissue samples in GSE17674 dataset. DEMs were identified in 10 synovial sarcoma samples and 5 Ewing sarcoma samples compared with muscle tissue samples in GSE18546 dataset. The RVM t-test was applied to filter the differentially expressed RNAs for sarcoma samples and muscle tissue samples because it can raise degrees of freedom effectively in the cases of small samples (21, 22). P < 0.05, the false discovery rate (FDR) < 0.05 and FC > 3 were used as the screening criteria for DEGs and DELs, while P < 0.05, FDR < 0.05 and FC > 3 were used for DEMs. Hierarchical clustering was performed using EPCLUST to represent the relationships among the samples based upon the expression data matrix of DEGs, DELs and DEMs respectively.
GO enrichment analysis of DEGs was performed to identify the potential functional genes according to the Gene Ontology which can organize genes into hierarchical categories and uncover the gene regulatory network on the basis of biological process and molecular function (23). Two-side Fisher’s exact test and χ2 test were used to classify the GO category, and FDR was calculated to correct the P-value (24). Pathways analysis of DEGs was carried out to determine the potential function of these genes that participated in the pathways based on the KEGG database. Fisher’s exact test and multiple comparisons were done to identify the significant genes through P-value and FDR (25–27).
DEMs were identified in 10 synovial sarcoma and 5 Ewing sarcoma compared with muscle tissue samples respectively. Here, we obtained the intersection of DEMs that
up and down regulated simultaneously.
Three algorithms, miRanda (http://www.microrna.org/), Targetscan (http://www.targetscan.org/) and miRWalk (http://129.206.7.150/) were used to inspect interactions between DEMs that identified above and their target genes. The target genes that overlapped with genes identified by GO and pathway enrichment analysis were considered as the prediction results. MiRanda and PITA (https://genie.weizmann.ac.il/pubs/mir07/mir07_exe.html) were used to predict target lncRNAs and that overlapped with DELs between sarcoma samples and muscle tissue samples were used to construct the ceRNA network.
The relationship among miRNAs, mRNAs and lncRNAs was assessed according to their differential expression values and miRNA-lncRNA interaction pairs and miRNA-mRNA interaction pairs were identified based on above. Further, miRNAs negatively regulating intersection expression of lncRNAs and mRNAs were chose to construct the lncRNA-miRNA-mRNA ceRNA network. The ceRNA networks were constructed and visualized using Cytoscape v3.0 (28). The flow chart, shown in Fig. 1, represents the overall process of ceRNA network construction.
The PPI network of the encoded proteins of the DEGs that included in ceRNA network was constituted by using the STRING database version 11.0 (https://string-db.org/). Cytoscape software was used to visualize the PPI network with a combined score of protein pairs > 0.4 as the cut-off value.
High-throughput experimental data with survival profiles of sarcoma patients was extracted from The Cancer Genome Atlas (TCGA) database. The association between the expression levels of mRNAs, miRNAs and lncRNAs that included in the ceRNA network and overall survival of sarcoma patients were confirmed by using univariable Cox regression analysis. The p values less than 0.05 were recognized as statistically significant.
Four datasets (GSE55625, GSE31045, GSE17674 and GSE18546) were obtained from GEO, while GSE55625 and GSE31045 datasets with multiple zero and negative expression values were excluded. The GSE17674 dataset contains Ewing sarcoma patients and 18 normal muscle tissue samples, and we extracted probes for lncRNA annotation based on the Affymetrix Human Genome U133 Plus 2.0 array platform. GSE18546 contains two subtypes of bone sarcoma, Ewing sarcoma and synovial sarcoma, and 5 muscle tissue samples. 5 Ewing sarcoma samples and 10 synovial sarcoma samples were allocated to two case-control groups, respectively.
DERs between sarcomas and control groups were analyzed. Results showed that 3415 mRNAs (2554 up-regulated and 861 down-regulated genes), 338 lncRNAs (234 up-regulated and 104 down-regulated lncRNAs) and 52 miRNA (39 up-regulated and 13 down-regulated miRNAs) differentially expressed in Ewing sarcoma comparison to skeletal muscle. In addition, there were 145 (109 up-regulated and 36 down-regulated miRNAs) DEMs between synovial sarcoma and skeletal muscle. Hierarchical clustering of the identified DEGs, DELs and DEMs were displayed as a heatmap (Fig. 2).
GO enrichment analysis and KEGG pathways analysis of 3415 DEGs were performed to identify the potential functional genes. The upregulated and down regulated genes were analyzed, respectively. The top 25 significant enrichment of up-regulated DEGs were presented in Fig. 3A, including transcription, DNA-templated (GO:0006351; P = 2.64 × 10− 104), cell division (GO:0051301; P = 2.57 × 10− 82) and regulation of transcription, DNA-templated (GO:0006355; P = 7.99 × 10–75). Most significant enrichment of down-regulated DEGs were presented in Fig. 3B, including muscle contraction (GO:0006936; P = 1.89 × 10− 57), muscle filament sliding (GO:0030049; P = 5.78 × 10− 46) and sarcomere organization (GO:0045214; P = 8.40 × 10− 38) (Table S1).
The main pathways for both the up-regulated genes and the down-regulated genes are metabolic pathways. The top 25 significant pathways of up-and down-regulated DEGs were presented in Fig. 3C,D (Table S2). Finally, 1296 intersection DEGs were extracted from the significant enrichment genes in GO and KEGG pathway analysis involved in up and down regulated genes.
There were two different subtypes of sarcomas, Ewing sarcoma and synovial sarcoma, in GSE18546 dataset and two different expression miRNAs sets were identified. We did the Venn diagram, intersected the two DEMs sets and 26 up-regulated miRNAs and 10 down-regulated miRNAs were identified (Figure S1).
In this study, we have identified 36 sarcomas associated miRNAs, and we focused on whether these miRNAs would target 1296 DEMs identified by GO and pathway analysis or 338 DELs between sarcoma samples and muscle tissue samples. Based on the prediction about targets of DEMs, 448 miRNA-mRNA interaction pairs (including 34 miRNAs and 269 mRNAs) and 454 miRNA-lncRNA interaction pairs (including 36 lncRNAs and 117 lncRNAs) were obtained.
According to the miRNA-mRNA and miRNA-lncRNA interactions, we constructed a lncRNA-miRNA-mRNA ceRNA network through negative regulation and used the shared miRNA as a junction. Finally, the ceRNA network consisted of 1440 interactions, including 29 miRNAs, 69 lncRNAs, 113 mRNAs (Fig. 4;Table S3).
To further explore the most significant clusters of DEGs in the ceRNA network, we performed a PPI network analysis through STRING database version 11.0 and visualized it by Cytoscape. In the PPI network, the most significant hub up-regulated proteins were IGF1, PRKCB and GNAI3, and the most significant hub down-regulated proteins were AR, CYCS and PPP1CB (Fig. 5; Table S4).
Furthermore, we performed the survival analysis based on the miRNAs, mRNAs and lncRNAs that involved in the ceRNA network. Results showed that three miRNAs, three mRNAs and one lncRNA were significantly associated with overall survival of sarcoma patients (P < 0.05). Among them, the high expression levels of SMARCC1, SRSF10, PRPF38A, JARID2, GNAI3, miR-301a-3p, miR-106b-5p, miRNA-130b-3p, miR-423-3p and LINC01296 and the low expression levels of ARF3 and PRKCB were associated with shorter overall survival in sarcomas. Kaplan-Meier survival plots of these significant miRNAs, lncRNA and mRNAs were shown in Fig. 6.
Increasing evidence shows that lncRNAs and miRNAs were differentially expressed and implicated in series of molecular processes, including differentiation, proliferation, metastasis, and transcriptional regulation in sarcomas (29, 30). While the whole regulatory network that links the functions of coding and non-coding RNAs has not been extensively discussed. In the present study, bioinformatics analysis was utilized to integrate available sequencing data sets of sarcoma and 1296 DEGs were identified in sarcoma samples by combining the GO and pathway enrichment analysis, 338 DELs were discovered after the probes were re-annotated, and 36 DEMs were ascertained through intersecting two different expression miRNAs sets. Further, 448 miRNA-mRNA interactions and 454 miRNA-lncRNA interactions were obtained through target gene prediction, and then, we constructed a lncRNA-miRNA-mRNA ceRNA network that contained 29 miRNAs, 69 lncRNAs and 113 mRNAs.
The ceRNA network identified in our study provided useful clues for further study. According to DEGs in the ceRNA network, we constructed PPI network which showed the up-regulated hub nodes including IGF1, PRKCB and GNAI3, and the down-regulated hub nodes including AR, CYCS and PPP1CB. In addition, there were twelve RNAs in the ceRNA network associated with prognosis of sarcomas based on the TCGA database. Among the seven overall survival associated mRNAs, the high expression levels of SMARCC1, SRSF10, PRPF38A, JARID2 and GNAI3 were significantly associated with shorter overall survival in sarcomas (P = 0.0018, P = 0.037, P = 0.0058, P = 0.0093, P = 0.0234, respectively). While the high expression of ARF3 and PRKCB were significantly associated with longer overall survival (P = 0.0018 and P = 0.0162), indicating that ARF3 and PRKCB overexpression could be positive prognostic factors in sarcoma patients. In these overall survival associated mRNAs, GNAI3 and PRKCB with the highest degree of connectivity, were hubs and tended to be essential (31). Moreover, PPI network showed that GNAI3 and PRKCB interacted with each other. The protein encoded by PRKCB is one of the PKC family members that has been reported to be involved in many different cellular functions. Surdez et al.’s study showed that PRKCB is strongly overexpressed in Ewing sarcoma. PRKCB inhibition significantly increased apoptosis in Ewing sarcoma cells and prevented tumor growth in vivo (32). GNAI3 encodes an alpha subunit of Guanine nucleotide-binding proteins and involves in various transmembrane signaling pathways. While their function and role in the prognosis of sarcoma patients remain to be further defined. Besides, in the ceRNA network, both GNAI3 and ARF3 can be regulated by miR-133b and miR-133a-3p, which are located in the center of regulatory network and interact with lncRNAs, including NONHSAT001973.2, NONHSAT203034.1, SNHG12 and ZNF37BP. Recent researches have demonstrated that SNHG12 was significantly over-expressing in osteosarcoma and high expression of SNHG12 tended to have a poor prognosis of osteosarcoma patients. Researchers further confirmed that SNHG12 promoted tumorigenesis and metastasis by activating the Notch signaling pathway, which functioned as a ceRNA, and modulated Notch2 expression of Notch2 by competing with miR-195-5p (33). Therefore, the interaction of SNHG12, miR-133a-3p/miR-133b and GNAI3 in sarcoma was deserved to be explored. Additionally, ARF3 also could be regulated by miR-378a-3p and miR-422a. Among them, miR-378a-3p target with LINC01296, and the high expression of LINC01229 showed a significant shorter overall survival of sarcoma patients (p = 0.0297).
The high expression of miRNA-301a-3p, miRNA-106b-5p, miRNA-130b-3p, and miRNA-423-3p were significantly associated with shorter overall survival of sarcoma patients (P < 0.0001, P = 0.0046, P = 0.0128, P = 0.0147, respectively). Overexpression of miR-301a was showed in Ewing sarcoma cell lines. Additionally, transfection of anti-miR-301a inhibited the proliferation and cell cycle progression of Ewing sarcoma cell (34). In the ceRNA network, the NEAT1/miR-301a-3p, XIST/miR-301a-3p, miR-301a-3p/NR3C2, miR-301a-3p/AR interactions were identified. XIST was significantly up-regulated in osteosarcoma tissues and cell lines, and increased XIST expression was associated with poor overall survival of patients (35). Nakatani et al. have defined miR-130b as an independent predictor of risk for disease progression and survival by Microarray analysis (36). Up to now, there is no study reporting that miRNA-423-3p and miRNA-106b-5p have association with sarcomas. NEAT1/miR-301a-3p/AR, XIST/miR-301a-3p/AR, NEAT1/miR-301a-3p/NR3C2, XIST/miR-301a-3p/AR axis predicted by the construction of ceRNA network may provide a more precise research direction for exploring biological mechanisms extending from this ceRNA network.
However, there are several limitations in the present study. Sarcomas contain multiple distinct subtypes and only two subtypes were involved in our study. More RNAs sequencing datasets of all subtypes of sarcomas were required for the construction of ceRNA network. Furthermore, regulatory ways of ceRNA network are very complex and ceRNA activity is influenced by multiple factors such as the abundance and subcellular location of ceRNA components, miRNA/ceRNA affinity, RNA editing and RBPs (7). This study only investigated the putative interaction of lncRNAs, miRNAs and genes, which require to be validated further.
In conclusion, our work successfully constructed a ceRNA network by bioinformatics analysis based on the GEO database, providing a comprehensive resource for investigating the ceRNA regulation in sarcomas. Importantly, we screened out candidate prognostic biomarkers involving in the ceRNA network, which may be exhibit important roles in the therapeutic target and prognosis analysis in sarcoma patients.
Author contributions:
Conception and design: Lu Gao and Ling Zhang.
Methodology and Software: Lu Gao, Yu Zhao and Xuelei Ma.
Writing-Original draft preparation: Lu Gao, Yu Zhao, Xuelei Ma and Ling Zhang.
Manuscript revision/review: Lu Gao and Ling Zhang.
Supervision: Ling Zhang.
Acknowledgements
This study was supported by grants from the Science and Technology Program of Sichuan Province (2018JY0165). We thank all of the participants in this study for their contributions.
Conflict of interests
The authors have no conflicts of interest to declare.