Prognostic lncRNAs, miRNAs, and mRNAs Form a Competing Endogenous RNA Network in CESC

Background: Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) tumorigenesis involves a combination of multiple genetic alteration processes. Constructing a survival-associated competing endogenous RNA (ceRNA) network and a multi-mRNA-based prognostic signature model can help us better understand the complexity and genetic characteristics of CESC. Methods: The RNA-seq data and clinical information of CESC patients were downloaded from The Cancer Genome Atlas (TCGA) database. Differentially expressed mRNAs, lncRNAs and miRNAs were identied by edgeR package. Constructing prognostic model used the differentially expressed RNAs. The Kaplan-Meier method and log-rank test were performed to assess survival rates. The relationships between overall survival (OS) and clinical parameters were evaluated by Cox regression analysis. A survival-associated ceRNA network was constructed by multiMiR package and miRcode database. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and Gene Ontology (GO) were used to identify the functional role of the ceRNA network in the prognosis of CESC. Results: Differentially expressed 298 mRNAs, 8 miRNAs, and 29 lncRNAs were signicantly associated with the prognosis of CESC. The prognostic signature model based on 4 mRNAs (OPN3, DAAM2, HENMT1, and CAVIN3) was constructed. The prognostic ability was 0.726 for this model. Patients in the high-risk group were signicantly associated with worse OS. The KEGG pathways were signicantly enriched in the TGF-β and Cell adhesion molecules signaling pathways. Conclusion: This study identied several potential prognostic biomarkers to construct a multi-mRNA-based prognostic model for CESC.


Background
Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) is one of the most common malignancy worldwide, with more than 570,000 new cases and 274,000 deaths per year [1, 2]. The longterm survival rates of patients with early-stage disease have been greatly improved in recent years.
However, 5-year survival rate remains less than 16.8% in recurrent or metastatic patients. At present, the prediction of CESC prognosis mainly depends on the tumour-node-metastasis (TNM) stage. However, the TNM stage is based on anatomical information and does not re ect the biological heterogeneity of CESC. Hence, it is urgent to nd novel biomarkers based on transcriptomics that can act as prognostic indicators to guide precise individualized treatment.
Recently, the competing endogenous RNA (ceRNA) hypothesis have provided novel insights into the cancer research eld. The ceRNA hypothesis is proposed to be a special pathway to regulate the expression of RNAs [3]. In that hypothesis, miRNA acts as the hub gene that suppresses mRNA translation, while lncRNA competes to bind one or multiple sites of miRNA to suppress the function of miRNA and participates in post-transcription control [4]. Previous studies reported that the ceRNA networks might act as the biomarkers for prognosis in CESC. Song et al [5] constructed a CESC-associated ceRNA network which consisted of 50 lncRNAs, 81 mRNAs and 18 miRNAs, and found that several RNAs were associated with the prognosis. However, that study did not construct a prognostic model based on the prognostic RNAs for CESC. CESC is a heterogeneous disease with multiple gene alterations and interactions. Hence, it is of great signi cance to construct a CESC-associated ceRNA network and develop a multi-gene prognosis model based on the ceRNA network.
In this study, we analyzed the relationships between differentially expressed RNAs and overall survival (OS), and constructed a survival-associated ceRNA network and a prognostic signature model based on multi-mRNA for CESC. In addition, we utilized the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and Gene Ontology (GO) function annotation to further investigate the function of the survival-associated RNAs. This study aimed to provide a novel biomarker to guide personalized medicine and to facilitate understanding of the molecular mechanisms for CESC.

Data processing
The clinical information and RNA-seq data of CESC patients were downloaded from The Cancer Genome Atlas (https://portal.gdc.cancer.gov) (TCGA) database. The inclusion criteria were as follows: (1) Patients with complete clinical information, including T stage, M stage, N stage, pathological stage, survival status, age, and histological type. (2) Patients with complete lncRNA-seq, miRNA-seq, and mRNA-seq data. Finally, 240 CESC samples and 3 adjacent non-tumor samples were examined.

Differentially expressed RNAs analysis
Using the edgeR package of R software to identify differentially expressed mRNAs, lncRNAs, and miRNAs, respectively, between CESC and adjacent non-tumor samples [6,7]. P < 0.05 and the absolute value of fold change |FC| > 4 were considered to indicate statistical signi cance. The expression pro les of lncRNAs, miRNAs, and mRNAs were converted to log2(count + 1) values after normalization by edgeR.
Survival analysis and prognosis-associated signature model construction Survival R package was used for univariate Cox regression to assess the relationships between differentially expressed RNAs and OS [8]. mRNAs with a P < 0.001 in the univariate Cox regression were included into the multivariate Cox regression. P < 0.05 was considered to indicate statistical signi cance.
According to the cut-off of the median RS, CESC patients were divided into low-risk group and high-risk group. The time-dependent receiver operating characteristic (ROC) curve was applied to assess the prognostic accuracy of the model, and the area under curve (AUC) was used to assess the speci city and sensitivity of the model. Using R software (version 4.0.2) and the "survivalROC" package to perform the time-dependent ROC curve analysis.

Analysis of the protein expression levels of the 4 key mRNAs
The Human Protein Atlas (https://www.proteinatlas.org) provides tissue and cellular distribution information for approximately 26000 human proteins. The information was obtained via immunoassay techniques (immunohistochemistry, immuno uorescence, and western blotting) to detect proteins expression in 64 cell lines, 48 normal human tissues and 20 tumour tissues [9]. In this study, we compared the expression levels of the 4 key genes (OPN3, DAAM2, HENTM1, and CAVIN3) between normal and CESC tissues using immunohistochemistry images.

Functional annotation and pathway enrichment analysis
Using the ClusterPro ler R package [10] to process KEGG pathway enrichment analysis and GO function annotation. Using the ggplot2 R package [11] to visualize the results of the KEGG and GO analyses. P < 0.05 was considered to indicate statistical signi cance.

ceRNA network construction
We constructed a ceRNA network according to the identi ed prognostic lncRNAs, miRNAs, and mRNAs.
The multiMiR R package [12] and miRcode (http://www.mircode.org/) database were applied to predict miRNA-mRNA interactions and lncRNA-miRNA interactions. Finally, Cytoscape (version 3.7.0) software was utilized to construct a ceRNA network based on the lncRNA-miRNA-mRNA axis by combining lncRNA-miRNA interactions with miRNA-target gene interactions. In the ceRNA network, lncRNA and mRNA act as natural miRNA sponges to suppress miRNA function by binding one or multiple sites of miRNA.

Statistical analysis
Using the t-test to access the relationships between different groups and the gene expression pro les. The Kaplan-Meier method and log-rank test were performed to assess survival rates. Univariate Cox regression and multivariate Cox regression were used to analyze the relationships between the Clinicopathological parameters and OS. R software (version 4.0.2) was used to plot the gures and statistical analyses. P < 0.05 was considered to indicate statistical signi cance.

Survival-associated RNAs
The relationships between the differentially expressed lncRNAs, miRNAs, and mRNAs and OS were evaluated in 240 CESC patients. The univariate Cox regression revealed that 298 mRNAs, 8 miRNAs, and 129 lncRNAs were signi cantly associated with OS, respectively. The top 15 mRNAs, miRNAs, and lncRNAs ranked by p-value are presented in Fig. 2a

Predictive model for overall survival
To construct a mRNA based prognosis signature model for CESC, 8 mRNAs of P < 0.0001 in the univariate Cox regression were included into the multivariate Cox regression analysis. Finally, 4 mRNAs (OPN3, DAAM2, HENMT1, and CAVIN3) were identi ed. The information of those mRNAs are presented in Table 1. In addition, we evaluated the transcript and protein levels of above 4 genes between adjacent normal and CESC tissues using t-test and immunohistochemistry (IHC). The transcript levels of OPN3 (P = 0.013) and HENMT1 (P = 0.0095) were signi cantly upregulated, while the transcript levels of DAAM2 (P < 0.0001) and CAVIN3 (P < 0.0001) were signi cantly downregulated in CESC tissues compared with adjacent normal tissues (Fig. 5). On the other hand, the Human Protein Atlas database was used to evaluate the protein expression levels of above 4 genes in adjacent normal and CESC tissues by IHC. Based on the immunohistochemical staining images, the protein expression levels of OPN3 and HENMT1 were higher, while the protein expression level of CAVIN3 was lower in CESC samples than those in normal samples, which was consistent with the transcriptomics results (Fig. 6). A mRNA based prognosis signature RS model was constructed based on above 4 mRNAs. The RS was calculated with the following formula: RS= (Exp OPN3 × β OPN3 ) + (Exp DAAM2 × β DAAM2 ) + (Exp HENMT1 × β HENMT1 ) + (Exp CAVIN3 × β CAVIN3 ). The "Exp" value represents the expression level and the "β" value represents regression coe cient derived from the multivariate cox regression model.
The RS of each patient were calculated according to the above formula. Then, CESC patients were divided into low-risk group and high-risk group according to the cut-off of the median RS. Using t-test to compare the expression levels of the 4 mRNAs between low-risk group and high-risk group. The expression levels of OPN3 (P < 0.0001), DAAM2 (P < 0.0001), and CAVIN3 (P < 0.0001) were higher, while the expression levels of HENMT1 (P < 0.0001) was lower in the high-risk group than those in the low-risk group (Fig. 7). Figure 8 shows the performance of the mRNA-based model. Figure 8a shows the rank of Page 7/18 patients according to the RS. The scatter plot presents the OS of CESC patients decrease along with the increase of RS (Fig. 8b). The heat map shows the expression level of HENMT1 decrease along with the increase of RS, while the expression levels of OPN3, DAAM2 and CAVIN3 increase (Fig. 8c).
The Kaplan-Meier method and log-rank test were performed to identify the relationship between different RS groups and OS, and the ROC curve was used to exam the sensitivity and speci city of the model. CESC patients in the high-risk group were signi cantly associated with shorter OS (P < 0.0001) (Fig. 9a). The AUC of the RS revealed that the model showed prognostic assessment ability was 0.726 (Fig. 9b).
Several clinical parameters were found to have some prognosis value with univariate analysis, such as M stage (P = 0.02), N stage (P = 0.01), T stage (P = 0.001), pathological stage (P = 0.001) and the signature RS (P = 1.5e-6); however, only the signature RS remained statistically signi cant with the con rmation by multivariate analysis (HR = 6.35, P = 0.01, Table 2).

Discussion
In this study, distinct mRNAs, lncRNAs, and miRNAs were identi ed to further understand the molecular events related to CESC prognosis. In addition, a prognostic ceRNA network provides new insights for the prognosis of CESC.
CESE tumorigenesis involves a combination of multiple genetic alteration processes rather than simply due to a single gene alteration. However, almost all studies focused on a single or a single cluster 'driver' gene of CESC so far. Daniel et al [13] reported that Keratin-17 is a prognostic biomarker of CESC. Li et al [14] reported that FAM83A is a potential biomarker regulated by miR-20, which promotes the development of CESC through the PI3K/AKT/mTOR signaling pathway. At present, no single pivotal driver gene or gene cluster was reported to be superior to evaluate the prognosis of CESC. Moreover, TNM stage, as a major prognostic indicator, is based on anatomical information and does not re ect the biological heterogeneity of CESC. Hence, it is of great signi cance to identify the effective biomarkers and construct a multi-mRNA-based model based on survival-associated biomarkers to predict the prognosis of CESE.
Other studies constructed a ceRNA network of CESC, but there are some limitations. Song  constructed a multi-mRNA prognosis model which were comprised of ITGA5, HHEX, and S1PR4. Similarly, our study also constructed a CESC-associated prognosis model based on OPN3, DAAM2, HENMT1, and CAVIN3, and the evaluation ability was 0.726 of this model with ROC. Moreover, in our study, the Cox proportional hazard model was used, and we corrected for the following confounding factors that can greatly affect prognosis: T stage, N stage, M stage, and pathological stage. The RS signature of this model was signi cantly associated with the clinical outcome of CESC patients. These results indicate that the model we constructed is an independent prognostic factor.
The KEGG pathways analysis in the ceRNA network indicated that the targeted RNAs are signi cantly enriched in the TGF-beta signaling pathway and Cell adhesion molecules pathway. TGF-beta is a critical cancer-associated signaling pathway, which is involved in proliferation, apoptosis, differentiation, migration, and epithelia-mesenchymal transition (EMT) of cancer [18,19]. TGF-beta signaling pathway plays vital role in CESC as well. Deng et al [20] reported that CD36 and TGF-β interact to promote the EMT of CESC. Yang et al [21] found that downregulation of SEMA4C could inhibit EMT, invasion, and metastasis of CESC via inhibiting TGF-beta. Cell adhesion molecules pathway plays a critical role in the development of CESC. Carvalho et al [22] reported that L1 cell adhesion molecule expression is associated with a poor prognosis. The biological process analysis revealed that the main disturbed biological process by those survival-associated RNAs in the ceRNA network is the epithelial cell proliferation. The cytokinetic homeostasis is controlled by the balance of cell proliferation and apoptosis. Previous studies have reported that excessive cell proliferative activity is a precancerous lesion in some types of tumors. T Obara indicated that the epithelial cell proliferation activity is signi cantly stepwise increased from normal gallbladder mucosa to cancer [23]. However, different pathways are usually perturbed by different molecules and thus need to be further investigated in the laboratory.
Our study still has some limitations. Firstly, although the effects of OPN3, DAAM2, HENTM1, and CAVIN3 in our prognosis model on tumorigenesis have been reported in other types of cancer [24][25][26][27], the exact effects on CESC have yet to be fully elucidated and need to be veri ed by experiments. Secondly, the research data came from a single online database, and another independent cohort is needed to verify above results in the future. Thirdly, the information of cervical cancer patients from the TCGA should be assessed with another experimental method.

Conclusion
In conclusion, this study constructed a ceRNA network that provides novel insights for conducting the study of gene complexity in CESC. This study may provide novel biomarkers and assist in the design of molecular targeted therapy for CESC.    The prognostic ceRNA network in CESC. The connection indicates the interactions among lncRNA, miRNA, or mRNA.

Figure 6
Representative immunohistochemistry images of the 4 key mRNAs in CESC and adjacent normal tissues.
The protein expression levels of OPN3 and HENMT1 were higher, while the protein expression level of CAVIN3 was lower in CESC samples than those in normal samples (Human Protein Atlas).

Figure 7
Comparison of the transcript levels of the 4 key mRNAs between low-risk group and high-risk group. The expression levels of (a) OPN3, (b) DAAM2, and (d) CAVIN3 were higher, while the expression level of (c) HENMT1 was lower in the high-risk group than those in the low-risk group. * P < 0.05; ** P < 0.01; *** P < 0.001; **** P < 0.0001.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.