Construction of a lncRNA-associated ceRNA network and identification of a potential six-lncRNA prognostic signature for esophageal cancer

BACKGROUND: Mounting evidence has shown that long noncoding RNAs (lncRNAs) can function as competing endogenous RNAs (ceRNAs) which participate in the initiation and progression of cancers. In the ceRNA network, lncRNAs, microRNAs (miRNAs) and mRNAs, communicate with and co-regulate each other. Rarely there is a systematic lncRNA-mediated ceRNA network and potential specific ceRNA pairs or triples of esophageal cancer (EC). In this study, we investigate the lncRNA-mediated ceRNA network in EC and screen the potential prognostic lncRNA biomarkers. METHODS: We obtained mRNA, miRNA, and lncRNA expression data and relevant clinical features on patients with EC from The Cancer Genome Atlas (TCGA), and used the edgR package to identify differentially expressed mRNAs, lncRNAs and miRNAs between EC samples and normal samples. The EC ceRNA network was constructed based on miRNA target prediction through the databases of miRcode, miRDB, miRTarBase and TargetScan. And then Pearson’s correlation analysis was adopted to identify co-expression mRNA-lncRNA pairs. Finally, the robust likelihood-based survival analysis and Cox regression models were used to identify prognosis-related lncRNAs, which was evaluated by Kaplan-Meier and receiver operating characteristic (ROC) curve analysis. RESULTS: A total of 3,200 mRNAs, 131 miRNAs and 1,338 lncRNAs were identified as significantly differentially expressed in EC, of which, 30 mRNAs, 15 lncRNAs, and 8 miRNAs were incorporated in the ceRNA network. According to the ceRNA network node degrees, lncRNA MAGI2-AS3, hsa-mir-93 and TGFBR2 were the key genes. Also, the ceRNA network revealed some important ceRNA pairs and triples, such as SNX29P2-TGFBR2 and MAGI2-AS-hsa-mir-143-COL1A1. Finally, we developed a six-lncRNA signature (ZNF341-AS1, CONCLUSIONS: present sheds new light on the tumorigenesis roles of lncRNA-mediated ceRNA network in EC and identifies a model that


Introduction
Esophageal cancer (EC) was the eighth most commonly occurring cancer overall and the sixth most common cause of cancer-related deaths with a mortality of 400,000 in 2012. The major histologic types of EC are esophageal squamous cell carcinoma and esophageal adenocarcinoma [1]. Thus, it is urgent to explore the molecular mechanisms underlying EC for developing effective diagnostic and therapeutic targets.
Long noncoding RNA (lncRNA), with transcripts ≥200 bp in length, can modulate gene expression at the transcriptional, posttranscriptional and epigenetic levels [2]. In 2011, Salmena et al. proposed a ceRNA hypothesis in which lncRNAs, mRNAs and other RNAs act as miRNA sponges via shared miRNA response elements (MREs) to suppress miRNA functions. While competing for binding to shared MREs, these RNA transcripts interact with and coregulate one another [3]. lncRNAs can be viewed as master regulators that modulate the expression of target genes.
Accumulating evidence shows that lncRNA-miRNA-mRNA networks are involved in the pathogenesis and progression of tumors [4][5][6][7]. However, our understanding of RNA transcript interactions in EC remains obscure.
Advancements in high-throughput sequencing technology and bioinformatics provide the opportunity to explore tumorigenic mechanisms [8]. The Cancer Genome Atlas (TCGA) database has profiles from more than 11,000 patient samples, obtained through largescale genome sequencing, together with clinicopathological information derived from over 30 human cancer types [9]. In this study, a lncRNA-related ceRNA network for EC was constructed, and the prognostic value of lncRNAs was subsequently evaluated. 4 Methods Data EC profiles including miRNA-seq and RNA-Seq data, together with patient clinical information, were downloaded from TCGA database in November 2018. All of the RNA-Sequencing (RNA-Seq) data in the present work were level 3 in TCGA, which has been normalized and processed [10].

Differentially expressed lncRNAs, miRNAs and mRNAs
The significantly differentially expressed mRNAs (DEmRNAs), miRNAs (DEmiRNAs) and lncRNAs (DElncRNAs) were identified in EC samples in comparison with adjacent nontumorous tissues using the edgeR package [11], and mRNAs and lncRNAs were defined and annotated by The Genome Research Project of ENCyclopedia of DNA Elements (GENCODE) (GRCh38) (v25) catalog [12].
Identification of competing lncRNA-miRNA pairs in the ceRNA network based on gene expression According to the ceRNA hypothesis, lncRNAs may positively regulate mRNAs by competing for their target miRNAs. To identify co-expressed lncRNA-mRNA pairs, we evaluated expression correlations among the above DEmRNAs and DElncRNAs using Pearson's correlation coefficients. A coefficient parameter >0.50 with a p-value < 0.05 was the criterion for retaining the lncRNA-mRNA pairs for further analysis [15].
Construction of the lncRNA-mediated ceRNA network and topological analysis The construction of the lncRNA-miRNA-mRNA ceRNA network was based on potential lncRNA-miRNA and miRNA-mRNA interactions and co-expression lncRNA-mRNA pairs.
The ceRNA network was visualized using Cytoscape software [16], and the NetWorkAnalyzer toolkit [17]was used to analyze the topology of the ceRNA network.
NetWorkAnalyzer is usually used to analyze the clustering coefficient, network node degree and average clustering coefficient. In this study, the network node degree (ND) was calculated. The number of edges connected to the node was defined as the node degree, displaying the local centrality of the node in the network [18]. A higher node degree indicated a greater importance of the node in the network.
Analyzing the relationship between the expression of DElncRNAs in the ceRNA network and EC pathological stage The relationship between DElncRNA expression and pathological tumor stage in EC was estimated using the Kruskal-Wallis test in R [19].

Functional enrichment analyses
To explore the function of lncRNAs in the ceRNA network, we performed enrichment analyses of the DEmRNAs involved in the network to reveal the potential biological functions and pathways ( p-value < 0.05) [20] in Gene Ontology (GO) and Kyoto

Survival analysis
To develop a lncRNA-based signature to improve prognostic prediction in EC, we randomly divided a total of 159 EC samples in TCGA into training and test cohorts. When the lncRNA expression level was above 0 and appeared in 50% of the total samples, it was defined as an abundance of expression. Univariate Cox regression survival analysis, followed by robust likelihood-based survival analysis, was performed to screen survival-related lncRNAs in the training cohort using the survival and rbsurv packages in R, respectively[23]. The procedure was in accordance with the report by Zhiqiang Wang [24].
Finally, the lncRNAs significantly associated with overall survival (OS) in the multivariate Cox survival analysis (p-value < 0.05) were selected to generate a risk-score formula.
The risk score for each patient in the training cohort was calculated using the formula. A time-dependent receiver operating characteristic (ROC) curve plotted using survivalROC package in R was adopted to assess the predictive capacity of the risk-score model, and the area under the curve (AUC) was calculated. An optimal cut-off point for the risk score was determined with maximal specificity and sensitivity. Based on the optimal cut-off point, survival differences in patients between the high-risk and compared by low-risk subgroups were assessed by the Kaplan-Meier method and log-rank test using the survival package in R [25].

Patient characteristics
EC miRNA-seq and RNA-Seq data derived from 170 samples, including 159 EC samples and 11 adjacent nontumorous samples, were downloaded from TCGA. A total of 159 EC cases were pathologically confirmed. The median age was 62 years old for the 159 EC patients.
The detailed clinical characteristics and pathological characteristics of the 159 EC patients are shown in Table 1. Identifying DEmRNAs, DEmiRNAs, and DElncRNAs A total of 3,200 DEmRNAs, 131 DEmiRNAs and 1,338 DElncRNAs (Table S1) (Table S2). Based on multivariate Cox regression analysis with those featured lncRNAs, six prognostic lncRNAs with statistical significance (p-value < 0.05) were further selected, and the risk coefficients were also calculated. The risk scoring system for each patient was calculated as follows: risk score = (0.66) × ZNF341- ROC analysis was used to estimate the sensitivity and specificity of the six-lncRNA signature for survival prediction. An optimal cut-off point of 0.589 was selected, which yielded the maximal specificity and sensitivity as evidenced by the AUC 0.93 ( Figure 7A).
Additionally, the risk scores for each patient in the training cohort were calculated. Using the optimal cut-off point, patients were classified into a high-risk group (n = 46) and a low-risk group (n = 34). Figure 7B shows the distribution of patient survival status and survival time. Survival differences, estimated with the Kaplan-Meier method and compared by log-rank test, demonstrated that patients with a high-risk score had a shorter survival time (p-value <0.0001) ( Figure 7B). Subsequently, this risk score formula was verified in the test cohort and the total cohort ( Figure 7C, D).

Discussion
In recent years, researchers have observed that noncoding RNAs have a striking variety of biological functions [22].
Understanding the crosstalk among lncRNAs, miRNAs and mRNAs on the basis of the ceRNA hypothesis will lead to significant insights into gene regulatory networks with implications on tumorigenesis. However, there have been relatively few studies on the roles of lncRNAs, miRNAs and mRNAs as ceRNAs in EC.
In the present study, through bioinformatics miRNA target prediction and co-expression analysis of lncRNA-mRNA pairs, we constructed a ceRNA network with 15 DElncRNAs, 30 DEmRNAs and 8 DEmiRNAs in EC. In our ceRNA network, lncRNAs and mRNAs interacted with each other via miRNAs, forming a complex regulatory network.
ceRNAs can affect the abundance of miRNAs and their impact on targets and impose an additional level of posttranscriptional regulation. MiRNAs act as key regulators in ceRNA networks [27]. In this study, hsa-mir-93 represented the node with the highest degree in the EC network and targeted 7 lncRNAs and 14 mRNAs (TGFBR2, OSR1, SLC2A4, etc.).
These results were not consistent with the EC ceRNA network proposed by Li-Ping Chen et al., as hsa-mir-93 was not in their ceRNA network [28]. In previous studies, hsa-mir-93 was shown to be an important oncogene in prostate cancer [29] and to function as biomarkers for the diagnosis and prognosis of esophageal cancer. [30] Additionally, many studies have confirmed that hsa-mir-93 shares several lncRNA-binding sites [31][32][33]. Our results may shed new light on the interactions of hsa-mir-93 with lncRNAs and mRNAs in EC.
In our ceRNA network, SNX29P2-TGFBR2 pair was the common target of several miRNAs, such as hsa-mir-93, hsa-mir-372, hsa-mir-17 and hsa-mir-145. TGFBR2 was also the mRNA node with the highest degree in the EC network. TGFBR2 is a member of the TGFB receptor subfamily. In previous research, TGF-β signaling was shown to play an important role in carcinogenesis [34], and hsa-mir-17 can impede migration and invasion via TGF-β pathway in esophageal squamous cell carcinoma [35]. Our data implied that lncRNA SNX29P2 might function as a ceRNA to impede the TGF-beta pathway in EC.
In the EC ceRNA network, sponging of overexpressed lncRNA AC009093.1 and under expressed lncRNA MAGI2-AS3, relieved their targets COL1A1 and COL5A2 from being suppressed by hsa-mir-143. Previous studies have suggested that hsa-mir-143 functions as a tumor suppressor in several cancer types [36,37], and COL1A1 has been shown to act as an oncogene to regulate tumors [38][39][40]. Our findings implied that hsa-mir-143 may function as a tumor suppressor by targeting COL1A1 and COL5A2 in EC and competing with lncRNA MAGI2-AS3/AC009093.1. Our study has provided a global view of regulatory ceRNA interaction networks in EC and laid a foundation for further functional research in EC. Because of their functional roles, lncRNAs have attracted much attention in the discovery of prognostic biomarkers [42].
Some studies have reported that lncRNAs may serve as biomarkers for EC, however, there are no commonly accepted prognostic biomarkers.
In survival analysis, the lncRNA expression data may have led to problems in predictive modeling because these data indicate an imbalance between a large number of covariates involving lncRNAs and a small number of samples, denoted as an issue with dimensionality [43].
In this study, to overcome the limitation involving high-dimensionality of expression data for 514 lncRNAs, we used a robust likelihood-based survival model utilizing a crossvalidation technique to reduce dimensionality [44,45]and finally developed a six-lncRNA signature to predict OS for EC patients. There have been very few studies on ZNF341-AS1, AC130324.2, AC027271.1, AL591212.1, AL732314.4 and LOC105372352. The six-gene signature was identified as an indicator for EC with high prognostic significance and demonstrated that lncRNAs have significant prognostic value for EC patients [46].
Our results were not consistent with the findings of Guo-Wei Huang et al. [47], in which a three-lncRNA signature including RP11-366H4.1.1, LINC00460 and AC093850.2 was identified as a potential prognostic indicator for esophageal squamous cell carcinoma. The discrepancy may be mainly ascribed to data and prediction method.

Conclusions
Taken together, through construction the ceRNA network of EC with 15 lncRNAs competed with 30 mRNAs for 8 miRNAs, we revealed important gens and the specific lncRNA-miRNA-mRNA interactions, shedding new light on the functional roles of lncRNA-mediated ceRNA network in esophageal tumorigenesis. Moreover, we identified a six-lncRNA model that could be used as candidate prognostic signature.
However, some limitations should be taken into consideration. ceRNA network construction was performed using bioinformatics predictions and lacked functional validation of ceRNA pairs (lncRNA-mRNA) and triples (lncRNA-miRNA-mRNA). Additionally, the EC sample size was not large, and validation of this six-lncRNA signature in larger cohorts of EC patients and clinical trials is essential for further investigation. In conclusion, we identified a six-lncRNA signature that might predict the prognosis of EC patients. Further study is required to explore the functional roles of these lncRNAs in esophageal carcinogenesis.

Availability of data and material
The data that support the findings of this study are openly available in TCGA at https://portal.gdc.cancer.gov. Detailed information is available in main text.

Competing interests
The authors declare no conflict of interest.

Funding
No funding. wrote codes. Wenfeng Ning revised the manuscript, designed and guided the project. All authors read and approved the final manuscript. Tables   Table 1 Clinicopathological      Top 10 genes with high node degree in the ceRNA network. X and Y axes represent genes and node degrees, respectively.

Figure 5
The difference between DElncRNA expression levels in different pathological stages. Stage III/IV represented stage III and stage IV; p-value < 0.05 was considered statistically significant.