An Integrated mRNA-lncRNA Signature for Overall Survival Prediction in Cholangiocarcinoma


 BackgroundThe incidence and mortality rate of cholangiocarcinoma (CCA) have been rising globally. Patients with CCA have extremely poor prognosis, partly due to the silent clinical character and hence diagnosed at advantage stage without effective treatments. There is growing evidence showing that aberrant expression of messenger RNAs (mRNAs) and long non-coding RNAs (lncRNAs) are involved in tumorigenesis and development of CCA. It is essential to establish an integrated mRNA-lncRNA signature to improve the ability of prognostic prediction in CCA patients.MethodsWe collected a training dataset of 45 patients from The Cancer Genome Atlas dataset and a validation cohort (GSE107943) of 57 patients from Gene Expression Omnibus. An integrated mRNA-lncRNA risk score was established by a univariate and a multivariate Cox regression analyses. Time-dependent receiver operating characteristic (ROC) analysis was used to evaluate prognostic performance. Moreover, we conducted a correlation analysis between the signature and different clinical characteristics, and preformed weighted gene co-expression network analysis (WGCNA) and functional enrichment analysis to investigate functional roles of the integrated signature.ResultsA total of two mRNAs (CFHR3 and PIWIL4) and two lncRNAs (AC007285.1 and AC134682.1) were identified to construct the integrated signature through a univariate Cox regression (P-value = 1.35E-02) and a multivariable Cox analysis (P-value = 1.12E-02). The ROC curve suggested the integrated mRNA-lncRNA signature possessed a high specificity and sensitivity of prognostic prediction with an area under the curve (AUC) of 0.872 and 0.790 at 1-year and 3-years, respectively. Subsequently, the signature was validated in GSE107943 cohort and combined dataset, and an area under the ROC curve reached up to 0.750 and 0.819 at 1-year. The signature was not only independent from different clinical features (P-value= 1.12E-02), but also outperformed other clinical characteristics as prognostic biomarkers with AUC of 0.781 at 3 years. These molecules in the integrated signature may associated with metabolic-related biological process and lipid metabolism pathway, which was highly involved in CCA carcinogenesis. ConclusionThese results showed that the integrated mRNA-lncRNA signature had an independent prognostic value for risk stratification, and further facilitated personalized treatment for CCA patients.


Abstract Background
The incidence and mortality rate of cholangiocarcinoma (CCA) have been rising globally. Patients with CCA have extremely poor prognosis, partly due to the silent clinical character and hence diagnosed at advantage stage without effective treatments. There is growing evidence showing that aberrant expression of messenger RNAs (mRNAs) and long non-coding RNAs (lncRNAs) are involved in tumorigenesis and development of CCA. It is essential to establish an integrated mRNA-lncRNA signature to improve the ability of prognostic prediction in CCA patients.

Methods
We collected a training dataset of 45 patients from The Cancer Genome Atlas dataset and a validation cohort (GSE107943) of 57 patients from Gene Expression Omnibus. An integrated mRNA-lncRNA risk score was established by a univariate and a multivariate Cox regression analyses. Time-dependent receiver operating characteristic (ROC) analysis was used to evaluate prognostic performance. Moreover, we conducted a correlation analysis between the signature and different clinical characteristics, and preformed weighted gene co-expression network analysis (WGCNA) and functional enrichment analysis to investigate functional roles of the integrated signature.

Results
A total of two mRNAs (CFHR3 and PIWIL4) and two lncRNAs (AC007285.1 and AC134682.1) were identi ed to construct the integrated signature through a univariate Cox regression (P-value = 1.35E-02) and a multivariable Cox analysis (P-value = 1.12E-02). The ROC curve suggested the integrated mRNA-lncRNA signature possessed a high speci city and sensitivity of prognostic prediction with an area under the curve (AUC) of 0.872 and 0.790 at 1-year and 3-years, respectively. Subsequently, the signature was validated in GSE107943 cohort and combined dataset, and an area under the ROC curve reached up to 0.750 and 0.819 at 1-year. The signature was not only independent from different clinical features (P-value= 1.12E-02), but also outperformed other clinical characteristics as prognostic biomarkers with AUC of 0.781 at 3 years.
These molecules in the integrated signature may associated with metabolic-related biological process and lipid metabolism pathway, which was highly involved in CCA carcinogenesis.

Conclusion
These results showed that the integrated mRNA-lncRNA signature had an independent prognostic value for risk strati cation, and further facilitated personalized treatment for CCA patients.
Background Cholangiocarcinoma (CCA) is an aggressive biliary epithelial malignancy arising from within the liver termed as intrahepatic cholangiocarcinoma (ICCA) or more commonly from the extrahepatic bile ducts known as extrahepatic cholangiocarcinoma (ECCA) [1]. According to epidemiological reports in the past few decades, CCA is the second most common primary hepatic neoplasm, and its incidence and mortality rate have been rising globally [2][3][4]. Surgical resection is the only potentially curative treatment for CCA patients, however, the 5-year survival rate of CCA patients remains poor (<20%) [5]. It is also worth noting that most of the patients diagnosed at advanced stage resulting from asymptomatic conditions have a worse prognosis with a median overall survival (OS) of 12-15 months [2,3,6,7]. Furthermore, due to molecular heterogeneity and complex etiology of CCA patients, the commonly used tumor-node-metastasis (TNM) staging system has shown valuable but insu cient accuracy for prognostic evaluation [8]. Therefore, there is an urgent and critical need to develop novel and promising prognostic signatures for CCA patients to distinguish risk strati cation and consequently contribute to personalized treatments and follow-up plans.
It is widely acknowledged that messenger RNAs (mRNAs) and long noncoding RNAs (lncRNAs) play important roles in the development and progression of various tumors [9]. Numerous reports have indeed detected molecular signatures to predict survival outcome for different cancers, for example, identi ed mRNAs signatures for non-small cell lung cancer [10], colorectal cancer [11], and glioblastoma [12] and identi ed lncRNAs signatures for head and neck squamous cell carcinoma (HNSCC) [13], gastric cancer [14], and hepatocellular carcinoma [15]. Given the functional roles of mRNAs and lncRNAs in carcinogenesis and progression, combining them as a prognosis signature has become a better classi er as reported in triplenegative breast cancer [16] and colon cancer [17]. Regarding CCA, currently no effective clinical biomarker is available for classifying risk strati cation. Literature about clinical biomarkers is currently scarce. According to our current knowledge, a three-miRNA signature for prognosis [18] and a 7-mRNA biomarker for recurrence-free survival prediction [19] have been identi ed recently. However, investigation into the potential of combining both mRNAs and lncRNAs expression across whole transcriptome to predict overall survival in CCA has not been conducted.
To discover novel potential biomarkers and improve the accuracy of prognosis prediction in CCA patients, we rstly identi ed the differentially expressed mRNAs and lncRNAs between cholangiocarcinoma and normal tissues by analyzing high-throughput data from The Cancer Genome Atlas (TCGA) database. For these candidate mRNAs and lncRNAs, we further developed an independent mRNA-lncRNA signature using a univariate Cox regression analysis and a stepwise multivariable Cox analysis. Moreover, we also evaluated expression levels of the detected biomarkers across various datasets using meta-analysis and assessed the effective prognostic performance of the signature in an external dataset (GSE107943) as an independent biomarker. Finally, the module eigengene (ME) related to prognostic RNAs were determined by weighted gene co-expression network analysis (WGCNA), and biological functions related to the signature were investigated through GO and KEGG enrichment analysis. Taken together, the mRNA-lncRNA signature identi ed in the current study would contribute to improve prognosis accuracy, and thus facilitate individualized treatment in CCA patients.

CCA datasets and patient information
The training data termed as TCGA_CHOL contained 45 samples (36 CCA tumor tissues and 9 adjacent normal tissues) that were collected from the TCGA portal on July 3, 2019 [20]. The external validation cohort (GEO Accession: GSE107943) [21] with 30 CCA tumor tissues and 27 adjacent normal tissues was retrieved from Gene Expression Omnibus (GEO) database [22]. All samples from the two cohorts were sequenced across the whole transcriptome by RNA-seq high-throughput sequencing platform. The read counts and corresponding clinical information were publicly available and used in the current study. All the samples possessed mRNA and lncRNA expression data and complete survival information including survival status, survival time, and classic clinicopathological features in both training and validation datasets. The data collection and processing followed the publication guidelines provided by TCGA (http://cancergenome.nih.gov/publications/publicationguidelines) and GEO database, thus ethical approval was not required for this study.

Screening of differentially expressed RNAs
The expression pro les of 20271 mRNAs and 14852 lncRNAs in total were obtained from TCGA database.
We then re-annotated all RNAs in training and validation cohorts based on Gencode V30 (https://www.gencodegenes.org/). After removing the RNAs with mean expression value lower than one and a median read counts equal to zero across all samples, we obtained 17010 mRNAs and 6390 lncRNAs for differential expression RNAs screening. The edgeR and DESeq2 R packages were independently utilized for detecting the differentially expressed RNAs (DERNAs) between tumor tissues and normal or adjacent tissues of CCA patients. We adjusted the P-value by false discovery rate (FDR) as proposed by the Benjamini-Hochberg procedure to limit the occurrence rate of false positives [23]. The RNAs were considered as differently expressed at a threshold of |log 2 (fold change)| ≥ 1.5 and FDR < 0.05. The overlapped DERNAs obtained by edgeR and DESeq2 were employed to further analysis.
The mRNA-lncRNA signature construction and validation To uncover candidate prognostic RNAs related to OS of CCA patients, we performed a univariate Cox regression analysis for the overlapping DERNAs through the survival R package. The candidate DERNAs with a signi cant P-value (P-value < 0.01) were subjected to a stepwise multivariable Cox analysis to select the optimal combination for predicting survival outcome. Subsequently, we then combined the expression levels of those RNAs with the multivariable Cox regression coe cients as a weight to construct a risk score model for each patient. The formula was listed as follows: Where n is the number of molecules used to predict risk scores including mRNAs and lncRNAs, coe cient(RNA i ) corresponds to multivariable Cox regression coe cient of the i th RNA, and expression(RNA i ) represents expression level of the i th RNA. To con rm the expression level of selected markers among other dependent datasets, we retrieved the whole GEO database and collected dataset meeting two criteria: rstly, it possessed mRNA or lncRNA expression data of normal or adjacent tissues; secondly, their sample size was more than 3. The six datasets (GEO Accessions: GSE26566, GSE57555, GSE31370, GSE76297, and GSE32879) with a total of 413 individuals including 228 tumor tissues and 185 normal or adjected tissues were collected. However, there was no appropriate cohort for con rmation of lncRNAs expression status. Comprehensive meta-analyses for the selected cohorts were performed by the meta R package [24]. The inconsistency (I²) test and the Cochran's Q test were utilized to assess heterogeneity. When I 2 was greater than 50% or P-value was lower than 0.01, the random effect model was applied, otherwise, the xed effect model was implemented to weight the standard mean difference (SMD). Finally, the overall SMD and a 95% con dence interval (CI) were employed to measure generally expression differences of selected biomarkers cross various CCA groups.
The optimal cut-off selection of risk scores to classify CCA patients into the high or low risk score groups in the training dataset was determined by "cutp-function" of the survMisc package in R. The cut point was chosen by hazard function with the maximal sensitivity and speci city for survival rate [25]. Kaplan-Meier (KM) method was applied to evaluate survival differences between the high-risk and low-risk groups, and statistical signi cance was obtained by the log-rank test. Time-dependent receiver operating characteristic (ROC) analysis was conducted by the timeROC package [26] to compare prognostic performance for predicting OS through calculating the area under the curve (AUC) value. Besides, we validated the risk score model in the external independent dataset and the combined data comprised of TCGA_CHOL training cohort and GSE107943 validation dataset through the KM method and ROC analysis, respectively.

Weighted gene co-expression network analysis and functional enrichment analysis
To investigate the functional roles behind the integrated mRNA-lncRNA signature, we conducted Pearson's correlation coe cient between the prognostic RNAs and all mRNAs across whole transcriptome in TCGA_CHOL dataset. P-value < 0.05 and correlation coe cient > 0.6 were chosen as the threshold to lter the co-expression mRNAs with identi ed mRNAs and lncRNAs in the signature. Subsequently, weighted gene co-expression network analysis (WGCNA) was performed on co-expression mRNAs and prognostic mRNAs to explore co-expression modules associated with the risk model using WGCNA package [27]. Soft power parameter with 12 was used to construct the topological overlap matrix (TOM), and a dynamic hybrid cut method with a minimum module size of 30 genes was implemented to detect co-expression clusters. The relationship between module eigengene (ME) and risk scores was evaluated and further plotted as a heatmap. The genes from co-expression modules signi cantly related to the mRNA-lncRNA integrated signature were then submitted to functional enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) through the clusterPro ler R package [28]. The cut-off of qvalue < 0.05 was used to identify both signi cantly enriched GO terms and KEGG pathways.

Results
Clinical characteristics of CCA datasets Two independent datasets of CCA were collected in this study. The clinical characteristic of all CCA patients was summarized in Table 1. The training set from TCGA contained 36 CCA patients with a mean follow-up time of 806 days, ranging from 10 to 1976 days. The mean age of individuals from TCGA was 64. There were 18 (50%) patients alive at the time of the last follow-up. The 30 CCA patients from GSE107943 selected as validation set had a mean follow-up time of 334 days (ranging from 14 to 1140), the average age of 66 at initial pathologic diagnosis, and more than half of patients (17) dead during follow-up times.
Differentially expressed mRNAs and lncRNAs in CCA Based on the expression pro les of the TCGA_CHOL dataset, we compared mRNAs and lncRNAs expression level between 36 CCA tumor and 9 adjacent normal samples. A total of 4787 DEmRNAs ( Figure 1A) and 1950 DElncRNAs ( Figure 1B) were identi ed by DESeq2, whereas 4907 DEmRNAs ( Figure 1C) and 2216 DElncRNAs ( Figure 1D) were detected by edgeR. The 4628 DEmRNAs ( Figure 1E) and 1810 DElncRNAs ( Figure 1F) identi ed by both packages were utilized for further downstream analyses. The heatmaps showed that CCA samples were clearly distinguished from normal tissues based on the top 200 DEmRNAs and DElncRNAs ( Figure S1).

Development of integrated mRNA-lncRNA signature in the training cohort
To detect prognostic biomarkers in CCA patients, we carried out a univariate Cox proportional hazards regression analysis for 4628 DEmRNAs and 1810 DElncRNAs in the discovery set. A total of six mRNAs (ACRV1, TMEM121B, PIWIL4, GOLGA8M, CFHR3, and FUT4) and three lncRNAs (AC134682.1, AC007285.1, and AC138430.1) with P-value < 0.01 were determined to be signi cantly associated with overall survival (Table S1). The six mRNAs and three lncRNAs were further subjected to a stepwise multivariate Cox regression analysis. The optimally integrated mRNA-lncRNA signature was determined with the lowest value of Akaike information criterion [29], which included two mRNAs (CFHR3 and PIWIL4) and two lncRNAs (AC007285.1 and AC134682.1). The chromosomal position, hazard ratio, P-value, and coe cient of these four prognostic RNAs in CCA are provided in Table 2. Among these four RNAs, only CFHR3 had a positive coe cient, which suggested that higher expression was related with shorter survival, whereas the remaining three RNAs were protective factors as their negative coe cients implied that higher expression level was associated with higher survival rate. We subsequentially performed a meta-analysis for CFHR3 and PIWIL4 using random effect model due to the P-value of heterogeneity test less than 0.05 as shown in Figure S2. The pooled SMD of CFHR3 and PIWIL4 were -2.80 (95% CI: -4.13 to -1.47) and 1.46 (95% CI: 0.51 to 2.41), respectively, which provided additional con dence of prognostic value as these mRNAs that were also differently expressed cross various independent CCA cohorts.
To build integrated mRNA-lncRNA signature for survival prediction in CCA patients, we calculated the risk scores for each individual using expression level of the two mRNAs and the two lncRNAs weighted by their regression coe cients from above multivariate Cox analysis as follows: risk score = (3.18 × expression value of CFHR3) + (-1.62 × expression value of PIWIL4) + (-2.97 × expression value of AC007285.1) + (-1.95 × expression value of AC134682.1). The patients then were classi ed into high-risk group (23 patients) and low-risk group (13 patients) based on the optimal cut-off point (-0.14) determined by "cutp" function from survMisc package (Figure 2A). The survival status and the expression pattern of the four prognostic RNAs for each CCA patient in the discovery cohort are presented in Figure 2A as well. The Kaplan-Meier curve with a log-rank test suggested that patients in the low-risk group have signi cantly longer survival time compared to the patients in a high-risk group ( Figure 2B). Additionally, the univariate Cox regression model (Table 3) showed a 6.46-fold increase (P-value = 1.35E-02) of hazard ratio in the high-risk group compared with the low-risk group for OS. Time-dependent ROC curve for the risk score model in the training cohort is shown in Figure 2C, with an area under the curve (AUC) of 0.872 and 0.790 for 1-and 3-year OS prediction, respectively, which implied that the integrated mRNA-lncRNA signature possessed a high speci city and sensitivity.
Validation for the prognostic prediction value of integrated mRNA-lncRNA signature in the independent validation cohort and the combined dataset To evaluate robustness of the integrated mRNA-lncRNA signature for prognosis in CCA patients, we validated its prognostic ability in an independent cohort (GSE107943) obtained from GEO database and yielded similar results as we obtained from the training dataset. Individuals in the validation dataset were divided into high-risk group (16 patients) and low-risk group (14 patients) according to the threshold determined by the same method as for the training dataset. The survival outcome of high-risk group patients was signi cantly worse than that of low-risk group (P-value = 1.71E-04) as shown in Figure 3B.
Notably, there were 14 deaths in patients with high-risk scores and only three death events in low-risk group ( Figure 3A). The hazard ratio of high-risk group was 8.04 folds compared with that of low-risk group (95% CI= 2.26 -28.62, P-value= 1.29E-03) in the univariable analysis ( Table 3). The AUC of time-dependent ROC curve was 0.750 and 0.729 for 1-and 3-year overall survival prediction ( Figure 3C), representing the risk score model has a good performance in CCA patients' OS prediction.
Furthermore, we assessed prognostic performance of the integrated mRNA-lncRNA signature in the combined dataset, which was consistent with the ndings from the discovery or validation cohort. The principal components analysis (PCA) for the combined samples showed that there was clearly divergence between the training and validation individuals ( Figure S3A), which suggests that the batch effect existed in the combined data. To eliminate bias caused by the batch effect, we tted it as a xed effect in the formula design generated by DESeq2 package. As a result, the batch effect was adjusted properly ( Figure S3B-C). The patients were then divided into high-risk group (n =34) and low-risk group (n = 32) according to the same risk score model and criteria. Kaplan-Meier survival curves between two risk groups were signi cantly different in the combined dataset with a P-value of 5.51E-06 ( Figure 4B). The survival rates at 3-and 5-years were 26.47% and 23.53% for patients in the high-risk group, compared to 87.50% and 75.00% survival rate for patients in the low-risk at 3-and 5-years respectively. Patients with high-risk scores exhibited a 5.27-fold increased risk than patients in low-risk group ( Table 3). Results of time-dependent ROC curve analysis similar to those obtained from the training and validation datasets are presented in Figure 4C with AUC equal to 0.819 for 1-year and 0.781 for 3-years OS prediction.
Correlation between the integrated mRNA-lncRNA signature and other clinicopathologic characteristics To investigate independence of the integrated mRNA-lncRNA signature in survival prediction, a multivariate Cox regression analysis was performed including risk scores, age, gender, tumor stage, residual tumor, and histologic grade. In the training cohort, the integrated mRNA-lncRNA signature was the most signi cant (P-value= 1.12E-02) compared with the other clinical characteristics. Furthermore, after adjusting the age, gender, and tumor stage, we found the hazard ratios of overall survival in high-risk versus low-risk group were 7.70 and 5.27 in the validation and the combined dataset, respectively (Table 3).
Besides, the tumor stage was relatively associated with OS in the validation (P-value= 5.65E-02) and combined (P-value= 5.88E-02) dataset (Table 3). The strati cation analysis was carried out to estimate the relationship of the mRNA-lncRNA signature with tumor stage. All patients were classi ed into two subgroups: I/II stage with 49 samples and III/IV stage with 17 individuals. As shown in Figure 5A-B, KM curve observed patients with high-risk scores have signi cantly shorter survival time than that with low-risk scores in stage I/II (P-value = 4.71E-04) and stage III/IV (P-value = 4.97E-03) subgroup. Accordingly, the multivariate Cox and strati cation analysis demonstrated that prognostic power of the integrated mRNA-lncRNA signature was independent from other clinical features.
We also compared prognostic performance of the mRNA-lncRNA signature with other clinical features by calculating the AUC of time-dependent ROC. In the combined set, the AUC of mRNA-lncRNA risk scores at 3 years was 0.781, which was higher than that of tumor stage (AUC = 0.673), gender (AUC = 0.541), and age (AUC = 0.505) ( Figure 5C). These results demonstrated that the mRNA-lncRNA signature had a better prognostic power than other factors including tumor stage, age, and gender.
Functional roles of the integrated mRNA-lncRNA signature in the CCA biology We performed WGCNA for 1067 co-expressed mRNAs to cluster genes that highly correlated with the risk scores. A total of 5 modules were identi ed including a turquoise module with 522 mRNAs, a blue module with 272 mRNAs, a brown module with 139 mRNAs, a yellow module with 131 mRNAs and a grey module with three mRNAs. The turquoise module showed a higher correlation with risk score model than other modules, which had a high correlation coe cient of 0.87 (P-value=8.00E-12) with risk scores (Figure 6A-B). We then carried out GO and KEGG enrichment analyses based on the 522 genes from the turquoise module. These 522 genes signi cantly enriched in 567 GO terms and 48 KEGG pathways. The top 10 GO biological processes and KEGG pathways are shown in Figures 6C-D. These genes were mostly enriched in catabolic or metabolic biological processes, such as small molecule catabolic process, organic acid catabolic process, carboxylic acid catabolic process, and lipid related metabolic processes. The enriched KEGG pathways included metabolic-related pathways involved in cholesterol metabolism, drug metabolismcytochrome P450, glycine, serine and threonine metabolism, and primary bile acid biosynthesis. In addition, complement and coagulation cascades and PPAR signaling pathways were also enriched by these turquoise module genes.

Discussion
The TNM staging system is the most common indicator to predict survival time of patients with malignancy worldwide. Unfortunately, due to high molecular heterogeneity in CCA patients, it is di cult to predict OS by clinical features [30]. Up to the time of conducting this current study, only a few studies have performed using high-throughput sequencing data to identify more powerful molecular biomarkers for CCA prognosis. For example, miRNAs have been identi ed as prognostic markers by Cao et al. [18]. They discovered three miRNAs (miR-10b, miR-22, and miR-551b) that showed a relatively precise prediction with an AUC of 0.715 for 1-year and 0.723 for 3-years ( Figure S4). However, no study has endeavored to investigate candidate mRNAs and lncRNAs as an integrated prognostic signature for CCA. In our study, we identi ed an integrated prognostic signature consisting of two mRNAs (CFHR3 and PIWIL4) and two lncRNAs (AC007285.1 and AC134682.1) that could potentially be used for CCA patients' prognosis. The signature was further con rmed in the independent validation and complete dataset, and performed well in 1-and 3-year survival prediction according to time-dependent ROC curve ( Figure 2C, 3C, and 4C). Compared with the previous miRNA prognostic signature [18], prediction accuracy of our model showed improvement compared to that of [18] with higher AUC of 1-year (0.872 vs. 0.715) and 3-year (0.790 vs. 0.723) survival prediction in CCA patients from TCGA_CHOL cohort ( Figure S4). The multivariable Cox regression and strati ed analysis revealed that our integrated mRNA-lncRNA signature had the independent prognostic ability from other clinical features.
The relationship between these prognosis biomarkers and OS of CCA patients implied the signature's potentially vital roles in underlying mechanism of carcinogenesis and progression of CCA. Published records of mRNAs identi ed in our signature indicate that overexpressed CFHR3 (Complement Factor H-Related Protein 3) would suppress proliferation and promote apoptosis of hepatocellular carcinoma (HCC) cells [31] and is a potential prognosis biomarker for HCC [32]. PIWIL4 (piwi like RNA-mediated gene silencing 4) belongs to Piwi-like (Piwil) proteins and is aberrantly expressed in various human cancers, including breast cancer [33], retinoblastoma [34], and hepatocellular carcinoma [35]. As for the two antisense lncRNAs identi ed in this study, their functional roles have not been elucidated in any cancer. Therefore, to infer potential biological roles of the integrated mRNA-lncRNA signature, we performed WGCNA analysis for mRNAs strongly co-expressed with the discovered prognostic mRNAs and lncRNAs. The 522 genes from turquoise module signi cantly correlated with risk score model were mainly enriched in metabolic-related biological pathways and PPAR signaling pathway. These pathways are well documented as participating in the carcinogenesis and progression of CCA [36]. Various studies based on multiple CCA independent cohorts [37][38][39] also detected that metabolic-related biological processes including small molecule and lipid metabolic processes relate to energy metabolism were pivotal for CCA development. Increasing evidence has demonstrated that fatty acids synthesis related genes (FASN and SLC27A1) [40,41], fatty acid transport proteins (FATP2, FATP1, FATP5, and CD36), and fatty acid binding proteins (FABP1, FABP4, and FABP5) [42] contribute to CCA carcinogenesis. Additionally, PPARγ ligands suppressed cholangiocarcinoma cell growth [43,44] and induced the cholangiocarcinoma cell apoptosis [45], which suggested potentially vital roles of the PPAR signaling pathway in CCA pathogenesis.
To our knowledge, this study is the rst attempt to develop a prognostic signature for CCA patients through combining the expression pro les of both mRNA and lncRNA at genome-wide gene expression level. However, there are a few limitations to the current study. Firstly, the small size sample and the mismatched number of individuals between tumor and normal group may cause the false positive rate of DERNAs. Secondly, the RNA expression level was quanti ed based on CCA tissues, which may not precisely predict prognosis when body uids (saliva, serum, urine, or stool) are commonly used in clinical application. Hence, collecting more CCA samples and verifying prognostic value of the risk score model in samples from body uids are necessary for further research endeavors.

Conclusion
In conclusion, we performed a comprehensive analysis to develop an integrated mRNA-lncRNA signature for CCA patients' prognosis. The signature consisting of two mRNAs (CFHR3 and PIWIL4) and two lncRNAs (AC007285.1 and AC134682.1) was independent of other clinical characteristics including age, gender, tumor stage, residual tumor, and histologic grade. WGCNA indicated that the mRNAs strongly co-expressed with our signature were enriched in numerous metabolic processes and pathways, some of which have been reported to be involved in different cancers. Our ndings revealed that the integrated mRNA-lncRNA signature may serve as a valuable and alternative biomarker for CCA patients.

Declarations
Availability of data and materials The raw data involved in the current study are publicly available in TCGA (https://portal.gdc.cancer.gov/) and GEO (https://www.ncbi.nlm.nih.gov/geo/).

Con ict of Interest
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.

Funding
This study was supported by the National Natural Science Foundation of China (No.31601034 to XDR).

Author Contributions
ZF conceived and designed the study; ZLP, MR, and XDR analyzed the data; ZF, ZLP, MR and XHB wrote and revised the manuscript. All authors read and approved the nal manuscript.