CCNA2 Expression and Its Prognostic Signicance in cholangiocarcinoma

Background: Cholangiocarcinoma (cid:0) CCA (cid:0) is a rare malignancy and it has become a signicant health burden worldwide. An increasing number of studies have demonstrated the crucial correlation of immunophenotypic characteristics modications in resected specimen of CCA. However, the accurate prognostic markers is still lacking in the prognosis of CCA. Methods: Gene expression proles and clinical data of CCA were downloaded from the Gene Expression Omnibus (cid:0) GEO (cid:0) database. GO and KEGG analysis were applied for differentially expressed genes in cholangiocarcinoma, and PPI network was constructed in Cytoscape software. The expression difference of Cyclin A2 (CCNA2) in CCA tissues and adjacent noncancerous tissues was analyzed by R software and veried by comprehensive analysis. The relationship between CCNA2 expression and immune inltration was assessed in the the Cancer Genome Atlas (TCGA) database. Kaplan–Meier survival analysis were chosen to assess the effect of CCNA2 expression on survival. Gene set enrichment analysis (GSEA) was used to screen the signaling pathways involved in CCA between the low and the high CCNA2 expression group. Results: The expression of CCNA2 in CCA was signicantly higher than that in adjacent cancerous tissues (P < 0.001) from the GEO database. The top 10 hub genes were mined by the Degree, MCC, and Closeness method based on the PPI network. CCNA2 was screened as the candidates to be further analyzed and validated. The Kaplan–Meier curves suggested that patients with high CCNA2 expression had a poor prognosis. Multivariate analysis showed that a high expression of CCNA2 was an important independent predictor of poor overall survival (P = 0.033). Cibersort analysis showed that the fraction of T cells CD4 (naive, P < 0.0001), T cells CD4 (memory, P = 0.0055), T cells (follicular, P = 0.0063), NK cells ( P = 0.0149), dendritic cells ( P = 0.029), neutrophils ( P = 0.0113) is signicantly correlated with the expression of CCNA2, which highlighted the CCNA2 expression in immune inltrates. GSEA indicated that 12 signaling pathways were evidently enriched in samples with the high-CCNA2 phenotype. Conclusions: CCNA2 might act as an oncogene in the progression of CCA and could be regarded as a potential prognostic indicator and therapeutic target for CCA. Kaplan–Meier analysis test compare survival


Introduction
Cholangiocarcinoma (CCA) is the most common malignant tumor in the intrahepatic biliary epithelium.
Its incidence accounts for about 3% of all gastrointestinal tumours and is the second most common primary liver tumour after hepatocellular carcinoma [1]. The onset of CCA is concealed, with no obvious symptoms or signs in the early stages [2]. Delayed clinical diagnosis limits the bene t of surgical treatment and curative management options, contributing to the poor outcome of CCA patients [3]. At present, surgical resection or liver transplantation is the main treatment method for CCA, but the prognosis of patients with surgical resections is poor, and the availability of livers for liver transplantation is rare. The long-term survival rate is still low, which causes great pain and economic burden to patients and their families [4]. Therefore, the early detection of CCA is essential for providing patients a curative therapeutic approach and to maximize clinical outcomes, prognostic biomarkers might help guide treatment decisions with respect to patient life expectancy and develop personalized treatments for individual CCA patients.
High-throughput genomics and epigenomics have greatly increased our understanding of CCA underlying biology, however its pathogenesis remains largely unknown. CCA is characterized by a highly desmoplastic microenvironment containing stromal cells, mainly cancer-associated broblasts, in ltrating tumor epithelium [5]. Tumor microenvironment in CCA is a highly dynamic environment that, besides stromal and endothelial cells, encompass also an abundance of immune cells, of both the innate and adaptive immune system (including tumor-associated macrophages, neutrophils, and T and B lymphocytes) and abundant proliferative factors [6,7]. It is orchestrated by multiple soluble factors and signals, that eventually de ne a tumor growth-permissive microenvironment [8,9]. Through complicate interactions with CCA cells, tumor microenvironment profoundly affects the proliferative and invasive abilities of epithelial cancer cells and plays an important role in accelerating neovascularization and preventing apoptosis of neoplastic cells [10].
The present study combined bioinformatics, transcriptomics and epidemiology in order to solve this issue and explore the molecular mechanism of CCA. Based on The Cancer Genome Atlas (TCGA) dataset, the Gene Expression Omnibus (GEO) dataset, we try to create a comprehensive "atlas" of CCA, and discover major cancer-causing genomic alterations [11]. In addition to their expression, their associated regulatory network was investigated. The data can elucidate the underlying etiology of CCA and provide reliable molecular targets for drug therapy. Further we discuss innovative treatment approaches, including immunotherapy, and how identi cation of CCA secreted factors by immune cell subsets are leading towards a precision medicine in CCA.

Identi cation of DEGs in cholangiocarcinoma (CCA)
We collected two datasets from GEO, including GSE26566 and GSE32225 and used SVA package to remove batch effects and other unwanted variation in experiments and obtained an integrated GEO datasets. Then, limma packages of R were used to screen for DEGs in the integrated GEO datasets. Finally, a total of 588 DEGs were identi ed, comprising 216 upregulated genes and 372 downregulated genes (Fig. 1A) at a cutoff |log2 fold change| >1 and padj < 0.05 (p-value adjusted for multiple testing using Benjamini-Hochberg method). The volcano plot of the DEGs is shown in Fig. 1B.

Functional enrichment analysis
GO and KEGG analyses were performed to examine the biological functions of DEGs in detail [12]. The results showed that complement and coagulation cascades, chemical carcinogenesis, and teroid hormone biosynthesis were signi cantly enriched ( Fig. 2A.B). GO analysis also described the results from 3 categories: monocarboxylic acid metabolic process,organic hydroxy compound metabolic process gland development. (Figure 2A). The bubble plot offered a visual representation of the aforementioned pathways.

PPI network construction and identi cation of hub genes
The PPI network analysis was further used to study the molecular mechanism of diseases and identify novel drug targets from a systematic perspective. The PPI network was conducted by STRING to explore the interactions between proteins encoded by these DEGs. We constructed a network of protein-protein interaction (PPI) by Cytoscape consisted of 447 nodes and 1301 edges according to the STRING database (Fig. 3A). Moreover, CytoHubba provides a user-friendly interface to examine the interactions among hub nodes in the biological network by topological analysis, which can aid the identi cation of the essential networks involved. The top 10 hub genes were mined by three network topology parameters, including Degree, MCC, and Closeness ( Fig. 3B-D). Finally, CCNA2 was screened as the candidates to be further analyzed and validated (Fig. 3E).

The Difference in Cyclin A2 (CCNA2) expression in CCA
The Cyclin A2 (CCNA2) expression data at the mRNA level were obtained from the GEO database (including 253 CCA tissues and 12 adjacent non-tumor tissues) and the TCGA database (including 36 CCA tissues and 9 adjacent non-tumor tissues). We found that the expression of CCNA2 upregulated in CCA tissues compared with that in adjacent noncancerous tissues, but especially in TCGA database (P < 0.0001). The scatter plot shows the mRNA expression pro les of CCNA2 in CCA tissues and adjacent noncancerous tissues.

High Expression of CCNA2 in CCA Is Related to Poor Overall Survival
We evaluated the prognosis of high-CCNA2 expression in CCA patients from the TCGA by Kaplan-Meier risk estimates. The results revealed that, compared with the low CCNA2 expression, the high CCNA2 expression was more signi cantly associated with a poor overall survival (P = 0.033, Fig. 5). The median OS of the high-CCNA2 expression group was 21 months while the median OS of the low-CCNA2 expression group was 31 months. As depicted in Fig. 5, the overexpression of CCNA2 genes is signi cantly correlated with poor prognosis in CCA patients (p < .05).

The Co-expression Analysis
According

Identi cation of CCNA2-Related Signaling Pathways by GSEA
On the basis of the KEGG and GO data, we explored the function of CCNA2 and its related signal transduction pathway through GSEA. In view of NES, FDR q-value, and nominal p-value, signi cantly enriched signaling pathways were selected. In this study, 12 signaling pathways involved in cytokinecytokine receptor interaction,complement and coagulation cascades, natural killer cell mediated cytotoxicity, cell cycle, ribosome,and calcium signaling pathway were differentially enriched in the highly expressed phenotypes of CCNA2 (Table 2-

Discussion
Cell cycle progression is regulated by the interaction of cyclin, cyclin dependent kinases [13]. Therefore, the detection of their expression are important in progression and control of the cell cycle in many tumours [14]. Several studies demonstrated high cyclin B1 expression in different carcinomas, which was associated with aggressiveness of tumour, and might serve as a prognostic marker [15]. It can be present in the cytoplasm and in the nucleus, the latter associated with poorer prognosis (as demonstrated in breast and oesophageal carcinomas) [16,17]. Furthermore, the cyclin A2 (CCNA2)-CDK2 complex is essential for the progression of G2 phase into mitosis [18]. CCNA2 is localised in the nucleus during S phase, controlling DNA synthesis [19].
In this study, we sought to determine the role of CCNA2 expression in CCA progression, especially as a prognostic factor for CCA. In addition, we also tried to screen signaling pathways related to CCNA2 in CCA to understand the underlying mechanism involved in the regulation of CCA development by CCNA2.
First, we analyzed the data in the GEO database and compared the differentially expressed genes in CCA and adjacent noncancerous tissues. Then, the PPI network analysis was used to further screen real hub genes with a signi cant p value. It's worth noting that CCNA2 were especially outstanding. It was not only highly correlated with CCA grade, but also may be potential biomarkers for prognosis. It is noteworthy that we established Kaplan-Meier risk estimates to predict survival of CCA patients based on tumorin ltrating immune cells, the survival rate of patients in the low-CCNA2 expression group was higher than that of patients in the high-CCNA2 expression group.
The tumor microenvironment often affects the invasive processes. The extracellular matrix molecules and secreted growth factors are involved in the transition of tumor cells into an invasive phenotype. The invasion and metastasis of tumor cells may have nothing to do with the proliferation of tumor cells but have occurred already at the early developmental stage of the tumor [20]. Previous studies about tumors and immune in ltration mostly focused on immune cell types. For example, Li B et al. infers the abundance of the six immune cell types (B cells, CD4 T cells, CD8 T cells, neutrophil, macrophage, and dendritic cells) using approach of constrained least squares tting and found many signi cant associations between immune cell abundance and outcome of 23 cancer types patients [21]. For instance, except for association with prolonged survival of patients, CD8 T cells may also play an important role in preventing tumor recurrence (in melanoma and colorectal cancer and cervical cancer.
Notably, the focus of our research is to estimate the degree of 6 immune cells in ltration, and nally determined their most promising co-expression patterns associated with CCNA2.These differences may illustrate that CCNA2 may affect the immune microenvironment of CCA to a certain extent.
Based on that, in this study, GSEA was implemented to calculate the immune cell in ltration levels for each sample. Since previous studies had shown immune-in ltration to have better prognosis in different carcinomas, we analyzed the CCNA2 expression in differential groups. CCA-related tumor samples from various databases and literatures were collected. Functional analysis showed these samples to be closely associated with the expression of CCNA2, such as via cytokine-cytokine receptor interaction,natural killer cell mediated cytotoxicity, cell cycle and calcium signaling pathway.
Our research also has some limitations. First, the clinical information is not perfect, and some important information, such as tumor size, was not provided. Second, there is a lack of speci c details, such as surgical treatment and surgical details, which are crucial to the prognosis of patients. Finally, it is impossible to evaluate the protein level and direct mechanism of CCNA2 in CCA from GEO and TCGA database.
In conclusion, our study rst analyzed the GEO and TCGA database and found that the expression of CCNA2 in CCA tissues is higher than that in adjacent noncancerous tissues. The upregulation of CCNA2 is closely correlated with some clinicopathological features of CCA, which are related to the occurrence and the development of CCA. Importantly, the fraction of 6 immune-related cells is signi cantly correlated with the expression of CCNA2, which highlighted the immunotherapy might actually change the therapeutic landscape of CCA. In summary, we found that the expression level of CCNA2 may be a marker for the diagnosis and the prognosis of CCA. In future analyses, other clinical trials will be needed to verify the corresponding results to reveal the prognostic value of CCNA2 in CCA

Data collection and processing
We obtained the expressing pro ling dataset GSE26566 and GSE32225 from the publicly available Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/). The expression dataset (GSE26566) contained 111 pairs of CCA tumors and matched non-tumor tissues, which was based on the GPL6104 platform (IlCCNA2ina HiSeq 2000). The expression dataset (GSE32225), which was generated on the GPL8432 platform (IlCCNA2ina HumanMethylation27 BeadChip), included a total of 155 primary CCA tissues and matched adjacent normal tissues (Table 4). GEOquery R package was used to download the expression pro les [22]. The SVA package was used to remove batch effects and other unwanted variation in experiments [23]. The differentially expressed genes (DEGs) of HNSCC were identi ed using "DESeq2" R package at a cutoff |log2 fold change| >1 and padj < 0.05 (p-value adjusted for multiple testing using Benjamini-Hochberg method).

Functional enrichment analysis
Gene ontology (GO) analysis provides a controlled vocabulary to describe gene and gene product attributes in any organism (http://www.geneontology.org) [24]. Pathway analysis is used to map genes to Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/). In this paper, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis and Gene Ontology (GO) annotation were performed using the R package "clusterPro ler". GO terms with P value < 0.05 and KEGG pathways with P value < 0.05 were considered to have signi cance.

PPI network construction, module analysis, and identi cation of hub genes
The PPI network of DEGs was constructed using the STRING online database with default parameters [25]. Cytoscape (http://www.cytoscape.org/) software was utilized to visualize the network [26]. Hub genes were identi ed using the CytoHubba according to three network topology parameters, including Degree, MCC, and Closeness [27].

CCNA2 Expression Analysis and Survival Analysis
Firstly, we extract the CCNA2 expression data from the integrated GEO datasets and TCGA dataset. Student's t-test was used to compare gene expression between tumor tissues and adjacent nontumorous tissues. Clinical information of patients in TCGA and CCNA2 expression data are combined into a matrix. According to the appropriate value of CCNA2 expression, the samples were divided into two groups (high-CCNA2 expression group and low-CCNA2 expression group). R software was applied to draw survival curves to visualize the impact of CCNA2 expression on patients' overall survival using survival package.

CIBERSORT estimation
We uploaded the gene expression data with standard annotation to the CIBERSORT web portal (http://cibersort.stanford.edu/), and the algorithm was run using the LM22 signature and 1000 permutations [28]. Cases with a CIBERSORT output of p < 0.05, indicating that the inferred fractions of immune cell populations produced by CIBERSORT are accurate, were considered to be eligible for further analysis [29].

Gene Set Enrichment Analysis
GSEA was used to explore the signaling pathways related to CCNA2 in CCA. Gene expression enrichment analysis was carried out between datasets with low or high CCNA2 mRNA expression. The phenotype was determined by the expression level of CCNA2 based on the TCGA database. The annotated gene set was selected (c2.cp.kegg.v6.2.symbols.gmt) as the reference gene set. A total of 1,000 gene sets were arranged in each analysis to determine signi cantly different pathways. Gene set permutations were performed 1,000 times for each analysis to identify signi cantly different pathways. The normalized enrichment score (NES), nominal p-value, and false discovery rate (FDR) q-value indicated the importance of the association between gene sets and pathways.

Statistical analysis
The difference in CCNA2 expression between CCA tissues and adjacent noncancerous tissues was tested by Mann-Whitney U test. The Wilcoxon rank-sum test was a nonparametric statistical test mainly utilized for comparing two groups and the Kruskal-Wallis test was suitable when it comes to two or more groups. Kaplan-Meier analysis and log-rank test were used to compare the signi cant differences in survival rates between the high-and the low-CCNA2 expression groups. All statistical analyses were performed with IBM SPSS statistical software (version 23.0) and R software (version 2.15.3), and P < 0.05 was used to determine the signi cance level.

Declarations
We declare that we have no nancial and personal relationships with other people or organizations that can inappropriately in uence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as in uencing the position presented in, or the review of, the manuscript entitled "CCNA2 Expression and Its Prognostic Signi cance in cholangiocarcinoma".
We con rm that we would like support with depositing and managing our data , and we can also submit datasets to Springer Nature as part of our publisher's Data Support Services.All data generated or analysed during this study are included in this published article.
Ethics approval and consent to participate There are no animal or human trials involved.
Consent for publication Not applicable.
Competing interests The authors declare that they have no competing interests. Funding Not applicable Authors' contributions All authors analyzed and interpreted the research data. Jie Zhang, Jingjun Zhang, Hairong Liu and Tao Ren was the major contributors in writing the manuscript.Meanwhile, all authors read and approved the nal manuscript.