Expression of Coiled-Coil Domain Containing Family mRNA and prognostic value in hepatocellular carcinoma

Background: The CCDC family plays a signicant role in the development and progression of malignant tumors. However, the relationship between CCDC family members and HCC progression is incompletely known. This study used bioinformatics analysis to investigate the expression as well as clinical prognostic value of CCDC family members in HCC and to predict the role of CCDCs family in the development and progression of HCC. Methods: This study utilized the data from two platforms databases to explore the diagnostic value and prognostic signicance of CCDC family members by Cox proportional hazards regression analysis, Kaplan-Meier curve and log-rank test, ROC and nomogram diagnostic and prognostic analysis methods. GSEA and tumor microenvironment analysis were employed to investigate the underlying mechanisms and cell-cell interactions of CCDCs family in the development and progression of HCC. The relationship between mutational signatures and CCDCs family were evaluated in HCC patients with somatic mutation. Results: Five CCDC family members (CCDC34, CCDC137, CCDC77, CCDC93 and CCDC21) mRNA expression showed signicantly higher in HCC tissues than in normal tissues and high expression levels of these genes predicted poor prognosis in HCC patients. The combined effect analysis of ve CCDCs family prognostic markers suggests that the prognosis difference for CCDC family members combination was more signicant than that for any individual CCDC family genes. We then developed a risk score model that could predict the prognosis of HCC, and nomogram gene expression was visualized with the probability of predicting the prognosis of HCC by clinical factors. GSEA revealed that, while ve CCDCs family combined high expression was associated with increased cell cycle progression and low expression was associated with complement activation pathway. Mutation analysis showed that the combined high expression group had a higher TP53 mutation rate than the combined low expression group, and the high expression group showed higher TMB, which was associated with a better prognosis than high TMB. Conclusions: Our data suggest that the expression of CCDC34, CCDC137, CCDC77, CCDC93 and CCDC21 may be potential prognostic markers in HCC and in combination have a strong interaction In this study, we investigated the relationship between the expression of individual CCDCGs and the OS of HCC patients. In addition, we investigated the joint effects of the ve OSDECCDCGs, constructed an effective CCDCGs-related risk signature model and constructed a nomogram to predict the prognosis of HCC patients. Our model combination was statistically signicant in predicting the prognosis of HCC. In addition, we also investigated the potential mechanism of CCDCGs in the prognosis of HCC patients using GSEA method. These results provide insight into the CCDCGs for cancer clinical outcomes that may serve as biomarkers for predicting the prognosis of HCC patients. There are also some limitations in our study: basic experiments are needed to validate the CCDCGs prognostic signature; further studies with larger sample sizes are needed in the future for our prognostic signature and its relationship with the CCDCGs.

expression showed signi cantly higher in HCC tissues than in normal tissues and high expression levels of these genes predicted poor prognosis in HCC patients. The combined effect analysis of ve CCDCs family prognostic markers suggests that the prognosis difference for CCDC family members combination was more signi cant than that for any individual CCDC family genes. We then developed a risk score model that could predict the prognosis of HCC, and nomogram gene expression was visualized with the probability of predicting the prognosis of HCC by clinical factors. GSEA revealed that, while ve CCDCs family combined high expression was associated with increased cell cycle progression and low expression was associated with complement activation pathway. Mutation analysis showed that the combined high expression group had a higher TP53 mutation rate than the combined low expression group, and the high expression group showed higher TMB, which was associated with a better prognosis than high TMB.
Conclusions: Our data suggest that the expression of CCDC34, CCDC137, CCDC77, CCDC93 and CCDC21 may be potential prognostic markers in HCC and in combination have a strong interaction and better predictive value for HCC prognosis.
Background Liver cancer is one of the most common malignancies worldwide and hepatocellular carcinoma (HCC) is the most frequent type of global cancer mortality rates [1]. HCC is highly malignant, invasive and metastatic, and has a poor prognosis, posing a serious threat to human health [2]. Although there is an increasing number of tumor markers used for cancer diagnosis and prognosis, there is not yet a highly discriminative, sensitive, and speci c tumor marker that can be used to evaluate HCC condition, e cacy, and prognosis [3,4]. Accordingly, there is an increasingly urgent need to identify useful biomarkers that can accurately display the biological features and prognostic outcomes in patients with HCC. Coiled-coil domain containing (CCDC) protein family is a homo-or oligomeric sequence protein composed of two or more coiled coil domains. Because of CCDCs family special coiled coil structure, it makes its protein prone to spatial conformational changes, thereby exerting multiple cell biological functions. Recently, an increasing number of studies have identi ed that CCDC proteins play important roles in the development and progression of various malignancies, including thyroid, pancreatic, and breast cancer [5][6][7]. Moreover, the speci c molecular mechanisms of CCDCs family in tumor malignant progression and distal metastasis are being further explored. Studies have reported that abnormal expression of CCDCs family can promote tumor cell motility and migration by regulating the rearrangement of the cytoskeleton and activating the activation of intracellular signaling phospholipase C with mitogen activated proteases [8].
However, the potential application value and mechanism of CCDCs family in prognosis prediction of HCC patients are not fully understood. The main objective of this study was to investigate this linkage by collecting data from public databases and performing a series of bioinformatics analyses.

Data acquisition and collation
The RNA-sequencing recorded based on FPKM and corresponding clinical data of 371 HCC patients were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/repository) database.
The RNA-Seq mRNA expression data and clinical pathological data of 243 samples with HCC patients were available in the International Cancer Genome Consortium (ICGC: https://dcc.icgc.org/projects/LIRI-JP) database. Then, coiled-coil domain containing genes (CCDCGs) were retrieved from GeneCards website (https://www.genecards.org/) and provided in supplementary table.
Identi cation of differentially expressed CCDCGs in HCC patient's prognosis In both TCGA dataset and ICGC dataset, the expression pro le data were log 2 (FPKM + 1) transformed to identify the differentially expressed genes between tumor tissues and normal tissues, respectively. The false discovery rate (FDR) less than 0.05 and the |log 2 FC| greater than 1 were set as cutoff values. The union of the two datasets was nally taken to screen for genes with differential expression of CCDCGs (DECCDCGs). The DECCDCGs in the two datasets was analyzed with the R studio software (version 3.6.3), using the "limma" package. Univariate Cox analysis of HCC patient's overall survival (OS) of TCGA dataset and ICGC dataset were performed to screen DECCDCGs with prognostic value. To evaluate the prognostic value of CCDCGs, both datasets were screened for differential expression of genes associated with HCC prognosis with a P-value less than 0.05.
Construction and validation of a prognostically relevant DECCDCGs family signature Using data from the TCGA dataset, a Cox regression model for OS was constructed and further validated in the ICGC dataset. The prognosis related DECCDCGs of the TCGA dataset were subjected to multivariate Cox regression analysis and lasso Cox regression, respectively, to screen candidate genes. After model building, the signature genes involved in the model are further validated in various ways: (1) HCC patients of the TCGA dataset with their corresponding calculated risk score were divided into low-risk and high-risk prognostic groups according to the median value. Survival analyses were performed using Kaplan-Meier curve and log-rank test in TCGA dataset at low-and high-risk. The model was then validated in the ICGC dataset with HCC patient's data. (2) Time-dependent receiver-operating characteristic (ROC) analysis was performed using the "survivalRoc" R package in the TCGA and ICGC datasets, and AUC values were calculated for 1-, 3-, and 5-year survival. Univariate and multivariate Cox analyses were also used to de ne independent risk variables. Meanwhile, strati ed analysis was performed according to clinicopathological characteristics.

Building and validating a predictive nomogram
The independent prognostic factors were further integrated, and nomograms were established to examine the 1-, 3-, and 5-year predictive abilities of HCC. The nomogram was evaluated and validated by plotting the calibration curve of the nomogram, observing the relationship between nomogram predicted probabilities and observed rates. The nomograms including all independent prognostic factors were subsequently compared using ROC curves.
Gene set enrichment analysis (GSEA) and HCC tumor microenvironment analysis GSEA (http://software.broadinstitute.org/gsea/index.jsp) was used to determine the enrichment of certain gene ranks in pre-de ned biological processes. According to the prognostic model, the HCC samples of TCGA dataset were divided into high-risk group and low-risk group. The potential biological functions of key genes were de ned by comparing the enriched biological processes in the two groups. Molecular characterization data http://software.broadinstitute.org/gsea/msigdb/index.jsp MSIgDB) c2.cp.kegg.v6.0.symbols.gmt and c5.bp.v7.1.symbols.gmtWas selected as the reference gene set in the GSEA software. The threshold of signi cance was set as |NES|>1, FDR < 0.25, and P value < 0.05. Stromal score and immune score were performed by estimate algorithm on downloaded TCGA HCC gene expression data.

Mutation analysis
TCGA data was downloaded from the Genomic Data Commons Data Portal (GDC portal, https://portal.gdc.cancer.gov/). The TCGA database included 350 HCC patients were analyzed. The quantity and quality of gene mutations were analyzed in low-group and high-group by using the Maftools package of R software. tumour mutational burden (TMB) was de ned as the number of somatic, coding, base substitution, and indel mutations per megabase (Mb) of genome examined. The exome size was estimated as 38 Mb. For studies reporting mutation number from whole exome sequencing, the normalized TMB = (whole exome non-synonymous mutations)/(38 Mb).

Result
Collation of samples and clinical data HCC and normal samples in TCGA and ICGC databases were downloaded with 370 tumour tissues and 50 normal tissues in TCGA dataset and 240 tumour tissues and 202 normal tissues in ICGC dataset. Samples without clinical information were excluded for analysis of clinical and survival prognosis.
Detailed clinical information of the HCC patients included in this study was shown in Table 1. Identi cation of prognostic DECCDCGs in HCC TCGA data matrix, 93 differentially correlated genes (FDR < 0.05) were obtained, of which 19 were differentially expressed genes that satis ed the |log 2 FC | > 1 condition (Fig. 1a). In the ICGC data matrix, 85 differentially correlated genes (FDR < 0.05) were obtained, of which 19 genes were differentially expressed that satis ed the |log 2 FC| > 1 condition. (Fig. 1b). Twenty-one DECCDCGs were identi ed at least one dataset in the two datasets (|log 2 FC| > 1 and FDR < 0.05) (Fig. 1c). Seventeen prognostic DECCDCGs were identi ed in the TCGA database (Fig. 1d), and eight prognostic DECCDCGs were identi ed in the ICGC database (Fig. 1e). Eight prognostic DECCDCGs were shared between TCGA and ICGC databases (Fig. 1f). DECCDCGs signature The lasso Cox method and the multivariate Cox regression method were applied to select candidate genes using the univariate Cox regression results for the 8 genes in the TCGA dataset. It was found that ve OS related prognostic DECCDCGs (OS-DECCDCGs) were obtained by lasso Cox method (Fig. 2a). After extracting coe cients from the results, an individualized risk score was calculated from the coe cient weighted expression levels of the ve OS-DECCDCGs as follows: e (0.094 * expression level of CCDC34 + 0.344 * expression level of CCDC137 + 0.258 * expression level of CCDC77 + 0.047 * expression level of CCDC93 + 0.301 * expression level of CCDC21) . Three OS-DECCDCGs were derived by Cox multivariate analysis. An individualized risk score was calculated from the coe cient weighted expression levels of the 3 OS-DECCDCGs as follows: 0.4426 * expression level of CCDC137 + 0.3979 * expression level of CCDC21 + 0.3936 * expression level of CCDC77. Based on these prognostic genes, a prognostic risk signature for OS was established. To evaluate the prognostic ability of the risk model, the ICGC dataset was used for model validation. Two model comparison analysis found that the predictive accuracy of the two models were comparable in the TCGA dataset (Fig. 2b), but the model derived by the lasso Cox method in the ICGC dataset had higher predictive accuracy than the model derived by multivariate Cox regression analysis (Fig. 2c).
Based on the ve OS-DECCDCGs models, a risk score was calculated and ranked for patients with HCC in the TCGA dataset. Based on the median risk score, the OS prognosis of patients in the TCGA dataset was divided into high-and low-risk groups. Kaplan-Meier and log-rank analysis demonstrated a signi cant difference in OS between the two risk groups in the TCGA and ICGC datasets (P < 0.0001) (Fig. 2d, g). The risk score, survival status, and the ve OS-DECCDCGs expression pro les of HCC patients are shown in the TCGA and ICGC datasets (Fig. 2e, h). For OS, the area under the ROC curve (AUC) at 1-, 3-and 5-years were 0.760, 0.693, 0.642 in the TCGA group respectively and 0.755, 0.779, 0.694 in the ICGC group respectively (Fig. 2f, i). To evaluate whether the model was independent of other clinical variables in patients with HCC, univariate and multivariate Cox regression analyses including clinical factors and risk scores were performed. The results showed that the ve OS-DECCDCGs combined model remained signi cantly associated with overall survival after adjusting for tumor grade, clinical stage, age and gender (Table2). The HR for overall survival in the high-risk versus low-risk group was 2.31 (95% CI = 1.56-3.4; P < 0.001) in the TCGA dataset and the HR in the ICGC dataset was 2.53 (95% CI = 1.19-5.38; P = 0.016). The overall survival of patients with HCC was compared between low-and high-risk groups in the TCGA dataset and ICGC dataset patients, and among each subgroup of the TCGA dataset, patients with a prognostic score classi ed as low risk had a better OS than patients with a high-risk score (P < 0.05) In all subgroups of ICGC, except for early stage and tumor histologic grade G3-G4 stage, the prognostic score was also signi cantly higher for patients in the low-risk group compared with the high-risk group (P < 0.01) ( Table 3).  Predictive nomogram construction and performance assessment In addition, a nomogram was established to predict OS in patients with two independent prognostic factors, including stage and the risk score (Fig. 3a). The calibration curve indicated that the nomogram was able to accurately predict 1-, 3-and 5-year OS (Fig. 3b-d). The combined model shows good accuracy and stability in clinical outcome prediction either for 1-year (AUC=0.778), 3-year (AUC=0.737) and 5-year (AUC=0.701) OS of HCC patients, when compared with single clinical factors ( Fig. 3e-g). The results suggested that the nomogram constructed using the combined model might be the best nomogram for predicting the survival of HCC patients compared with the nomogram established using single prognostic factors, which might be helpful for clinical management.
Gene set enrichment analysis (GSEA) and tumor microenvironment analysis In addition, GSEA analyzed transcriptional information of patients strati ed by risk score into high-and low-risk groups. Both KEGG pathway (Fig. 4a) and GO biological pathway (Fig. 4b) enrichment found that cell cycle was enriched in the high-risk group. Pathways enriched in the low-risk group were complement activation pathway related, suggesting that complement regulatory processes may play a protective role in low-risk patients. The stromal score and immune score based on the ESTIMATE algorithm based on the TCGA HCC gene expression data showed that in the TCGA dataset, patients with HCC were divided into high-and low-risk groups by stromal and immune scores, and the analysis found no differences in immune scores between the high-and low-risk groups (p= 0.124), but the high-risk group had lower stromal scores than the low-risk group (p < 0.05) (Fig. 4c). Patients with a lower stromal score had a worse prognosis (Fig. 4d), and these results suggest that complement may in uence tumor growth by altering the tumor microenvironment.
Mutational pro le of CCDCGs related risk signature TCGA HCC patients assessed the relationship between mutational signatures and signatures using available somatic mutation data. The most frequently mutated genes in the high-and low-risk groups are shown (Fig. 5a is high-risk group and Fig. 5b is low-risk group). TP53, GRIA1 and other genes were found to be differentially mutated genes in high and low risk groups (Fig. 5c), and TP53 was found to be mutated signi cantly more frequently in high-risk group than in low-risk group (Fig. 5d). High risk patients were found to have a higher TMB in the TCGA somatic mutation data (Fig. 5e), and the higher the mutation rate was found to have a worse prognosis (Fig. 5f).

Discussion
Coiled coils are a prevalent protein domain, and proteins with coiled coil structure include structural proteins, membrane proteins, enzymes, and transcription factors [9]. The spatial folding of coiled coil domain containing (CCDC) is variable, leading to different spatial conformations, enabling many different molecular biological functions, including regulation of gene expression, cell division, membrane fusion, and controlled release of drugs [10]. An increasing number of studies in recent years have begun to focus on the role of the CCDC family of proteins in tumors, with studies reporting that abnormal expression of CCDC proteins may promote tumor cell motility and migration by regulating rearrangements of the cytoskeleton [8,11]. Previous studies have con rmed that the domains of the CCDC gene family are associated with aberrant CCDC associated protein expression for a variety of malignancies, such as nonsmall cell lung cancer [12],gastric cancer [13] colorectal cancer [14] and hepatocellular carcinoma [15]. In this study, the association between CCDCGs and HCC was explored in both TCGA and ICGC databases.
Our results showed that high expression levels of CCDC34 CCDC137 CCDC77 CCDC93 and CCDC21 were associated with poor prognosis of HCC patients in TCGA and ICGC databases. In addition, the combination of the ve OS-DECCDCGs had a good predictive value in patients with HCC. Therefore, the combination of the ve OS-DECCDCGs identi ed in this study may serve as a potential prognostic biomarker for HCC.
Currently, several CCDCGs have been reported to be involved in intracellular signaling, transcription of genetic signals, molecular recognition and cell cycle regulation, cell differentiation, apoptosis and other biological processes [16]. Coiled coil domain containing 34 (CCDC34), also known as NY-REN-41, comprises 373 amino acids and is located on chromosome 11p14.1. Previous studies have found that CCDC34 is overexpressed in several human malignancies, including renal cell carcinoma [17],non-small cell lung cancer (NSCLC) [18], bladder cancer [19], pancreatic adenocarcinoma [20], esophageal squamous cell carcinoma [21], colorectal cancer [22] and cervical cancer [23]. Recently, it was shown [24], that CCDC34 overexpression in liver cancer tissues is associated with poor prognosis of HCC patients. This study con rmed by in vitro and in vivo experiments that knockdown of CCDC34 could effectively inhibit the proliferation and metastasis of HCC cells and found that inhibition of CCDC34 could affect the activation of protein kinase B (PKB or Akt) as well as the epithelial mesenchymal transition (EMT) process. The remaining four CCDCGs (CCDC137 CCDC77 CCDC93 and CCDC21) have not been studied experimentally in relation to tumour development, but from the regression models constructed for the ve genes as well as the expression pro les of the genes in HCC tissues, they are closely related to HCC and may be involved in the molecular mechanisms of HCC pathogenesis, and are expected to be target genes for future studies.
Studies also performed GSEA analysis to explore the potential mechanism of the high-and low-risk groups of CCDCGs combination in HCC prognosis. The results showed that cell cycle related pathways were signi cantly enriched in the high-risk group and complement activation pathways were signi cantly enriched in the low-risk group, indicating that complement regulatory processes may play a protective role in low-risk patients. TP53 is one of the most frequently mutated genes in HCC. Considerable studies have revealed the role of TP53 as a biomarker for certain molecular features and a prognostic factor for unfavorable survival in HCC [25,26]. In this study, the mutation rates of TP53 in high-and low-risk groups were compared, and it was found that high-risk group had a higher TP53 mutation rate than low-risk group, high-risk group showed higher TMB, lower TMB had a better prognosis compared with high TMB, it can be considered that HCC patients with lower TMB provide a potential bene t in prognosis.