Identification of CDC20 as an immune infiltration-correlated prognostic biomarker in hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is a malignancy with a poor prognosis. E3 ubiquitin-protein ligases play essential roles in HCC, such as regulating progression, migration, and metastasis. We aimed to explore a hub E3 ubiquitin-protein ligase gene and verify its association with prognosis and immune cell infiltration in HCC. Cell division cycle 20 (CDC20) was identified as a hub E3 ubiquitin-protein ligase in HCC by determining the intersecting genes in a protein-protein interaction (PPI) network of differentially expressed genes (DEGs) using HCC data from the International Cancer Genome Consortium (ICGC) and the gene list of 919 E3 ubiquitin-protein ligases. DEGs and their correlations with clinicopathological features were explored in The Cancer Genome Atlas (TCGA), ICGC, and Gene Expression Omnibus (GEO) databases via the Wilcoxon signed-rank test. The prognostic value of CDC20 was illustrated by Kaplan-Meier (K-M) curves and Cox regression analyses. Subsequently, the correlation between CDC20 and immune infiltration was demonstrated via the Tumor Immune Estimation Resource (TIMER) and Gene Expression Profiling Interactive Analysis (GEPIA). CDC20 expression was significantly higher in HCC than in normal tissues (all P < 0.05). High CDC20 expression predicted a poor prognosis and might be an independent risk factor in HCC (P < 0.05). Additionally, CDC20 was correlated with the immune infiltration of CD8 + T cells, T cells (general), monocytes, and exhausted T cells. This study reveals the potential prognostic value of CDC20 in HCC and demonstrates that CDC20 may be an immune-associated therapeutic target in HCC because of its correlation with immune infiltration.


Introduction
Hepatocellular carcinoma (HCC) is the sixth most common primary malignancy and fourth leading cause of cancerrelated death worldwide [1]. Surgical resection is the preferred treatment for HCC, and many new therapeutic strategies, such as targeted therapy and immunotherapy, can be used to treat this disease in recent years [2]. However, because of the characteristics of invasion, metastasis, and immune escape, although many advances have improved HCC treatment, its prognosis remains unsatisfactory [3,4]. Therefore, it is necessary to explore the relevant mechanism of HCC progression and identify novel therapeutic targets to improve the prognosis of HCC.
E3 ubiquitin-protein ligases are a series of enzymes that catalyse the degradation of proteins by transferring ubiquitin to their substrates [5]. Because an increasing number of E3 ubiquitin-protein ligases plays oncogenic or anti-oncogenic roles in cancers, the important roles of E3 ubiquitin-protein ligases in the development and carcinogenesis of malignancies have been illustrated. Some E3 ubiquitin ligases are correlated with chemoresistance and may be potential cancer treatment targets [6,7]. Therefore, it is necessary to explore hub E3 ubiquitin-protein ligases in HCC to identify new biomarkers and treatment targets.
The immune microenvironment, a crucial part of the tumor microenvironment, comprises many immune cells, such as T cells, natural killer (NK) cells, and macrophages [8]. Importantly, it is associated with cancer prognosis. For example, tumors with a regulatory immune microenvironment are usually associated with a dismal prognosis [9]. Recently, Chen Xiong and Zhihuai Wang contributed equally to this work. immunotherapy has been more widely used because of its excellent curative effect, which is associated with the components and mechanisms of the immune microenvironment. The targets of immunotherapy, such as PD-1, PDL-1, and CTLA4, are biomarkers of relevant immune cells in the immune microenvironment [10,11]. Thus, exploring novel therapeutic targets associated with the immune microenvironment is critical to identify new treatment strategies for cancer.
In the present study, we identified cell division cycle 20 (CDC20) as a hub E3 ubiquitin-protein ligase in HCC. Subsequently, we explored the prognostic value of CDC20 in HCC by analysing the mRNA-sequencing data and clinical prognostic information downloaded from several databases. Next, we analysed the relationship between CDC20 and immune cells in the immune environment using an online database. The outcomes illustrated an oncogenic role for CDC20 and a possible association between CDC20 and the tumor immune environment in HCC.

Data and information
In total, 919 E3 ubiquitin-protein ligases were obtained from the Integrated annotations for Ubiquitin and Ubiquitin-like Conjugation Database (IUUCD) (http://iuucd.biocuckoo. org/), which contains information on different ubiquitinating and deubiquitinating enzymes [12]. The identifications of genes in this list were translated to official gene symbols by Perl language. The gene expression data and corresponding clinical information of 231 HCC patients were acquired from the International Cancer Genome Consortium (ICGC) database (https://dcc.icgc. org/projects/LIRI-JP) [13]. All the patients in the ICGC cohort were from Japan. Other RNAsequencing data on CDC20 and the prognostic data on an additional 371 HCC patients were obtained from The Cancer Genome Atlas (TCGA) database up to January 3, 2021 (https://portal.gdc.cancer.gov/) [14]. Additionally, other expression data on CDC20 and the clinicopathologic features of HCC patients were downloaded from the " G S E 1 4 5 2 0 " , " G S E 4 5 2 6 7 " , " G S E 7 6 4 2 7 " , and "GSE121248" datasets in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo) [15]. Immunohistochemical figures were obtained from the Human Protein Atlas (HPA) database (https://www. proteinatlas.org/).

Identifying hub E3 ubiquitin-protein ligases
We explored differentially expressed E3 ubiquitin-protein ligase genes by considering the intersecting genes of E3 ubiquitin-protein ligases and differentially expressed genes (DEGs) in the ICGC (log FC > 2). The protein-protein interaction (PPI) network of the intersecting genes was analysed through using STRING (https://www.string-db.org/) [16]. The degrees of each intersecting gene in the PPI network were calculated by utilizing Cytoscape software (https:// cytoscape.org/) [17]. The gene with the most degrees was identified as the hub E3 ubiquitin-protein ligase in HCC.

Differential expression analysis
CDC20 was identified as the hub E3 ubiquitin-protein ligase in HCC. First, datasets from the TCGA, ICGC, and GEO databases ("GSE14520", "GSE45267", "GSE76427", and "GSE121248") were used to analyse the expression levels of CDC20 in HCC samples and normal hepatic samples. The Wilcoxon signed-rank test and "limma" package in R software 4.0.0 (https://www.r-project.org/) were used for analysis. The results were drawn via the "beeswarm" package in R software. Additionally, we acquired immunohistochemical images of CDC20 in HCC tissues and normal hepatic tissues from the HPA database.

Evaluating the prognostic value of CDC20
Based on the information on CDC20 expression and overall survival data of HCC patients obtained from the TCGA, ICGC, and GEO (GSE14520) databases, Kaplan-Meier (K-M) curves and scatterplots were plotted through using the "survival" and "pheatmap" packages in R software to show the relationship between CDC20 and HCC prognosis. Additionally, receiver operating characteristic (ROC) curves were drawn to determine the areas under the curve (AUCs) of CDC20 at 1 year, 2 years, and 3 years, which were plotted using the "survival", "survminer", and "timeROC" packages in R software.

Analysing the expression of CDC20 according to clinical features
To further explore the prognostic value of CDC20, we analysed CDC20 expression according to clinical features through univariate and multivariate Cox regression analyses with the "survival" package in R software. The AUCs of the ROC curves were used to compare the prognostic values of CDC20 and clinical features using the "survivalROC" package. Next, we investigated the correlations between the expression level of CDC20 and clinical characteristics of patients through using the "ggpubr" package in R software and Perl language (https://www.perl.org/). All the data and information were obtained from the TCGA database.

Gene set enrichment analysis (GSEA) of CDC20
We divided all the HCC samples in the TCGA database into the CDC20 low group and the CDC20 high group according to the median expression level of CDC20. Single-gene GSEA was employed to explore the significant biological pathways between the two groups through GSEA_4.0.1 software (https://www.gsea-msigdb.org/gsea/index.jsp). All the results were plotted by the "ggplot2" package.

Immune infiltration analysis of CDC20
The Tumor Immune Estimation Resource (TIMER) (https:// cistrome.shinyapps.io/timer/) is a website used to systematically analyse the correlation between gene expression and the infiltration of immune cells with data on different cancers in TCGA [18]. We utilized TIMER to analyse the associations between CDC20 and 6 types of immune cells (CD4 + T cells, CD8 + T cells, B cells, macrophages, dendritic cells (DCs), and neutrophils) in the tumor microenvironment of HCC using a gene module. Corresponding results were plotted with TIMER to demonstrate correlations between the CDC20 expression level and tumor purity, a crucial element that can affect the analysis of immune infiltration via genomic approaches [19,20]. The size of the partial correlation coefficient (partial cor) was used to reveal the degree of association between CDC20 expression and immune cell infiltration. Subsequently, multivariate Cox regression analysis was performed to compare the prognostic value of CDC20 with these 6 types of immune cells via the survival module in TIMER. The correlation between CDC20 and biomarkers of tumorinfiltrating immune cells (TIICs) was also explored using the "correlation" module. These immune cells comprised CD8 + T cells, T cells (general), B cells, monocytes, tumorassociated macrophages (TAMs), M1 macrophages, M2 macrophages, neutrophils, T helper 1 (Th1) cells, T helper 2 (Th2) cells, T helper 17 (Th17) cells), follicular helper T (Tfh) cells, regulatory T cells (Tregs), NK cells, DCs, and exhausted T cells. These immune cell types have been previously described [19,[21][22][23][24]. Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn) is an online database that contains RNA-sequencing data on 369 HCC samples and 160 normal liver samples from the TCGA and Genotype-Tissue Expression (GTEx) projects [25]. Using GEPIA and Spearman's correlation, we further illustrated the associations between CDC20 expression and the gene biomarkers of four types of immune cells with the most significant correlations with CDC20 according to the TIMER results. Additionally, the value of R was employed to explain the correlation coefficient between CDC20 and these 4 types of immune cells according to the GEPIA results.

Statistical analysis
All statistical analytical methods were based on the Perl language and R software 4.0.0. The expression data of CDC20 from 3 different databases were normalized by log2 transformation. K-M survival curves were calculated based on logrank tests. The associations of CDC20 with immune infiltration and gene markers of different immune cells were calculated by Spearman's correlation. The associations of immune infiltration were determined using the following guide for the partial cor value: 0.00-0.19, "very weak"; 0.20-0.39, "weak"; 0.40-0.59, "moderate"; 0.60-0.79, "strong"; 0.80-1.0, "very strong" [26]. All P values less than 0.05 were considered to indicate statistical significance.

CDC20 is identified as the hub gene
The entire research process is plotted in Fig. 1. In total, 548 DEGs obtained from the ICGC (log FC > 2) are shown in a volcano map (Fig. 2a). After identification transformation, the list of 919 E3 ubiquitin-protein ligase genes was decreased to 859 genes with official gene symbols. Twenty-one intersecting genes were identified (intersecting genes between DEGs in the ICGC and E3 ubiquitin-protein ligases) (Fig. 2b). Among these 21 genes, the PPI network of 16 genes with degrees > 0 is shown in Fig. 2c. From this figure, we identified CDC20 as the hub gene with the most connections between these 21 genes.

Higher expression of CDC20 in HCC samples
To analyse the differential expression of CDC20 between HCC and normal samples, expression data from the TCGA, ICGC, and GEO databases were analysed. The mRNA levels of CDC20 from the three cohorts are shown in Fig. 3. The immunohistochemical staining results of CDC20 in HCC from the HPA database are also shown in Fig. 3. According to this figure, we speculated that the CDC20 expression level in all 4 databases was remarkably lower in normal hepatic samples than in HCC samples (P < 0.05).

CDC20 predicts a poor prognosis in HCC
After exploring CDC20 expression in HCC, we verified the prognostic value of CDC20 in HCC by survival analysis via using K-M curves. Patients were ranked in ascending order by the CDC20 expression level and categorized into a high-expression group and a low-expression group according to the median CDC20 expression level. The K-M curves of CDC20 are plotted in Fig. 4a-c and show the prognostic value of CDC20 according to data obtained from the TCGA, ICGC, and GEO datasets. Survival analysis revealed that the prognosis of patients in all three cohorts with low CDC20 expression was much better than that of patients with high CDC20 expression (P < 0.05). All the patient survival times and survival conditions are shown in scatterplots (Fig. 4d-f). Patients in the CDC20 low group had a remarkably longer survival time and a lower survival rate than those in the CDC20 high group. Additionally, the ROC curves demonstrated the ability of CDC20 to predict overall survival at 1 year, 2 years, and 3 years based on data from 3 datasets (Fig. 4g-i). These data demonstrate that CDC20 predicts a poor prognosis in HCC patients.

Correlations between CDC20 and clinical features
We analysed the correlations between CDC20 expression and clinical features using the TCGA data. The statistically significant results are shown in Fig. 5a-c. According to this figure, we speculate that CDC20 expression is associated with pathological grade (Fig. 5a), TNM stage (Fig. 5b), and T stage (Fig. 5c). CDC20 is highly expressed in samples with a higher pathological grade, TNM stage, and T stage (all P < 0.01), indicating that CDC20 plays an oncogenic role in HCC.

Comparing the prognostic values of CDC20 and clinical features
To further validate the prognostic value of CDC20, we compared the prognostic value of CDC20 with that of other clinical features. First, univariate and multivariate Cox regression analyses were performed (Fig. 5d, e). All the results showed that CDC20 had independent predictive value (P < 0.05). Additionally, CDC20 had a higher AUC value (AUC = 0.728) than the other clinical features, indicating that CDC20 can be an independent prognostic risk factor in HCC (Fig. 5f).

GSEA of CDC20
The potential biological functions of CDC20 in HCC were preliminarily explored using GSEA (Fig. 6). High expression of CDC20 might be involved in several signalling pathways related to cancers, such as the P53 signalling pathway, mTOR signalling pathway, and NOTCH signalling pathway. High Fig. 1 Processes of the entire study. The part inside the box identifies CDC20 as a hub E3 ubiquitin-protein ligase in HCC expression of CDC20 was also associated with the cell cycle, ubiquitin-mediated proteolysis, and T-cell receptor signalling pathways.

Correlations between CDC20 and gene markers of immune cells in HCC
The immune cell infiltration analysis suggested an association between CDC20 and immune cells in HCC. To further investigate why CDC20 correlated with high immune cell infiltration, the correlations between CDC20 and gene markers of The TIMER results are listed in Table 1. CDC20 expression was significantly correlated with the gene biomarkers of CD8 + T cells (CD8A and CD8B) after adjusting for tumor purity (Table 1; Fig. 8a; all P < 0.05), indicating that CDC20 is related to a specific immune response. CDC20 was also associated with gene biomarkers of two other immune cell types-T cells (general) (CDC3D, CDC3E, and CD2) and monocytes (CD86 and CSF1R)-after adjusting for tumor purity (Table 1; Fig. 8b, c; all P < 0.05). Additionally, the gene biomarkers of exhausted T cells, including CTLA4, PD-1, LAG3, GZMB, and TIM-3, were correlated with CDC20 expression in HCC (Table 1; Fig. 8d; P < 0.05). Some gene biomarkers, such as PD-1 and CTLA4, are targets of immunotherapy, suggesting an association between CDC20 and immunotherapy in HCC. Subsequently, we validated the association between CDC20 and the gene biomarkers of 4 types of immune cells using the GEPIA website; all the results had significantly statistical significance (all P < 0.05; Table 2). Fig. 3 Differential expression of CDC20 between HCC and normal tissues in different databases. a Differential expression of CDC20 in TCGA. b Differential expression of CDC20 in ICGC. c Differential expression of CDC20 in GEO (GSE14520). d Differential expression of CDC20 in GEO (GSE45267). e Differential expression of CDC20 in GEO (GSE76427). f Differential expression of CDC20 in GEO (GSE121248). g-h Immunohistochemical figures of CDC20 in HCC and normal tissue from the HPA database Discussion HCC is a global problem because of its high mortality and morbidity. Identifying new targets for the diagnosis and therapy of HCC is critical. Many E3 ubiquitin-protein ligases play essential roles, such as regulating progression, migration, and metastasis, in several carcinomas, including HCC [5,[29][30][31]. These studies showed that E3 ubiquitinprotein ligases might be potential therapeutic targets in cancers. Immunotherapy is a widely used treatment for many malignancies, including HCC. Immune cell infiltration in the immune microenvironment is associated with immunosurveillance and immunotherapy in HCC [32]. Ubiquitin signalling is associated with the immune response [33]. The E3 ubiquitin-protein ligase MDM2 maintains STAT5 stability to control T-cell immunity in tumors [34]. However, the association between E3 ubiquitinprotein ligases and immune cell infiltration has not been explored in HCC. Therefore, it will be novel and meaningful to identify a hub E3 ubiquitin-protein ligase and analyse its association with immune cell infiltration in HCC.
In the present study, we identified CDC20 as a hub E3 ubiquitin-protein ligase in HCC and demonstrated its prognostic value in HCC using data from the TCGA, ICGC, GEO, and TIMER databases. All the results showed a strong oncogenic function of CDC20, and the prognosis of HCC patients with high CDC20 expression was poor.
CDC20 is a hub gene in cell cycle progression and contributes to the progression or metastasis of several carcinomas, such as pancreatic cancer and breast cancer [35]. Alfarsi et al. verified that the high expression of CDC20 in oestrogen receptor-positive breast cancer predicted a poor prognosis and no response to endocrine therapy [36]. High expression of CDC20 also predicts a poor prognosis in patients with gastric cancer, prostate cancer, bladder cancer, and colorectal cancer [37][38][39][40]. CDC20 induces the radioresistance of bladder cancer by degrading FoxO1 [41]. Decreasing the CDC20 Fig. 4 Prognostic value of CDC20 in HCC. a-c K-M curves of CDC20 using the data of HCC patients in the TCGA, ICGC, and GEO (GSE14520) datasets. d-f Scatterplots of the HCC patient survival time and survival condition from the TCGA, ICGC, and GEO (GSE14520) datasets. g-i ROC curves of the AUC values of CDC20 at 1 year, 2 years, and 3 years based on data from the TCGA, ICGC, and GEO (GSE14520) datasets  [42]. In HCC, Shi et al [43] demonstrated that CDC20 promotes PHD3 ubiquitination and activates HIF-1 signalling to accelerate the proliferation of cancer cells. Additionally, Li et al [44] illustrated that the increased expression of CDC20 is associated with HCC proliferation. We performed GSEA to explore potential biological functions and signalling pathways associated with CDC20 in HCC. The major functions and pathways were enriched in the P53 signalling, mTOR signalling, NOTCH signalling, cell cycle, ubiquitin-mediated proteolysis, and T-cell receptor signalling pathways. Among these pathways, the P53, mTOR and NOTCH signalling pathways are correlated with HCC Fig. 7 Analysis of CDC20 using 6 types of immune cells in HCC via TIMER. a Immune infiltrate analysis between CDC20 and 6 types of immune cells. b Multivariate Cox regression analyses between CDC20expression and 6 kinds of immune cells metastasis, which is the main reason for a poor prognosis [45,46]. Furthermore, the NOTCH and T-cell receptor signalling pathways are involved in some cellular elements of the tumor microenvironment [47]. Therefore, the possible associations between CDC20 and certain malignant phenotypes, including metastasis and immune infiltration, should be explored and verified.
All the previous studies explored only the prognostic value and oncogenic function of CDC20 in cancers. However, the possible association between CDC20 and immune cell infiltration in the tumor environment has not been explored. In the present study, we analysed the correlation between CDC20 and immune cell infiltration in HCC and illustrated a possible relationship between CDC20 and the immune microenvironment of HCC.
At the beginning of tumor metastasis, tumor cells escape from the anti-tumor immune response and adapt to the microenvironment of the next site of metastasis [48]. Many immune cells in the immune microenvironment play crucial roles in the process of tumor metastasis. For example, TAMs and neutrophils promote tumor proliferation and metastasis [49]. Targeting TAMs may be a potential treatment strategy for HCC [50]. In our study, we surveyed the possible correlations between CDC20 expression and immune cell infiltration in HCC. The results suggested significant connections between CDC20 and the infiltration of several immune cell types, including B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and DCs. Additionally, the correlations between CDC20 and relative gene biomarkers of 4 types of immune cells were verified using TIMER and GEPIA. These gene markers include several special genes involved in immunotherapy, such as PDCD1 and CTLA4 [51]. Correlation analysis indicated that the CDC20 expression level influences the effect of immunotherapy, implying that CDC20 is a potential therapeutic target correlated with tumor immunology.

Conclusions
In summary, a high CDC20 expression level in HCC indicates a poor prognosis. Additionally, a high CDC20 expression is markedly correlated with immune infiltration. The correlations between CDC20 and gene biomarkers of several immune cell types indicate a potential role for CDC20 in the immunology of HCC. Further experiments are needed to explore the potential biological mechanisms of CDC20 involved in the metastasis and immune infiltration of HCC. CDC20 may be an innovative prognostic biomarker and potential therapeutic target in HCC.  Treg, regulatory T cell. Note: Partial cor (partial correlation coefficient) represents the correlation coefficient between the CDC20 expression and gene markers of infiltrating immune cells; Purity represents the correlation adjusted by purity; None represents the correlation without adjustment; * p < 0.01; ** p < 0.001; *** p < 0.0001 Fig. 8 Analysis of the associations between CDC20 and the gene biomarkers of tumorinfiltrating immune cells in HCC using TIMER. a Scatter plots of the associations between CDC20 and the gene biomarkers of CD8 + T cells (CD8A and CD8B). b Scatter plots of the associations between CDC20 and the gene biomarkers of T cells (general) (CDC3D, CDC3E, and CD2). c Scatter plots of the associations between CDC20 and the gene biomarkers of monocytes (CD86 and CSF1R).