Identication of CCNB2 expression in triple-negative breast cancer based on bioinformatics results

Background The current epidemiology shows that the incidence of breast cancer is increasing year by year and tends to be younger. Triple-negative breast cancer is the most malignant of breast cancer subtypes. The application of bioinformatics in tumor research is becoming more and more extensive. This study provided research ideas and basis for exploring the potential targets of gene therapy for triple-negative breast cancer (TNBC). We analyzed three gene expression proles (GSE64790 (cid:0) GSE62931 (cid:0) GSE38959) selected from the Gene Expression Omnibus (GEO) database. The GEO2R online analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues. Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were applied to identify the pathways and functional annotation of DEGs. Protein–protein interaction network of these DEGs were visualized by the Metascape gene-list analysis tool so that we could nd the protein complex containing the core genes. Subsequently, we investigated the transcriptional data of the core genes in patients with breast cancer from the Oncomine database. Moreover, the online Kaplan–Meier plotter survival analysis tool was used to evaluate the prognostic value of core genes expression in TNBC patients. Finally, immunohistochemistry to evaluated level and localization of CCNB2 on TNBC tissues.

The current epidemiology shows that the incidence of breast cancer is increasing year by year and tends to be younger. Triple-negative breast cancer is the most malignant of breast cancer subtypes. The application of bioinformatics in tumor research is becoming more and more extensive. This study provided research ideas and basis for exploring the potential targets of gene therapy for triple-negative breast cancer (TNBC).

Methods
We analyzed three gene expression pro les (GSE64790 GSE62931 GSE38959) selected from the Gene Expression Omnibus (GEO) database. The GEO2R online analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues. Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were applied to identify the pathways and functional annotation of DEGs. Protein-protein interaction network of these DEGs were visualized by the Metascape gene-list analysis tool so that we could nd the protein complex containing the core genes. Subsequently, we investigated the transcriptional data of the core genes in patients with breast cancer from the Oncomine database. Moreover, the online Kaplan-Meier plotter survival analysis tool was used to evaluate the prognostic value of core genes expression in TNBC patients. Finally, immunohistochemistry (IHC) was used to evaluated the expression level and subcellular localization of CCNB2 on TNBC tissues.

Results
A total of 66 DEGs were identi ed, including 33 up-regulated genes and 33 down-regulated genes. Among them, a potential protein complex containing ve core genes was screened out. The high expression of these core genes was correlated to the poor prognosis of patients suffering breast cancer, especially the overexpression of CCNB2. CCNB2 protein positively expressed in the cytoplasm, and its expression in triple-negative breast cancer tissues was signi cantly higher than that in adjacent tissues.

Conclusions
CCNB2 may play a crucial role in the development of TNBC and has the potential as a prognostic biomarker of TNBC.

Background
Breast cancer is the most common malignant tumor in women [1]. According to reports, the incidence of breast cancer has increased year by year in the past few decades, and its prevalence has become the rst among female malignancies. The treatment for breast cancer has been explored for many years, coupled with advances in diagnosis, though, around 400,000 patients die from breast cancer worldwide each year [2,3]. Triple-negative breast cancer (TNBC), a special clinical subtype of breast cancer, which is characterized by negative expression of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER-2), accounts for 12-17% of all invasive breast cancers [4]. Since TNBC demonstrates highly malignant features as strong invasiveness, early metastasis, frequent recurrence, short survival and so on, it has attracted widespread attention [5].
Owing to the abundance of molecular information from several public databases, such as Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/geo/) database and ONCOMINE (https://www.oncomine.org) [6], the mechanism of cancer progression can be researched through unparalleled methods. Additionally, the differentially expressed genes (DEGs) between cancer and normal tissues were screened out based on bioinformatics analysis, and the identi cation of these oncogenes or tumor suppressor genes, leading to prediction of potential biomarkers for related cancers, may provide new therapeutic strategies. Among various bioinformatics methods, DEGs analysis is a widely used independent tool to study gene up-regulation and down-regulation [7].
In this study, we tried to explore novel therapeutic targets for improving overall prognosis of patients with TNBC. Firstly, we analyzed the gene expression pro ling data downloaded from the GEO database to detect the DEGs between TNBC and normal breast tissues. Then, GO and KEGG pathway enrichment analysis were manipulated for the screened DEGs. Furthermore, we established a protein-protein interaction (PPI) network to identify the potential protein complex containing core genes related to TNBC. we investigated the transcriptional and survival data of core genes using the Oncomine database and the Kaplan-Meier Plotter. Finally, we used the IHC method to detect the expression and localization of CCNB2 for veri cation.

Source of data
The GEO database stores microarray data, next-generation sequencing data, and other high-throughput sequencing data. In addition, it is currently the largest and most comprehensive public gene expression database storing various forms of data such as genomic DNA, gene expression pro le data, protein molecular data, etc. In this study, the GEO database was used to download the original data including the expression pro le of triple-negative breast cancer and non-triple-negative breast cancer. A total of 4,442 results for 'TNBC' in the GEO DataSets Database were retrieved. Through careful reading and comparison, we nally selected three TNBC-related gene expression pro les (GSE64790, GSE62931, GSE38959). All data resources were freely available on the Internet, and this study did not involve human and animal experiments.

Screening of DEGs
In each pro le, the data was divided into TNBC samples and non-TNBC samples. The powerful online analysis tool GEO2R on the GEO website (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to perform analysis on these data. The adjusted P-value<0.05 and |logFC|≥1.5 were de ned as meaningful differentially expressed genes. We then performed statistical analysis on the three data sets, and used the venn graph network tool (http://www.interactivenn.net/index.html) to nd out the intersecting part so as to enhance the reliability of DEGs' selection.

GO/KEGG enrichment analysis
Functional analysis of large gene lists derived from emerging high-throughput genomics, proteomics, and bioinformatics scanning methods (such as expression microarrays, promoter microarrays, ChIP-on-CHIPs, etc.) is a challenging Task. DAVID (the Database for Annotation, Visualization and Integrated Discovery) biological information database online analysis tool is an e cient database with powerful analysis functions, from which we can simultaneously perform GO function annotation and KEGG pathway enrichment analysis. In GO analysis, P<0.01 and count ≥10 were de ned as statistically signi cant. KEGG pathway analysis took P<0.01 as statistically signi cant.

PPI network construction and DisGeNET analysis
Metascape (http://metascape.org/) is a free, well-maintained and powerful gene annotation analysis tool that can visualize enrichment results as bar maps, heat maps or networks [8,9]. In this study, Metascape was applied to analyze the pathway and process enrichment of differential genes and their adjacent genes which were signi cantly related to the differential genes. The Gene Ontology (GO) terms for cellular component (CC), biological process (BP), and molecular function (MF) categories, were obtained from the Metascape online tool. Simultaneously, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were also gotten by this method. Only the conditions of P-value < 0.01, enrichment factor>1.5, and minimum count of 3 were considered as meaningful. Select a subset of rich terms and draw a network plot to further determine the relationship of the terms. The following databases were used for protein-protein interaction enrichment analysis: BioGrid+, InWeb_IM+ and OmniPath+. Moreover, the Molecular Complex Detection (MCODE) algorithm was used to identify tightly connected network components. Metascape also provides the DisGeNET analysis, a versatile platform containing one of the largest publicly available collections of genes and variants associated to human diseases. This makes it easier to study and predict disease-related genes.
Oncomine database analysis ONCOMINE (www.oncomine.org) is an online tumor data analysis platform that that facilitates research by providing genome-wide expression analysis [10]. Using the Oncomine database, we identi ed the mRNA expression levels of the SKA1, CCNB2, CENPF, CENPA and BIRC5 genes in various cancers.
The Kaplan-Meier plotter analysis Kaplan-Meier plotter (www.kmplot.com) is an online database containing survival information and microarray gene expression data.
These data come from GEO, TCGA and the Cancer Biomedical Information Grid. It contains gene expression data and overall survival information of 1402 clinical breast cancer patients. In this study, Kaplan-Meier plotter was used to evaluate the prognostic value of mRNA expression of core gene members. By dividing the patient samples into two groups according to the median expression (high expression and low expression) and using the Kaplan-Meier survival plot for evaluation, the relapse-free survival (RFS) of TNBC patients was determined, and the risk ratio was 95% con dence interval and log Rank p-value.

Immunohistochemistry (IHC)
TNBC specimens were taken from the Pathology Department of the First A liated Hospital of Bengbu Medical College and con rmed by pathological diagnosis. The expression of CCNB2 protein was detected by immunohistochemical staining. The tissue sections were routinely depara nized and dehydrated, and the endogenous peroxidase activity was inactivated with 3%H2O2 methanol. The primary antibody, secondary antibody and tertiary antibody were added in turn. The tissue sections were stained with aminobenzidine (DAB) and counterstained with hematoxylin. The positive cells were judged by the obvious brown granules in the cell membrane or cytoplasm. Abbreviations: GEO, Gene Expression Omnibus; TNBC, triple-negative breast cancer.

Functional enrichment analyses
Enter the differential gene list into the DAVID database for GO and KEGG pathway enrichment analysis. The enriched GO terms were divided into CC, BP, and MF. In GO analysis, the results revealed that the DEGs were mainly enriched in BP term including mitosis, cell proliferation and so on ; in CC term, they were mainly enriched in nucleus and nucleoplasm; MF term showed that DEGs were mainly enriched in protein binding. In addition, KEGG pathway analysis results displayed that DEGs were mainly enriched in pathways related to Progesterone-mediated oocyte maturantion, oocyte meiosis and cell cycle ( Table 2).

ONCOMINE Analysis
The graph reveals the numbers of datasets with statistically signi cant mRNA over-expression (red) or down-regulated expression (blue) of the target gene. The threshold was designed with following parameters: p-value of 1E-3 and fold change of 1.5. We compared the transcription levels of core genes in cancers with those in normal samples by using ONCOMINE databases (Figure 3 and Figure 4). ONCOMING analysis showed that the mRNA expression of BIRC5, CCNB2, CENPA, CENPF and SKA1 was upregulated in patients with breast cancer. In Curtis's dataset, BIRC5 was upregulated in medullary breast carcinoma compared with that in the normal samples, with a fold change of 6.014 and p-value of 9.13E-17. In Turashvili's dataset, CCNB2 was overexpressed in Invasive ductal breast carcinoma with a fold change of 4.653 and p-value of 6.05E-6. In Curtis's dataset, CENPA was overexpressed in Invasive ductal breast carcinoma compared with that in the normal samples, with a fold change of 2.183 and p-value of 1.27E-115. In the TCGA dataset, the transcription level of CENPF was signi cantly higher in patients with Invasive lobular breast carcinoma than that in the normal specimens, with a fold change of 6.980 and p-value of 1.31E-21. In Turashvili's dataset, the fold change of mRNA expression of SKA1 in Invasive ductal breast carcinoma was 7.501 and p-value of 2.48E-6.

The Kaplan-Meier Plotter Analysis
Five genes (CCNB2, CENPF, SKA1, CENPA and BIRC5) were found to be associated with relapse-free survival (RFS) in TNBC through the Kaplan-Meier Plotter analysis. Patients with a higher level of them had worse RFS compared to those with lower levels. Among them, the overexpression of CCNB2 was the most unfavorable prognostic factor of relapse-free survival in TNBC patients (HR=1.98; 95% CI: 1.28-3.06; P=0.0018; n=255) in accordance with the lowest logrank p value. To date, the TNBC cases in the database are still insu cient for overall survival analysis ( Figure 5).

Immunohistochemistry staining
The results of immunohistochemical staining showed that the CCNB2 protein expression in TNBC tissues was signi cantly higher than that in adjacent tissues. And the protein was located in the cytoplasm and displayed mainly brown-yellow granular staining in TNBC tissues.

Discussion
Breast cancer is a malignant tumor that occurs in the epithelial tissue of the breast [11]. Breast cancer cells are easy to fall off due to loose connections and free cancer cells are easily spread in the blood or lymph uid. Thereafter, these cancer cells disperse in the whole body, form metastasis, and endanger life. All of these make the breast cancer a serious threat to women's health. In particular, triplenegative breast cancer (TNBC) is a unique subtype of breast cancer. Its hormone receptors (ER: estrogen receptor, PR: progesterone receptor) and HER-2 receptors are not expressed, which makes clinical targeted therapy and endocrine therapy unsatisfactory [12].
Chemotherapy is currently the main method of adjuvant treatment for patients with TNBC. However, the e cacy is limited compared with comprehensive therapy, especially in patients who show resistance to chemotherapy drugs [13]. Therefore, it is urgent to dig out reliable biomarkers and effective targets of TNBC to improve the prognosis of patients.
In our study, gene and protein expression analysis based on the publicly available bioinformatics databases was performed to screen out the potential key genes related to TNBC. DEGs between TNBC tissues and normal human breast tissues were identi ed by gene expression pro ling data from the GEO database. Then, we identi ed 66 DEGs including 33 upregulated DEGs and 33 downregulated DEGs. These DEGs were correlated to the GO terms (Mitotic nuclear division, Cell proliferation, nucleus, nucleoplasm, Protein binding) and KEGG terms such as Progesterone-mediated oocyte maturantion, Oocyte meiosis, Cell cycle. Hereafter, we constructed a PPI network to seek out the potential protein complex and obtain ve core genes (BIRC5, CCNB2, CENPA, CENPF, SKA1). GEO analysis results showed that the ve core genes were all up-regulated in TNBC, and the ONCOMINE database proved that they were signi cantly overexpressed in breast cancer compared to normal tissues. Finally, the Kaplan-Meier plotter was performed to estimate the relationship between the expression of core genes and prognosis of TNBC patients. Among the ve core genes, we found that CCNB2 presented closest relevance to TNBC, indicating poor prognosis. And the cytoplasmic localization of the CCNB2 protein was detected by immunohistochemistry staining.
Cyclin family proteins which include cyclin B1 (CCNB1) and cyclin B2 (CCNB2), regulate the activities of cyclin-dependent kinases (CDKs) and different cyclins act in speci c phases of the cell cycle [14][15][16]. CCNB2, as one of the cyclin family proteins, plays an important role in the regulation of the cell cycle. During the interphase and mitosis, CCNB2 is located in the Golgi apparatus and participates in its disassembly [17]. According to previous reports, CCNB2 usually triggers the process of G2/M transition by activating CDK1 kinase and the downregulation of CCNB2 helps to inhibit cell proliferation and promote cell cycle arrest in the G2/M phase [18][19][20]. Relevant studies have shown that metformin can down-regulate the expression of CCNB2 to enhance the rate of apoptosis and cell cycle arrest [21]. High level of CCNB2 protein is positively correlated with the degree of undifferentiation, tumor size, lymph node metastasis, distant metastasis and clinical stage status. In the past few years, the overexpression of CCNB2 in tumor tissues has been shown to be an unfavorable prognostic biomarker in many human cancers, for example, gastric cancer [22], breast cancer [23], pituitary adenoma [24], nasopharyngeal carcinoma [25], adrenocortical carcinoma [26] and so on. Our research found that compared with normal breast tissues, CCNB2 was upregulated in TNBC, and the overexpression of CCNB2 was associated with unfavorable relapsefree survival of TNBC patients. Herein, CCNB2 shows the potential as a therapeutic target and prognostic factor for TNBC.
In our study, BIRC5, SKA1, CENPA and CENPF also showed high expression in breast cancer compared to normal breast tissues, and CENPA, CENPF, SKA1 were signi cantly correlated with poor recurrence-free survival of TNBC (logrank p < 0.05). However, the role of these genes in TNBC is still unclear, and more experiments are needed for further research.

Conclusions
In summary, our bioinformatics analysis detected 66 DEGs from GEO database and clustered the functions and pathways of these genes. Transcriptional and prognostic analysis on the genes in the protein complex identi ed by the PPI network were then performed.
We found that CCNB2 protein expression was signi cantly increased in TNBC tissues and associated with the malignant status and prognosis of TNBC patients. With no doubt, the clinical value of CCNB2 has yet to be con rmed by further research. Still, CCNB2 exhibits a promising prospect as a potential therapeutic target for TNBC.

Declarations
Competing interest The authors declare that they have no competing interests in this work.
Authors' contributions JC analyzed the data, drafted the initial manuscript and revised it for important content. ZF and NL led the conception and review of the paper. SS, RL, RM, XF, YH made contributions to the collection of relevant literature.   The transcription levels of core genes in different types of cancers (ONCOMINE).

Figure 5
The prognostic value of mRNA level of core genes in TNBC patients (Kaplan-Meier plotter). Figure 6