Exploration of Core Genes Signature and Prognostic Implication for Ductal Carcinoma In Situ of the Breast


 Background: Following the implementation of breast screening programs, the occurrence of DCIS has risen as an early form of neoplasm. Although the prognosis is good, 20-50% of DCIS patients will develop invasive ductal carcinoma (IDC) if they are not handled. It is important to look for promising biomarkers for predicting DCIS prognosis.Methods: The Gene Expression Omnibus (GEO) database provided three microarray profile datasets. The expression of genes that differed between DCIS and normal tissue was investigated. To describe the biological role and intrinsic process pathway, enrichment analysis was used. The Cancer Genome Atlas Breast Cancer Dataset was used to classify the hub genes and further verify the findings using CytoHubba and MCODE, two Cytoscape plugins. The prognostic ability of the core genes signature was determined through time-dependent receiver operating characteristic (ROC), Kaplan-Meier survival curve, Oncomine databases, and UALCAN databases. In addition, in proliferation assays, the prognostic value of core genes was verified.Results: We identified 217 common DEGs, with 101 up-regulated and 138 down-regulated genes in the present study. The top genes were obtained from the PPI network (protein-protein interaction). For DCIS prognosis prediction, a novel six gene signature (including GAPDH, CDH2, BIRC5, NEK2, IDH2, and MELK) was developed. Centered on the TCGA cohort, the ROC curve showed strong results in prognosis prediction. The six core genes signature is often overexpressed in DCIS, which has a weak prognosis. Furthermore, transfected with small interfering RNAs, downregulation of core gene expression significantly inhibits breast cancer cell proliferation, implying a great potential for using core genes in DCIS prognosis.Conclusions: The six core genes signature for promising DCIS biomarkers was validated in our research, which may assist in clinical decision-making for individual care.

With the advancement of genome-sequencing technologies, evidence has accumulated that differentially expressed genes have a great deal of promise in the diagnosis and prognosis of DCIS. A gene microarray pro le can be examined to incorporate and can be a useful tool for identifying novel biomarkers to assist in the diagnosis and personalized treatment. This research was conducted by using microarray pro le datasets which were obtained through gene expression omnibus (GEO) database to perform an integrated analysis of DCIS, and six genes were identi ed as possible biomarkers. The Cancer Genome Atlas Breast Cancer Dataset was used to con rm the expression of these genes, and enrichment analysis was used to clarify the biological role and intrinsic mechanism pathway. Additionally, in proliferation assays, the prognostic value of core genes was veri ed. Our ndings indicated that the six genes are suitable to be used as biomarkers in the diagnosis of DCIS.

DEGs identi cation
Differentially expressed genes (DEGs) between DCIS samples and noncancerous tissues were determined using GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/). In the GEO series datasets, GEO2R can compare two or more samples group using the analysis of variance or the t-test as an R programming language-based dataset analysis tool. Adjusted P-value < 0.05 and |log FC| > 1 were set as the cut-off criteria. BioDBnet (https://biodbnet-abcc. ncifcrf.gov/db/db2db.php) was used to convert identi ers from Gene ID to Gene symbol. DEGs were used to evaluate the overlapping genes in the three microarray pro le datasets using FunRich software (version 3.1.3).

Pathway enrichment analysis and Gene ontology
The Database for Annotation, Visualization, and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov/) and Funrich Software (version 3.1.3) was utilized to determine the biological role of candidate DEGs and possible pathway enrichment. DAVID is a bioinformatics resource on the web that can help with gene annotation, visualization, and integrated discovery. P-value < 0.05 represented as the cut-off criterion for pathway analysis and signi cant function.

PPI network construction and hub genes identi cation
To predict candidate DEGs protein-protein interaction (PPI) network, we used Search Tool for the Retrieval of Interacting Genes database (STRING, version 11.0, http://string-db.org), with 0.400 medium con dence and con dence network edges as the product criterion. The PPI network was then built and analyzed the candidate DEGs encoding protein interactions using Cytoscape software (version 3.7.2, http://www.cytoscape.org/). CytoHubba and MCODE, Cytoscape two plugins, were utilized to explore the hub genes of the PPI network and calculated node degree which is basically the number of interconnections to lter PPI hub genes.
Validation of the identi ed hub genes To predict candidate DEGs protein-protein interaction (PPI) network, we used Search Tool for the Retrieval of Interacting Genes database (STRING, version 11.0, http://string-db.org), with 0.400 medium con dence and con dence network edges as the product criterion. The PPI network was then built and analyzed the candidate DEGs encoding protein interactions using Cytoscape software (version 3.7.2, http://www.cytoscape.org/). CytoHubba and MCODE, Cytoscape two plugins, were utilized to explore the hub genes of the PPI network and calculated node degree which is basically the number of interconnections to lter PPI hub genes.
Hub genes signatures and prognosis analysis Using Oncomine databases (https://www.oncomine.org/resource/login.html), researchers looked at the transcription levels of hub and core genes in various cancer forms. Accompany with gene expression data of 32 types of cancers, Startbase (http://starbase. sysu.edu.cn/index.php) was utilized to estimate the effect of hub genes expression level in breast cancer. Hub gene expression was evaluated by UALCAN (http://ualcan.path.uab. edu/index.html) in molecular subtypes and nodal metastasis. The importance of prognosis of the known hub genes was evaluated using the Kaplan-Meier plotter (https://kmplot.com/analysis/), an online database capable of evaluating the impact of genes on survival in various cancer types. RNA extraction and reverse transcription quantitative polymerase chain reaction (RT-qPCR) Trizol reagent (Takara, Japan) was used to extract total RNA from cultured cells. For reverse transcription, the SuperScript RT kit (Takara, Japan) was used. The SYBR Green PCR package (Takara, Japan) was used for the RT-qPCR assay. The sequences of PCR primers are described in Supplementary Table S1. The calculation for the relative expression levels was performed by using the 2 −ΔΔCt method. The Ct values are basically the average of each gene in triplicate reactions.
Transfection and Small interfering RNA (siRNA) siRNAs of hub genes and the appropriate scrambled control were procured by RiboBio (Shanghai, China). siRNAs target sequences are listed in Supplementary Table S2. By using the manual provided by the supplier, transfection of siRNAs with different cell lines was achieved by using FuGENE HD Transfection Reagent (Promega, Madison, WI, USA).

Proliferation assays
Cell proliferation potential was assessed using MTT, Colony formation, and EdU assays. MTT assay was conducted by seeding 2 × 10 3 cells 96-well plates after the transfection of 24h. Cells were added 20 µL MTT (0.5 mg/mL, Solarbio) after indicated time, then incubated for 4h at 37°C followed by the medium removal, and the formosan precipitate solubilization in 150 µL of DMSO. A microplate reader was used to test the activities of the viable cells at 570 nm. While 8 × 10 2 cells were seeded for 2 weeks in 6-well plates in the Colony formation assay.
The colonies were xed for 30 minutes in 4% paraformaldehyde before being stained with hematoxylin and counted and compared to a control group. The EdU test was conducted following the instructions of the manufacturer while using the Cell-Light Edu Apollo488 In Vitro Imaging Kit (Ribobio). EdU-positive cells percentage was calculated under the uorescence microscope.

Statistical analysis
The results were presented as the mean ± SD of at least three independent experiments. The student's t-test was used to assess signi cance between the experimental and control groups. P < 0.05 was considered statistically signi cant. SPSS version 24.0 was used to perform all calculations.

Identi cation of overlapping DEGs in DCIS
According to the cut-off criteria of P < 0.01 and |logFC| > 1 for selecting DEGs, a total of 5586, 3042, 1757 DEGs were recognized as up-regulated, and 3532, 2917, 2409 DEGs were recognized as down-regulated from GSE7882, GSE21422 and GSE59246 microarray pro le datasets ( Fig. 1a-c), respectively. As shown in Fig. 1d and Fig. 1e, 110 up-regulated and 107 down-regulated genes overlapped across the three datasets. The names of the overlapping genes are shown in Table 1.

Functional enrichment analysis of overlapped DEGs in DCIS
GO enriched functions for the 217 overlapped DEGs were involved in various cellular components (CC), including cytoplasm, cell surface, nucleus, collagen type I trimer, and catalytic step 2 spliceosome for the up-regulated genes, and protein complex, nuclear speck, lamellipodium, and focal adhesion for the down-regulated genes ( Fig.  2a and Table 2). Microtubule binding, procollagen-proline 4-dioxygenase activity, extracellular matrix structural constituent, ATP binding, and transcription corepressor activity were included in the up-regulated DEGs in terms of molecular function (MF), while transcriptional activator activity, transcription factor activity, and RNA polymerase II core promoter proximal region sequence-speci c binding included in the down-regulated DEGs ( Fig. 2b and Table   2). For the biological processes (BP) terms, the DEGs (up-regulated) were enriched for protein autophosphorylation, blood vessel development, RNA polymerase II promoter based negative transcription regulation, mitotic spindle assembly, and response to UV, while on the other hand, negative regulation in response to DNA damages of the inherent apoptotic signaling pathway, cellular response to calcium ion, renal tubule morphogenesis, microglial cell activation, and cell morphogenesis involved in neuron differentiation were all found to be down-regulated DEGs ( Fig. 2c and Table 2).
Furthermore, the DEGs' signaling pathways were enriched. The up-regulated genes were reported to be increased association with BH3-only protein activation, NGF signaling, the C-MYB transcription factor network, the Intrinsic Pathway for Apoptosis, and ERK signaling. The genes that were down-regulated were mostly involved in TRIFmediated TLR3 signaling, CDC42 activity regulation, MAPK targets/Nuclear events mediated by MAP kinases, Toll Receptor Cascades, and Integrin-linked kinase signaling ( Fig. 2d and Table 3).

Module analysis and PPI network construction
The DEGs overlapping revealed a unique set of networks and interactions. The online database of STRING was used to lter 178 DEGs (92 up-regulated and 86 down-regulated genes) from the 217 usually altered DEGs belonging to the PPI network complex. A total of 39 DEGs were excluded from the PPI network. Furthermore, by using Cytoscape software analysis, 456 edges were identi ed in overlapping DEGs. The degree of the PPI network complex defaulter lter ranged from 1 to 64 (Fig. 3a).
Furthermore, the entire PPI network was analyzed through the Cytoscape's MCODE plug-in, the most signi cant module and sixteen nodes were identi ed using degree cut-off = 2, k-core = 2, node score cut-off =0.2, and maximum depth = 100 as the criterion (Fig. 3b). Afterward, as shown in Fig. 3c and Fig. 3d, the rst 27 PPI network genes were chosen using CytoHubba plug-in and nodes degree were analyzed. Twelve core candidate genes were chosen after combining the results of MCODE, CytoHubba, and nodes degree, and all of them were up-regulated DEGs, in the following order: GAPDH, CDH2, COL1A2, HNRNPA2B1, POLR2H, COL1A1, IDH2, NEK2, BIRC5, TRA2B, HNRNPH1, and MELK. They may have a signi cant impact on the progression of DCIS and the prognosis.
Validation of prognostic effectiveness of the hub genes Based on TCGA samples, the expression of the selected six hub genes with prognostic signi cance was further investigated. The ndings revealed that hub gene expression was substantially increased in breast cancer tissues (Fig. S2). The six up-regulated hub genes were subjected to ROC research. These six hub genes' ROC curves all showed favorable prognostic values for DCIS. Moreover, the area under curve (AUC) of GAPDH, CDH2, BIRC5, NEK2, IDH2 and MELK were 0.8876, 0.7552, 0.7499, 0.8457, 0.7841 and 0.9664 (Fig. 5a-f), respectively.
Core gene signatures and prognostic analysis DCIS has been linked to six core genes. Using Oncomine databases, researchers compared the transcriptional levels of core genes in pan-cancers to those in normal samples. In four to twenty-one datasets, the expression level of core genes mRNA was expressively up-regulated in breast cancer patients (Fig. S3). The increased expression of core genes in breast cancer samples has been illuminated, there were also differences in the upregulated degree of core genes in diverse breast cancer molecular types. The expression of GAPDH, NEK2, BIRC5 and MELK was higher in triple-negative breast cancer, a subtype with the worst prognosis, CDH2, and IDH2 was higher in HER2 positive subtype. The up-regulated degree was higher in the subtype with a poor prognosis.
Additionally, high expression of core genes also increases the risk of lymph node metastasis (Fig. 6a-f).
Downregulation of core genes expression inhibits the proliferation T47D, MDA-MB-231, SK-BR-3, MCF-7, BT474, and BT549 are common breast cancer cells, MCF10A is the normal mammary epithelial cell. RT-qPCR was used to compare the core gene expressions in breast cancer and normal cell lines. The result shows that GAPDH, BIRC5, and MELK expressions were highest in MDA-MB-231, CDH2 was highly expressed in T47D, NEK2 was highly expressed in MCF-7 and IDH2 was highly expressed in SK-BR-3 (Fig.   S4). Subsequently, transfection with siRNAs for the downregulation of core gene expression in the corresponding cells (Fig. S5). As shown in the gures, the cell proliferation of breast cancer was obviously inhibited when the expression of core genes was down-regulated through MTT (Fig. 7a-f), Colony formation (Fig. 8a-f), and EdU ( Fig.  9a-f) assays. Data to support the use of a six-gene signature for DCIS diagnosis and prognosis prediction include GAPDH, CDH2, BIRC5, NEK2, IDH2, and MELK.

Discussion
DCIS is a heterogeneous disease that describes the stage of breast cancer before it becomes invasive (17). While the majority of DCIS patients have excellent long-term results, some DCIS patients can still develop invasive breast cancer. Regrettably, current clinical methods result in overtreatment of certain women with DCIS due to confusion about which lesions are at risk of progressing to invasive cancer. As a result, the identi cation of novel prognostic biomarkers is important. Gene signatures based on aberrant mRNAs have recently shown great promise in cancer prognosis prediction.
The researchers looked at the gene expression pro les of 148 DCIS patients and discovered 217 common DEGs, including 101 up-regulated and 138 down-regulated genes. According to the functional enrichment review, the DEGs were mostly associated with protein autophosphorylation, cytoplasm, microtubule-binding, negative regulation of intrinsic apoptotic signaling pathway in response to DNA damage, protein complex, transcriptional activator activity, and RNA polymerase II core promoter proximal region sequence-speci c binding. Signaling pathway enrichment analysis is associated with activation of BH3-only proteins and TRIF mediated TLR3 signaling. With the help of the PPI network, twelve hub genes were chosen to be studied further. While GAPDH, CDH2, BIRC5, NEK2, IDH2, and MELK were found to be negative prognostic genes in DCIS patients. ROC and signatures analysis demonstrated that the core genes can be a useful indicator for DCIS. Additionally, downregulation of core genes expression by transfected with small interfering RNAs inhibits the proliferation of breast cancer cells signi cantly, also suggesting a great potential of utilizing the core genes in DCIS prognosis.
GAPDH, or glyceraldehyde-3-phosphate dehydrogenase, is a housekeeping gene that is often served as an internal control in experiments. Increased GAPDH levels, on the other hand, are seen in a range of human cancers and are often linked to shorter survival times (18-21). The evidence suggests that GAPDH function mechanisms, such as its role in cell survival of tumor, angiogenesis, and posttranscriptional regulation of tumor cell mRNA, are related to deprived prognosis and increased tumor progression for the affected individual (22,23). Surprisingly, the role and mechanism of aberrant GAPDH in DCIS remain unknown. CDH2 (Cadherin 2) as a member of the cadherin superfamily, encodes the N-cadherin protein, plays an imperative role in EMT (Epithelial-mesenchymal transition).
Elevated expression of CDH2 implicated poor prognosis in various cancers such as lung cancer (24), prostatic cancer (25) and glioma (26). Especially, CDH2 was found to be overexpressed in DCIS with invasion which may be an early marker in the absence of histological signs and as a marker of a short-term local recurrence after treatment (27). BIRC5 (also known as Survivin) is an apoptosis inhibitory protein that exerts a role in both the inhibition of cell death and the promotion of cancer cell survival (28). Researches demonstrate that BIRC5 expression is signi cantly increased in many cancers including lung, breast cancer, and colon cancers (29,30). BIRC5 can be used as a predictor marker in different tumors due to its aggregation. As a result, increased survivin expression could be regarded as a prognostic marker associated with increased lymph node invasion, recurrence possibility, and metastasis (31,32). NEK2, Never in Mitosis (NIMA) Related Kinase 2, plays a key role in regulating microtubule stabilization, centrosome separation and duplication, spindle assembly checkpoint, and kinetochore attachment (33). Accumulated evidences have shown that the level of NEK2 was up-regulated in primary tumor tissues or cancer cell lines (34)(35)(36). Furthermore, increased NEK2 overexpression was linked with advanced tumor stage, distant metastases, and lymph node invasion suggesting that it may be used to predict tumor progression and disease prognosis (37)(38)(39). IDH2, isocitrate dehydrogenase 2, performs the oxidative decarboxylation of isocitrate to αketoglutarate (α-KG). IDH2 is the most commonly mutated metabolic gene in cancer, and it disrupts metabolic and epigenetic regulation, promoting tumorigenesis in humans (40). Interestingly, IDH2 frequently showed overexpression rather than a mutation in the following tumors, bladder, breast, and lung cancers. According to the ndings of Li et al, up-regulated wild-type IDH2 promotes proliferation and tumor formation in the lung cancer cell, and is linked to a lower overall survival rate (41). IDH2 has been related to the recurrence of DCIS and progression to invasive disease and is expressed differently in recurrent and non-recurrent DCIS. Furthermore, high wild-type IDH2 expression was linked to a poor patient outcome in DCIS (42)(43)(44). According to microarray and TCGA datasets study, MELK, maternal embryonic leucine zipper kinase, expression is higher in many cancer cells and tissues than in their counterparts (45)(46)(47). MELK expression levels are also linked to high-grade tumors, increased aggressiveness, and poor patient outcomes (48-50). MELK has been recognized as an e cient therapeutic target and prognostic factor in the treatment of cancer, according to research.
It is important to look for promising biomarkers for DCIS diagnosis and prognosis prediction. Identifying a panel of deregulated genes that can increase the biomarker's sensitivity and speci city rather than identifying individual genes is a crucial point to remember when researching genes as DCIS biomarkers. The six core genes, according to our ndings, may be useful prognostic and diagnostic biomarkers for DCIS. However, more research into the expression and prognostic function of the six genes at the protein level is required. The underlying mechanism of the six genes must be clari ed by functional experiments.

Conclusions
The six core genes signature for promising DCIS biomarkers was validated in our research, which may assist in clinical decision-making for individual care.
Declarations analysis. Xuchen Cao and Zhenzhen Liu performed the revision of the manuscript. The nal manuscript owns the approval of all authors.

Funding
None.

Availability of data and materials
This published article contains all of the data analyzed or produced during this research.
Ethics approval and consent to participate None.

Consent for publication
Not applicable.

Competing interests
The writers announce that the manuscript does not contain any con icting interests.        The data were shown as mean ± SD of at least three independent experiments. Signi cance determination was conducted using Student's t-test. **, p < 0.01; ***, p < 0.001.  The data were shown as mean ± SD obtained from at least three independent experiments. Signi cance was determined by Student's t-test. **, p < 0.01.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryTablesandFigures.docx