Identification of overlapping DEGs in DCIS
According to the cut-off criteria of P < 0.01 and |logFC| > 1 for selecting DEGs, a total of 5586, 3042, 1757 DEGs were recognized as up-regulated, and 3532, 2917, 2409 DEGs were recognized as down-regulated from GSE7882, GSE21422 and GSE59246 microarray profile datasets (Fig. 1a-c), respectively. As shown in Fig. 1d and Fig. 1e, 110 up-regulated and 107 down-regulated genes overlapped across the three datasets. The names of the overlapping genes are shown in Table 1.
Functional enrichment analysis of overlapped DEGs in DCIS
GO enriched functions for the 217 overlapped DEGs were involved in various cellular components (CC), including cytoplasm, cell surface, nucleus, collagen type I trimer, and catalytic step 2 spliceosome for the up-regulated genes, and protein complex, nuclear speck, lamellipodium, and focal adhesion for the down-regulated genes (Fig. 2a and Table 2). Microtubule binding, procollagen-proline 4-dioxygenase activity, extracellular matrix structural constituent, ATP binding, and transcription corepressor activity were included in the up-regulated DEGs in terms of molecular function (MF), while transcriptional activator activity, transcription factor activity, and RNA polymerase II core promoter proximal region sequence-specific binding included in the down-regulated DEGs (Fig. 2b and Table 2). For the biological processes (BP) terms, the DEGs (up-regulated) were enriched for protein autophosphorylation, blood vessel development, RNA polymerase II promoter based negative transcription regulation, mitotic spindle assembly, and response to UV, while on the other hand, negative regulation in response to DNA damages of the inherent apoptotic signaling pathway, cellular response to calcium ion, renal tubule morphogenesis, microglial cell activation, and cell morphogenesis involved in neuron differentiation were all found to be down-regulated DEGs (Fig. 2c and Table 2).
Furthermore, the DEGs' signaling pathways were enriched. The up-regulated genes were reported to be increased association with BH3-only protein activation, NGF signaling, the C-MYB transcription factor network, the Intrinsic Pathway for Apoptosis, and ERK signaling. The genes that were down-regulated were mostly involved in TRIF-mediated TLR3 signaling, CDC42 activity regulation, MAPK targets/Nuclear events mediated by MAP kinases, Toll Receptor Cascades, and Integrin-linked kinase signaling (Fig. 2d and Table 3).
Module analysis and PPI network construction
The DEGs overlapping revealed a unique set of networks and interactions. The online database of STRING was used to filter 178 DEGs (92 up-regulated and 86 down-regulated genes) from the 217 usually altered DEGs belonging to the PPI network complex. A total of 39 DEGs were excluded from the PPI network. Furthermore, by using Cytoscape software analysis, 456 edges were identified in overlapping DEGs. The degree of the PPI network complex defaulter filter ranged from 1 to 64 (Fig. 3a).
Furthermore, the entire PPI network was analyzed through the Cytoscape's MCODE plug-in, the most significant module and sixteen nodes were identified using degree cut-off = 2, k-core = 2, node score cut-off =0.2, and maximum depth = 100 as the criterion (Fig. 3b). Afterward, as shown in Fig. 3c and Fig. 3d, the first 27 PPI network genes were chosen using CytoHubba plug-in and nodes degree were analyzed. Twelve core candidate genes were chosen after combining the results of MCODE, CytoHubba, and nodes degree, and all of them were up-regulated DEGs, in the following order: GAPDH, CDH2, COL1A2, HNRNPA2B1, POLR2H, COL1A1, IDH2, NEK2, BIRC5, TRA2B, HNRNPH1, and MELK. They may have a significant impact on the progression of DCIS and the prognosis.
Kaplan-Meier survival analysis
The Kaplan-Meier survival plot was used to evaluate the prognostic details of the twelve core candidate genes. The twelve genes, including GAPDH, CDH2, COL1A2, HNRNPA2B1, POLR2H, COL1A1, IDH2, NEK2, BIRC5, TRA2B, HNRNPH1 and MELK, were used to plot the survival curves by uploading them to database. High expression of GAPDH (P = 7.9e-07), CDH2 (P = 0.0016), BIRC5 (P = 1.6e-08), NEK2 (P = 1.3e-06), IDH2 (P = 0.0076) and MELK (P = 9.2e-11) were correlated with poor patient’s OS, while COL1A2 (P = 0.53), HNRNPA2B1 (P = 0.16), POLR2H (P = 0.051), COL1A1 (P = 0.28), TRA2B (P = 0.55) and HNRNPH1 (P = 0.46) expression were not relevant to OS (Fig. 4 and Fig. S1). Survival curves also showed that high expression of GAPDH (P = < 1e-16 and 2.9e-06), CDH2 (P = 0.011 and 0.0098), BIRC5 (P = < 1e-16 and 6e-10), NEK2 (P = < 1e-16 and 5.3e-10), IDH2 (P = 7.8e-14 and 0.00049) and MELK (P = < 1e-16 and 4.9e-14) were significantly associated with worse RFS and DMFS (Fig. 4).
Validation of prognostic effectiveness of the hub genes
Based on TCGA samples, the expression of the selected six hub genes with prognostic significance was further investigated. The findings revealed that hub gene expression was substantially increased in breast cancer tissues (Fig. S2). The six up-regulated hub genes were subjected to ROC research. These six hub genes' ROC curves all showed favorable prognostic values for DCIS. Moreover, the area under curve (AUC) of GAPDH, CDH2, BIRC5, NEK2, IDH2 and MELK were 0.8876, 0.7552, 0.7499, 0.8457, 0.7841 and 0.9664 (Fig. 5a-f), respectively.
Core gene signatures and prognostic analysis
DCIS has been linked to six core genes. Using Oncomine databases, researchers compared the transcriptional levels of core genes in pan-cancers to those in normal samples. In four to twenty-one datasets, the expression level of core genes mRNA was expressively up-regulated in breast cancer patients (Fig. S3). The increased expression of core genes in breast cancer samples has been illuminated, there were also differences in the up-regulated degree of core genes in diverse breast cancer molecular types. The expression of GAPDH, NEK2, BIRC5 and MELK was higher in triple-negative breast cancer, a subtype with the worst prognosis, CDH2, and IDH2 was higher in HER2 positive subtype. The up-regulated degree was higher in the subtype with a poor prognosis. Additionally, high expression of core genes also increases the risk of lymph node metastasis (Fig. 6a-f).
Downregulation of core genes expression inhibits the proliferation
T47D, MDA-MB-231, SK-BR-3, MCF-7, BT474, and BT549 are common breast cancer cells, MCF10A is the normal mammary epithelial cell. RT-qPCR was used to compare the core gene expressions in breast cancer and normal cell lines. The result shows that GAPDH, BIRC5, and MELK expressions were highest in MDA-MB-231, CDH2 was highly expressed in T47D, NEK2 was highly expressed in MCF-7 and IDH2 was highly expressed in SK-BR-3 (Fig. S4). Subsequently, transfection with siRNAs for the downregulation of core gene expression in the corresponding cells (Fig. S5). As shown in the figures, the cell proliferation of breast cancer was obviously inhibited when the expression of core genes was down-regulated through MTT (Fig. 7a-f), Colony formation (Fig. 8a-f), and EdU (Fig. 9a-f) assays. Data to support the use of a six-gene signature for DCIS diagnosis and prognosis prediction include GAPDH, CDH2, BIRC5, NEK2, IDH2, and MELK.