In this study, GSE54140 (15,612 genes) composed of gene expression data from 63 samples including HER2-positive BC, Luminal-A BC and Luminal-B BC was selected and conducted by the R software function, and then microarray data were standardized. Using platform file GPL10152, the probe name was converted into the gene name, finally, the microarray data with row name as sample name and column name as gene name and the top 25% most variant genes by analysis of variance (11,709 genes) were selected for the construction of co-expression network.
Co-expression network construction and key modules identification
The co-expression analysis was carried out to construct the co-expression network. Select the appropriate weighting coefficient β to ensure a scale-free network, in this study, the power of β value of 3 was selected to construct a co-expression network, as shown in Figure 1. We then calculated the TOM for each gene pair, 60 modules were displayed by hierarchical clustering according to degree of TOM's difference as shown in Figure 2. Each module contained a group of coordinately expressed genes with high TOM, and was potentially involved in shared biological processes. To distinguish the modules individually, each module was assigned a unique color. 400 genes were randomly selected to make TOM heat map, as shown in Figure 3, which shows the high degree of independence between modules and the relative independence of gene expression in each module. The association analysis between tumor characteristics and co-expression modules was carried out to identify the correlation between MEs and tumor characteristics. As shown in Figure 4, Module-trait relationships heat map, the eigengene of the skyblue3 module (93 genes) had significant correlation with HER2+ BC (cor=0.74, P=3e-12). By calculating the correlation coefficient of GS and MM in skyblue3 module, and cluster analysis of traits and modules, the significant correlation between skyblue3 module and HER2+ BC was determined, and the credibility of the results was verified, as shown in Figure 5 and 6. Hence, genes in the skyblue3 module were selected for further analysis.
Functional enrichment analysis of genes in modules
All the name of genes in the skyblue3 module was submitted to DAVID bioinformatics tool. For the BP, the genes were mainly enriched in nuclear-transcribed mRNA catabolic process, response to organophosphorus, positive regulation of hepatocyte proliferation, regulation of microtubule-based process, response to drug, tetrahydrofolate metabolic process, regulation of phosphatidylinositol 3-kinase signaling, cellular aldehyde metabolic process, lysine catabolic process, as shown in Table 1. The genes in the CC group were mainly enriched in cytosol, cytoplasm, ribosome, cytosolic large ribosomal subunit, membrane, neuronal cell, as shown in Table 2. For the MF, the genes were mainly enriched in oxidoreductase activity, protein binding, actin filament binding, aldehyde dehydrogenase [NAD(P)+] activity, large ribosomal subunit rRNA binding, 3-chloroallyl aldehyde dehydrogenase activity, as shown in Table 3. The KEGG pathway analysis was performed and showed that genes are mainly involved in Regulation of actin cytoskeleton (P > 0.05).
Hub genes identification and analysis
We found that the skyblue3 module had significant correlation with HER2+ BC, which suggests that hub genes may exists in skyblue3 module. Then we used Cytoscape software to visualize the network of the skyblue3 module and the intra-modular connectivity, which was calculated by WGCNA. Using MCOD function and correlation degree analysis, the MCODE score >59.5 were presented, and then the hub genes that degree > 74 were regarded as the hub gene represented by a red dot, as shown in Figure 7. The hub genes in the skyblue3 module included PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, STARD3, and NEUROD2. Moreover，based on The Human Protein Atlas database, the protein levels PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, and STARD3 gene in tumor tissues were significantly higher than those in normal tissues, as shown in Figure 8. Then, all the hub genes underwent survival analysis using cBioPortal datasets. As shown in Figure 9, PGAP3, PNMT, ERBB2, TCAP, and STARD3 were negatively associated with the overall survival.