Identification of Potential Genes in Her2+ Breast Cancer by Co-expression Network Analysis

Breast cancer (BC) is one of the most common malignancies in women all over the world. For patients with human epidermal growth factor receptor 2 positive (Her2+) BC, although the widespread use of anti-Her2 therapy has make for survival increased, improved survival has led to an increased demand for better management and diminish toxic side effects of anticancer treatments. This study aimed to identify the potential biomarkers associated Her2 + BC in order to improve the overall management of BC treatment. Our research downloaded GSE54140 gene expression datasets from the Gene Expression Omnibus data sets, and used weighted gene co-expression network analysis (WGCNA) to developed a scale-free gene co-expression network to explore the associations between gene sets and clinical features. A total of 60 modules were analyzed, and found that the skyblue3 module was significantly related to Her2 + BC. The function of 93 genes in the skyblue3 module was annotated by DAVID bioinformatics tool, and it was demonstrated that the function of the module was mainly related to nuclear-transcribed mRNA catabolic process, cytosol, and oxidoreductase activity. Based on the WGCNA and Cytoscape software analysis, 9 hub genes (PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, STARD3, and NEUROD2) were identified. The Human Protein Atlas database detected that the protein level of PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, and STARD3 in tumor tissues were different from normal tissues. And survival analysis shows that PGAP3, PNMT, ERBB2, TCAP, and STARD3 were negatively associated with the overall survival (P < 0.05).


Abstract Background
Breast cancer (BC) is one of the most common malignancies in women all over the world. For patients with human epidermal growth factor receptor 2 positive (Her2+) BC, although the widespread use of anti-Her2 therapy has make for survival increased, improved survival has led to an increased demand for better management and diminish toxic side effects of anticancer treatments. This study aimed to identify the potential biomarkers associated Her2 + BC in order to improve the overall management of BC treatment.

Results
Our research downloaded GSE54140 gene expression datasets from the Gene Expression Omnibus data sets, and used weighted gene co-expression network analysis (WGCNA) to developed a scalefree gene co-expression network to explore the associations between gene sets and clinical features.
A total of 60 modules were analyzed, and found that the skyblue3 module was significantly related to Her2 + BC. The function of 93 genes in the skyblue3 module was annotated by DAVID bioinformatics tool, and it was demonstrated that the function of the module was mainly related to nucleartranscribed mRNA catabolic process, cytosol, and oxidoreductase activity. Based on the WGCNA and Cytoscape software analysis, 9 hub genes (PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, STARD3, and NEUROD2) were identified. The Human Protein Atlas database detected that the protein level of PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, and STARD3 in tumor tissues were different from normal tissues. And survival analysis shows that PGAP3, PNMT, ERBB2, TCAP, and STARD3 were negatively associated with the overall survival (P < 0.05).

Conclusions
In total, 9 candidate biomarkers were identified by comprehensive analysis, among which, the coexpansion of PGAP3, CRKRS, STARD3 and NEUROD2 related to ERBB2 may be associated with the occurrence of Her2 + BC. In addition, PPP1R1B, CRKRS and TCAP are related to drug resistance and adverse reactions in the treatment of Her2 + BC.

Background
Breast cancer (BC) is the most common cancer of women, which severely intimidates women's health [1]. Nearly 1.3 million women worldwide are pronounced with BC every year, and more than 400,000 4 network comply a scale-free distribution, in this section, the power of β value of 3 was decided to develop a co-expression network, as shown in Figure 1. We then calculated the TOM of each gene pair, 60 modules were displayed by hierarchical clustering according to degree of TOM's difference as shown in Figure 2. Each module had a group of coordinately expressed genes with high TOM, meaning that the intrinsic genes may participate in the same biological process. In order to distinguish the various modules, each module was assigned a unique color. 400 genes were randomly selected to make a TOM heat map. As shown in Figure 3, the darker the color, the higher the TOM, which suggests the high degree of independence between modules and the relative independence of gene expression in each module. The association analysis between tumor characteristics and co-expression modules was carried out to identify the correlation between MEs and tumor characteristics. As shown in Figure 4, Module-trait relationships heat map, the eigengene of the skyblue3 module (93 genes) were undoubtedly correlated with Her2+ BC (cor=0.74, P=3e -12 ). By calculating the correlation coefficient of GS and MM in skyblue3 module, and performing cluster analysis on the traits and modules, the significant correlation between skyblue3 module and Her2+ BC was determined, and the credibility of the results was confirmed, as shown in Figure 5 and 6. Consequently, further analysis was performed in module skyblue3 gene.

Functional enrichment analysis of genes in modules
The names of all genes in the skyblue3 module were uploaded to DAVID bioinformatics tool. For the BP, the genes were mainly denriched in nuclear-transcribed mRNA catabolic process, response to organophosphorus, positive regulation of hepatocyte proliferation, regulation of microtubule-based process, et al, as shown in Table 1. The genes in the CC group were mainly concentrated in cytosol, cytoplasm, ribosome, cytosolic large ribosomal subunit, et al, as shown in Table 2. For the MF, the genes were mainly enriched in oxidoreductase activity, protein binding, actin filament binding, aldehyde dehydrogenase [NAD(P)+] activity, et al, as shown in Table 3.

Hub genes identification and analysis
Our research caught that the skyblue3 module had powerful correlation with Her2+ BC, which suggests that hub genes may exists in skyblue3 module. Then we used Cytoscape software to 5 visualize the network of the skyblue3 module which was connected into the module through WGCNA calculation. Using MCOD function and correlation degree analysis, the MCODE score > 59.5 was presented, with the hub genes that degree > 74 as the hub gene, represented by red dots, as shown in Figure 7. The hub genes in the skyblue3 module included PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, STARD3, and NEUROD2. Moreover based on The Human Protein Atlas database, the protein levels of PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, and STARD3 in tumor tissues were different from normal tissues, as shown in Figure 8. Then, all the hub genes underwent survival analysis by cBioPortal datasets. As shown in Figure 9, PGAP3, PNMT, ERBB2, TCAP, and STARD3 were negatively associated with the overall survival.

Discussion
BC is a life-threatening disease for females, especially Her2 + BC with rapid tumor progression, easy recurrence and metastasis and poor prognosis. Despite of the great progress in Her2 + BC treatment options in the past decades, much more remains to be done, more cancer-driving genes need to be identified. Therefore it is critical to find more of the potential genes involved in the development and progression of Her2 + BC.
In this work, we used WGCNA to dynamically study genes co-expression in Her2 + BC, Luminal-A BC, and Luminal-B BC, and to explore the related modules and hub genes. We concluded that the skyblue3 module had the highest correlation with Her2 + BC. Skyblue3 module related to Her2 + BC has well defined functions including nuclear-transcribed mRNA catabolic process, cytosol, and oxidoreductase activity. In the genes of the skyblue3 module, PGAP3, PPP1R1B, PNMT, ERBB2, CISD3, CRKRS, TCAP, STARD3, and NEUROD2 were regarded as the hub genes.
ERBB2, generally named Her2, is amplified or overexpressed in 20-30% of invasive breast carcinomas [7]. Moreover, amplification and overexpression of Her2 gene has also been detected in ovarian and gastric cancer [8][9][10]. Her2 integrates firmly to other epidermal growth factor receptor family members to assemble a heterodimer, which reinforce kinase-mediated activation of downstream signaling pathways, such as PI3K/AKT activation and RET signaling, so as to facilitate the increment and metastasis of cancer cells [11]. As a cell surface related protein, the most critical treatment for Her2 -positive cancer is anti-Her2 strategy, but the subsequent Her2 resistance is a new clinical puzzler [12]. Now, an increasing number of teams have found that many oncogenes co-amplify with the Her2 gene.
PGAP3, CRKRS, STARD3 and NEUROD2 were co-amplified with the Her2 gene. PGAP3, glycosylated phosphatidylinositol (GPI)-specific phospholipase members, is principally involved in the protein distribution and pack [13]. This group confirmed that concurrent amplification of copy number variation at PGAP3 and Her2 loci were detected in BC tissues [14]. Since Her2 gene is significantly associated with the occurrence and invasion of BC, the amplification of PGAP3 gene may affect the effect of Her2 gene on BC. CRKRS, cyclin-dependent kinase 12, also known as CDK12, which phosphorylates RNA polymerase II and modulates transcription factors [15]. Recent studies confirmed that genomic stability needs to be maintained by Cdk12 regulating DNA repair genes by suppressing intron polygadenylation [16][17][18]. This group study found that CRKRS could induce drug resistance in breast cancer endocrine therapy, which was related to its silent activation of mitogen-activated protein kinase signaling pathway, leading to the loss of estrogen receptor dependence [19]. In breast cancers, CDK12, as a carcinogene, is constantly co-amplified with the ERBB2 oncogene, which suggests that the tumor is prone to invasion and metastasis and poor prognosis [20,21]. This group study found that CDK12 can not only regulate RNA processing and DNA repair, but also the overexpression of CDK12 can enhancement the oncogenicity of BC cells [22]. STARD3-related protein located on the cell membrane of late endosomes may be involved in fostering the carriage or assignment of cholesterol and sphingolipid to the intracellular membrane compartments and the catabolism of steroid hormones [23,24]. Recent publication confirmed that the expression level of STARD3 protein is forcefully associated with Her2 amplification, and the high expression of STARD3 may boost BC invasion by increasing membrane cholesterol and intensifying oncogene signal [25].
NEUROD2-related protein is part of the family of neurogenic basic helix-loop-helix proteins, which affect the adjust and control of glutamate and GABA genes [26]. Miangela's research discovered that the copy number of Her2 and NEUROD2 increased in male BC tissue, and merely NEUROD2 amplification appear to have an independent prognostic effect [27].
PPP1R1B and TCAP are related to anti-Her2 resistance and treatment. PPP1R1B encodes multiple transcripts and two experimentally-documented proteins Darpp-32 and t-Darpp [28]. Recent publications suggested that the protein encoded by PPP1R1B was overexpressed in BC, esophagus cancer, lung cancer, et al [29,30]. Professional team proved that the resistance of trastuzumab, a key drug in anti-Her2 therapy, was related to the regulation of IGF-1R and AKT signal pathways by t-Darpp [31,32]. The protein encoded by TCAP gene, widely distributed in cardiomyocytes, is mainly involved in cardiac conduction and striated muscle contraction [33]. Although doxorubicin and anti-Her2

Conclusion
In summary, our study used a systems biology based WGCNA approach to determine the module highly related to Her2 + BC, whose function was mainly concentrated in nuclear-transcribed mRNA catabolic process, cytosol, and oxidoreductase activity. Although most of the hub genes accentuated in this study have been reported in previous reports, there has been no comprehensive and all-round analysis. Undoubtedly, ERBB2 plays an important role in the occurrence and development of BC. In addition, we found that PGAP3, CRKRS, STARD3 and NEUROD2 were associated with co-amplification of ERBB2, PPP1R1B may mediate anti-ERBB2 drug resistance by activating IGF-1R and AKT signal pathway, and TCAP may be related to cardiomyopathy caused by doxorubicin or Trastuzumab. As 8 endocrine therapy is the most important treatment for hormone receptor positive breast cancer. It has been found that CRKRS mediates endocrine drug resistance by affecting mitogen-activated protein kinase signal pathway. Hub genes such as PGAP3 and STARD3 were highly correlated with Her2 + BC development. However, PPP1R1B, CRKRS, and TCAP may be brought new insights in BC study in treatment target. Further investigation about these genes is warranted.

Materials And Methods
Microarray data and data pre-processing www.rstudio.com) backed by R language platform (3.6.2 version; www.r-project.org) and relevant software packages were developed to arithmetic data. Before WGCNA, this study pre-processed the microarray Data: the probe name was transformed to the gene name based on the platform information, and the microarray data were standardized through the R function. After standardization and probe switch, the dataset accommodating 15,612 genes was further processed, and the top 25% of genes with the highest degree of variation (11,709 genes) were screened by analysis of variance for co-expression network analysis.

Co-expression network construction
After pre-processing, the selected expression data profiles were developed to a gene co-expression network applying WGCNA package in R (www.cran.r-project. org/web/packages/WGCNA /index.html).
The gradient method was employed to weigh the connection strength between each pair of nodes and assess the independence and average degree connectivity of the assorted modules with different power values (range [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. In this work, the value whose independence degree was greater than 0.8 for the first time was seems as the appropriate power value (β) to ensure a scale-free network. The hierarchical clustering dendrograms were developed utilizing correlation coefficients between genes, and genes with analogous expression profiles were divided into the same gene module. In the cluster dendrograms, different branches mean different gene modules, and different colors refer to different modules. Module indicates a gene set with high topological overlap. Subsequently, the soft threshold power was applied to translate the adjacency matrix into topological overlap measure (TOM), and 400 genes were randomly distributed to make a TOM heat map to validate the high independence between modules and the relative independence of gene expression in each module.

Module and clinical trait association analysis
Module Eigengenes (MEs) were described as the first principal component of each gene module, and the expression of MEs was considered as a representative of all genes in a given module. The

Enrichment analysis of module
The function enrichment analysis of modules was developed by Gene Ontology (GO) pathway enrichment analysis employing the DAVID bioinformatics tool (version 6.8; www.david.ncifcrf.gov/) to explore the biological function. Go term enrichment analysis refer to biological process (BP), cellular component (CC), and molecular function (MF). P < 0.05 as the threshold was considered statistically significant.

Hub genes identification and analysis
The corresponding heat map was procured by assaying the correlation between each module and the clinical traits of the sample, and the module with the highest correlation was imported into Cytoscape software (version 3.7.2; www.cytoscape.org/) for analysis and visualization. MCOD, a plug in Cytoscape, was applied to screen the module. Module with highest MCODE score (53.9) were presented, and then the hub genes that degree > 74 were identified by analysis. The Human Protein Atlas database (www.proteinatlas.org) was employed to identify the protein expression of hub genes by comparing normal tissue and cancer tissue. Subsequently, to validate the hub genes, we took the Breast Cancer (METABRIC, Nature 2012 & Nat Commun 2016) from cBioPortal database (www.cbioportal.org/), consisting of 2,509 breast cancers. Upload hub gene name to survival analysis of cBioPortal to determine the importance of genes in biological processes.

Statistical method
This work ran R language software and WGCNA for statistical calculation. WGCNA was used to structure free-scale gene co-expression networks to determine the communication between genes, thereby enabling the identification of modules (clusters) of highly correlated genes, and the hub gene in each module. P < 0.05 was considered statistically significant. Hypergeometric test was used for enrichment analysis, and Kaplan-Meier statistical method was used for survival analysis.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate
Not applicable.

Consent for publication
All the authors have consented for the publication.    Hierarchical cluster dendrograms of all expressed genes based on topological overlap.
19 Figure 3 TOM heatmap between genes. The heat map describes the topological overlap matrix between 400 random genes. A darker red color indicates higher overlap.
20 Figure 4 Heatmap of the correlation between module eigengenes and clinical traits of breast cancer.
Each cell contains the correlation and p value of the corresponding module and character.
21 Figure 5 Scatterplot of gene significance of HER2+ BC versus intramodular module membership in the skyblue3 module.
22 Figure 6 Eigengene dendrogram representing the relationships between modules and HER2+ BC traits.
23 Figure 7 Visualization of weighted gene co-expression network analysis (WGCNA) network connections of the intramodular hub genes. The hub genes represented by red dots have the highest degree of association with other genes in the module.