Quality control of microarray
Quality control of the three data sets of microarrays was conducted. The results showed that the overall chip quality of GSE7904 was good and RNA degradation was small (Figure.S1). The overall chip quality of GSE10797 is general, and RNA degradation was small (Figure.S2). and the GSE103512 chipset quality is slightly worse, RNA degradation is obvious (Figure.S3). based on these quality control analysis, 4 data of chips, 14 data of chips in the GSE10797 and GSE103512, respectively, were removed in the following studies (see supply information).
the normalization of the three datasets showed that between the experimental group and the control group, the expression of most genes detected were consistent, and the consistency between parallel experiments is stronger. Relative logarithmic expression (RLE) box plots reflect these trends showed that the overall trend of the fitting curve is consistent, indicating that the chip pretreatment effect is good (Figure 1). The normalized datasets were suitable for analysis of DEGs.
Significant Expression Changes of mRNAs in Breast Tumor
To determine the mRNA levels of genes in the breast tumor, we evaluated the mRNAs information in GSE7904, GSE10797, and GSE103512 by case-control design (Fig 2A, B, C). the results showed that contrast with the normal group, 338 genes were upregulated and 843 genes were downregulated in the GSE7904, 4 genes were upregulated and 80 genes were downregulated in GSE10797, and 48 genes were upregulated and 142 genes were downregulated in GSE103512, respectively (Figure 3A, B and C). We have listed the first ten DEGs sorted by the fold change (Table 1). Furthermore, levels of "MYH11", "IGFBP6", "WLS", "ANXA1", "FOSB", "KIT","FOS" were different between tumor and normal tissues in the three data sets, indicating their potential involvement in the progression of breast cancer (Figure 2D). For further analysis, we chose the 7 genes evidently different mRNAs that were conserved among the three datasets: "MYH11", "IGFBP6", "WLS", "ANXA1","FOSB", "KIT","FOS". Cbioportal tools were used to check the 7 genes in the BRCA database (Fig.3D). The results showed the status of the 7 genes existed in BRCA and MYH11 was the significant one.
DEGs enrichment analysis and its visualization
GO and KEGG pathway analyses were performed to elucidate the biological function of the DEGs from three datasets (Fig .4). the results showed that in the data set of GSE7904, the significantly enriched GO terms at biological process domain: chromosome segregation, organelle fission, nuclear division, nuclear chromosome segregation, sister chromatid segregation, mitotic nuclear division, mitotic sister chromatid segregation, regulation of mitotic nuclear division, regulation of chromosome separation, regulation of sister chromatid segregation. While the most obvious connection of pathways was, complement and coagulation cascades, ECM-receptor interaction, focal adhesion, Pathways in cancer, Cell cycle. In the data set of GSE10797, the significantly enriched GO terms at biological process domain were gland development, leukocyte migration, response to radiation, response to a steroid hormone, cognition, learning or memory, digestive system development, digestive tract development, cell-substrate junction assembly, response to progesterone. The pathways were Pathways in cancer, Focal adhesion, Hematopoietic cell lineage, Protein digestion and absorption, ECM-receptor interaction. In the data set of GSE103512, the enriched GO terms were extracellular structure organization, extracellular matrix organization, reactive oxygen species metabolic process, response to reactive oxygen species, cellular response to reactive oxygen species, glomerulus development, regulation of nitric oxide biosynthetic process, renal system vasculature development, glomerulus vasculature development, metanephric glomerulus development. The pathways were PPAR signaling pathway, Tyrosine metabolism, Arachidonic acid metabolism, Complement and coagulation cascades, and Cytokine-cytokine receptor interaction.
DEGs alterations may play important role in BRCA patient’s survival
Survival analysis is widely used in clinical and epidemiological research. In this study, survival analysis was performed to assess the 7 genes in patient’s survival (Fig 5). The results showed that the high expression of FOS, FOSB, IGFBP6, KIT, MYH11 and WLS were associated with better overall survival (OS) for BRCA patients (HR>0.6, log rank P<0.0001), and the high expression of ANXA1 was associated with worse overall survival (HR=1.2(1.06-1.35) log rank P= 0.0044, Figure 5). To determine the correlation between DEGs and BRCA, the expression levels of the 7 genes were evaluated by GEPIA, the result showed that the expression of these 7 genes was down-regulated in BRCA (P<0.05, Figure 6).
Various genes and medicines specific to DEGs reveal the 7 genes as potential targets for BRCA treatment
To evaluate if this DEGs could be targeted to BRCA therapeutics, a website integrated PhosphoSite, KEGG Drugs, pid, HumanCyc, Reactome, PANTHER, and DrugBank databases was used to analyze the correlation of these genes and drugs. Results showed that 22 kinds of FDA approved drugs were target to ANNXA1, 11 kinds of FDA approved drugs and 10 kinds of non-FDA approved drugs were target to KIT, 2 kinds of FDA approved drugs were target to MYH11, and 1 kind of FDA approved, drugNadroparin, was target to FOS (Fig 7), however, there were no drug target to IGFBP6, FOSB, and WLS until now.