Identification of differentially expressed genes
We performed differential analysis of 255 samples of luminal A breast cancer and 97 samples of normal breast tissue. Based on the criteria |logFC|>2.199 and P-value <0.05, we identified 1,114 differentially expressed genes, including 453 upregulated genes and 661 downregulated genes, from a total of 20,531 genes (Figure 1). We also performed differential analysis of 87 samples of basal-like breast cancer and 97 samples of normal breast tissue. Based on the criteria |logFC|>2.799 and P-value <0.05, we selected 1,042 differentially expressed genes, including 435 upregulated genes and 607 downregulated genes, from a total of 20,531 genes (Figure 1).
Figure 1 Differentially expressed genes in luminal A and basal-like breast cancer versus normal breast tissue. a: Volcano plot of the differentially expressed genes identified by comparison of 255 luminal A breast cancer samples and 97 normal breast tissue samples. The 453 upregulated genes are shown in red (Up) and the 661 downregulated genes are shown in green (Down). b: Volcano plot of the differentially expressed genes identified by comparison of 87 basal-like breast cancer samples and 97 normal breast tissue samples. The 435 upregulated genes are shown in red (Up) and the 607 downregulated genes are shown in green (Down). Genes that were not differentially expressed are shown in black (Equal).
We compared the differentially expressed genes in the 2 breast cancer subtypes and found that 614 differentially expressed genes were unique to the luminal A breast cancer samples and 542 were unique to basal-like breast cancer samples (Figure 2). The subtypes shared 500 differentially expressed genes. We identified 15 differentially expressed genes with opposite expression patterns in the luminal A and basal-like breast cancer samples. The relationships of the differentially expressed genes are shown in Figure 2.
Figure 2 Expression patterns of the differentially expressed genes in luminal A and basal-like breast cancer. a: Luminal A breast cancer had 614 unique differentially expressed genes. b: Basal-like breast cancer had 542 unique differentially expressed genes. c: The subtypes shared 15 common differentially expressed genes with opposite expression patterns (updownoppo).
Function and pathway enrichment analyses
GO enrichment analysis (Figure 3) revealed that the 614 differentially expressed genes unique to luminal A breast cancer were mainly involved in biological processes, including the antimicrobial humoral response, epidermis development, glial cell differentiation, and the hormone metabolic process. The differentially expressed genes with a relationship to cellular components were significantly associated with multiple components, such as the sarcolemma, apical plasma membrane, ion channel complex, transmembrane transporter complex, and neuronal cell body. The significantly enriched molecular functions of the differentially expressed genes included cation channel activity, substrate-specific channel activity, metal ion transmembrane transporter activity, and passive transmembrane transporter activity. In addition, the significantly enriched KEGG pathways comprised the oxytocin signaling pathway, neuroactive ligand-receptor interaction, ovarian steroidogenesis, vascular smooth muscle contraction, dopaminergic synapses, Staphylococcus aureus infection, and the estrogen signaling pathway (Figure 3).
Figure 3 Enrichment analyses of the differentially expressed genes in luminal A breast cancer. a: GO enrichment analysis of biological processes. b: GO enrichment analysis of cellular components. c: GO enrichment analysis of molecular functions. d: KEGG pathway analysis. P.adjust is the adjusted value of P-value.
GO enrichment analysis (Figure 4) revealed that the 542 differentially expressed genes unique to basal-like breast cancer were mainly involved in biological processes, including organelle fission, nuclear division, nuclear chromosome segregation, sister chromatid segregation, mitotic nuclear division, chromosomal segregation, and DNA-dependent DNA replication. The differentially expressed genes were significantly associated with multiple cell components, such as collagen-containing extracellular matrix, postsynaptic membrane, collagen trimer, and chromosomal and centromeric regions. The significantly enriched molecular functions of the differentially expressed genes included aromatase activity, RNA polymerase II-specific DNA-binding transcription activation activity, oxidoreductase activity, and G protein-coupled peptide receptor activity. In addition, the significantly enriched KEGG pathways were cell cycle, neuroactive ligand-receptor interaction, oocyte meiosis, melanoma, and Cushing syndrome. The detailed results of the analyses are shown in Figure 4.
Figure 4 Enrichment analyses of the differentially expressed genes in basal-like breast cancer. a: GO enrichment analysis of biological processes. b: GO enrichment analysis of cellular components. c: GO enrichment analysis of molecular functions. d: KEGG pathway analysis. P.adjust is the adjusted value of P-value.
PPI network construction
Next, we sought to further understand the functional modules in the PPI networks of the differentially expressed genes unique to luminal A and basal-like breast cancer to identify the key genes for each disease. The MCODE Cytoscape plugin was used to construct the functional modules in the PPI network of the differentially expressed genes unique to luminal A breast cancer. Functional modules with scores >5 were selected. The module in Figure 5 has a score of 6.182 and contains 12 nodes and 24 edges.
Figure 5 Luminal A breast cancer module
We similarly constructed functional modules in the PPI network of the differentially expressed genes unique to basal-like breast cancer. Module 1 has a score of 25.812 and contains 33 nodes and 413 edges; module 2 has a score of 5.818 and contains 12 nodes and 32 edges (Figure 6).
Figure 6 Modules in the PPI network of differentially expressed genes in the basal-like breast cancer subtype. A: PPI network module 1. B: PPI network module 2.
We used the cytoHubba Cytoscape plugin (settings: Hubba_nodes=8, Ranking Method=“DMNC”) to screen for 8 key genes among the differentially expressed genes unique to luminal A breast cancer and basal-like breast cancer (Figure 7). The key genes identified for luminal A breast cancer were GRM4, GRM8, KRT18, NMUR1, MUC1, CX3CL1, GATA3, and NCAM1. The neuroactive ligand-receptor interaction pathway was enriched for GRM4, NMUR1, and GRM8 (P<0.05). The key genes identified for basal-like breast cancer were CENPI, CENPK, CDC7, CCNE2, KIF18A, STIL, CDCA7, and CKS2. The small cell lung cancer pathway was enriched for CCNE2 and CKS2, and the cell cycle pathway was enriched for CDC7 and CCNE2 (P<0.05). The key genes unique to the breast cancer subtypes were primarily present in module 1 of each corresponding PPI network.
Figure 7 Key subtype-specific genes. a: Key genes for luminal A breast cancer. b: Key genes for basal-like breast cancer.
Analysis of prognostic value
We created ROC curves for the 2 sets of key genes. ROC curve analysis showed that these genes exhibited good prognostic value for their associated cancer subtypes. The areas under the ROC curves were greater than 90% for all genes, as shown in Figure 8.
Figure 8 ROC curves of the key genes. A: ROC curves of the key genes, including GRM4, GRM8, KRT18, NMUR1, MUC1, CX3CL1, GATA3, and NCAM1, for luminal A breast cancer. B: ROC curves of the key genes, including CENPI, CENPK, CDC7, CCNE2, KIF18A, STIL, CDCA7, and CKS2, for basal-like breast cancer.
The prognostic values of the selected key genes unique to luminal A breast cancer were analyzed using the PROGgeneV2 online tool [19]. We retrieved the survival curves of the patients from the TCGA database with the corresponding breast cancer subtype and analyzed survival by the expression levels of the key genes (Figure 9). Of the key genes unique to the luminal A breast cancer subtype, the expression levels of only NMUR1 and NCAM1 were associated with patient survival time (P<0.05). Survival analysis showed that higher expression levels of the prognosis-related key genes were associated with shorter survival time of luminal A breast cancer patients.
Figure 9 Survival of patients with luminal A breast cancer by expression of key genes. a: NMUR1. b: NCAM1. Survival and gene expression data were retrieved from TCGA [16]. The cohort was divided at the median gene expression. (P<0.05)
Next, we used the same methodology to analyze the prognostic values of the key genes unique to basal-like breast cancer (Figure 10). Of the key genes unique to basal-like breast cancer, the expression levels of only CDC7, KIF18A, STIL, and CKS2 were associated with patient survival time (P<0.05). Lower than median expression levels of the prognosis-related key genes were associated with better prognosis
Figure 10 Survival of patients with basal-like breast cancer by expression of key genes. a: CDC7. b: KIF18A. c: STIL. d: CKS2. Survival and gene expression data were retrieved from TCGA [16]. The cohort was divided at the median gene expression. (P<0.05)