DOI: https://doi.org/10.21203/rs.3.rs-2188589/v1
Poor prognosis and low survival rate always hinder the research on the mechanism and pathology of triple-negative breast cancer (TNBC). With the rapid development of sequencing technology, the sequencing data of TNBC are becoming more widely available. This study attempted to reveal the potential biology of TNBC at both transcriptome and single-cell sequencing levels. Alteration in the differentially expression genes of TNBC was uncovered at the transcriptome level through the dataset GSE62931 and the cell ratio changes were displayed at the single-cell level through six samples from the dataset GSE161529. TNBC was identified at both transcriptome and single-cell levels. Compared with the non-TNBC group, 475 differentially expressed genes were obtained at the transcriptome level in the TNBC group. The differentially expressed genes were mainly enriched in microtubule binding, chromosome segregation, and response to xenobiotic stimulus, as well as pathways in cancer, Tyrosine metabolism, and Mucin type O-glycan biosynthesis. A sub-module with high correlation was further identified after screening of TNBC-related DEG. Compared with the non-TNBC group, the proportions of natural killer T cells, luminal epithelial cells, B cells, and basal cells in the TNBC group were significantly decreased at the single-cell level, but the proportions of T cells, monocytes, and neural progenitor cells significantly increased. The transcriptome results could be combined with the single-cell sequencing results through the submodule, based on which we studied the key genes related to the prognosis of TNBC patients, including RRM2, TPX2, CENPF and TOP2A. We found that the expression of these key genes at the single-cell level was heterogeneous with that at the overall cellular level. To conclude, the expression of RRM2, TPX2, CENPF, and TOP2A is heterogeneous at the cellular level, inconsistent with the expression of TNBC at the overall cellular level. Therefore, it is necessary to combine the changes at the gene level and the cellular level for research.
Triple-negative breast cancer (TNBC) is a specific subtype of breast cancer that is negative for estrogen receptor, progesterone receptor, and human epidermal growth factor receptor-2 (HER2) in immunohistochemical tests [1]. Poor prognosis and low survival rate are two major problems that impede understanding TNBC progression and pathogenesis [2]. As high-throughput sequencing technology develops rapidly, large amounts of sequencing data related to TNBC have generated. A current concern is paid to better reveal the potential significance of TNBC based on these sequencing data.
Bioinformatics algorithms have been widely used to explore the molecular mechanism of TNBC. We can better shape the understanding of TNBC occurrence and development at the transcriptome level and predict the pathogenesis of TNBC by identifying key markers and pathways [3–7]. However, genetic studies cannot reveal TNBC at the cellular level. Therefore, single-cell sequencing technology is beneficial to reveal the development of TNBC at the cellular level, as well as to explore the heterogeneity between tumors and tumor microenvironment alteration during the progression of TNBC [8–10].
In this study, several samples from the dataset GSE62931 at the transcriptome level and the dataset GSE161529 at the single-cell level were collected and used to explore differentially expressed genes (DEGs) in TNBC and their expression changes at these two levels based on bioinformatics algorithms. This study attempted to uncover the potential biology of TNBC, thereby providing a theoretical basis for the prognosis, treatment, and precise prevention of TNBC.
In this study, the dataset GSE62931 was obtained after TNBC-related dataset filtrating in the GEO database. GSE62931 was available in the GEO database on December 9, 2019. There are 100 samples, including 53 non-TNBC samples and 47 TNBC samples. Other specific information can be yielded at the website: https://ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62931. The Limma package was used to screen DEGs and the screening criteria were |log2(Fold Change)|>1 &adj.P value < 0.05.
To further clarify the biological significance of DEGs in TNBC, GO and KEGG enrichment analyses were performed to determine the enriched functions and pathways of DEGs in TNBC. The GO analysis classified DEGs in TNBC into three categories: molecular function, cellular component, and biological process. The functional annotations of DEGs in TNBC were exhibited at the DAVID website. Significant functions and pathways were selected at a P value of < 0.05.
After screening, DEGs in TNBC were further identified based on their association. The sub-modules composed of DEGs highly related to TNBC were then determined, which could reveal significance for the progression of TNBC. The associations between DEGs were mainly visualized on the STRING website using Cytoscape software. Moreover, the key submodule was determined through the MCODE module, and then their functions were displayed using the clueGO module.
In this study, the data set GSE161529 was retrieved using single-cell sequencing technology from the GEO database. The data set was available in the GEO database on March 19, 2021. Six single-cell sequencing samples containing all cell types were yielded, including three normal samples (GSM4909265, GSM4909266, and GSM49092683) and three TNBC samples (GSM4909281, GSM4909283, and GSM4909284). Other specific information can be yielded at the website: https://ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529. The Seurat package was used for single-cell sequencing analysis of TNBC, and the main steps included data preprocessing, data integration, cell clustering, and cell annotation.
The expression of key prognostic genes of TNBC at the single-cell level was displayed through sub-module screening and relevant literature retrieval. Changes of the key prognostic gene relative expression in cell subsets were compared between the TNBC and non-TNBC groups, showing the expression of key prognostic genes in each cell type in both states. Relative expression is a combination of gene counting and expression in cells to calculate the expression of each gene in each cell type.
A total of 475 DEGs related to TNBC, including 262 downregulated DEGs and 211 upregulated DEGs, were obtained after screening (Fig. 1A). The screening criteria were: |log2(Fold Change)|>1 & adj.P value < 0.05. The expression of these DEGs in the samples is shown in Fig. 1B.
GO functional and KEGG pathway enrichment analyses indicated 171 GO molecular functions and 28 KEGG metabolic pathways according to a P value of < 0.05, respectively. Top 15 functions and pathways are displayed in Fig. 2. In the GO functional annotation analysis, the DEGs were mainly enriched in five biological processes, eight cellular components, and two molecular functions, mainly including microtubule binding, chromosome segregation, response to xenobiotic stimulus, extracellular region, and apical plasma membrane. In the KEGG pathway analysis, the DEGs were mainly enriched in cancer pathway, tyrosine metabolism pathway, Mucin type O-glycan biosynthesis pathway, p53 signaling pathway, Rap1 signaling pathway, and Chemical carcinogenesis - receptor activation pathway.
Correlation between DEGs in TNBC was screened through the protein-protein interaction network. MCODE, a Cytoscape plug-in, was used to identify key sub-networks and genes. The submodule composed of 37 DEGs was finally yielded, the genes in which were highly correlated and closely related (Fig. 3A). Ten key DEGs in the module were determined using the cytoHubba plugin through sorting and calculation of the maximal clique centrality, including ASPM, RRM2, NCAPG, CDCA8, KIF2C, TPX2, KIF4A, CENPF, TOP2A, and CDK1 (Fig. 3A). The DEGs in the submodule were further functionally annotated using clueGO and mainly enriched in cell cycle, microtubule motor activity, histone kinase activity, and DNA replication origin binding (Fig. 3B).
To reveal the potential biological significance of TNBC at the single-cell level, we studied TNBC using single-cell sequencing technology. First, the Seurat package was used for sample preprocessing and the six samples were integrated into one dataset. Second, the data set was processed for data dimensionality reduction, cell clustering, and cell annotation. Finally, a single-cell profile of TNBC composed of 12 kinds of cells was obtained (Fig. 4A). After data integration, the cells in the two states were highly overlapped, indicating good sample integration (Fig. 4B). We further compared the proportions of 12 kinds of cells between the TNBC and non-TNBC groups. Compared with the non-TNBC group, the proportion of natural killer (NK) T cells, luminal epithelial cells, B cells, and basal cells was significantly decreased in the TNBC group, while the proportion of T cells, monocytes, and neural progenitor cells was significantly increased in the TNBC group (Fig. 4C). To indicate the rationality of the cell annotation results, the heat map of marker genes of each kind of cells is shown in Fig. 4D.
We classified the cells with more obvious changes in the cell proportion and conducted research on the types of cells with a decreased/increased cell proportion. First, we performed intracellular differential gene analysis on the cell types with a decreased proportion, interactively analyzed the DEGs of NK T cells, Luminal epithelial cells, B cells and Basal cells, and obtained a common DEG - TNFAIP6 (Fig. 5A). We then performed an interaction analysis of the DEGs in the cell types with an increased proportion, including T cell, monocytes, and neural progenitor cells, and 11 common genes (TM4SF1, HSPA6, SOD2, PLCG2, IFI44Lm PLIN2, CXCL1, IL7R, MUCL1, SRGN, and CCR7) were finally identified (Fig. 5B).
Based on the transcriptome results and relevant literature retrieval, RRM2, TPX2, CENPF, and TOP2A genes were identified to be closely related to the prognosis of TNBC patients. Therefore, we further explored the distribution of prognostic genes of TNBC at the single-cell level (Fig. 6A-D). Compared with the normal group, the expression of these four prognostic genes was significantly increased in B cells, endothelial cells and luminal epithelial cells, but significantly decreased in neural progenitor cells.
Due to its poor prognosis and survival, TNBC has brought serious troubles to patients and society. Many researchers have studied the mechanisms and treatment strategies of TNBC from the molecular, genetic, and cellular levels, and have made many achievements. However, research using different methods and from different aspects of TNBC leads to different results. Although the understanding of TNBC has been greatly enriched, the treatment for TNBC has become more complex. This study used bioinformatics algorithms to study TNBC from two aspects, expounded the potential biological significance, and compared and integrated the two research methods in an attempt to more clearly reveal the similarities and differences of TNBC at the two levels.
Firstly, this study used bioinformatics algorithms to expound on the potential biological significance of TNBC at the transcriptome level. The DEGs modules of TNBC were initially screened at the transcriptome level, which included 475 DEGs, followed by functional enrichment analysis and submodule screening. The DEGs were mainly enriched in function and pathway, mainly including microtubule binding, chromosome segregation, response to xenobiotic stimulus, cancer pathway, tyrosine metabolism pathway, Mucin type O-glycan biosynthesis pathway and p53 signaling pathway. The key submodule in the TNBC procession was then identified. Moreover, ASPM is a specific gene of cell cycle regulator, which is related to the survival rate of TNBC [11] and can be used as a new molecular target for TNBC treatment [12]. NCAPG is a component of the condensin complex and acts as a major molecular effector of chromosome condensation and segregation during mitosis, which is also closely associated with prognostic survival [13]. CDCA8 and KIF2C are key genes in TNBC. The expression of CDCA8 gene is significantly correlated with TNBC [14]. KIF4A is a circular RNA that may act as a prognostic factor and progression mediator of breast cancer [15]. The main biological function of CDK1 gene is to regulate the centrosome cycle and control the eukaryotic cell cycle, which can be used as an effective therapeutic target in cancer treatment. The submodule was mainly enriched in cell cycle, microtubule motor activity, histone kinase activity, and DNA replication origin binding.
Afterward, the single-cell sequencing data of TNBC were collected to reveal the TNBC progression at the single-cell level. The proportion of NK T cells, luminal epithelial cells, B cells, and basal cells was significantly decreased in the TNBC group, while the proportion of T cells, monocytes, and neural progenitor cells was significantly increased in the TNBC group compared with the non-TNBC group. We combined the changes in cell subsets with a decreased or an increased proportion and the DEGs identified at the transcriptome level to determine the common DEGs - TNFAIP6 and TM4SF1, respectively. TNFAIP6 encodes inflammatory response factors and regulates anti-inflammatory responses and immune mechanisms, while cancer cells can generate an inflammatory microenvironment to enhance tumor metastasis, indicating that TNFAIP6 is a prognostic cytokine for breast cancer [16]. TM4SF1 gene can be used as a strong mediator of breast cancer metastasis and reactivation, involved in the migration and invasion of TNBC [17]. HSPA6 gene has inhibitory effects on the growth, migration and invasion of TNBC cells [18]. SOD2 gene is associated with the prognosis and survival of TNBC [19]. PLCG2 gene is involved in immune response [20]. IFI44L gene is a type I interferon-stimulated gene involved in congenital immune process. PLIN2 gene is overexpressed in tumors [21]. CXCL1 gene is a chemokine. IL7R gene is associated with prognosis and treatment, involved in the progression of TNBC. MUCL1 gene is an attractive tumor-associated antigen and potential therapeutic target [22]. SRGN gene can interact with TGFβ2 which regulates the metastasis of TNBC through autocrine and paracrine pathways [23]. CCR7 gene promotes the metastasis of TNBC and may act as a target for breast cancer diagnosis and treatment [24].
Finally, we combined transcriptome and single-cell sequencing results to integrate the key prognostic genes of TNBC, including RRM2, TPX2, CENPF, and TOP2A, which are closely related to the prognosis and treatment of TNBC. Ribonucleotide reductase (RR) is a rate-limiting enzyme used to induce 2′-deoxyribonucleoside 5′-diphosphates that is essential for DNA replication and repair. RRM2 is a critical RR subunit and has received significant attention in carcinoma research because its expression is dysregulated in multiple cancer types, including breast cancer [25]. The high expression of RRM2 has a worse prognosis in patients with breast cancer with specific features [26]. Pathway-centric integrative analysis identifies RRM2 as a prognostic marker in breast cancer associated with poor survival and tamoxifen resistance [27]. Expression RRM2 and its correlation with clinicopathological parameters could help in evaluating outcome in breast cancer [28]. In addition, RRM2 is significantly associated with the prognostic survival of TNBC and can be used as a biomarker for the prognosis of TNBC [29, 30]. Targeting protein for Xenopus kinesin-like protein 2 (TPX2) plays a critical role in chromosome segregation machinery during mitosis [31]. TPX2 silencing negatively regulates the PI3K/AKT and activates p53 signaling pathway by which breast cancer cells proliferation were inhibited whereas cellulars apoptosis were accelerated, suggesting that TPX2 may be a potential target for anticancer therapy in breast cancer[32]. TPX2 is a key gene that is closely related to survival time of TNBC patients [33, 34]. Significantly upregulated TPX2 expression is observed in breast cancer tissue and cells, and contributes to promote the proliferation, migration and invasion of breast cancer cells [35]. CENP-F is a cell cycle-regulated protein associated with kinetochores, the site at which chromosome-microtubule interactions are monitored and the source of checkpoint signals [36]. CENPF interacts with microtubules and participates in cell cycle development, which is a reliable indicator of poor survival for breast cancer [37, 38]. According to the CENP-F expression level, some investigators have reported that CENP-F is immunohistochemically correlated with highly proliferative cancer cells and poorer prognosis [39]. Topoisomerase 2 alpha (TOP2A) is a key enzyme in DNA replication and a target of various cytotoxic agents such as anthracyclines. As such it has been widely investigated for potential applications in breast cancer detection and management [40]. The TOP2A expression was an independent prognostic indicator of 5-DFS in TNBC [41]. TOP2A gene mainly helps DNA replication and transcription, controls and changes the topological state of DNA, which is closely related to tumor proliferation and invasion [42]. The expression of these prognostic genes was significantly increased in B cells, endothelial cells, and luminal epithelial cells. Combined with the changes of cells, it can be seen that B cells can generate antibodies and play a crucial role in the process of tumor immunity [43]. Meanwhile, endothelial cells play an active role in the growth and metastasis of solid tumors [44], and luminal epithelial cells are the tumor-originating cells that define the subtype of breast cancer. In neural progenitor cells, the expression of prognostic genes was significantly lowered. Among them, the reduction in the proportion of neural progenitor cells indicates that the infiltration of nerve fibers during tumorigenesis will release some nerve signals to promote tumor growth and metastasis [45, 46]. Therefore, the changes in these prognostic genes not only at the transcriptome levels but also at the single-cell level should be considered in the research.
This study used bioinformatics algorithms to explore the progression of TNBC and uncovered the potential biological significance of TNBC at the gene and single-cell levels. The DEGs, differential gene functions and key submodule of TNBC were identified at the transcriptome level. The submodule was closely related to the prognosis and survival of TNBC patients. The single-cell profile of TNBC was also revealed at the single-cell level, indicating the changes of cell subsets in TNBC. We further found that, in the submodule at the single-cell level, four prognostic genes of TNBC, including RRM2, TPX2, CENPF, and TOP2A, had cellular heterogeneity. Their expression in cells was inconsistent with the expression of TNBC at the overall cellular level. Therefore, prognosis and treatment of TNBC should be considered both at the gene and single-cell levels.
Author contributions BJ, ZZ, and JM contributed to the design and conception of the study. BJ and ZZ carried out the collection and assembly of data. ZZ and JM performed the data analysis and interpretation. BJ wrote the manuscript, ZZ revised the manuscript, then JM gave the final approval of manuscript.
Data availability All data generated or analyzed during this study are included in this article and its Supplementary Information files. The datasets GSE62931 and GSE161529 are from the website: https://ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62931, https://ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529 .
Compliance with ethical standards
Conflict of interest We declare that we have no fnancial and personal relationships with other people or organizations that can inappropriately infuence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as infuencing the position presented in, or the review of, the manuscript entitled.