Identification and characterization of candidate SE-associated genes in breast cancer.
In order to find genes associated to SEs, first we identified non-coding genomic regions carrying SEs from a dataset generated by applying the Rank Ordering of Super-Enhancers (ROSE) algorithm on H3K27Ac enrichment data (12, 16, 28). H3K27Ac data was obtained from chromatin immunoprecipitation experiments followed by high throughput sequencing analyses (ChIP-Seq) performed in 4 patient-derived Xeno transplant (PDX) breast tumors from a previous study (18). We found that a tumor sample classified as hormone-receptor positive (ER + PR+) has 1194 DNA regions identified as SEs associated to 1128 genes, while the remaining three TN tumors contained 640, 527 and 651 SEs associated to 610, 458, and 587 genes, respectively (Fig. 1A and Additional file 1). Our annotation analyses revealed that from the 3012 SEs identified, the majority are located either in chromosome 1 (14%), 8 (9%), 2 (9%), or 6 (9%) (Fig. 1B, panel I). Since chromosome gene number widely varies across chromosomes, SE enrichment based on chromosome location was normalized by chromosome gene number (Fig. 1B, panel II). Interestingly, when the number of genes per chromosome is considered, we found that SEs are also enriched in chromosome 8 (10%). From the total SE-associated genes, we identified 26 SE-associated genes common to all breast tumors analyzed (Fig. 1A). Again, we found that most of these 26 SE-associated genes are located in chromosomes 8 (27%) or 1 (23%) (Fig. 1B, panel III). Moreover, when gene number per chromosome is considered, common breast cancer SE-associated genes remain particularly enriched in chromosome 8 (28%) (Fig. 1B, panel IV). Given that chromosome 8 is known to contain many genes with tumor progression roles (29, 30), our findings suggest SEs may be a frequent mechanism that dysregulates expression of oncogenes in breast tumors.
In order to investigate whether within these 26 SE-associated genes there are groups of genes that share mechanisms of gene regulation or interact in similar pathways, we performed a correlation matrix using hierarchical clustering of gene expression information from breast tumors. For this, we used bulk RNA-sequencing (RNA-Seq) data from breast tumors available for 23 out of the 26 genes on The Cancer Genome Atlas (TCGA) database (5). Our correlation analysis identified four clusters of genes whose expression levels are significantly correlated (Fig. 1C). From the hierarchical clustering we noted (1) a strong positive correlation between expression of CYTOR and MIR4435-2HG – two homolog non-coding RNA genes reported to be correlated with patients’ poor prognosis in glioblastoma and in low-grade glioma (31) – and (2) an anticorrelation between expression of genes in the second and third clusters. Next, we performed a pathway enrichment analysis using Metascape (32), a web tool that combines functional enrichment, interactome (e.g., protein to protein, protein to DNA) analysis and gene annotation leveraging over 40 independent knowledgebases. Through Metascape, we found that genes grouped in the third cluster are target genes for the transcription factor NF1, a tumor suppressor gene that has been reported to be associated with an increase in breast cancer risk when lost (33).
To further investigate the causes of gene expression correlation among the SE-associated genes in breast tumors, we analyzed co-occurrence of genetic alterations of genes within each cluster. For this, we used cBioPortal (23), an open-source platform that facilitates the interactive exploratory analysis and visualization of large-scale cancer genomics datasets. Using cBioPortal on TCGA genomic data, we observed that highly co-expressed genes in cluster 1 (CAPN2, ATP2B4 and PKP1) and cluster 4 (EIF3H, MAL2, NSMCE2, YWHAZ, VXN and AGO2) have a frequency of genetic alterations between 8 to 10% and 6 to 14%, respectively, mainly due to co-occurring DNA amplifications (Supp Fig. 1). CAPN2, ATP2B4 and PKP1 are located on chromosome 1 between the cytogenetic bands 1q32.1 and 1q41, while genes in cluster 4 (except for VXN) are located on chromosome 8 in a region spanning cytogenetic bands 8q22.3 to 8q24.3. Importantly, these regions on chromosomes 1 and 8 are frequently amplified in cancer disease (5, 34, 35). In line with our findings, recent reports show that SEs are frequently associated with genes located within large tandem duplications in breast cancers (36). Overall, our gene correlation analyses suggest that genes found in clusters 1 and 4 have SE-mediated enhanced transcription that help increase transcript levels of genes that are highly amplified and potentially involved in breast cancer development.
High gene expression levels of NSMCE2 and MAL2 SE-associated genes in breast tumors are linked to patients’ poor prognosis in breast cancer.
Previously it has been reported that SEs can enhance the expression of tumor-associated genes. For instance, in T-cell acute lymphoblastic leukemia, somatic mutations create a SE upstream of the TAL1 oncogene (17). Recently, by comparing breast tumor samples with healthy breast tissue samples, we found that the immunosuppressive signal CD47 is transcriptionally upregulated by an SE (18). Thus, we decided to investigate whether the gene expression levels of our 26 SE-associated candidate genes are higher in breast tumors when compared to healthy samples by using the TCGA database and the Genotype-Tissue Expression (GTEX) study (24). TCGA compiles RNA-Seq data of tumors or matched adjacent tissue (considered “healthy” tissue due to absence of cancerous characteristics) obtained from cancer patients, for whom clinical information is also available. Such information includes histological grade and survival data. GTEX provides open access to bulk RNA-Seq gene expression data from non-diseased tissues obtained from donors. Mining these datasets, we found RNA-Seq data available for 21 out of the 26 SE-associated genes of our interest (RNA-Seq data was not available for a pseudogene nor for four non-coding RNA genes). From these 21 genes, we found that 9 showed significantly higher expression in breast primary tumors when compared to matched adjacent normal tissue or to breast normal tissue samples from the GTEX study (Fig. 2A and Supp Table 1). These data suggest that SEs could be driving the overexpression of these 9 genes in breast tumors.
Next, to investigate whether high levels of gene expression in breast tumors for any of the 26 SE-associated genes has clinical relevance, we performed a univariate Kaplan-Meier analysis using RNA-Seq data linked to patients’ survival outcomes available for 1097 breast primary tumors through TCGA. We found that high transcript levels of 3 genes – NSMCE2, MAL2 and EIF3H – significantly correlate with worse overall survival in breast cancer patients (Log-rank test, p-value < 0.05) (Fig. 2B). To validate these results on an independent dataset, we performed the same univariate Kaplan-Meier analysis using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which contains microarray gene expression data from 2000 breast primary tumors along with long-term clinical follow-up (22). On the METABRIC dataset, high expression of MAL2 significantly associates with worse overall survival of breast cancer patients, while high NSMCE2 follows the same trend without reaching statistical significance (Fig. 2B). However, on the METABRIC patient cohort, high expression levels of both genes are significantly associated with reduced probability of relapse-free survival – the length of time a cancer patient survives without any signs or symptoms of cancer after the end of primary treatment – thus, confirming that NSMCE2 and MAL2 expression are linked to poor prognosis in breast cancer. On the other hand, we did not observe a correlation between high EIF3H expression and breast cancer patients’ negative prognosis on the METABRIC dataset, even though both TCGA and METABRIC datasets do not differ on the probability of overall survival for their respective patients’ cohorts (Supp Fig. 2A) nor on other demographic characteristics (37). Since we found that high levels of NSMCE2 and MAL2 gene expression highly correlate with patients’ negative prognosis, we focused our subsequent analyses on these two genes.
To further confirm that the association of high levels of gene expression with patients’ negative prognosis is independent of other risk factors (covariates), we constructed a multivariate Cox model of survival using TCGA data, which includes age, stage and RNA expression subtype as covariates. We observed that, after adjusting for these clinical covariates, higher NSMCE2 and MAL2 gene expression levels remained significantly associated with patients’ poor prognosis (Fig. 2C). NSMCE2 encodes a member of a family of E3 small ubiquitin-related modifier (SUMO) ligases, part of the structural maintenance of chromosomes 5/6 (SMC 5/6) complex. This protein complex plays key roles in DNA double-strand break repair by homologous recombination and in chromosome segregation during cell division (38–40). MAL2 is a multispan transmembrane protein required for transcytosis, an intracellular transport pathway used to deliver membrane-bound proteins and exogenous cargos from the basolateral to the apical surface (41).
Next, we investigated in other cancer types (apart from breast cancer) whether expression of NSMCE2 and MAL2 is higher in tumor samples when compared to normal tissue and whether high levels of any of these two genes associate with poor survival. To approach this, we compared RNA levels from an array of cancer samples from TCGA Pan-Cancer with RNA levels from corresponding normal tissue samples from GTEX and TCGA studies as we did for breast cancer. In line with what we observed in breast cancer, NSMCE2 or MAL2 are significantly higher in the cancer samples compared to the normal corresponding samples in most of the cancer types studied (Supp Fig. 2B). When we asked whether their overexpression is also correlated with patients’ poor survival in other cancer types from the TCGA Pan-Cancer dataset, we observed that high NSMCE2 RNA expression levels associate with worse overall survival in 6 other cancer types (apart from breast cancer). For MAL2, we found that, while high gene expression is linked to shorter survival in 4 cancer types (apart from breast cancer), it is linked to better survival in 2 cancer types. Thus, although NSMCE2 and MAL2 are highly expressed in Pan-Cancer tumors, their association to patients’ negative survival outcomes is specific to certain cancer types, including breast cancer (Supp Table 2).
NSMCE2 and MAL2 are regulated by SEs in ER + PR+, HER2 + and TN breast cancer cells.
Since we found that SEs are associated to NSMCE2 and MAL2 expression in breast tumors, we hypothesized that in breast cancer cells, expression of these genes should be reduced when disrupting the SE-enhancing function by blocking the binding of BRD4 to SEs with the inhibitors JQ1 or IBET151 (42, 43). To test this, we used 5 breast cancer cell lines representing different breast cancer subtypes: MCF7 (ER + PR+), HCC1954 (HER2+), BT549, MDAMB231 and Hs578T (TN). We also chose these cell lines because NSMCE2 and MAL2 are amplified in all except in MDAMB231 (visualized using cBioPortal on data from the Cell Line Encyclopedia (25), Supp Fig. 3A), which recapitulates our observation in breast cancer patients’ datasets where these two genes were frequently amplified.
Treatment of ER + PR+, HER2+, and TN cancer cell lines with non-toxic concentration of either JQ1 or IBET151 (Supp Fig. 3B) had the following effects on gene expression levels (Fig. 3): We observed a reduction on NSMCE2 RNA in all cell lines treated with BET inhibitors, confirming regulation of NSMCE2 by SEs. MAL2 expression was only reduced by BET inhibition in cell lines representing HER2 + and TN breast cancer subtypes, but it was curiously increased after 24 hours of treatment in the TN BT549, perhaps as an indirect effect of BET inhibition on downregulating MAL2 repressors. In summary, pharmacological blockade of SEs confirmed NSMCE2 and MAL2 are associated with SEs in breast cancer cells.
High gene expression levels of NSMCE2 correlate with breast cancer patients’ poor response to chemotherapy.
Given that our data shows that high NSMCE2 and MAL2 expression are associated with breast cancer patients’ poor survival outcomes, we next investigated if high levels of these markers could also be negatively associated with response to standard cancer therapies. To investigate this, we used the online platform ROC Plotter (44), which integrates multiple transcriptome-level gene expression datasets available through GEO into a single database containing information for 3104 breast cancer patients linked to corresponding response data for a range of treatments, including endocrine therapies, anti-HER2 therapies and chemotherapeutic drugs (26). The transcriptome data is obtained from patients’ biopsies prior to treatment and patients are grouped into responders and non-responders for a given treatment based on clinical characteristics. Responder and non-responder patients are compared via two diverse statistical approaches: The Mann-Whitney test (non-parametric T-test) and the Receiver Operating Characteristic (ROC) test. The ROC test measures how much ‘the gene expression level’ model is capable of distinguishing between the two classes: responders and non-responders. ROC plots are useful to compute the Area Under the Curve value (AUC), which shows the predictive power of the gene. The higher the AUC value, the better the model is at distinguishing between patients who are responders versus non-responders. For cancer biomarkers with potential clinical utility, the AUC value should be higher than 0.6, while AUC values higher than 0.7 are obtained by top quality cancer biomarkers (26).
We focused our analysis on breast cancer patients who received chemotherapy due to the larger sample size in the dataset compared to that for patients who received endocrine or anti-HER2 therapies (n = 507 versus n = 160 and n = 151, respectively). Using ROC Plotter, we found that high levels of NSMCE2 gene expression correlate with lower pathological complete response to chemotherapy of breast cancer patients (Fig. 4A, Mann-Whitney test p-value = 0.000032 and AUC = 0.617). Moreover, we found that high levels of NSMCE2 gene expression strongly correlate with poor response to chemotherapy for patients diagnosed with grade III breast cancer (Fig. 4B, Mann-Whitney test p-value = 0.00099 and AUC = 0.655). We also performed the same analyses for the 21 SE-associated genes we found in breast cancer and for which gene expression data is available on ROC plotter. We did not see significant correlation (AUC < 0.6, Mann-Whitney test p-value < 0.05) between high levels of gene expression and breast cancer patients’ poor response to chemotherapy for any of the other SE-associated genes analyzed (including MAL2) when analyzing the entire database or when focusing on Grade III breast cancer (Supp Table 3). This indicates that the observed correlation between high NSMCE2 gene expression levels and poor response to chemotherapy is specific.
Since the drug regimen for breast cancer patients is highly determined by the tumor molecular subtype, we next analyzed the correlation between high NSMCE2 gene expression levels with response to chemotherapy for each ER + PR+, HER2 + or TN tumor subtypes. Here, we observed a stronger correlation between high levels of NSMCE2 and patients’ poor response to chemotherapy when the tumors were classified as grade III TN or grade III HER2 + tumors (Fig. 4C and D, AUC = 0.723, Mann-Whitney test p-value = 0.0027 and AUC = 0.735, Mann-Whitney test p-value = 0.0091, respectively). Thus, our results indicate that breast cancer patients with high NSMCE2 RNA expression in aggressive breast tumors of either TN or HER2 + molecular subtype respond poorly to chemotherapeutic drugs. Since the AUC values obtained for NSMCE2 are usually values observed for top quality cancer biomarkers (26), our findings suggest that NSMCE2 expression could potentially be used to pinpoint a group of patients (especially those diagnosed with grade III TN or HER2 + breast cancer) that may not respond to chemotherapy.
Lastly, using the ROC Plotter tool, we found that NSMCE2 high expression levels do not correlate with patients’ response to therapy for ovarian cancer, colorectal cancer, and glioblastoma (45, 46) (data not shown), thus, confirming that high levels of NSMCE2 gene expression are specifically associated with breast cancer patients’ poor response to chemotherapy.
Lowering NSMCE2 transcript levels sensitizes breast cancer cells to chemotherapeutic agents.
So far, our work shows that (1) NSMCE2 and MAL2 genes are regulated by SEs at the transcript level in breast cancers; (2) the expression of these genes can be reduced by BET inhibition; and (3) high NSMCE2 expression correlates to poor response to chemotherapy in grade III TN or HER2 + breast cancer patients. The latter result indicates that our findings may have important clinical implications. Most chemotherapeutic agents work by activating the program of apoptosis in cancer cells through DNA damage induction or through cell cycle inhibition. Since NSMCE2 is known to be required for DNA damage repair at different steps of the process and for chromosomal segregation during mitosis (38–40, 47–49), NSMCE2 may inhibit chemotherapy-induced apoptosis, thus contributing to the therapy resistance we observed in breast cancer patients showing high NSMCE2 gene expression. To test whether reduction of NSMCE2 transcript levels by SE disruption can sensitize breast cancer cells to chemotherapy-induced apoptosis, we treated breast cancer cells with BET inhibitor JQ1 and the standard chemotherapeutical drugs doxorubicin and paclitaxel – two antitumor agents widely used for the treatment of breast cancer. Doxorubicin is part of the anthracycline group of chemotherapeutic agents that exert antitumor action by both DNA intercalation and inhibition of the enzyme topoisomerase II, which results in DNA damage during DNA replication and the eventual induction of apoptosis. Paclitaxel, on the other hand, is a class of taxanes, an antimitotic drug that affects the stabilization of microtubules, thus, disrupting the cell’s ability to divide (50).
Even though we observed that NSMCE2 expression is reduced by SE blockade in all the breast cancer cell lines tested, we decided to use TN breast cancer cells, MDAMB231, BT549 and Hs578T, and the HER2 + cells HCC1954 for our experiments, since we found that high NSMCE2 gene expression levels significantly associate with poor response to chemotherapy for patients diagnosed with TN or HER2 + breast cancer. Initially, cells were pre-treated with the BET inhibitor JQ1 for 3 hours to block SE transcriptional function (this is a window of time we have previously found successfully inhibits SE function). After the 3 hours, cells were treated with JQ1 in combination with several concentrations of either of the chemotherapeutic drugs. We assessed cell viability by MTT assays. For all cell lines tested, we observed that the reduced cell viability resulting from the combined treatment is highly dependent on the chemotherapeutic agent used and on its concentration (Supp Fig. 4A). For instance, a combination of JQ1 and doxorubicin has an additive effect on cell survival reduction when using doxorubicin at 1 µM. Such additive effect is gradually lost at higher concentrations of doxorubicin, where we see strong decline in cell viability regardless of SE inhibition by JQ1, probably due to high toxicity and massive DNA damage caused by doxorubicin. On the other hand, for any of the breast cancer cell lines tested, paclitaxel effectiveness at reducing the number of viable cells did not improve when given in combination with JQ1; in fact, an antagonistic effect was observed upon combination of these drugs. Next, to replicate the scenario we observe when analyzing the correlation of NSMCE2 gene expression levels with response to chemotherapy in breast cancer patients (where we see that low NSMCE2 expression levels correlate with good response to chemotherapy), we pre-treated cells with JQ1 for 24 hours in order to allow enough time to reduce expression levels of SE-associated genes (in particular for NSMCE2, Fig. 3) before doxorubicin treatment. Here, we also observed an additive effect on cell viability reduction in response to doxorubicin when NSMCE2 levels are lowered by JQ1 treatment (Supp Fig. 4B). Since treating cells with JQ1 prior to doxorubicin treatment increases doxorubicin effectiveness at reducing cell viability, we wondered whether this reduction in viable cell numbers is due to increased apoptosis. To assess this, we measured apoptosis by Annexin V staining using flow cytometry. Breast cancer cells were pretreated with JQ1 for 24 hours prior to the addition of JQ1 and/or chemotherapeutic drugs. As shown in Fig. 5A, doxorubicin highly increases apoptosis of BT549 cells and of Hs578T to a lesser extent. Moreover, the combination of doxorubicin with JQ1 treatment strongly synergizes to induce apoptosis in both breast cancer cell lines. On the contrary, neither paclitaxel alone nor in combination with JQ1 increase apoptosis of the treated breast cancer cell lines (Fig. 5B).
In summary, our results demonstrate that JQ1 synergizes with doxorubicin to induce apoptosis suggesting that reducing NSMCE2 expression by SE inhibition may contribute to the increase in apoptosis of the breast cancer cell lines tested. To confirm that NSMCE2 downregulation by JQ1 SE blockade is specifically contributing to the stronger effect observed on cell viability reduction, we next generated BT549 or Hs578T cells stably expressing a short hairpin RNA against NSMCE2 (shNSMCE2) to knockdown gene expression. After checking NSMCE2 silencing was efficient (Supp Fig. 4C), we performed MTT assays to measure cell viability upon chemotherapeutic drug treatments. Here we found that silencing NSMCE2 in both cell lines significantly contributed to a greater reduction in cell viability after 48 hours of incubation with doxorubicin. Knocking down NSMCE2 had a significant but smaller contribution to cell viability reduction after 72 hours of paclitaxel treatment (Fig. 5C). Taken together, these results show that reducing NSMCE2 levels sensitizes breast cancer cells to chemotherapy, thus implying that NSMCE2 high levels contribute resistance to chemotherapeutic agents.
Since NSMCE2 is required for DNA damage repair, we hypothesized that more NSMCE2 is produced at the transcriptional level upon chemotherapy-induced DNA damage. Indeed, we observed that NSMCE2 RNA levels significantly increase 24 hours after doxorubicin treatment in BT549 and Hs578T, but not upon paclitaxel treatment (Fig. 5D). Given that NSMCE2 is associated with SEs in both cell lines, we next investigated if doxorubicin-induced NSMCE2 upregulation is mediated by SEs. Treatment of breast cancer cells with SE inhibitor JQ1 abrogates the effect of doxorubicin in NSMCE2 upregulation, indicating that NSMCE2 upregulation by doxorubicin is mediated by SEs (Fig. 5E). Collectively, our results show that NSMCE2 expression is transcriptionally upregulated upon doxorubicin treatment through SEs, and this could be a factor contributing to resistance to chemotherapy during breast cancer treatment. In sum, our experiments show that the correlation between high NSMCE2 expression levels in breast tumors and cancer patients’ poor response to chemotherapy is due, in part, to increased resistance to chemotherapeutic drugs driven by NSMCE2.