An Immunohistochemical Panel of Three SUMO Genes Predicts Outcomes of Patients With Triple-negative Breast Cancer

Yuxiang Lin Fujian Medical University Union Hospital Qingshui Wang Fujian Normal University Han Xiao Fujian Medical University Union Hospital Zhiwei Chen Fujian Provincial Maternity and Children’s Hospital: Fujian Provincial Maternity and Children's Hospital Meichen Jiang Fujian Medical University Union Hospital Jie Zhang Fujian Medical University Union Hospital Rongrong Guo Fujian Medical University Union Hospital Shaohong Kang Fujian Medical University Union Hospital Yao Lin Fujian Normal University Chuangui Song (  Songcg1971@hotmail.com ) Department of Breast Surgery, Fujian Medical University Union Hospital No.29, Xin Quan Road, Gulou District, Fuzhou, Fujian Province, 350001, China


Introduction
Triple negative breast cancer (TNBC) is de ned as a speci c type of breast cancer by the absence of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2) expression. Although TNBC only constitutes to approximately 10-15% of all breast cancers [1,2] , it has a highly aggressive clinicopathological signatures and unfavorable outcomes [3] . Patients with TNBC generally develop distant metastasis within the rst three years after initial treatment, with the mortality rate reaching about 40% in the rst ve years [4] . The lack of ER/PR and HER2 expression renders TNBC inaccessible to endocrine or anti-HER2 target therapies. Therefore, the most common treatment strategy for TNBC up to now is a combination of surgery, chemotherapy and radiotherapy. To date, anthracycline and taxane-based adjuvant chemotherapy is the standard regimen for TNBC patients after resection [5,6] .
Triple negative breast cancer has a high sensitivity to chemotherapy, for patients with TNBC have an enhanced neoadjuvant response rate compared with other subtypes of breast cancer [7,8] . However, some of the patients still develop a rapid onset of recurrence and poor prognosis, which is commonly referred as "triple-negative paradox" [9] . Thus, identi cation of new predictive biomarkers for chemotherapy response and promising therapeutic targets might be bene cial in the treatment of TNBC.
As an important post-translational protein modi cation, SUMOylation has attracted more and more attention. Four subtypes of SUMO have been identi ed including SUMO1, SUMO2, SUMO3 and SUMO4 [10] . SUMO2 and SUMO3 are 95% identical to each other and only 50% identical to SUMO1 [11] . SUMO1/2/3 are ubiquitously expressed in human tissues, however SUMO4 is only expressed in spleen lymph nodes and kidney [12] . SUMOylation is catalyzed by a three-step enzymatic reaction, including activation, coupling and ligation [13] . SUMO E1-activating enzyme is a protein that contains two subunits, which are SUMO-activating enzyme E1 (SAE1) and SUMO-activating enzyme E2 (SAE2). UBC9 is the only known SUMO-conjugating E2 enzyme required for SUMOylation, and its deletion abolishes SUMO conjugation [14] . SUMO E3 ligases are roughly divided into three categories including the protein inhibitor of activated STAT-1 (PIAS) protein family, the nucleoporin Ran binding protein 2 and the human polycomb protein Pc2. Although SUMO is similar to ubiquitin, SUMOylation does not directly lead to protein degradation, but leads to the regulation of cell functions, such as protein-protein interactions, maintaining genome integrity, subcellular localization, transcription regulation, DNA repair and cell cycle [11,15] . The dysregulation of SUMOylation could result in tumors progression and is considered as new novel biomarkers and possible therapeutic targets for cancers [16] .
In this study, we sought to explore the expression and prognostic utility of SUMOs and aimed to build a prognosis prediction model based on SUMO1/2/3 protein expression. Meanwhile, potential mechanisms that regulate SUMOylation pathway in TNBC were also explored.

Patients and specimens
A total of 212 TNBC patients were retrospectively reviewed from Fujian Medical University Union Hospital between June 2013 and August 2017. All patients were histologically con rmed TNBC, ranging from 27 to 77 years of age (median: 51years) with 4-77 months of follow-up information. The clinicopathological data included age, tumor size, nodal status, tumor grade, lymphovascular invasion, type of surgery, chemotherapy and radiotherapy information were obtained from medical records. Disease-free survival (DFS) was de ned as the time of diagnosis to the date of clinical relapse (with histopathology con rmation or radiological evidence of tumor recurrence). Overall survival (OS) was de ned as the time of diagnosis until death from any cause. The follow-up deadline was August 30, 2020.
The standard requirements for patients included in the study were: (a) no history of other malignancy tumor, not bilateral breast cancer, not de novo IV stage; (b) received total mastectomy or breast conserving surgery without neoadjuvant chemotherapy or radiotherapy; (c) primary tumor size was pT1c-pT2; (d) demographic, clinicopathological and follow-up information were complete. Patients who received at least 3 cycles of anthracycline-based and 3 cycles of taxanes-based regimens (4EC-4T, 3FEC-3T, 6TEC) were considered as having chemotherapy, while those with insu cient cycles of chemotherapy were excluded from this study.
This study was approved by the Research Ethics Committee of Fujian Medical University Union Hospital. Informed consent was obtained from each participant.The patient's information was shown in Supplementary Table 1.
Immunohistochemistry (IHC) staining analysis IHC staining analysis was performed to measure the SUMO1/2/3 protein expression in all TNBC tissues and adjacent normal breast tissues according to standard immunoperoxidase staining procedure. Slides were incubated with anti-SUMO1 (ab32058, abcam, diluted 1:150), anti-SUMO2 (ab233222, abcam, diluted 1:300) and anti-SUMO3 (ab203570, abcam, diluted 1:300) according to the manufacturer's instructions. To ensure quality, a negative control was prepared by the substitution of primary antibody with phosphate-buffered saline (PBS, 5% BSA). The IHC staining scores of SUMO1/2/3 were assessed by two independent pathologists. The percentage of stained positive cells was scored from 1 to 4: 1, 0-25%; 2, 26-50%; 3, 51-75% and 4, 75-100%. The staining intensity score was calculated from 0 to 3: 0, no staining; 1, weak staining; 2, moderate staining; and 3, strong staining. The nal scores were based on the sum of these two scores. A score more that 5 was de ned as high expression level and a score less than and include 5 wan de ned as low expression.
Gene set variation analysis (GSVA) and LASSO analysis GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. In the research, the pathway activity of protein SUMOylation and 50 oncogene pathways in TNBC were analyzed. The GSVA analysis was performed using R package 'GSVA'. We used the LASSO Cox regression model to constructed a 3 SUMOs-based classi er (SB classi er) for predicting the DFS of TNBC patients. The LASSO analysis was performed using R package 'glmnet'.

Statistical analysis
In the research, t test was used to compared continuous variables in two groups by using GraphPad Prism 5.0 software. Correlations between SUMO1/2/3 expression and clinicopathological characteristics were conducted by the chi-squared test. DFS and OS were calculated by the Kaplan-Meier method by using SPSS, and differences between groups were tested by log-rank test. We performed cox regression analysis to undertake the univariate and multivariate survival analysis. All p values < 0.05 were considered statistically signi cant.

Protein SUMOlyation pathway was activated in TNBC
In order to explore the pathway activity of protein SUMOlyation in TNBC, TCGA database and four related GEO databases (GSE31448, GSE45827, GSE53752 and GSE65216) were employed. TCGA database contained 166 cases of TNBC tissues and 113 adjacent normal breast tissues. For the remaining four GEO databases, 98 cases of TNBC tissues and 31 adjacent normal breast tissues were from GSE31448 database, 41 cases of TNBC tissues and 11 adjacent normal breast tissues were from GSE45827 database, GSE53752 database consisted of 51 cases of TNBC tissues and 25 adjacent normal breast tissues, while 55 cases of TNBC tissues and 10 adjacent normal breast tissues were retrieved in GSE65216 database. GSVA was performed to conduct GO analysis and assign protein SUMOlyation pathway activity estimates to individual samples from TCGA and GEO databases. It is indicated that protein SUMOlyation pathway exhibited a higher enrichment score in the TNBC tissues compared with adjacent normal breast tissues (GSE45827, GSE65216, p<0.001; GSE31448, GSE53752, p<0.01; TCGA, p<0.05) ( Figure 1A-E). Moreover, meta-analysis containing 603 tissues from 5 TNBC databases mentioned above further demonstrated that protein SUMOlyation pathway was activated in TNBC (p<0.001; Figure 1F).
In order to further validate TCGA and GEO expression data, we performed an IHC study using local patient samples to determine the protein expression of SUMO1, SUMO2 and SUMO3 in TNBC. The IHC analysis showed that SUMO1, SUMO2 and SUMO3 were signi cantly upregulated in 212 TNBC tissues compared with paired adjacent normal breast tissues ( Figure 3). The clinicopathological characteristics of these patients in the study cohort are summarized in Table 1. Most of the patients (93.4%) were treated with adjuvant chemotherapy. We estimated the correlations of SUMO1/2/3 expression with relevant clinicopathological factors. No associations between SUMO1 expression and clinicopathological features were observed. SUMO2 expression were indicated to be signi cantly associated with tumor size (p=0.032), while SUMO3 expression correlated signi cantly with the lymph node metastasis (p=0.033) and lymphovascular invasion (p=0.028) .
Survival analysis was conducted to explore the relationship between SUMO1/2/3 proteins expression, clinicopathological factors along with survival of these 212 TNBC patients. Kaplan-Meier survival analysis of OS and DFS of TNBC patients was generated according to SUMO1/2/3 protein expression ( Figure 4A-F), which implied that TNBC patients with higher expression of SUMO1/2/3 suffer lower possibility of OS ( Figure 4A-C) and DFS ( Figure 4D-F). Univariate and multivariate Cox regression analyses were employed to clarify the independent factors affecting OS and DFS of TNBC patients.
In order to see which one is an independent factor impacting patient outcome, univariate analysis and multivariate Cox analyses were performed. Lymph node metastasis, radiotherapy, SUMO1, SUMO2 and SUMO3 protein expression were nally determined to be independent prognostic factors for OS of TNBC patients by multivariate Cox analyses ( Figure 4G). As for DFS, tumor size, lymph node metastasis, tumor grade, lymphovascular invasion and SUMO3 protein expression were determined to be independent prognostic factors in TNBC patients ( Figure 4G).

Construction of a prognostic scoring model based on SUMO1/2/3 proteins
In order to construct a risk score model for predicting DFS of TNBC, we used LASSO Cox regression model to build a SUMO proteins-based prognostic classi er, which included SUMO1, SUMO2 and SUMO3 and named it SB classi er ( Figure 5A&B). Using the LASSO Cox regression models, we calculated a risk score for each patient based on individualized values of IHC scores for the three proteins: Risk score = (SUMO1×0.3746) + (SUMO2×0.3290) + (SUMO3×0.8217). The SB classi er showed signi cantly higher prognostic accuracy than single SUMO alone ( Figure 5C). When we assessed the distribution of risk scores and recurrence status, TNBC patients with higher risk scores generally had higher recurrence rate than those with lower risk scores ( Figure 5D). TNBC patients were then assigned into a SB classi er highlevel group (75 patients) and low-level group (137 patients) by the cut-off value (5.87). The Kaplan-Meier curve showed the patients in the SB classi er high-level group presented a signi cantly worse DFS (HR 2.8, 95% CI 1.73-4.53, p < 0.01) ( Figure 5E). By predicting DFS of TNBC patients at 1, 3, and 5 years, the areas under the curve (AUC) of the ROC curves obtained from the risk-based prediction model were 0.84, 0.7 and 0.7 ( Figure 5F). Total cohort was randomly divided into two equal training and validation sets by using X-tile plots. Based on cut-points of risk score, TNBC patients were divided into SB classi er lowlevel and SB classi er high-level in the training cohort, patients with poor DFS displayed higher risk score than those with good prognosis ( Figure 5G). Similar prognostic results were found in validation cohort and total cohort ( Figure 5H&I).
Results from survival analysis by our SB classi er showed that patients in the classi er-de ned low score group had a favourable response to chemotherapy (HR=4.04, 2.14-7.63; p<0.0001) ( Figure 6A), which indicated that our SB classi er could successfully identify patients with TNBC who might bene t from chemotherapy. To provide the clinician with a quantitative method to predict probability of cancer recurrence for TNBC patients with chemotherapy, we constructed a nomogram that integrated both the SB classi er and clinicopathological factors ( gure 6B).
Oncogenic pathways that positive correlate to protein SUMOylation were activated in the tumors of TNBC patients By using GSVA method and the Molecular Signatures Database hallmark gene set collection, we analyzed the mRNA expression data of TNBC in TCGA, GSE53752, GSE65216 and GSE31448 database.
The correlation between protein SUMOylation and 50 hallmark gene set in TNBC was analyzed by pearson correlation analysis. In the tumor samples of the TNBC cohort, the intersection of TCGA, GSE53752, GSE65216 and GSE31448 datasets revealed that there was a positive correlation existed between protein SUMOylation and E2F-targets, MYC-targets-V1, Mtorc1-signaling, Mitotic-spindle, G2Mcheckpoint and unfolded protein response (r>0.3, Figure 7A-E). In addition, there was a positive correlation existed between protein SUMOylation and Mitotic-spindle, G2M-checkpoint and Unfolded protein response in the intersection of TCGA, GSE53752, GSE65216 and GSE31448 normal tissues datasets (r>0.3, Figure 7F-J). The intersection of these two arrays was shown in Figure 7K, and 3 overlapping pathway including Mitotic-spindle, G2M-checkpoint and unfolded protein response were found. Next, we analyzed the pathway activity of E2F-targets, MYC-targets-V1, Mtorc1-signaling, Mitotic-spindle, G2Mcheckpoint and unfolded protein response in the TCGA and GEO databases. These six pathways were upregulated in TNBC tissues compared with adjacent normal breast tissues in TCGA, GSE53752, GSE65216 and GSE31448 ( Figure 8A-D). Finally, meta-analysis revealed that the pathway activity of E2F-targets, MYC-targets-V1, Mtorc1-signaling, Mitotic-spindle, G2M-checkpoint and unfolded protein response were increased in TNBC ( Figure 8E-J).

Discussion
TNBC was characterized by high invasiveness and has a worse prognosis than other subtypes of breast cancer. For the lack of ER, PR and HER2 expression, there is no speci c systemic treatment such as endocrine therapy or anti-HER2 targeted therapy. Currently, the basis of TNBC treatment is surgery, chemotherapy and radiotherapy. Anthracycline and taxane-based chemotherapy regimen is the standard of care for prevention of TNBC recurrence and survival improvement [5] . EBCTCG analysis demonstrated a moderate reduction in 5-year and 10-year risk of recurrence and death with a dose intensity adjuvant chemotherapy, especially for TNBC patients [6] . However, some of the patients still develop a rapid onset of recurrence and poor prognosis after conventional chemotherapy. Thus, identi cation of novel biomarkers which could be used to predict chemotherapy response and promising therapeutic targets might be bene cial in the treatment of TNBC.
Previous studies indicated that SUMOylation is closely related to carcinogenesis, tumor proliferation and metastasis, and signi cantly up-regulated in most cancers [20][21][22][23] . Therefore, SUMOylation may become a potential target for cancer treatment. However, the expression and underlying mechanisms of SUMOylation remain poorly understood in TNBC. In the present research, we advanced the knowledge of the role of SUMOylation in TNBC. We demonstrated that the pathway activity of protein SUMOlyation and the expression of SUMO1/2/3 mRNA were increased in TNBC tissues compared with adjacent normal breast tissues by TCGA and GEO database. Meanwhile, our immunohistochemistry staining results suggested that the expression of SUMO1/2/3 proteins were signi cantly increased in tumor tissues of 211 TNBC patients. According to survival analysis, SUMO1/2/3 protein expression levels were all associated with disease free survival and overall survival of TNBC patients. In addition, we developed a novel prognostic tool based on IHC scores of SUMO1/2/3 to improve the prediction of disease recurrence for TNBC patients. Further use of SB classi er might allow for better identi cation of TNBC patients who are most likely to bene t from chemotherapy. Therefore, the classi er for TNBC patients is both a prognostic and predictive method. Patients with a SB classi er de ned low score might have both a lower likelihood of recurrence and a clear bene t from chemotherapy.
Meanwhile, we analyzed the pathways associated with SUMOylation in TNBC. Our data showed that E2Ftargets, MYC-targets-V1, Mtorc1-signaling, Mitotic-spindle, G2M-checkpoint and unfolded protein response were positively correlated with SUMOylation in tumor tissues of TNBC patients. However, only Mitotic-spindle, G2M-checkpoint and unfolded protein response were found positively correlated with SUMOylation in normal tissues of TNBC patients. MYC is an important transcription factor. MYC mutations lead to uncontrolled expression of many genes, some of which are involved in cell proliferation and relate to the development of cancer. MYC protein activates the transcription of SUMO activating enzyme subunit 1 (SAE1) by directly binding to the classic E-Box sequence located near the SAE1 transcription start site [24] . Inhibition of SUMOylation was reported to disable MYC-induced cell proliferation and triggers G2/M cell cycle arrest in mouse and human MYC-driven lymphomas [25] . In addition, there is accumulating evidence that SUMO directly and indirectly regulated protein localization within the mitotic spindle. AMP-activated protein kinase (AMPK) inhibits protein synthesis through suppression of mammalian target of rapamycin complex 1 (mTORC1). SUMOylation of AMPKα1 attenuates AMPK activation, and then prompts restoration of mTORC1 signaling [26] . Retinoblastoma protein (Rb) is a prototypical tumor suppressor Hypo-phosphorylated Rb is related to G0/G1 arrest by inhibiting the activity of E2F transcription factors, while hyper-phosphorylation Rb releases E2F and makes the cell cycle from G0/G1 into S phase. SUMOylation of Rb causes Rb hyperphosphorylation and E2F-1 release [27] . X-boxbindingprotein1 (XBP1) is a key transcription factor that regulates the endoplasmic reticulum (ER) stress response, which is a cytoprotective mechanism that deals with the accumulation of unfolded protein in the ER. When endoplasmic reticulum stress occurs, unspliced XBP1 mRNA is converted into mature mRNA, and the transcription factor pXBP1 is translated and the transcription of endoplasmic reticulum related genes is activated to process unfolded proteins [28] . SUMO-conjugase, UBC9, speci cally bound to the leucine zipper motif of pXBP1 and increased the stability of pXBP1. Our analysis provides insights on possible mechanism that MYC activation causes the activation of SUMOylation, which eventually results in the activation of E2F-targets, Mtorc1-signaling, Mitotic-spindle, G2M-checkpoint and unfolded protein response.
The major strengths of the present study are that it had a large enough sample size of TNBC patients to perform survival analysis based on SUMO1/2/3 proteins, and developed a prognostic nomogram. In addition, some small molecule drugs that inhibit SUMOylation have been considered for the treatment of cancer. SUMO E1 inhibitor ML-792 is currently being tested in a phase 1 clinical trial for patients with metastatic solid tumors and lymphomas. In the current era of precision medicine, using the prognostic biomarker to select eligible patients and administration of speci c treatments is a promising strategy. Our ndings suggested that the inhibition of SUMOylation could be a promising therapeutic strategy for the treatment of TNBC patients.
Undoubtedly, there are several limitations in this study. Firstly, as all TNBC patients are Chinese and from one single center. The ndings of the present study may not be generalizable to all populations. Secondly, more intensive researches are still warranted to illustrate the underlying mechanisms in regulation of SUMOylation for TNBC.

Conclusion
In summary, our nding showed that pathway activity of SUMOylation, SUMO1/2/3 mRNA and protein levels were up-regulated in TNBC patients based on TCGA, GEO and 212 TNBC specimens. Three SUMOsbased prognostic model can effectively classify TNBC patients into groups at low and high risk of disease recurrence. Moreover, our study showed that the SB classi er might be a useful predictive tool for TNBC patient treated with chemotherapy ( Figure 9). Thus, the SB classi er potentially offers clinical value in directing personalized therapeutic regimen selection for TNBC patients. Meanwhile, our analysis provides insights on possible mechanisms that MYC activation leads to the activation of SUMOylation, which eventually causes the activation of E2F-targets, Mtorc1-signaling, Mitotic-spindle, G2M-checkpoint and unfolded protein response.  Tables   Due to technical limitations, table 1 is only available