The early-stage triple-negative breast cancer landscape derives a novel prognostic signature and therapeutic target

Triple-negative breast cancer (TNBC) is a highly heterogeneous disease. Patients with early-stage TNBCs have distinct likelihood of distant recurrence. This study aimed to develop a prognostic signature of early-stage TNBC patients to improve risk stratification. Using RNA-sequencing data, we analyzed 189 pathologically confirmed pT1-2N0M0 TNBC patients and identified 21 mRNAs that were highly expressed in tumor and related to relapse-free survival. All-subset regression program was used for constructing a 7-mRNA signature in the training set (n = 159); the accuracy and prognostic value were then validated using an independent validation set (n = 158). Here, we profiled the transcriptome data from 189 early-stage TNBC patients along with 50 paired normal tissues. Early-stage TNBCs mainly consisted of basal-like immune-suppressed subtype and had higher homologous recombination deficiency scores. We developed a prognostic signature including seven mRNAs (ACAN, KRT5, TMEM101, LCA5, RPP40, LAGE3, CDKL2). In both the training (n = 159) and validation set (n = 158), this signature could identify patients with relatively high recurrence risks and served as an independent prognostic factor. Time-dependent receiver operating curve showed that the signature had better prognostic value than traditional clinicopathological features in both sets. Functionally, we showed that TMEM101 promoted cell proliferation and migration in vitro, which represented a potential therapeutic target. Our 7-mRNA signature could accurately predict recurrence risks of early-stage TNBCs. This model may facilitate personalized therapy decision-making for early-stage TNBCs individuals.


Introduction
Triple-negative breast cancer is the most aggressive type of breast cancer, characterized by low/no expression of estrogen receptor (ER), progestogen receptor (PR) and human epidermal growth factor receptor 2 (HER2) [1]. Recent advances in breast cancer screening have significantly improved the diagnosis of early-stage breast cancer [2]. According to breast cancer registry, stage I breast cancer accounted for 25.8% of breast cancer patients in China [3]. However, outcomes for these patients may vary by tumor subtype and biological behavior [4]. For early-stage breast cancer, TNBC patients had a higher risk of mortality compared with other molecular subtypes [4][5][6]. Currently, adjuvant chemotherapy remains the only systematic treatment option for earlystage TNBCs. Chemotherapy is often considered even in pT1a-bN0M0 patients, but few benefits of survival could be achieved from chemotherapy [7]. In addition, due to the intertumor heterogeneity of TNBCs, similar chemotherapy may evoke different treatment response. Therefore, there is an urgent need to develop prognostic signatures to stratify early-stage TNBCs according to prognosis.
Previous studies have tried to use mRNA, micro-RNA, long noncoding RNA, or proteome profile to build prognostic signatures to assess the risk of recurrence and magnitude of chemotherapy benefit of node-negative breast cancer [8][9][10]. Nonetheless, these studies had two main drawbacks: first, most of the studies used Cox proportional hazards regression analysis to build prognostic models, which is not suitable for high-dimensional data (datasets with thousands of genes) [11]; Second, these studies enrolled both nodenegative and -positive TNBC patients. With a relatively small sample size of node-negative TNBC patients, it is challenging to generate a precise model for node-negative TNBCs.
In this study, we developed a 7-mRNA signature by LASSO (Least Absolute Shrinkage and Selection Operator) Cox regression model following all-subsets regression program to predict relapse-free survival (RFS) in a large cohort of early-stage TNBCs. We hypothesized that this signature could identify the high-risk subpopulation among early-stage TNBC and provide instructions for individual treatments.

Patients and samples
Previously we have constructed a multiomics TNBC cohort, containing 386 consecutive TNBC patients who underwent surgeries at the Department of Breast Surgery, Fudan University, Shanghai Cancer Center (FUSCC), from 1 January 2007 to 31 December 2014 [12]. To identify candidate mRNAs for prognostic signature development, we reviewed the 386 TNBC patients with high-throughput transcriptome data (GSE76250 and SRP157974) and 189 patients (50 patients had paired adjacent normal tissues) meet the following inclusion criteria were included as developing set in this prospective observational study, which was initiated in 2017 (Fig. S1). The inclusion criteria were the following: (1) Female patients with unilateral breast tumors and histologically confirmed invasive ductal carcinoma (IDC) and triplenegative subtype (ER-negative, PR-negative, and HER2negative); (2) Pathology confirmed stage pT2 or below (tumor size ≤ 5 cm); (3) Pathology confirmed pN0 (no cancer cells in the lymph nodes). Patients with isolated tumor cells (pN0i+) and micro metastases (pN1mi) in lymph nodes were excluded. (4) Patients who did not receive any type of treatment prior to surgery; (5) Follow-up time > 1 year. The follow-up of patients in developing set was completed on 30 June, 2017, and the median follow-up length was 47.6 months (interquartile range 38.5-59.3 months). To construct the training and validation set, a total of 317 consecutive early-stage TNBC patients treated in FUSCC between January 1, 2008 and December 31, 2015 were enrolled according to the inclusion criteria described above. A total of 330 frozen samples (317 tumor tissues and 13 paired normal tissues) from 317 TNBC patients preserved by the Department of Pathology in FUSCC were collected. There were 79 patients who were overlapped between the developing set and the above 317 patients.
The ER, PR and HER2 status were measured by immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) were confirmed by two experienced pathologists (Ruo-Hong Shui and Wen-Tao Yang), and the cut-off points were set as the guideline from the American Society of Clinical Oncology and the College of American Pathologists [13]. The pathologic stage was assigned based on the 8th edition of the AJCC anatomic staging system [14].
To train and validate a multi-gene signature, we randomly assigned the 317 patients into a training set (n = 159) and validation set (n = 158). The follow-up was completed on 31 December, 2017 and the median follow-up length was 39.3 months (interquartile range 25.6-55.2 months) in training set and 38.5 months (interquartile range 26.8-52.7 months) in the validation set. RFS events were defined as the first recurrence of invasive disease at a local, regional, or distant site; contralateral breast cancer; and death from any cause. Patients without events were censored at the last follow-up.
The independent ethics committee/institutional review board of FUSCC (Shanghai Cancer Center Ethics Committee) approved our study and written informed consent from patients in our study was obtained before enrollment.

RNA purification, reverse transcription and RT-qPCR
The RNeasy Plus Mini Kit (QIAGEN) was used to isolate total RNA from frozen tissues and cells and reverse transcribed to cDNA using GoScript Reverse Transcription Kit (Promega) according to the manufacturer's protocol. Realtime quantitative-polymerase chain reaction (RT-qPCR) was performed using SYBR Premix Ex Taq (TAKARA) and amplified on QuantStudio 7 Flex System (Applied Biosystems). The primers used for RT-qPCR were listed in Table S1. The relative expression value of each mRNAs was calculated as CT mRNA − (CT U6 + CT GAPDH )/2 (CT, cycle threshold).

Identification of candidate mRNAs
The detailed filtration process is illustrated in Fig. S1. The mRNAs that were differentially expressed between tumor and paired normal tissues and related to RFS were selected for further research using transcriptome data in the developing set (N = 189 containing 50 patients with paired adjacent normal tissues). In detail, the paired differentially expressed mRNA analysis was processed using the Bioconductor package 'Limma' (Linear models for microarray analysis); significant differentially expressed genes were defined as false discovery rate < 0.05 and fold change > 1.5 or < 0.33 for up or downregulation, respectively. The univariate Cox proportional hazards regression model was used to select mRNAs that were significantly correlated with RFS. Next, the LASSO Cox regression model analysis was processed to select the most useful prognostic mRNAs as candidate mRNAs with 200 bootstrap replicates [15].

Development and validation of the prognostic signature
The expression of the candidate mRNAs was measured using RT-qPCR in the training set (N = 159) and internal validation set (N = 158). In order to simplify the prognostic signatures, we used R package 'glmulti' to perform automated screening for the best combination of mRNAs based on Akaike information criterion (AIC) value and calculate the coefficients of each mRNAs in the training set [16]. The coefficients of each mRNAs were used to construct a risk score formula. The accuracy of this signature was assessed by time-dependent receiver operating characteristic (ROC) analysis, and the best cut-off score was selected according to Youden Index [17]. To validate this signature, the risk score of each patient in the validation set was calculated using the formula generated in training set. In addition, we used transcriptome data from METABRIC database as a preliminary external validation. As the gene expression level in METABRIC database was profiled using RNA-sequencing (RNA-seq), we could not use the same coefficients derived from training set to calculate the risk score of patients from METABIC database. Therefore we used Cox regression model to recalculate the coefficients of 7 mRNAs and the cut-off of risk score was selected according to Youden Index.

Gene set enrichment analysis (GSEA)
GSEA (version 2.2.0) was performed using the hallmark gene set and a preranked differential expression gene list [18]. The differential gene analysis between early-stage and late-stage was conducted using DEseq2 package.

Cell cultures and siRNA transfection
Human breast cancer cell lines MDA-MB-231, BT-549 and human embryonic kidney cells HEK293T were obtained from American Type Culture Collection. These human cell lines have been authenticated using STR profiling and monitoring mycoplasma contamination. The Lipofectamine RNAiMAX Transfection Reagent (Thermo Fisher Scientific) was used to transfect small interfering RNAs (siRNAs). The target sequences are listed in Table S2. The transfection procedure was performed according to the manufacturer's instructions.

Proliferation assay
For the proliferation assay, cells were cultured in 6-well plates for 24 h and transfected with siRNAs. After 24 h, 1000 cells per well were seeded in 96-well plates. The cell confluence was monitored using an IncuCyte Live-Cell Analysis Systems (IncuCyte ZOOM System, ESSEN Bioscience) and imaged every 12 h. The cell confluence was measured by IncuCyte ZOOM software, and the assays were performed in triplicate.

Migration assay
For the migration assay, 5 × 10 4 cells in serum-free medium were seeded on the top of the transwell chamber (pore size, 8um, BD Bisosciences) and 600 μl medium containing 10% FBS was added to the bottom chamber. After incubation for 8-10 h, cells that migrated to the opposite side of the membrane were fixed in 4% paraformaldehyde for 30 min and stained with crystal violet for additional 30 min. ImageJ was used to quantify the number of cells migrated to the opposite side per field.

Immunohistochemistry (IHC)
A total of 56 primary TNBC specimens were obtained from the Department of Pathology, Fudan University Shanghai Cancer Center. IHC staining was performed as previously described [19]. The anti-TMEM101 (Abscitech, AB-35251-2, 1:100) primary antibody was used. Slides were evaluated using light microscopy and scored with a microscope by pathologists with the H-score, which evaluated staining intensity (0 to 3) and the percentage of positively stained cells (0 to 1), with a final score ranging from 0 to 3.

Statistical analysis
The t-test was used to compare the differences between the two groups for continuous variables and χ 2 test for categorical variables. We used the Kaplan-Meier method to perform univariate survival analysis and log-rank test to calculate the hazard ratio and compare two survival curves. For multivariate survival analysis, the multivariate Cox proportional hazards regression model was used to test whether the signature was an independent prognostic factor associated with RFS. All statistical analyses were performed with R software version 3.5.3. or GraphPad Prism 8. A P value < 0.05 was considered to be statistically significant.

Transcriptome characteristic of early-stage TNBCs
We reviewed 17,877 breast cancer patients diagnosed in our center from January 1, 2008 and December 31, 2015 and 663 of them were at pT1-2N0M0 TNBC which accounted for 28.8% of all TNBC patients. Survival analysis showed that pT1-2N0M0 TNBCs (early-stage) showed great survival benefits compared to other TNM stage (late-stage) (Fig. 1A). According to our previous research, TNBC could be classified into 4 mRNA-based TNBC intrinsic subtype, including basal-like and immune-suppressed (BLIS), luminal androgen receptor (LAR), immunomodulatory (IM) and mesenchymal stem-like (MSL) subtype [20]. We found that more early-stage TNBCs were classified as BLIS (Fig. 1B,   Fig. 1 The transcriptome features of early-stage TNBCs. A Kaplan-Meier curves of relapse-free survival in early-stage and late-stage TNBCs. B, C Distribution of TNBC molecular subtypes (B) and HRD scores (C) between early-stage and late-stage TNBCs. D, Representative gene set enrichment analysis plot showing gene set unregulated in the early-stage versus the late-stage TNBCs. RFS relapse-free survival; BLIS basal-like and immune-suppressed; IM immunomodulatory; LAR luminal androgen receptor; MES mesenchymal-like; HRD homologous recombination deficiency; EMT epithelial-mesenchymal transition P = 0.0048, early-stage vs late-stage, 46.5% vs 28.0%). BLIS patients had generally high homologous recombination deficiency (HRD) score a copy-number-based biomarker to identify patients with have DNA repair defects [20,21]. As we expected early-stage patients had higher HRD score compared with patients of late-stage (Fig. 1C, P = 0.0008). Further GSEA analysis showed that gene set of cell cycle and epithelial-mesenchymal transition was enriched in earlystage TNBCs (Fig. 1D).

Construction of a multi-gene prognostic signature for early-stage TNBCs
We identified 996 mRNAs differently expressed between 50 paired tumor and normal breast samples that were significantly related to RFS in developing set (n = 189) (Fig. S1). LASSO Cox regression model was used to select candidate mRNAs with 200 bootstrap replications. Twenty-one candidate mRNAs were eligible for developing a prognostic signature after the filtrating procedure (Fig. S2A, B). Next, the expressions of the 21 mRNAs were further evaluated in training (n = 159) and validation set (n = 158) by RT-qPCR.
In order to further simplify the prognostic signature, the genetic algorithm was used to perform automated screening for the best signature based on the AIC value (Methods). A total of 542,358 candidate signatures were identified, and finally, a 7-mRNA signature with the lowest AIC value was returned as the best signature to calculate the risk score of each patient based on the relative expression level of the seven mRNAs in the training set (Fig. S2C). The formula used to calculate the risk score was the following: Risk score = 1 .

Prognostic value of the 7-mRNA signature
With the risk score formula, patients in the training set were divided into high-(11.3%, N = 18) or low-risk (88.7%, N = 141) groups according to the optimum cut-off score (− 0.87) determined by the time-dependent ROC analysis. The distribution of the risk score and RFS status showed that patients with higher risk scores had a higher recurrence rate and worse survival ( Fig. 2A, B). Yet, there was no significant difference in traditional clinicopathological factors between the two groups ( Table 1). The time-dependent ROC analysis demonstrated that the prognostic accuracy of this signature was similar at different follow-up times (Fig. 2C). Patients in the validation set were then stratified into the low-risk group (84.2%, N = 133) and high-risk group (15.8%, N = 25) using the same procedure as in the training cohort. Similarly, the high-risk group had worse survival compared to the low-risk group in the validation set ( Fig. 2D-F). In addition, the 7 mRNAs also had good prognostic value in the METABRIC cohort and stratified patients into two groups with different outcomes (Fig. S3A and B).

Multivariate analysis and subgroup analysis
Multivariate Cox proportional hazards regression analysis showed that after adjustment of traditional clinicopathological features, the 7-mRNA signature was still an independent prognostic factor associated with RFS (Table 2; HR 15.68; 95% CI 5.50-44.70; P < 0.001). In addition, none of the traditional clinicopathological features was associated with RFS in the multivariate analysis except for lymphovascular invasion in the training set. Similar results were observed in the validation set. Patients with high-risk had a higher risk of recurrence (Table 2, HR 5.87; 95% CI 2.07-16.66; P = 0.001), and lymphovascular invasion was not a significant prognostic factor in the validation set.
Subgroup analysis was proposed to verify the performance of this signature in different subpopulations. The forest plot illustrated that the 7-mRNA signature had good prognostic value in predicting RFS independent of tumor size, Ki-67, tumor grade, and lymphovascular invasion (Fig. S3C).

Comparison with other prognostic factors
To compare the performance of the 7-mRNA signature and other traditional clinicopathological features, we conducted time-dependent ROC analyses (the time was set as 3 years) and calculated the area under curve (AUC) on both the signature and traditional clinicopathological features. The results showed that the 7-mRNA signature had better prognostic value in predicting RFS (Fig. 3A). In line with the current statistical consensus, the predictor model with AUC over 0.7 could be considered as adequate for clinical implications [22,23]. Traditional clinicopathological features including tumor size, tumor grade, Ki-67 status and lymphovascular invasion had poor prognostic value in earlystage TNBCs with AUC lower than 0.7. This finding was further validated in the validation set, as the 7-mRNA signature had higher AUC; however, when compared to tumor size and lymphovascular invasion, the difference was not statistically significant (Fig. 3B). Therefore, the 7-mRNA signature could be a better prognostic model than clinicopathological prognostic features.

The biological function of TMEM101 in the 7-mRNA signature
To further study the biological function of 7 mRNAs in the prognostic signature, we explored their role in cell migration. Knockdown of each mRNAs was conducted by a pool of three target-specific siRNAs. The transwell assay revealed that the downregulation of TMEM101 or LAGE3 restrained the migration of the MDA-MB-231 cell line (Fig. S4A). Furthermore, Kaplan-Meier analysis demonstrated that high expression of TMEM101 was significantly associated with shorter RFS (Fig. 4A; P = 0.019), while LAGE3 was not related to patients survival (Fig. S4B, C). Therefore, TMEM101 could be a good candidate target or prognostic biomarker of TNBC.    The exact function of TMEM101 in human still remains unknown. Some studies have suggested an association between the NF-κB pathway and TMEM101 [24,25]. In this study, we found higher expression of TMEM101 in tumor samples than paired normal samples ( Fig. 4B; P = 0.003) and TMEM101 showed great mRNA-protein correlations in TNBC tumor samples (Fig. S4D). We also designed three siRNAs targeting TMEM101, and two siRNAs with the highest interference efficiently inhibited the migration ability of MDA-MB-231 and BT-549 cell lines (Fig. 4C, D, F). Similarly, the downregulation of TMEM101 significantly impaired the proliferation of tumor cells in vitro (Fig. 4E, G). In addition, TMEM101 depletion could also decrease the mRNA level of NF-κB downstream genes [26] (Fig. S4E). Collectively, these data suggest that knocking down TMEM101 could restrain migration and proliferation of TNBC cell lines in vitro, thus revealing it as a potential therapeutic target.

Discussion
In this study, we developed and validated a 7-mRNA signature based on transcriptome analysis to predict prognosis for node-negative TNBC. The prognostic value of this signature was more efficient than traditional clinicopathological features, including tumor size, tumor grade, Ki-67 status, and lymphovascular invasion status. Furthermore, we identified that TMEM101 was related to poor prognosis and promoted of TNBC progression.
The prognosis of node-negative TNBC patients is rarely associated with clinical factors, including age, histological grade, and tumor size [27]. Previous retrospective research showed that lymphovascular invasion was the only traditional parameter associated with the survival of pT1a-b node-negative TNBC [28]. Our results also revealed that lymphovascular invasion could reflect worse survival of Several studies constructed prognostic signatures to stratify node-negative TNBC into different risk groups, to provide de-escalating or escalating treatments for earlystage TNBCs. Liu and colleagues constructed an 11-protein signature from 126 node-negative and adjuvant chemotherapy-naive TNBCs [8]. According to this signature, 60% early-stage TNBC were with relatively low recurrence risks, while over 80% of patients in our cohort were identified to have low recurrence risks (Table 1). However, it is difficult to compare our results with those of Liu and colleagues [8] because of the different inclusion criteria. Most of the patients enrolled in our study received adjuvant chemotherapy. Furthermore, Wang and colleagues constructed a 76-gene signature (16 genes for ER-negative; 60 genes for ER-positive) to predict distant metastasis for node-negative breast cancer patients in two cohorts Representative images and the number of migrated MDA-MB-231 (D) and BT-549 cells (F) interfered with TMEM101 siRNAs. In vitro growth curves of MDA-MB-231 (E) and BT-549 (G) interfered with TMEM101 siRNAs for 5 days. Scale bar, 100 µm. NC negative control; TM TMEM101. *P < 0.05; **P < 0.01; ***P < 0.001 1 3 (contained 286 and 180 lymph node-negative breast cancer patients, respectively); however, the number of patients with ER-negative tumors (77 and 16 patients, respectively) in their study was too small to perform a subgroup analysis [10,29]. Also, in this study, ER-negative patients included both HER2-positive and TNBC subtype, which had entirely different biological features [30]. Therefore, the first strength of our prognostic signature compared with previous studies is the large cohort of node-negative TNBC patients. Second, we used RT-qPCR to construct the signature in order to apply it in clinical practice. Profiling of multiple gene expression by RT-qPCR or microarray, such as PAM50 assay, 21-gene Oncotype DX and 70-gene MammaPrint signature, has been widely used in clinical practice [31]. High-throughput data are a timeconsuming and costly assay which might not be easily used in clinical practice. Liu and colleagues constructed an mRNA-lncRNA signature by RT-qPCR to identify highrisk TNBC patients, and a clinical trial (NCT02641847) based on this signature was conducted in our center [32]. Therefore, from technical aspects, our signature could be widely used in our center.
The literature review showed that the seven mRNAs incorporated in our signature and their potential roles in cancer have already been reported. For example, CDKL2 was reported to promote breast cancer epithelial-mesenchymal transition through the up-regulation of ZEB1 expression in breast cancer [33]. A two-gene epigenetic signature revealed that higher methylation of CDKL2 was related to non-response to neoadjuvant chemotherapy in triple-negative breast cancer [34]. KRT5 encodes a member of the keratin gene family, which was identified as a basal marker in cancer research and related to worse survival, especially in node-negative breast cancer [30,[35][36][37][38]. In our study KRT5 served as a predictor of worse survival in early-stage TNBC, but the in vitro migration assay illustrated that KRT5 had no impact on the migration of TNBC cell lines. We speculated that it may regulate other biological functions in TNBC, which need to be further investigated. Aggrecan encoded by ACAN is an important component for cartilage structure, which may facilitate extracellular matrix (ECM) remodeling [39,40]. Although the function of aggrecan in cancer is unknown, in breast cancer, ECM components are significantly dysregulated and associated with tumor progression [41]. TMEM101 has been rarely reported in cancer research. Moreover, using a cDNA library, Matsuda and colleagues revealed that 83 genes could activate the NF-κB pathway, including TMEM101 [24]. Recently, Khan and colleagues demonstrated that TMEM101 could increase the expression of NF-κB in zebrafish ovarian samples [25]. In our study, we first demonstrated that the expression of TMEM101 in TNBC was higher in tumors than in adjacent normal tissues, which suggests that TMEM101 could be used as a prognostic biomarker at early stage of TNBCs. Furthermore, interference expression of TMEM101 could inhibit the migration and proliferation abilities of TNBC cell lines.
This study has a few limitations. First, most of the patients enrolled in our study received adjuvant chemotherapy containing taxane and (or) anthracycline. Therefore, it was not accurate to identify the subpopulation of early-stage TNBCs in whom chemotherapy might be entirely omitted. However, according to our previous research, addition of taxane to an anthracycline-based adjuvant chemotherapy cannot further improve the RFS of pT1N0M0 TNBC patients. Therefore early-stage TNBC patients with low recurrence risk identified by this signature had the opportunity to omit taxane in the adjuvant chemotherapy [28]. Further randomized prospective studies should verify our hypotheses that patients with a high risk of relapse could benefit from adding chemotherapy such as capecitabine, which has been shown to bring additional survival benefits for TNBCs in clinical trials [42,43], whereas patients in low-risk group could avoid taxanecontained chemotherapy. Second, as for using the RT-qPCR platform, it is inappropriate to use public high-throughput data sets as independent external validation. Therefore, to further validate the accuracy and stability of our signature, we have already started collecting patients' samples and follow-up information at our center since 2018.

Conclusion
In summary, our 7-mRNA prognostic signature could identify early-stage TNBC patients at high recurrence risks. The 7-mRNA signature could be a powerful tool for oncologists to individualize treatment to patients' early-stage TNBCs. Subsequent clinical trials are needed to further validate the signature.