Transcriptome Analysis Derives a Novel Prognostic 7-mRNA Signature in Early-Stage Triple-Negative Breast Cancer

Background: Triple-negative breast cancer (TNBC) is a highly heterogeneous disease and patients with early-stage TNBCs have distinct likelihood of distant recurrence. Methods: In this study, We extracted transcriptome data for 189 pathologically con�rmed pT1-2 node-negative TNBC patients at Fudan University Shanghai Cancer Center. Candidate mRNAs were �ltered, which was followed by differential expressed mRNAs analysis, survival analysis, and LASSO Cox regression model. All-subsets regression program was used for constructing a multi-mRNA signature in the training set (n=159); the accuracy and prognostic value were then validated using an independent validation set (n=158). Results: Here, we pro�led the transcriptome data from 189 early-stage TNBC patients along with 50 paired normal tissues, and developed a prognostic signature based on seven mRNAs (ACAN, KRT5, TMEM101, LCA5, RPP40, LAGE3, CDKL2).In both the training (n=159) and validation cohorts (n=158), the signature could identify patients with relatively high recurrence risks and serve as an independent prognostic factor. Furthermore, the signature had better prognostic value than traditional clinicopathological features in both sets. Among the seven mRNAs, TMEM101 was identi�ed as a prognostic biomarker of early-stage TNBC. Additional cell experiments suggested that TMEM101 could facilitate migration and proliferation of TNBC cells. Conclusions: Our 7-mRNA signature could accurately predict recurrence risks of early-stage TNBCs. Clinical and genomic low risk TNBC patients may safely avoid adjuvant chemotherapy.


Background
Triple-negative breast cancer (TNBC), which is a breast cancer subtype that lacks estrogen receptor (ER), progestogen receptor (PR) and human epidermal growth factor receptor 2 (HER2), is the most aggressive type of breast cancer [1].Recent advances in breast cancer screening have signi cantly improved the diagnosis of early breast cancer, such as pT1-2 and node-negative breast cancers [2].However, outcomes for these patients may vary by tumor subtype and biological behavior [3].Early-stage TNBC represents a clinically high-risk group associated with a more aggressive clinical course compared with other breast cancer subtypes [3][4][5].Due to the absence of effective tools, genomic low-risk early-stage TNBC has not been identi ed who might not be bene ted from chemotherapy.So far, many efforts have been dedicated to identifying low clinical risk and high genomic risk subtypes based on genomic pro ling.The multi-gene signature assay, Oncotype DX, is a novel genomic tumor pro ling tool that determines the expression of 21 tumor-associated genes which has a high value in guiding node-negative hormone-receptor (HR)-positive breast cancer patients' individualized treatments [6].However, there has no precise approaches to tailoring therapy of node-negative TNBCs [7].Retrospective researches have reported the opportunity of de-escalating chemotherapy in pT1N0M0 TNBC, such as offering chemotherapy regimen with low toxicity [8], eliminating taxane [9], or even eliminating chemotherapy altogether [10].For pT2N0M0, HR-patients have worse survival than HR+ patients, which implied that some high-risk patients need escalating chemotherapy or frequent follow-up [4].Thus, developing an accurate and stable prognostic signature could add signi cant value to individual treatments of node-negative TNBCs.
Previous studies have tried to use mRNA, micro-RNA, long-noncoding RNA, or proteome pro le building prognostic signatures to assess the risk of recurrence and magnitude of chemotherapy bene t of nodenegative breast cancer [11][12][13].Nonetheless, these studies had two main drawbacks: rst, most of the studies used Cox proportional hazards regression analysis to build prognostic models, which is not suitable for data with the high variable to sample size ratios [14].Second, these studies enrolled both node-negative and -positive TNBC patients.With a relatively small sample size of node-negative patients, it is challenging to generate a precise model for node-negative TNBCs.
In this study, we developed a new gene-expression signature by LASSO (Least Absolute Shrinkage and Selection Operator) Cox regression model to predict relapse-free survival (RFS) in a large cohort of nodenegative TNBCs.We hypothesized that this signature could identify the high-risk subpopulation among node-negative TNBC and provide instructions for individual treatments.

Patients and samples
A total of 317 consecutive patients treated in the Department of Breast Surgery at Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2015 were enrolled in this prospective observational study, which was initiated in 2017.The inclusion criteria were the following: (1) female patients with unilateral breast tumors and histologically con rmed invasive ductal carcinoma (IDC) and triple-negative subtype (ER-, PR-and HER2-); (2) tumor size < 5cm and no lymph-node metastasis or distant metastasis con rmed by pathology; (3) patients who did not receive any type of treatment prior to surgery; (4) su cient frozen tissues for RNA puri cation.
The ER, PR and HER2 status measured by immunohistochemistry (IHC) and uorescence in situ hybridization (FISH) were con rmed by two experienced pathologists (Ruo-Hong Shui and Wen-Tao Yang), and the cut-off points were set as the guideline from the American Society of Clinical Oncology and the College of American Pathologists [15].The pathologic stage was assigned based on the 8 th edition of the AJCC anatomic staging system [16].All of the breast cancer specimens were con rmed to contain more than 80% of tumor cells.A total of 330 frozen samples (including 13 paired normal tissues) from 317 TNBC patients preserved by the Department of Pathology in FUSCC (Shanghai, P.R. China) were examined.
To develop candidate mRNAs from high-throughput transcriptome sequencing data, we enrolled another 386 consecutive patients who underwent surgical treatment in the Department of Breast Surgery at Fudan University Shanghai Cancer Center (FUSCC) between 1 January 2007 and 31 December 2014 with Affymetrix GeneChip Human Transcriptome Array 2.0 (HTA 2.0) (n=141) and RNA-seq (n=245) data from our previous researches [17,18].Using the same criteria as described above (except for su cient frozen tissues), 189 patients (50 patients had paired adjacent normal tissues) were enrolled as developing set in this study.The follow-up was completed on 30 June, 2017, and the median follow-up length was 47.6 months (interquartile range, 38.5-59.3months).There were 79 patients who overlapped between the developing set and the above 317 patients.
To train and validate a multi-gene signature, we randomly assigned the 317 patients into a training set (n=159) and validation set (n=158).The follow-up was completed on 31 December, 2017 and the median follow-up length was 39.3 months (interquartile range, 25.6-55.2months) in training set and 38.5 months (interquartile range, 26.8-52.7 months) in the validation set.RFS events were de ned as the rst recurrence of invasive disease at a local, regional, or distant site; contralateral breast cancer; and death from any cause.Patients without events were censored at the last follow-up.
The independent ethics committee/institutional review board of FUSCC (Shanghai Cancer Center Ethics Committee) approved our study and written informed consent from patients in our study was obtained before enrollment.

RNA puri cation, reverse transcription and RT-qPCR
The RNeasy Plus Mini Kit (QIAGEN) was used to isolate total RNA from frozen tissues and cells and reverse transcribed to cDNA using GoScript Reverse Transcription Kit (Promega) according to the manufacturer's protocol.Real-time quantitative-polymerase chain reaction (RT-qPCR) was performed using SYBR Premix Ex Taq (TAKARA) and ampli ed on QuantStudio 7 Flex System (Applied Biosystems).The primers used for RT-qPCR were listed in Table S1.The relative expression value of each mRNAs was calculated as CT mRNA -(CT U6 +CT GAPDH )/2 (CT, cycle threshold).

Identi cation of candidate mRNAs
The detailed ltration process is illustrated in Figure 1.The mRNAs that were differentially expressed between tumor and paired normal tissues and related to RFS were selected for further research using transcriptome data in the developing set (N=189 containing 50 patients with paired adjacent normal tissues).The differentially expressed mRNA screening was processed using the Bioconductor package 'Limma' (Linear models for microarray analysis); signi cant differentially expressed genes were de ned as false discovery rate < 0.05 and fold change > 1.5 or < 0.33 for up or down-regulation, respectively.The univariate Cox proportional hazards regression model was used to select mRNAs that were signi cantly correlated with RFS.Next, the LASSO Cox regression model analysis was processed to select the most useful prognostic mRNAs as candidate mRNAs with 200 bootstrap replicates [19].

Development and validation of the prognostic signature
The expression of the candidate mRNAs was measured using RT-qPCR in the training set (N=159) and internal validation set (N=158).In order to simplify the prognostic signatures, we used R package 'glmulti' to perform automated screening for the best combination of mRNAs based on Akaike information criterion (AIC) value and calculate the coe cients of each mRNAs in the training set [20].The coe cients of each mRNAs were used to construct a risk score formula.The accuracy of this signature was assessed by time-dependent receiver operating characteristic (ROC) analysis, and the best cut-off score was selected according to Yoden Index [21].To validate this signature, the risk score of each patient in the validation set was calculated using the formula generated in training set.

Cell cultures and siRNA transfection
Human breast cancer cell lines MDA-MB-231(RRID:CVCL_0062), BT-549 (RRID:CVCL_1092) and human embryonic kidney cells HEK293T (RRID: CVCL_0063) were obtained from American Type Culture Collection.These human cell lines have been authenticated using STR pro ling and monitoring mycoplasma contamination.The Lipofectamine RNAiMAX Transfection Reagent (Thermo Fisher Scienti c) was used to transfect small interfering RNAs (siRNAs).The target sequences were listed in Table S2.The transfection procedure was performed according to the manufacturer's instructions.

Proliferation assay
For the proliferation assay, cells were cultured in 6-well plates for 24 hours and transfected with siRNAs.
After 24 hours, 1000 cells per well were seeded in 96-well plates.The cell con uence was monitored using an IncuCyte Live-Cell Analysis Systems (IncuCyte ZOOM System, ESSEN Bioscience) and imaged every 12 hours.The cell con uence was measured by IncuCyte ZOOM software, and the assays were performed in triplicate.

Migration assay
For the migration assay, 5x10 4 cells in serum-free medium were seeded on the top of the transwell chamber (pore size, 8um, BD Bisosciences) and 600μl medium containing 10% FBS was added to the bottom chamber.After incubation for 8-10 hours, cells that migrated to the opposite side of the membrane were xed in 4% paraformaldehyde for 30 minutes and stained with crystal violet for additional 30 minutes.
ImageJ was used to quantify the number of cells migrated to the opposite side per eld.

Western Blot
Cells were lysed in lysis buffer (50 mM tris [pH 8.1], 1 mM EDTA, 1% SDS, 1 mM fresh dithiothreitol, sodium uoride, and leupeptin).The cell lysates were boiled in SDS-PAGE loading buffer for 15 minutes.The proteins were separated by SDS-PAGE and transferred to polyvinylidene di uoride membranes (Millipore).

Statistical Analysis
The t-test was used to compare the differences between the two groups for continuous variables and χ2 test for categorical variables.We used the Kaplan-Meier method to perform univariate survival analysis and log-rank test to calculate the hazard ratio and compare two survival curves.For multivariate survival analysis, the multivariate Cox proportional hazards regression model was used to test whether the signature was an independent prognostic factor associated with RFS.All statistical analyses were performed with R software version 3.5.3.A P value < 0.05 was considered to be statistically signi cant.

Results
Construction of a multi-gene prognostic signature for early-stage TNBCs We identi ed 996 mRNAs differently expressed between 50 paired tumor and normal breast samples that were signi cantly related to survival in developing set (n=189) (Figure 1).LASSO Cox regression model was used to select candidate prognostic mRNAs with 200 bootstrap replications.Twenty-one candidate mRNAs were eligible for developing a prognostic signature after the ltrating procedure (Figure S1a).Next, the expressions of the 21 mRNAs were further evaluated in training (n=159) and validation set (n=158) by RT-qPCR.
In order to further simplify the prognostic signature, the genetic algorithm was used to perform automated screening for the best signature based on the AIC value (Methods).A total of 542,358 candidate signatures were identi ed, and nally, a 7-mRNA signature with the lowest AIC was returned as the best signature to calculate the risk score of each patient based on the expression level of the seven mRNAs in the training set (Figure S1b, c (88.7%, N=141) groups according to the optimum cut-off score (-0.87) determined by the time-dependent ROC analysis.The distribution of the risk score and RFS status showed that patients with higher risk scores had a higher recurrence rate and worse survival (Figure 2a, b).Yet, there was no signi cant difference in traditional clinicopathological factors between the two groups (Table 1).The time-dependent ROC analysis demonstrated that the prognostic accuracy of this signature was similar at different follow-up times (Figure 2c).Patients in the validation set were then strati ed into the low-risk group (84.2%,N=133) and high-risk group (15.8%,N=25) using the same procedure as in the training cohort.Similarly, the high-risk group had worse survival compared to the low-risk group in the validation set (Figure 2d-f).

Multivariate analysis and subgroup analysis
Multivariate Cox proportional hazards regression analysis showed that after adjustment of traditional clinicopathological features, the 7-mRNA signature was still an independent prognostic factor associated with RFS (Table 2; HR=15.68;95% CI, 5.50-44.70;P<0.001).In addition, none of the traditional clinicopathological features was associated with RFS in the multivariate analysis except for lymphovascular invasion in the training set.Similar results were observed in the validation set.Patients with high-risk had a higher hazard of recurrence (Table 2, HR=5.87;95% CI, 2.07-16.66;P=0.001), and lymphovascular invasion was not a signi cant prognostic factor in the validation set.
Subgroup analysis was proposed to verify the performance of this signature in different subpopulations.
The forest plot illustrated that the 7-mRNA signature had good prognostic value in predicting RFS independent of tumor size, Ki-67, tumor grade, and lymphovascular invasion (Figure S2).

Comparison with other prognostic factors
To compare the performance of the 7-mRNA signature and other traditional clinicopathological features, the area under curve (AUC) was calculated on signature and traditional clinicopathological features.The results showed that the 7-mRNA signature had better prognostic value in predicting RFS (AUC=0.766; Figure 3a).However, other clinicopathological features including tumor size, tumor grade, Ki-67 status and lymphovascular invasion had poor prognostic value in early stage TNBCs with AUC lower than 0.7.This nding was further validated in the validation set, as the 7-mRNA signature had higher AUC; however, when compared to tumor size and lymphovascular invasion, the difference was not statistically signi cant (Figure 3b).
The biological function of TMEM101 in the 7-mRNA signature To further study the biological function of 7 mRNAs in the prognostic signature, we explored their role in cell migration.Knockdown of each mRNAs was conducted by a pool of three target-speci c siRNAs.The transwell assay revealed that the downregulation of TMEM101 or LAGE3 restrained the migration of the MDA-MB-231 cell line (Figure S3a).Furthermore, Kaplan-Meier analysis demonstrated that high expression of TMEM101 was signi cantly associated with RFS (Figure 4a; P=0.019), while LAGE3 was not related to patients survival (Figure S3b).Therefore, TMEM101 resulted as a good candidate target or prognostic biomarker of TNBC.
The exact function of TMEM101 in human still remains unknown.Some studies have suggested an association between the NF-κB pathway and TMEM101 [22,23].In this study, we found higher expression of TMEM101 in tumor samples than in paired normal samples, which further suggested that TMEM101 might be a prognostic biomarker of early-stage TNBCs (Figure 4b; P=0.003).We also designed three siRNAs targeting TMEM101, and two siRNAs with the highest interference e cient inhibited the migration ability of MDA-MB-231 and BT-549 cell lines (Figure 4c, d, f).Besides, the downregulation of TMEM101 signi cantly impaired the proliferation of tumor cells in vitro (Figure 4e, g).Collectively, these data suggest that knocking down TMEM101 could restrain migration and proliferation of TNBC cell lines in vitro, thus revealing it as a potential therapeutic target.

Discussion
In this study, we developed and validated a 7-mRNA signature based on transcriptome analysis to predict prognosis for node-negative TNBC.The prognostic value of this signature was more e cient than traditional clinicopathological features, including tumor size, tumor grade, Ki-67 status, and lymphovascular invasion status.Furthermore, we identi ed that TMEM101 was related to poor prognosis and promoted of TNBC progression.
The prognosis of node-negative TNBC patients is rarely associated with clinical factors, including age, histological grade, and tumor size [24].Previous retrospective research showed that lymphovascular invasion was the only traditional parameter associated with the survival of pT1a-b node-negative TNBC [9].
Our results also revealed that lymphovascular invasion could re ect worse survival of node-negative TNBCs, but the 7-mRNA signature had a better prognostic value of RFS.
Several studies constructed prognostic signatures to stratify node-negative TNBC into different risk groups, to provide de-escalating or escalating treatments for early-stage TNBCs.Liu and colleagues constructed an 11-protein signature from 126 node-negative and adjuvant chemotherapy-naive TNBCs [11].According to this signature, 60% early-stage TNBC were with low recurrence risks, while over 80% of patients in our cohort were identi ed to have low recurrence risks (Table 1).However, it is di cult to compare our results with those of Liu and colleagues [11] because of the different inclusion criteria.Most of the patients enrolled in our study received adjuvant chemotherapy.Furthermore, Wang and colleagues constructed a 76-gene signature (16 genes for ER-negative; 60 genes for ER-positive) to predict distant metastasis for nodenegative breast cancer patients in a 286 cohort and a 180 cohort; however, the number of patients with ERnegative tumors (77 and 16 patients respectively) in their study was too small to perform a subgroup analysis [13,25].Also, in this study, ER-negative patients included both HER2-positive and TNBC subtype, which had entirely different biological features [26].Therefore, the rst strength of our prognostic signature compared with previous studies is the large cohort of node-negative TNBC patients.Second, we used RT-qPCR to construct the signature in order to apply it in clinical practice.High-throughput data was used to construct prognostic signatures, but it might not be widely used in clinical practice because of the relatively high costs of transcriptome sequencing.Liu and colleagues constructed an mRNA-lncRNA signature by RT-qPCR to identify high-risk TNBC patients, and a clinical trial (NCT02641847) based on this signature was conducted in our center [27].Therefore, from technical aspects, our signature could be widely used in our center.
The literature review showed that the seven mRNAs incorporated in our signature and their potential roles in cancer have already been reported.For example, CDKL2 was reported to promote breast cancer epithelialmesenchymal transition through the up-regulation of ZEB1 expression in breast cancer [28].A two-gene epigenetic signature revealed that higher methylation of CDKL2 was related to non-responders to neoadjuvant chemotherapy in triple-negative breast cancer [29].KRT5 encoded a member of the keratin gene family, which was identi ed as a basal marker in cancer research and related to worse survival, especially in node-negative breast cancer [26, [30][31][32][33].Aggrecan encoded by ACAN is an important component for cartilage structure, which may facilitate ECM remodeling [34,35].Although the function of aggrecan in cancer is unknown, in breast cancer, ECM components are signi cantly dysregulated and associated with tumor progression [36].TMEM101 has been rarely reported in cancer research.Moreover, using a cDNA library, Matsuda and colleagues revealed that 83 genes could activate the NF-κB pathway, including TMEM101 [22].Recently, Khan and colleagues demonstrated that TMEM101 could increase the expression of NF-κB in zebra sh ovarian samples [23].In our study, we rst demonstrated that the expression of TMEM101 in TNBC was higher in tumors than in adjacent normal tissues, which suggests that TMEM101 could be used as a prognostic biomarker at early stage of TNBCs.Furthermore, interference expression of TMEM101 could inhibit the migration and proliferation abilities of TNBC cell lines.
This study has a few limitations.First, most of the patients enrolled in our study received adjuvantchemotherapy containing taxane and (or) anthracycline.Therefore, it was not accurate to identify the subpopulation of early-stage TNBCs in whom chemotherapy might be entirely omitted.Yet, according to our previous research that anthracycline-based taxane-free regimen might be su cient for pT1N0M0 TNBCs so that patients with low recurrence risk had the opportunity to omit taxane [9].Further randomized prospective studies should verify our hypotheses that patients with a high risk of relapse could bene t from adding chemotherapy such as capecitabine, which has been shown to bring additional survival bene ts for TNBCs in clinical trials [37,38], whereas patients in low-risk group could avoid taxane contained chemotherapy.
Second, as for using the RT-qPCR platform, it is inappropriate to use public high-throughput data sets as independent external validation.Therefore to further validate the accuracy and stability of our signature, we have already started collecting patients' samples and follow-up information at our center since 2018.

Conclusion
In summary, since about 5-15% of node-negative TNBC patients developed tumor recurrence, our 7-mRNA prognostic signature could identify patients at high recurrence risks.The 7-mRNA signature could be a powerful tool for oncologists to escalate systemic therapy for high-risk node-negative TNBC.Subsequent clinical trials are needed to further validate the signature. The ).The formula used to calculate the risk score was the following: Risk score=1.108*ACAN-0.213*KRT5-0.315*TMEM101-0.464*LCA5+0.446*RPP40-0.373*LAGE3-0.257*CDKL2.Prognostic value of the 7-mRNA signature With the risk score formula, patients in the training set were divided into high-(11.3%,N=18) or low-risk

Figure 3 Comparison
Figure 3

Table 1 .
Clinicopathological features of three sets of pT1-2N0M0 TNBC patients strati ed according to the 7-mRNA signature

Table 2 .
Multivariate Cox regression analysis of 7-mRNA signature and traditional clinical features associated with RFS Abbreviations: RFS, relapse-free survival; HR, hazard ratio a Adjusted by Cox proportional hazards models including age, tumor size, tumor grade, Ki-67, lymphovascular invasion, chemotherapy, radiotherapy, and the 7-mRNA signature