An 8-lncRNA signature predicts survival of triple-negative breast cancer patients without germline BRCA1/2 mutation

Triple-negative breast cancer (TNBC) is a particular breast cancer subtype with poor prognosis due to its aggressive biological behavior and strong heterogeneity. TNBC with germline BRCA1/2 mutation (gBRCAm) have higher sensitivity to DNA damaging agents including platinum-based chemotherapy and PARP inhibitors. But the treatment of TNBC without gBRCAm remains challenging. This study aimed to develop a long non-coding RNA (lncRNA) signature of TNBC patients without gBRCAm to improve risk stratication and optimize individualized treatment. clinicopathological correlationsof and in 30 paired clinical triple-negative breast cancer samples without gBRCAm.


Background
Breast cancer (BC) is the most common cancer and the leading cause of cancer-related death among women worldwide [1,2].Triple-negative breast cancer (TNBC) is a particular aggressive type of BC, de ning as the lack of estrogen receptor (ER) and progesterone receptor (PR) expression as well as human epidermal growth factor receptor 2 (HER2) ampli cation, accounting for approximately 15-20% of all breast cancers.The treatment of TNBC mainly relies on a combination of surgery, radiotherapy, and chemotherapy.Recently, several targeted therapies have been approved for TNBC, including the poly(ADPribose) polymerase(PARP) inhibitors olaparib [3] and talazoparib [4] for TNBC with germline BRCA1/2 mutation(gBRCAm) and the checkpoint inhibitor, atezolizumab [5] in combination with nab-paclitaxel for programmed death-ligand 1 (PD-L1) positive advanced TNBC.But the TNBC is still characterized by poorer prognosis compared with other types of breast cancers [6][7][8].
TNBC is considered a single clinical entity but molecular pro ling has revealed an unexpectedly high level of heterogeneity, which result in different treatments sensitivity and outcomes among different subtypes.
TNBC with gBRCAm have higher sensitivity to DNA damaging agents including platinum-based chemotherapy and PARP inhibitors [3,4,9,10].And they also bene t more from immunotherapy due to their genomic instability or relatively high mutational load.But the treatment of those TNBC patients without gBRCAm remains challenging.Therefore, developing potential therapeutic or prognostic biomarkers are urgently needed for TNBC without gBRCAm.
Long non-coding RNAs (lncRNAs) are a category of RNA transcripts with a length of > 200 nucleotides well known for their limited protein-coding potential.They comprise a heterogeneous class of intergenic transcripts, enhancer RNAs, and sense or antisense transcripts that overlap other genes [11].lncRNAs have been proposed to carry out diverse functions, including transcriptional regulation in cis-acting or in trans-acting manner, speci c interactions with other cellular factors, namely proteins, DNA, and other RNA molecules [12].lncRNAs have been reported to play an important role in breast cancer pathogenesis, development, and metastasis.For instance, STAR1 is overexpressed in breast cancer and signi cantly associated with gene modulation activity in the interferon signaling pathway during breast tumorigenesis [13].LINC00673 promotes the proliferation of breast cancer cells via the miR-515-5p/MARK4/Hippo signaling pathway [14].LINC00261 reduces proliferation and migration of breast cancer cells via the NME1-EMT pathway [15].lncRNA are also implicated in the development and progression of TNBC.CCAT2 promotes oncogenesis in triple-negative breast cancer by regulating stemness of cancer cells [16].LINC00993 suppressed TNBC growth and its higher expression indicated better outcome [17].LncRNA FAM83H-AS1 promotes triplenegative breast cancer progression by regulating the miR-136-5p/metadherin axis [18].DCST1-AS1 promotes cell proliferation and metastasis in TNBC by forming a positive regulatory loop with miR-873-5p and MYC [19].
In recent years, accumulating evidence suggests that lncRNAs could be promising prognostic biomarkers in breast cancer including TNBC [20].However, the studies about lncRNAs and TNBC without gBRCAm are very few by far.Here, we aimed to develop and validate a new lncRNA signature to predict the prognosis of TNBC patients without gBRCAm, using the clinical data and RNA-Sequencing (RNA-Seq) data acquired from the Cancer Genome Atlas (TCGA) database(http://cancergenome.nih.gov/).We hope our ndings may provide orientations for the management of TNBC without gBRCAm and then improve their prognosis in the future.

Data collection and preprocessing
In the present study, we downloaded the clinical data and RNA-Seq data from the TCGA database.Of the 1098 breast cancer patients in TCGA, 825 patients without gBRCAm were identi ed by Kraya et al. using DNA-sequencing data [21].Of these 825 patients, 99 were diagnosed with TNBC.Then we excluded patients with an unknown survival time (N=1), leaving 98 TNBC patients without gBRCAm in our study.60% patients N=59 were randomly selected as the training cohort, and all patient (N=98) as the validation cohort.

Identi cation and Validation of the Prognostic lncRNA Signature
First, the univariate Cox regression model was applied to the training cohort to detect the prognostic lncRNAs.A set of lncRNAs whichP-value was<0.05were identi ed.The top 20 lncRNAs were then analyzed in a training cohort utilizing R software (R version 3.6.2) and R Studio software to carry out the LASSO Cox regression model analysis.A list of prognostic lncRNAs with related coe cients was obtained from the lncRNA expression pro le and the patients' overall survival (OS) according to the best lambda value.Furthermore, the risk score of every patient was calculated according to the expression level of each prognostic lncRNA and its corresponding coe cient.The training cohort was assigned to a high-risk or low-risk group using the median risk score as the cut-off.The method of Kaplan-Meier and the log-rank test were utilized to evaluate the OS difference between the high or low-risk groups.
Meanwhile, the prognostic lncRNA signature was validated in the validation cohort.Based on the median risk score, the validation cohort was also split into a high-risk or low-risk group and the OS difference between the two groups were also evaluated by the Kaplan-Meier survival curves.Time-dependent receiver operating characteristic (ROC) analysis was further used to assess the prognostic value.When a two-sided P-value was <0.05, the statistical analyses were de ned as statistically signi cant.
Identi cation of lncRNA-related mRNAs lncRNA signature related mRNAs were identi ed using the Pearson correlation with COR > 0.3and p < 0.05 as the cutoff.Then the functional enrichment analysis was performed to predict the potential functions of these mRNAs.

Functional Enrichment Analysis
Metascape(http://metascape.org/) was used to perform functional enrichment analysis, including the Gene Ontology(GO) biological process and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway analyses.All genes in the genome have been used as the enrichment background.Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities.Kappa scores are used as the similarity metric when performing hierachical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster.The most statistically signi cant term within a cluster is chosen to represent the cluster.

Collection of breast cancer clinical specimens and clinicopathological analysis
All primary triple-negative breast cancer specimens without gBRCAmused in this study were collected from breast cancer patients who had mammectomy at the A liated Hospital of Guandong Medical University.This study was reviewed and approved by the committee for ethical review of research involving human subjects at Guangdong Medical University.SPSS, standard version 16.0 (SPSS Inc., Chicago, IL) was used for data statistical analysis.The lncRNAs expression level of breast cancer clinical specimens was compared with the aid of paired Student t-tests.Pearson's χ2 test was carried out for the clinical correlation analysis, and Kaplan-Meier plots and log-rank tests were used for survival assessment.Results were considered signi cantly different when P< 0.05.

Quantitative real-time PCR (qPCR)
For the quantitative real-time PCR, a SYBR Green PCR Kit (Applied Biosystems, Carlsbad, CA) was used to amplify the cDNA of paired Breast cancer clinical specimens and performed under an ABI Prism 7900 System (Applied Biosystems Carlsbad, CA).18srRNA was involved as the endogenous reference.In nal analysis, all the data were analyzed using the ABI SDS v2.3 software (Applied Biosystems, Carlsbad, CA).The ABI SDS v2.3 software (Applied Biosystems, Carlsbad, CA) was used to analyze the all the data in nal statistics.The relative expressions of lncRNAs were analyzed with formula 2 -ΔCT (ΔCT = CT target -CT 18S ) in paired clinical samples.Correspondingly, the average of non-tumor tissues value was de ned as a criteria equal to 1.0, considered as normalization.For the estimation of lncRNA expression, the Ribo TM mRNA/lncRNA qRT-PCR Starter Kit (Ribobio, China) was used for quantitative real-time PCR.

Demographic and clinical characteristics of TNBC patients without gBRCAm
A total of 98 TNBC patients without gBRCAm were enrolled in our study(Supplementary Table S1).The detailed demographic and clinical characteristics of these patients are summarized in Table 1.As shown in Table 1, the vast majority of patients (78.57%) were younger than 65 years old, and most of them (65.31%) were postmenopausal women.Almost all patients were in early or local advanced stage (TNM I&II&III; 98.98%).60% TNBC patients were randomly selected as the training cohort (N =59), and all patient (N=98) as the validation cohort (also called the entire cohort).The demographic and clinical characteristics of the patients were similar between those two cohorts.

Generate prognostic lncRNA signature from the training cohort
Using the univariate Cox regression analysis method, a set of 173 prognostic lncRNAs was identi ed in the training cohort (P < 0.05).A LASSO Cox regression model was further applied to the top 20 lncRNAs to generate a prognostic signature (Table 2).As a result, we recognized an new 8-lncRNA signature that was highly associated with OS in TNBC patients without gBRCAm(Figure 1A, 1B).As shown in Table 3 We worked out the 8-lncRNA signature risk score for every patient in the training cohort.Using the median risk score as the cut-off point, the patients were categorized into a low risk group (N = 30) and high-risk group (N = 29).The Kaplan-Meier survival curve analysis showed that the overall survival rate of the highrisk group was lower, and the difference between the two groups was statistically signi cant (P=0.00018, Figure 2A).The prognostic ability of the 8-lncRNA signature was also evaluated by calculating the AUC of the time-dependent ROC curve.The ROC curve can be used to assess the speci city and sensitivity of the model (AUC >0.7 indicates that the model has good sensitivity).The higher the AUC, the better is the prediction performance of the signature.For 1, 5, 8 years survival times, the AUC of the 8-lncRNA signature in the training cohort were 1.000, 1.000 and 0.908 respectively (Figure 3A).
Validation the prognostic ability of the 8-lncRNA signature in the validation cohort In order to con rm the power of the 8-lncRNA signature in predicting the OS of TNBC patients without gBRCAm, we validated our results in the entire cohort.By utilizing the same classi cation method, patients were classi ed into a high-risk group (N = 49) and a low risk group (N = 49).Consistent with previous ndings, patients in the high-risk group revealed signi cantly worse OS compared to the low-risk group (P =0.0068, Figure 2B).And for 1, 5, 8 years survival times, the AUC of the 8-lncRNA signature in the entire cohort were 0.785, 0.790 and 0.892 respectively (Figure 3B).It indicated that the prognostic ability 8-lncRNA signature is highly sensitive and speci c, and also time-dependent.

Overexpression of lncRNAs TONSL-AS1 and HAGLROS in triple-negative breast cancers without gBRCAm and its clinical signi cance
To gain general insights on the clinical association of lncRNAs TONSL-AS1 and HAGLROS in triplenegative breast cancers without gBRCAm, qPCR was used to determine the expression level of these lncRNAs in 30 breast cancer samples and their paired adjacent non-tumorous tissues.The results showed that overexpression of lncRNAs TONSL-AS1 and HAGLROS (de ned as a greater than 10-fold increase in tumor tissue) were observed in 10/30 (33.3% for lncRNA TONSL-AS1) and 16/30 (53.3% for lncRNA HAGLROS) of the primary breast tumors (Figure 5; P<0.0001).

Discussion
As mentioned above, unlike TNBC with gBRCAm, the treatment of TNBC patients without gBRCAm remains challenging.Identifying effective prognostic biomarkers which could help to stratify patients into different subgroups and guide individualized treatment is in great demand.In the other hand, lncRNAs have attracted increasing attention with the development of next-generation sequencing over the last decade.Accumulating research has demonstrated that that lncRNAs play an important role in the development and progression of the breast cancer, and different lncRNA signatures can predict the prognosis of breast cancer [22][23][24][25][26][27][28].However, the studies about lncRNAs and TNBC without gBRCAm are limited.In the present study, we identi ed 98 TNBC patients without gBRCAm from the TCGA database.In the training cohort, we nally obtained an 8-lncRNA signature using the univariable Cox regression analysis and LASSO Cox regression model.Patients with higher 8-lncRNA signature risk scores showed worse OS compared to those with lower risk scores.High expression levels of HAGLROS, AL139002.1,AL391244.2,AP000696.1,AL391056.1,AL513304.1,TONSL-AS1 and AL031008.1wascorrelated with poor prognosis.The prognostic ability of the 8-lncRNA signature was validated by the time-dependent ROC curve both in the training cohort and the entire cohort.
Among these 8 lncRNAs, except for HAGLROS and TONSL-AS1, other 6 lncRNAs have not been investigated thoroughly and were discovered as prognostic signatures for the rst time.HAGLROS is alncRNA with a length of 699 bp, was reported to involve in the progression of various cancers [29][30][31][32][33][34][35][36], including lung cancer, colorectal cancer, lymphoma, ovarian cancer, osteosarcoma, nasopharyngeal carcinoma, intrahepatic cholangiocarcinoma.HAGLROS regulates apoptosis and autophagy via PI3K/Akt/mTOR signaling pathway [29,31,37], miR-5095/ATG12 axis[36] or miR-100/ATG5 axis [31,38].The apoptosis and autophagy regulated by HAGLROS contribute to cancer cells proliferation and chemoresistance [35].Patients with high expression levels of HAGLROS showed poor OS, which is consistent with our result.HAGLROS may serve as a potential therapeutic target for future treatment of TNBC patients without gBRCAm as mTOR inhibitors were available in clinical.Liu Y et al demonstrated that TONSL-AS1 regulated miR-490-3p/CDK1 to affect ovarian epithelial carcinoma cell proliferation, and TONSL-AS1 was upregulated in ovarian epithelial carcinoma, its high expression level was correlated with poor survival [39].Wang P et al demonstrated that TONSL-AS1 regulated progression of gastric cancer via activating TONSL [40].Its high expression level was correlated with poor survival, which is also consistent with our result.lncRNAs are non-coding transcripts, but they play important functional roles in modulating mRNA translation [41].In the present study, 8-lncRNA signature related mRNAs was identi ed using the Pearson correlation.The result showed that the top 10 related mRNA were CALML6, BRICD5, ADCK5, C10orf143, GLI4, FBXL6, KIFC2, CDCP2, CCDC154, TSTA3.CALML6 encoded RAS pathway related proteins, involving in calcium-mediated signaling and second-messenger-mediated signaling.BRICD5 mainly involved in cell proliferation pathway.ADCK5 is a member of an atypical kinase family and overexpressed in many carcinomas.Qiu M et aldemonstrated that ADCK5 might regulate the expression of tumor oncogene PTTG1 by phosphorylating transcription factor SOX9, therefore enhancing the migration and invasion capabilities of lung cancer cells [42].So the ADCK5-SOX9-PTTG1 pathway might also be a potential therapeutic target for TNBC without gBRCAm.GLI4, a zinc nger protein of unknown function, was localized to chromosome 8q24.3,distal to c-MYC [43].Shi W et al have found that FBXL6 governs c-MYC to promote hepatocellular carcinoma through ubiquitination and stabilization of HSP90AA1 [44].FBXL6-HSP90AA1-c-MYC axis might contribute to the oncogenesis of HCC.They proposed that inhibition of FBXL6 might be an effective therapeutic strategy for HCC treatment [44].CCDC154 was reported to inhibit tumor cell growth [45].Sun Y et al showed that TSTA3 controls cell proliferation and invasion by regulating CXCR4 expression [46].TSTA3 was highly expressed in breast cancer cell and served as an independent prognostic factor for BC patients.C10orf143, KIFC2, CDCP2 were protein coding genes, however, the function of those genes in cancer have not been elucidated.Functional enrichment analysis revealed that signature related mRNAs involved in RNA metabolic process and regulation of DNA repair terms, which were strongly associated with the proliferation and metastasis of tumor cells [47,48].
In the present study, we also found that lncRNAs HAGLROS and TONSL-AS1 were frequently overexpressed in primary triple-negative breast cancer tissues without gBRCAm, which was signi cantly associated with poor prognosis.These ndings have been supported by the database analysis above, suggesting that lncRNAs HAGLROS and TONSL-AS1 plays an important oncogenic role in the development and progression of triple-negative breast cancer without gBRCAm.There were some limitations in this study.First, owing to the limited number of patients recruited in our study, patients were not divided randomly into training cohort and validation cohort.Second, the 8-lncRNA signature has not been validated in an independent cohort.Third, the underlying functions of these 8 lncRNAs have not been fully illuminated in this study.

Conclusion
In conclusion, we constructed an 8-lncRNA signature which signi cantly associated with the overall survival of TNBC patients without gBRCAm.Among those 8 lncRNAs, HAGLROS and TONSL-AS1 may be potential therapeutic targets, which function needed further exploration.Moreover, 8-lncRNA signature related mRNA ADCK5 and FBXL6 may also be promising biomarkers for TNBC patients without gBRCAm.We hope these ndings can provide orientations for future management of TNBC without gBRCAm.
The result of the time-dependent ROC analysis in the training cohort(A)and the entire cohort(B).

Figures
Figures

Figure 1 Construction
Figure 1

Table 3
Breast cancer specimens demonstrating a 10-fold change higher expression level than matched NT samples are classified as "Overexpression" Group * Statistical significance (P<0.05) is shown in bold italic. a

Table 5
Clinicopathological correlation of HAGLROS expression in Breast cancer Breast cancer specimens demonstrating a 10-fold change higher expression level than matched NT samples are classified as "Overexpression" Group a * Statistical significance (P<0.05) is shown in bold italic.