Bioinformatics-based analysis of TCGA to identify the prognostic biomarkers in HR-positive/HER2-negative tamoxifen-treated breast cancer CURRENT STATUS: POSTED

Background Patients with hormone receptor-positive (HR+) breast cancer, which accounts for approximately 70% of all breast cancers, receive tamoxifen therapy to inhibit recurrence and improve overall survival. However, the cumulative risk of distant recurrence in the past 20 years is still as high as 22 to 52% after 5 years of endotherapy. TNM stage, histological grade and age are important clinical factors related to recurrence. Differential gene analysis based on matching clinical factors will be helpful for understanding the molecular mechanism of recurrence; however, there have been no reports yet.Results In our study, we identified 647 DEGs in a TCGA dataset that included 20 patients, of which 10 patients had relapsed after tamoxifen treatment and 10 did not. Ten genes were found by the CytoHubba APP in Cytoscape. After analysis of the GO terms and KEGG pathways, we found that the genes were mainly enriched in the inflammatory response and cytokine-receptor interaction, and then these results were verified in the GEPIA2 and KM-plot websites. Finally, we identified CXCL1, CXCL2 and CX3CL1 as prognostic factors, and their overexpression in HR-positive/HER2-negative breast cancer predicted a longer disease-free survival.Conclusion In conclusion, the high expression of CXCL1, CXCL2, and CX3CL1 can predict longer disease-free survival in breast cancer after tamoxifen treatment and may be a biomarker for tamoxifen therapy.


Abstract
Background Patients with hormone receptor-positive (HR+) breast cancer, which accounts for approximately 70% of all breast cancers, receive tamoxifen therapy to inhibit recurrence and improve overall survival. However, the cumulative risk of distant recurrence in the past 20 years is still as high as 22 to 52% after 5 years of endotherapy. TNM stage, histological grade and age are important clinical factors related to recurrence. Differential gene analysis based on matching clinical factors will be helpful for understanding the molecular mechanism of recurrence; however, there have been no reports yet.Results In our study, we identified 647 DEGs in a TCGA dataset that included 20 patients, of which 10 patients had relapsed after tamoxifen treatment and 10 did not. Ten genes were found by the CytoHubba APP in Cytoscape. After analysis of the GO terms and KEGG pathways, we found that the genes were mainly enriched in the inflammatory response and cytokine-receptor interaction, and then these results were verified in the GEPIA2 and KM-plot websites. Finally, we identified CXCL1, CXCL2 and CX3CL1 as prognostic factors, and their overexpression in HR-positive/HER2-negative breast cancer predicted a longer disease-free survival.Conclusion In conclusion, the high expression of CXCL1, CXCL2, and CX3CL1 can predict longer disease-free survival in breast cancer after tamoxifen treatment and may be a biomarker for tamoxifen therapy.

Background
Breast cancer, which is the most common malignant tumor in women worldwide, can be classified into 4 main subtypes according to the molecular profile of the tumor, including hormone receptor (HR)positive (HR+)/HER2-negative, HR+/HER2-positive, HR-negative/HER2-positive and HRnegative/HER2-negative breast cancers [1]. Over 2/3 of patients are estrogen receptor (ER)-positive or progesterone receptor-positive and thus benefit from endocrine therapy, including the selective ER antagonist tamoxifen and aromatase inhibitors, which inhibit the estrogen product. Patients with HR+ breast cancer, which accounts for approximately 70% of all breast cancers, benefit from tamoxifen therapy in terms of the inhibition of breast cancer recurrence and the improvement of overall survival [2]. Tamoxifen, which is widely used in adjuvant therapy, has been indicated to substantially reduce the recurrence rate by approximately 40% and mortality by approximately one-third 3 throughout the first 15 years in HR+ patients [3]. However, 12.1% of patients still express high HR relapse even with 5 years of tamoxifen therapy, and the cumulative risk of recurrence in the past 20 years is still as high as 22 to 52% [4,5]. Thus, recurrence after tamoxifen therapy is still the main and critical clinical challenge for ER+ patients.
Accumulating efforts have been made to investigate the mechanism of relapse after tamoxifen treatment. The failure of endocrine therapy has been reported to involve ESR1 mutations [6,7], the downregulation of ER activation and the amplification of the PI3K/AKT/mammalian target of rapamycin  [16] and other signaling pathways. The ER signaling pathway is also regulated by membrane receptor tyrosine kinases, including epidermal growth factor receptor (EGFR), HER2, and insulin-like growth factor receptor (IGF1R) [17]. These membrane kinases activate signaling pathways that eventually result in the phosphorylation of ER as well as its coactivators and corepressors at multiple sites to influence their specific functions [18]. Evidence suggests that the overexpression of HER2 confers tamoxifen resistance in ER+ breast cancer through the MAPK signaling pathway [18]. Palbociclib [19], an inhibitor of CDK4/6, has been approved by the FDA as the first-line treatment for drug resistance in advanced endocrine therapy; however, unfortunately, its overall response rate (ORR) was only 42.5% and clinical benefit rate (CBR) was 84.5%, with some patients still progressing in the Paloma-2 trial [20]. The overexpression of NCOR1, a corepressor of ER, is associated with enhanced sensitivity to tamoxifen [21]. Therefore, searching for other direct mechanisms of drug sensitivity in endocrine therapy is of great clinical and scientific value.
Chemokines are small molecular weight proteins (~8 to 10 kDa) that are released in response to insult; they play double roles of recruiting neutrophils from the vasculature to tissue and then activating neutrophils in tissue for microbial killing and/or initiating tissue repair [22]. There are 7 neutrophil-activating chemokines (NACs) expressed in humans-CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL7, and CXCL8. NACs exist as monomers and dimers and exert their functions by 4 activating the CXCR1 and CXCR2 receptors. The role of chemokines in tumors has been controversial, and proinflammatory cytokines such as CXCL1, CXCL2, and CXCL3 (GRO-α, β, and γ) contribute to multiple protumorigenic activities and are involved in the occurrence, development, and metastasis of tumors. Studies have demonstrated the role of CXCL1 and 2 in treatment resistance and metastasis; CXCL1 and 2 are overexpressed in breast cancer cells and prime cells for survival in metastatic sites [23]. In contrast, several groups have reported a tumor-suppressive role for some of the factors secreted by senescent cells. One group showed that the proinflammatory cytokines and chemokines secreted by senescent cells also regulate senescent growth arrest and trigger an innate immune response that results in the clearance of senescent lesions relying on p53 [24,25]. This group also showed that the levels of CXCR2 are upregulated in preneoplastic lesions, and evidence suggests that its expression might subsequently decrease during cancer progression, as seems to be the case in prostate cancer [25].
Here, we aimed to find gene biomarkers for overexpressed HR patients and then assess the sensitivity to tamoxifen therapy, which can help evaluate the prognosis of patients who receive tamoxifen treatment and provide guidance for HR+/HER2-negative patients. TNM stage, histological grade and age are important clinical factors related to recurrence. Differential gene analysis based on clinical factor matching is helpful for understanding the molecular mechanism of recurrence, but there have been no reports yet. Herein, we screened The Cancer Genome Atlas (TCGA) RNA-seq datasets, including 1097 breast cancer samples, and matched 10 pairs of patients who were all HR+ /HER2-negative who received tamoxifen treatment according to their age, clinical stage and histological grade to find differentially expressed genes (DEGs). We then analyzed the enriched Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms of 647 DEGs. We found that the DEGs are mainly enriched in the inflammatory response and cytokine−cytokine receptor interaction, indicating that the mechanism of recurrence after tamoxifen therapy was associated with fractalkine in the tumor microenvironment. Finally, after validation in the GEPIA2 and KM-plot websites, the expression of CXCL1, CXCL2 and CX3CL1 was shown to predict the prognosis of breast cancer patients who were HR+/HER2-negative who received tamoxifen treatment. In addition, the overexpression of these chemokines was associated with longer disease-free survival. In brief, our results showed that the expression of CXCL1, CXCL2, and CX3CL1 could be used to assess the prognosis of HR+/HER2-negative breast cancer patients who received tamoxifen treatment and provided clues for the mechanism of relapse after taking tamoxifen. However, the findings we provided are limited, and the deeper underlying mechanism of relapse after tamoxifen still needs to be further explored.

TCGA analysis of the expression of DEGs in breast cancer patients
Processed and deidentified patient breast cancer (BRCA) data (TCGA "level 3" data designation) were downloaded from TCGA (2016) (http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/BRCA/20160128/). Ten pairs of HR+/her2-negative invasive ductal breast cancers, including 10 recurrence and 10 nonrecurrence samples after tamoxifen therapy, were matched according to the patient age and pathological characteristics. The clinical characteristics of those 20 patients are presented in Table 1. Differential expression analysis of two groups was performed using the DEG-seq R package. The p-value was adjusted using the q value. A q value <0.05 and |log2FC|>1 was set as the threshold for significantly differential expression.

GO and KEGG analysis of DEGs
The above DEGs were all entered in the DAVID website (https://david.ncifcrf.gov/) to explore the biological process, molecular function, and cellular component categories as well as KEGG pathways with the threshold of count>5 and p-value <0.05. The bubble diagram was illustrated with the ggplot2 R package and the Hmisc R package.

PPI network analysis
The PPI network explores the downstream relationship between proteins based on physical binding and genetic and functional relationships. Analyzing the functional interactions between proteins may provide insights into the mechanisms underlying the generation or development of diseases. We integrated 647 DEGs using the STRING website (https://string-db.org/cgi/input.pl? 6 sessionId=L1vQI87ngiIe&input_page_show_search=on) to explore the association between these DEGs. Cytoscape was used to visualize the PPI network from the STRING website, find the hub genes (with the CytoHubba APP) and beautify the below diagrams.
Validation of the expression of the top 10 hub genes After selecting the top 10 hub genes, we validated their expression in the GEPIA2 website (http://gepia2.cancer-pku.cn/#index), which provides multiple functions, including allowing users to compare their own data with TCGA and GTEx data, comparing the gene expression in different subtypes of cancers and predicting the patient survival.

Survival analysis
Only 10 pairs of patient data were selected from TCGA. The clinical data could not be used to analyze patient survival. Thus, we performed RFS analysis on the KM plotter website (http://kmplot.com/analysis/index.php?p=background), which is dedicated to constructing survival curves for breast, liver, lung, ovarian, and gastric cancer, and the data used were based on TCGA and GEO. Survival curves with p<0.05 were regarded as statistically significant.

1.
Identification of DEGs between the relapsed and nonrelapsed groups after tamoxifen treatment in HR+/HER2-negative breast cancer patients To identify gene expression related to tamoxifen resistance, we matched 10 pairs of breast cancer samples from the TCGA database of patients who received tamoxifen therapy according to their age and pathological characteristics ( Table 1); all of these patients relapsed in 5 years. Figure 1a shows the process of screening for suitable patients. In total, we identified 647 DEGs between recurrent and nonrecurrent patients with the selection criteria of q value < 0.05 and |log2(Fold Change (FC)) | >1.
Among the DEGs, 506 genes were downregulated, and 141 were upregulated. Figure 1b shows a volcano plot to examine the differences in the expression level of the genes in the two groups of samples and the statistical significance of these differences. shows that 50 genes were distributed in metabolic pathways, 16 genes were distributed in cytokine −cytokine receptor interactions, 7 genes were distributed in drug metabolism-cytochrome P450, and 6 genes were distributed in steroid hormone biosynthesis. Herein, the mechanism of tamoxifen therapy failure, whether associated with estrogen metabolism or tamoxifen metabolism, needs further investigation.

Protein-protein interaction network construction and module analysis
To analyze the genes associated with tamoxifen resistance, the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) website was used to investigate the protein-protein interaction (PPI) network associated with the ER pathway. The PPI network was drawn using Cytoscape, which is an open-source bioinformatics software platform for visualizing molecular interaction networks, and the most significant module in the PPI network was identified by using the cytoHubba plug-in of Cytoscape, which is an APP for clustering a given network based on a topology to find densely connected regions. Figure 3b shows the top ten hub genes-C3, PF4, SAA1, CXCL1, CX3CL1, CXCL13, GAL, GPER1, CXCL2, and CXCL-most of which are inflammatory factors, which implies that tamoxifen resistance may be associated with the inflammatory response, and the differentiated expression of these ten genes was shown in Table 2. A previous study has proven that mediators secreted by cancer-associated fibroblasts (CAFs)26, in addition to components of the inflammatory response27 such as cytokines and growth factors, exert an important role in drug resistance.

Validation of the expression of the top 10 hub genes
Considering that there were few samples from TCGA, we validated the expression of the top 10 hub genes in a different subgroup of breast cancer patients on the GEPIA2 website (http://gepia2.cancerpku.cn/#index). As shown in Figure 4, the expression of SAA1, CX3CL1, CXCL1, CXCL2, and GPER1 was higher in the adjacent normal breast tissue than in the luminal A subgroup of breast cancer, while the expression of CXCL13 was lower, and the differential expression was statistically significant. The expression of C3, PF4, GAL, and CXCL6 was not significant between the para-cancerous normal breast tissue and luminal A breast cancer tissue, but its general trend was in line with our results, especially for C3, PF4, SAA1, CXCL1, CX3CL1, CXCL13, GAL, GPER1, CXCL2, and CXCL6, most of which are inflammatory factors, thus indicating that tamoxifen resistance may be associated with the inflammatory response. The previous study has proven that mediators secreted by CAFs26, in addition to components of the inflammatory response27 such as cytokines and growth factors, exert an important role in drug resistance.

Survival analysis
Survival analysis was performed to explore the correlation between the 10 DEGs and patients' recurrence-free survival (RFS) with the KM-plot website (http://kmplot.com/analysis/). Figure 5(a-j) shows that the increased expression of C3, CX3CL1, CXCL1, CXCL2, CXCL6, GPER, and PF4 was closely related to improved RFS, while this result did not show that the other three genes (CXCL13, GAL, and SAA1) were associated with RFS in luminal A breast cancer.

Discussion
The development of recurrence after tamoxifen treatment is a serious challenge in breast cancer therapy. Thus, addressing this problem and determining the underlying mechanism are top priorities.
Our purpose was to identify the DEGs associated with resistance after tamoxifen therapy and discover genes that can serve as biomarkers of resistance after tamoxifen treatment for assessing prognosis.
First, we used the TCGA RNA-seq dataset and identified 647 DEGs. Then, we analyzed the GO and KEGG pathways on the DAVID website. According to the results, the DEGs were mainly enriched in epidemic development, cytoskeleton organization, cell-cell signaling, and inflammatory responses. In addition, the top 10 hub genes were mainly enriched in cytokine-cytokine receptor interactions, such 9 as C3, CXCL1, CX3CL1, CXCL13, CXCL2, and CXCL6. In our study, the log2FC of these genes was negative in the comparison of nonrelapse and relapse patients; in other words, these genes had low expression in relapse patients and high expression in nonrelapse patients. Hence, we checked the expression of these genes in GEPIA2, and the expression of SAA1, CX3CL1, CXCL1, CXCL2, and GPER1 coincided with our data from TCGA, while the other genes were not statistically significant between para-cancerous tissues and luminal A breast cancer tissues, though their expression trend was broadly consistent with our data. Then, we verified the prognosis of all top 10 hub genes on the KM plotter website ( Figure 5) and found that seven of the 10 genes (C3, CX3CL1, CXCL1, CXCL2, CXCL6, GPER, and PF4) were closely related to the patient RFS; specifically, the overexpression of these genes could predict a long RFS, while the other 3 genes were not associated with the prognosis of luminal A breast cancer patients (p-value >0.05). Combining the expression and prognosis results, only three genes (CX3CL1, CXCL1, and CXCL2) were consistent with our results from TCGA. CXCL1 and CXCL2 are proinflammatory cytokines, whose receptor is CXCR2, that have been proven to be involved in the acquisition of many malignant features, such as cancer cell proliferation, invasion, epithelial-mesenchymal transition (EMT) [28], migration [29] and chemoresistance in many tumors, including those of ERα-positive breast cancer patients treated with tamoxifen [30]. However, these chemokines and corresponding receptors have a role in either the promotion or inhibition of cancer, depending on their capacity to suppress or stimulate the action of the immune system, respectively, leading to their controversial role in cancer. Additionally, previous studies proved that high CXCR2 levels are associated with senescence in premalignant lesions and that the loss of CXCR2 expression contributes to reducing the severity of the arrest via the activation of the p53 pathway [25,31].
Moreover, the chemokine CXCL1 dimer is a potent agonist for the CXCR2 receptor [32]. CX3CL1, also a controversial inflammatory factor, is a CX3C chemokine that can chemoattract T lymphocytes [33]. To date, accumulating studies have evaluated the clinicopathological significance of CX3CL1 in human cancer. High CX3CL1 expression in colon cancer has been reported to result in a better prognosis compared with low CX3CL1 expression, which may be associated with fractalkine expressed in colon tumors that appears to recruit cytotoxic T cells and natural killer (NK) cells to the tumor site; thus, these cytotoxic cells result in a better prognosis mediated by tumor cell cytotoxicity using a perforin and granzyme B mechanism [34]. We hypothesized that the profile of CX3CL1 expression in breast carcinoma can also predict a good prognosis. Though CX3CL1 may have specific positive biological roles in breast cancer progression, the association between the chemokine receptors expressed in primary tumor cells and the site of metastatic relapse has been evaluated in patients with breast cancer [35,36]. Researchers found that CX3CR1, a specific receptor for CX3CL1, was associated with metastasis to the brain. Because the human brain constitutively expresses CX3CL1, it could attract In conclusion, the present study was designed to identify DEGs that may be involved in recurrence after tamoxifen treatment for HR+/HER2-negative breast cancer. CXCL1, CXCL2, and CX3CL1 may be regarded as diagnostic biomarkers for tamoxifen therapy, and their high expression predicts a good prognosis. However, further studies are needed to elucidate the biological function of these genes in breast cancer.

Conclusions
In conclusion, our study identified biomarkers to assess patient prognosis. Our results show that CXCL1, CXCL2, and CX3CL1 may be regarded as diagnostic biomarkers for tamoxifen therapy and that their high expression predicts a good prognosis. Our study may have some limitations considering the small sample size.