Identification of key genes controlling docetaxel resistance in patients with breast cancer

Breast cancer is the most common malignant tumor in women. Chemotherapy is a very important part of its comprehensive treatment. Docetaxel is a commonly used chemotherapy drug in breast cancer treatment. However, some patients will develop drug resistance during use. To further explore this issue, we conducted an in-depth analysis on gene expression data of 24 breast cancer patients from GEO. Key module and hub genes were searched through WGCNA network construction, KEGG and GO function analysis, PPI network construction and other methods. Functional analysis revealed that the key module mainly related to drug transport events such as phagocytosis, cell-cell adhesion, and extracellular exosome. TCGA data were used to further verify these hub genes. Finally, we screened out 4 most important key genes, RPS2, EEF2, RPL15 and RPLP1, from 55 hub genes. These findings may highlight some therapeutic targets for predicting, avoiding or overcoming docetaxel resistance.


Introduction
Breast cancer is the world's leading malignant tumor in women [1]. Although the early diagnosis, early treatment and individualized treatment have significantly reduced the mortality rate of breast cancer, its mortality rate still ranks among the highest female malignant tumors. Surgical resection is the radical treatment of breast cancer, but with the deepening of the understanding of breast cancer disease, the treatment of breast cancer patients has shifted from simple surgery to a holistic treatment combining surgery, chemotherapy, radiotherapy, targeted therapy and endocrine therapy [2][3][4][5]. Docetaxel is a kind of commonly used drugs in the chemotherapy for breast cancer, an M phase cell cycle specific drug, can promote tubular polymerization stability of microtubules, and suppress the depolymerization, leading to significantly reduction of small tube number and inhibition of cell proliferation. Moreover, docetaxel can destroy microtubules mesh structure, promote the cell apoptosis, thereby play an important role of anti-tumor in breast cancer [6]. Although docetaxel is effective in most breast cancer patients, some might eventually recur due to rapid proliferation of tumor cells and drug resistance, resulting in high mortality after disease progression [7][8][9]. Therefore, finding the key genes and potential pathways of docetaxel resistance in breast cancer is of great significance for individualized treatment and prognosis determination of breast cancer patients, and may also provide new therapeutic targets for future clinical practice.
The rapid development of gene microarray and sequencing technology has provided a new direction for the study of various cancers, especially the related pathological mechanism and the search for new molecular targets [10,11]. At present, most studies focus on screening for differentially expressed genes without paying sufficient attention to the association between genes. The development of tumors is due to the interaction of multiple genes. Therefore, biological analysis methods that can cover a large amount of genomic data have been widely accepted and applied.
Weighted correlation network analysis (WGCNA) establishes the connection between gene modules and clinical characters through scientific algorithms, establishes the regulatory network between gene concentration genes, and finally determines the key regulatory genes [12]. It is a mature analytical technique in gene data analysis and has been used to find hub gene in a variety of cancers.
For example, Zhou et al. identified 10 hub genes associated with the development and prognosis of pancreatic cancer using WGCNA [13]. Wu et al. also employed WGCNA to discover the potential biomarkers to differentiate malignant thyroid nodules [14].
In this study, we used gene expression data of 24 breast cancer patients from GEO database to construct WGCNA network and search for hub genes related to docetaxel resistance. We also performed functional analysis of the key module. In summary, we used a novel approach to identify the genes and possible pathways associated with docetaxel resistance in breast cancer.

Data collection
Gene chip data GSE349 and GSE350 [7,15] were downloaded from the public GEO database (www.ncbi.nlm.nih.gov/geo/), including 10 patients sensitive to docetaxel treatment, exhibiting less than 25% residual tumor, and 14 patients resistant to docetaxel treatment, exhibiting residual tumor of 25% or greater remaining volume. All samples were tested on the platform of Affymetrix Human Genome U95 Version 2 Array.

WGCNA network construction
We calculated the differentially expressed genes in the docetaxel resistant group, compared with the 4 sensitive group, and ranked them in descending order of significance. Weighted gene correlation network analysis (WGCNA) was performed using the WGCNA R package with the first 5,000 genes.
Soft thresholding power β defined the scale-free topology fit index and the mean connectivity. We selected a β value to build the co-expression similarity and adjacency, then turned adjacency into topological overlap matrix (TOM), and made the hierarchical clustering tree on TOM-based dissimilarity. The minimum module size of the gene group was 30. Then we applied the module eigengenes (ME) function to get the correlations of different modules and sample traits. The highest one is the key module. At last, the module membership (MM) >0.8 and the gene significance (GS) >0.8 were defined as the thresholds for the selection of hub genes.

Functional enrichment analysis
The DAVID website was selected to perform KEGG analysis and GO functional annotation to investigate and visualize the biological function of the key module (https://david.ncifcrf.gov/tools.jsp).
The ggplot package in R was used for plotting. A p-value 0.05 and an FDR 0.05 were considered statistically significant.

Protein-protein interaction (PPI) network analysis
The STRING website was selected to perform PPI network analysis to explore the relationships of the hub genes (https://string-db.org/). Nodes with higher PPI scores were considered to be more important key genes.

Overall Survival analysis
The Kaplan-Meier Plotter website was selected to perform overall survival analysis (https://kmplot.com/analysis/). TCGA database is the source of information for this website. A p-value 0.05 was considered statistically significant.

Results
Construction of co-expression network 24 breast cancer samples used in this study to construct the WGCNA network all contained partial clinical information (Supplementary Table 1). According to the variation of gene expression, the 5,000 genes with the most significant changes were included in the follow-up study. First, we determined the optimal soft threshold (β). In this study, the scale-free network construction standard is set as: β = 9 ( Figure 1). Based on this, network construction and gene module division were carried out. A total 5 of 15 gene modules were distinguished by dynamic shear tree method, with different colors representing different gene modules and gray indicating that they did not belong to any gene module ( Figure 1). Clinical information for breast cancer samples includes menopause, lymphatic metastasis, ER, PR, Her-2 status, and docetaxel resistance (demarcated by 25%). Pearson correlation coefficient (cor) was used to represent the correlation between module genes and clinical features, and the pvalue was used to determine the significant correlation (P < 0.05 represents significant correlation).
Molecule Turquoise, Brown, Midnightblue, Red, Cyan, Greenyellow, Tan were correlated with drug resistance. Among them, the brown module was most correlated with docetaxel resistance (cor = 0.85, p < 1e-200).   Table 3). Combined with the expression of these genes in patient samples, RPS2, EEF2, RPL15 and RPLP1 seemed to be the most important key genes for docetaxel resistance in BC patients, and should be paid more attention to in future studies (Figure 3).

Discussion
Breast cancer is the most common malignant tumor in women, accounting for about 24.2% of malignant tumors in women [1]. At present, the treatment of breast cancer patients has been a comprehensive treatment integrating surgery, chemotherapy, radiotherapy, targeted therapy, endocrine therapy and other treatment methods [2,4,16]. According to the immunohistochemical characteristics, it was clinically divided into Luminal A subtype, Luminal B subtype, HER-2 overexpression subtype and Basal-like subtype/triple negative breast cancer [17]. For patients who need chemotherapy, different chemotherapy regimens can be given to breast cancer patients according to different molecular types. Docetaxel is one of the most commonly recommended drugs.
Docetaxel, a drug targeting microtubules, can affect the composition of cytoskeleton and the formation of mitotic spindles, so as to play an anti-tumor role in universities [15]. However, many patients eventually develop resistance to it. We need to conduct in-depth research into the reasons for this, discover the genes and underlying mechanisms that play an important role in drug resistance, explain the occurrence of this phenomenon, and screen and predict such patients.
Moreover, these important genes are likely to become new clinical therapeutic targets.
In this study, we used chip data from 24 breast cancer patients from GEO database to construct WGCNA network. A total of 15 gene modules were found, among which the most related to Docetaxel resistance was the brown module, which included 735 genes. Through functional analysis of these genes, we found that the Pathways most associated with Docetaxel resistance were Phagocytosis, Estrogen signaling pathway and Pathways in cancer. These Pathways are mainly associated with oncogenesis and drug transport [18][19][20][21]. In addition, the cell-cell adhesion, extracellular exosome and protein binding were the most prominent ones in the GO analysis. It will help us in our subsequent RPLP1 was significantly increased in triple negative breast cancer tissues and associated with the tumor metastasis and patient prognosis [29]. According to the above reports, these four genes have not been directly studied in docetaxel resistance, but they all have a certain correlation with tumors.
8 However, we found that this correlation is not completely consistent with the effect found in this study, which may be because of different tumor types, or because tumorigenesis and chemotherapy resistance are two different biological processes, and the reasons for this need to be further studied.
In summary, in this study, four key genes for docetaxel resistance in breast cancer were identified through GEO, TCGA public databases and WGCNA network construction. This has very important guiding significance to our clinical practice.

Conflict of interest statement:
The authors declare that they have no competing interests.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.