Differential Gene-Based Predictors of Neoadjuvant Chemotherapy Efficacy in Breast Cancer

: Background and objective: Chemotherapy is the most common treatment in breast cancer, and neoadjuvant chemotherapy (NAC) is wildly used because of it’s efficiency and safety. To identify significantly differentially expressed genes and select the most suitable breast cancer patients for neoadjuvant chemotherapy (NAC) before treatment. Methods: We collected a total of 60 breast cancer patient samples before and after NAC. All the samples were subjected to high-throughput RNA sequencing (RNA-seq). Then, we identified AHNAK, CIDEA, ADIPOQ, and AKAP12 as candidate genes related to tumour chemotherapeutic resistance. Next, we analysed the expression levels of AHNAK, CIDEA, ADIPOQ, and AKAP12 by logistic regression and based on the result, we constructed a predictive model visualized by a nomogram. Results: The RNA-seq results show that AHNAK, CIDEA, ADIPOQ and AKAP12 are upregulated in residual disease after NAC (P<0.05), and compared with the pathological complete response (pCR) group, the non-pCR group presented high AHNAK, CIDEA, ADIPOQ and AKAP12 expression levels (P<0.05). Logistic analysis showed that high AHNAK, CIDEA, ADIPOQ and AKAP12 expression levels significantly reduced the pCR rate of NAC for breast cancer (P<0.05). In addition, our prediction model, which included AHNAK, CIDEA, ADIPOQ and AKAP12, showed a good fitting effect with the H1 test (χ2=6.3967, P=0.4945) and the receiver operating characteristic (ROC) of AHNAK, CIDEA, ADIPOQ and AKAP12 indicates poor treatment response in breast cancer patients treated with NAC. The efficacy prediction model based on these results is expected to be a new method to select the optimal population of breast cancer patients for NAC.


Introduction
Breast cancer is the most common cancer in women, and its morbidity and mortality are ranked first globally [1]. Chemotherapy is one of the most effective treatments in breast cancer, and neoadjuvant chemotherapy (NAC) or preoperative chemotherapy, increases the chance of breast-conserving surgery for those who have a large tumour at initial diagnosis and has equal efficacy compared to adjuvant chemotherapy [2].
More importantly, the tumour in vivo can be used to directly measure the cancer response to NAC, and we can obtain information about the biological roles of breast cancer as well. Altogether, NAC provides a platform for biomarkers to explore and predict treatment prognosis and outcome [3]. The most reliable indicator of NAC is pathological complete response (pCR) [4], which is defined as ypT0 ypN0 or ypT0/is ypN0. Patients can obtain an 80% decreased recurrence rate if they are assessed as pCR after NAC, regardless of their molecular subtype [5]; furthermore, pCR helps these patients obtain a long-term outcome [6].
However, how to select patients who are most likely to achieve pCR after NAC in breast cancer is still an unsolved question. With the development of medical techniques and precision medical demands, traditional molecular subtypes by immunohistochemistry in breast cancer no longer meet the needs of individualized clinical treatment. Research shows that gene signatures help to predict outcomes [7], so it is significant to detect differential genes before and after NAC and identify resistance genes in breast cancer patients. Here, we compared gene expression differences before and after NAC as well between the pCR and non-pCR groups of breast cancer. We found that AHNAK, CIDEA, ADIPOQ and AKAP12 most likely induce chemoresistance in NAC. Furthermore, we constructed a gene prediction model to select the optimal population of breast cancer patients for neoadjuvant therapy. Our work revealed that AHNAK, CIDEA, ADIPOQ and AKAP12 are efficacy-related genes causing chemoresistance in breast cancer. Finally, the products were purified (AMPure XP system), and library quality was assessed on the Agilent Bioanalyzer 2100 system.

System using the HiSeq 4000 PE Cluster Kit (Illumina) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an
Illumina HiSeq 4000 platform, and 150 bp paired-end reads were generated.

The detailed grouping criteria
pCR [4] is defined as ypT0 ypN0 or ypT0/is ypN0. pCR was regarded as sensitive to NAC, and non-pCR was regarded as resistance to NAC.

Statistical analysis
Correlations of clinical characteristics between the pCR group and the non-pCR group were tested by the chi-square test. The Mann-Whitney U-test or Wilcoxon test was used to test differences in AHNAK, CIDEA, ADIPOQ, and AKAP12 before and after NAC and in the two groups. Univariate and multivariate logistic regression tests were used to analyse the associations between AHNAK, CIDEA, ADIPOQ, and AKAP12 expression and non-pCR. Receiver operating characteristic (ROC) curve analysis was used to assess the discriminative ability of the nomogram. Statistical results were considered significant with a P value <0.05. All statistical analyses were carried out using SPSS statistics (version 25.0). The nomogram was drawn in R software (R version 3.6.0). genes were downregulated more than two times than before NAC treatment (P<0.05), and noncoding RNAs were excluded. Some differentially expressed genes are displayed in Figure 1.

Differential expression in RNA-seq in the pCR and non-pCR groups
Compared with the pCR group, 457 genes were expressed more than 2-fold and 1361 genes were expressed less than 2-fold in the non-pCR group (P<0.05), excluding noncoding RNAs. The two groups showed differentially expressed mRNA pairs ( Figure   2).

Potential drug resistance-related genes in breast cancer treated with NAC
Furthermore, we selected genes that were more than 2 times upregulated in residual disease after chemotherapy in the non-pCR group and 2 times more highly expressed in the non-pCR group than in the pCR group. Finally, 28 genes met the requirements ( Figure 3). We next selected the 4

Construction of the nomogram efficacy prediction model
AHNAK, CIDEA, ADIPOQ, and AKAP12 were used as factors in the efficacy prediction model, and a nomogram was constructed with R software (Figure 6).
AHNAK, AKAP12, CIDEA, and ADIPOQ were used as continuous variables (10 was used as the unit because the expression levels were high). As shown in Figure 6, the    [11]. Cancer heterogeneity also means there are differences in gene expression and chemotherapeutic sensitivity; therefore, it is vital to refine molecular typing and advance targeted treatment in breast cancer [12].
RNA-seq is a great technology for genome-wide analyses of transcriptome information at the single-nucleotide level, in addition to the quantification of gene expression [13].
RNA-seq can identify alternative splice sites, detect novel transcripts and tumour heterogeneity [14], and identify drug resistance biomarkers [15,16]. With these bioinformatics approaches, scholars have found that drug resistance genes exist in breast cancer before NAC, and chemotherapy promotes the evolution of resistance genes and then causes resistance clinical outcomes [17][18][19][20]