Identification of a Six-Gene Signature Predicting Overall Survival for Oxaliplatin and 5-Fluorouracil Resistant Colon Cancer and Potential Drug-Repurposing


 Background: Oxaliplatin (L-OHP) and 5-fluorouracil (5-FU) resistance in colorectal cancer (CRC) is a major medical problem. Therefore, detailed mechanisms and predictive markers are urgently needed. The aim of this study was to identify key pathways, a robust prognostic gene signature and potential drug-repurposing. Methods: In order to confirm the predictive markers and detailed molecular mechanisms of L-OHP and 5-Fu chemoresistant CRC, we performed weighted correlation network analysis (WGCNA), an unsupervised analysis method, to identify the chemoresistant CRC significantly related genes. Then, the gene prognostic model was conducted by Univariate Cox regression and Lasso penalized Cox regression analysis. Subsequently, the time-dependent receiver operating characteristic (ROC) and Kaplan-Meier survival curve were performed to assess the prognostic capacity of the model. Simultaneously, pathway enrichment was done to identify the key pathways involved in chemoresistant CRC. Moreover, we explored how the hub genes interacted with key pathways and transcription factors. Then, we found the potential drug target by the subcellular location fo hub genes. Finally, we identified the potential drug-repurposing by virtual screening for chemoresistant CRC according to ZINC 15 database. Results: We identified the key pathways using KEGG over-representation test and Gene Set Enrichment Analysis (GSEA): Ribosome KEGG pathway. Moreover, six hub-genes prognostic model was conducted by Univariate Cox regression and Lasso penalized Cox regression analysis. Additionally, the detailed interactions among the pathway and hub genes (RBM6, PNN, LEF1, ANO1, PAFAH1B3 and BHLHE41) were examined by protein-protein interaction (PPI) network and shortest-pathway analysis. Furthermore, ANO1 was considered the potential drug target based on the subcellular location and ZINC000018043251 was verified the potential drug by virtual screening Conclusions: Our study identified a novel six-gene resistant signature for CRC prognosis prediction and the molecular details of these interactions between hub genes (RBM6, PNN, LEF1, ANO1, PAFAH1B3 and BHLHE41) and Ribosome key pathways. Furthermore, ZINC000018043251 was verified the potential drug for ANO1 by virtual screening, which might help to improve the outcome of CRC patients.

Colorectal cancer (CRC) is the third most frequently diagnosed cancer worldwide with over one million cases annually, and is also the fourth commonest cause of cancer death worldwide [1]. Owing to the chemotherapy, the mortality rate has decreased during the past decades. The chemotherapy, especially, the combination of oxaliplatin (L-OHP) and 5-uorouracil (5-FU) was the primary treatment for CRC to improve overall survival and quality of life for patients [2,3].
However, despite receiving standard chemotherapy regimens (5-FU and L-OHP), most of CRC patients eventually generate chemoresistance and cause to treatment failure [2]. This highlights the need to better understand the mechanism of drug resistance and how can it be targeted to the patients' advantage.
Even though great efforts have spent to explore the cause of CRC tolerance to chemotherapy, the molecular mechanisms of L-OHP and 5-FU resistance in CRC remain unclear. Thus, the identi cation of predictive biomarkers for response and the mechanism involved in chemoresistance are of great importance.
Some studies were aimed to construct a prognosis model for predicting chemoresistance, but it has not been well established, since most previous relevant studies constructed their prognostic models using parameters from clinical baseline characteristics (such as blood sugar levels [4], body mass index [5] and age [6]) and single molecular biomarkers (CDO1 [7] and MASTL [8] ). Then, a thorough understanding of L-OHP and 5-FU mechanisms of action and interventions to overcome resistance are still relatively needed. Recently, the development of high-throughput sequencing technology and bioinformatics showed a great advantage for CRC prognosis prediction and the mechanism research [9,10]. But the genetic researches that could predict the recurrence of CRC with chemoresistance are still limited and need further study. Therefore, the aim of this study was to identify the molecular mechanisms in CRC with chemoresistance, and a robust prognostic gene signature, that would support the development of novel targeted therapies.

Data collection
The clinical information and the Fragments PerKilobase Million (FPKM) values of mRNA expression data containing 482 colon cancer and 42 normal control samples were downloaded from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/). According to the clinical data, we obtained 22 CRC patients who were only treated with L-OHP and 5-FU (Table S1). GSE17536 dataset was downloaded from the GEO database [11]. GSE17536 was performed on Affymetrix Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA, USA), including 177 colon cancer patients. Data were downloaded from the publicly available database hence it was not applicable for additional ethical approval.
Weighted gene correlation network analysis (WGCNA) WGCNA was performed to obtain the modules which was associated with the chemotherapeutic resistance (L-OHP and 5-FU). A total of 6175 genes in the top 35 % of variance were screened from the data set containing 22 CRC patients who were only treated with L-OHP and 5-FU, using the R package WGCNA [12]. In a brief, we chose the power = 6 for weighting the correlation matrix following an approximate scale free topology. Subsequently, gene expression modules with similar patterns were identi ed based on a gene cluster dendrogram and by using the dynamic tree cut method (minModuleSize = 50, mergeCutHeight = 0.3, deepSplit = 1). The unsigned network type was used to keep the relationships between modules and chemotherapeutic resistance. Generally, the correlation between the phenotype and module eigengenes was considered as the module-trait associations. Therefore, modules with p < 0.05 and person correlation value > 0.4 were considered signi cantly related to the chemoresistant traits. Additionally, the threshold of genes related to chemotherapeutic resistance was set at person correlation of gene vs. module-membership value > 0.5 and person correlation of gene vs. traitcorrelation value > 0.5.

Establishment of the prognostic gene signature
Only CRC patients with a follow-up period longer than 1 months were included for survival analysis. Univariate Cox regression analysis was performed by R package survival (https://CRAN.Rproject.org/package=survival) to identify prognostic genes, and genes were considered signi cant with a cut-off of p < 0.05. Then, Lasso-penalized Cox regression analysis was performed to further select prognostic genes for overall survival in patients with CRC. Then a prognostic gene signature was constructed based on a linear combination of the regression coe cient derived from the Lasso Cox regression model coe cients (β) multiplied with its mRNA expression level.

The risk score=
The optimal cut-off value was investigated by the R package survminer (http://www.sthda.com/english/rpkgs/survminer/) and two-sided log-rank test. Patients were classi ed into a high-risk and low-risk according to the threshold (medium of risk score). The time-dependent receiver operating characteristic (ROC) curve was drawn to evaluate the predictive value of the prognostic gene signature for overall survival using the R package survivalROC [13]. The Kaplan-Meier survival curve combined with a log-rank test was used to compare the survival difference in the high-and low-risk group using the R package survival. As the prognostic genes could predict the prognosis of CRC, so they were considered as the hub genes among the genes related to the chemotherapeutic resistance. Then the predictive value of the prognostic hub-gene signature was further investigated in the GSE17536 testing cohort.

Independent prognostic role of the gene signature
To investigate whether the prognostic hub-gene signature could be independent of other clinical parameters (including gender, age and stage), univariate and multivariate Cox analyses were performed by R package survival using the Cox regression model method with forwarding stepwise procedure with the cutoff of p < 0.01.

Pathway enrichment analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a data base for systematic analysis of gene function which links genomic information with higher-level systemic function [14]. After obtaining the genes related to the chemotherapeutic resistance by WGCNA, the R package clusterPro ler [15] was used to perform a KEGG over-representation test, at a cutoff of p < 0.05. Simultaneously, Gene Set Enrichment Analysis (GSEA) were performed [16] to explore the potential molecular mechanisms, with a cutoff of FDR < 0.05 and p < 0.05. Then, combining the two results to get the key pathway involved in the chemoresistance.

Protein-protein interaction
The Human protein-protein interaction (PPI) network used in this analysis was retrieved from the STRING database (medium con dence: 0.2) [17]. The PPI network was constructed with the genes related to the chemotherapeutic resistance and core genes involved in the key pathways were visualized using Cytoscape 3.7.1 [18].

Shortest pathway analysis
To nd possible connections between hub genes and the signaling pathways of interest, we performed a shortest-pathway analysis. The shortest pathway is de ned as the minimum number of edges required to travel from one node in the PPI network to another. The Python package NetworkX (http://networkx.github.io) was applied to compute all the shortest paths between hub genes and the key pathway genes.

Identi ng of Potential Drug-repurposing
The 3D structure of ANO1 was download from SWISS-MODEL (Q5XXA6, https://swissmodel.expasy.org/) and its biding site was found by Schrodinger maestro 2019-1 [19]. Then, we built a library of 2106 FDA approved drugs obtained from ZINC15 database [20]. Finally, we performed virtual screening and molecular docking by Schrodinger maestro 2019-1 to nd the potential drug-repurposing.

Statistical analysis
Statistical analysis was performed using R 3.5.3 (R Foundation for Statistical Computing, Vienna, Austria). Qualitative variables were analyzed using the Pearson test or Fisher's exact test; quantitative variables were analyzed using a t-test for paired samples. Multiple groups of normalized data were analyzed using one-way ANOVA. If not speci ed above, p < 0.05 was considered statistically signi cant.

Results
Construction of weighted expression network to identify chemotherapeutic resistant genes A total of 22 samples who were only treated with L-OHP and 5-FU were include in WGCNA. We selected β = 6 as the appropriate soft-thresholding value to ensure a scale-free network and 20 modules were identi ed. These modules were shown in distinct colors ( Figure 1A). Then, the correlations between module eigengenes and clinical traits were determined ( Figure 1B, Table S2). The cutoff to screen the chemoresistant-related modules was p < 0.05 and person correlation value > 0.4. Based on the cutoff, ve modules signi cantly related to chemotherapeutic resistance were brown module, magenta module, tan module, blue module and greenyellow module ( Figure 1B). A total of 262 genes (Table S3) Figure 2A). The overall survival was signi cantly poorer in the high-risk group than the low-risk group (p = 2.35e -3 < 0.05, Figure 2B). To validate the predictive value of the six-gene signature, we calculated risk score with the same formula for patients in GSE17536. The AUC for overall survival was 0.720 ( Figure 2C). What' more, consistent with the results in the TCGA cohort, patients in the high-risk group shown signi cantly poorer overall survival than patients in the low-risk group (P = 3.862e -3 < 0.05, Figure 2D). Independent prognostic role of the gene signature 22 patients with complete information including gender, age and stage were included for further analysis. Univariate and multivariate Cox regression analysis indicated that stage and our prognostic model were both an independent prognostic factor for overall survival (p < 0.01, Figure 3A and B). Moreover, according to the Hazard ratio, our prognostic was more sensitive than stage. Then, the results were valeted in the GSE17536 datasets. Stage and our prognostic model were also found to be an independent prognostic factor for overall survival (p < 0.01) and our prognostic model was more sensitive ( Figure 3C and D), which was consistent with the results from TCGA cohort.
Identi cation the key pathways through KEGG over-representation test and GSEA To identify the key pathways involved in CRC with chemotherapeutic resistance. KEGG over-expression pathway analyses were conducted by clusterPro ler to examine the function of previously identi ed chemoresistant genes through WGCNA. Based on the KEGG database, chemoresistant genes were highly enriched in eight pathways ( Figure 4A). As another method for pathway analysis, GSEA targets the whole genes expression other than chemoresistant genes, which was obtained by Arti cial threshold. GSEA analysis produced a total of 3 pathways which were enriched in resistant group and 19 pathways were in the sensitive groups (Table S4). Based on the KEGG over-representation test and GSEA results ( Figure  4B), one overlapping pathway Ribosome KEGG pathway was identi ed.

Protein-protein interaction (PPI) network and shortest-pathway analysis
To explore the interaction of hub genes and core genes involved in the Ribosome KEGG pathway, we subjected the genes to a search for the retrieval of string databases. The PPI network comprised 277 nodes and 4503 edges. Then, we applied a shortest-path analysis on the PPI network. We found that RBM6 could directly interact with PNN ( Figure 4C). What's more, as showed in the Figure 4D, PNN, LEF1, RBM6 and PAFAH1B3 could directly interact with genes involved in Ribosome pathway. ANO1 and BHLHE41 should through other genes to crosstalk with the Ribosome pathway.

Identify Potential drug-repurposing
According to the subcellular location information of UniPort database, we found that RBM6, PNN, LEF1, and BHLHE1 were located in the nucleus, PAFAH1B3 was in the cytoplasm, and ANO1 belonged to a transmembrane protein. So, we speculated that ANO1 was the potential drug target. Then, we utilized the virtual screening technique by Schrodinger maestro 2019-1 to identify potential drug-repurposing for ANO1. The 3D protein structure of ANO1 was downloaded from SWISS (Q5XXA6,) and the active site was found by Schrodinger maestro 2019-1 ( Figure S1). According to the glide scores (Table S5), the ZINC000018043251 was the top one hit from output of structure-based virtual screening process. And there were 6 H-bond and 1 Pi-pi interaction in the ligand -protein complex ( Figure 5).

Discussion
Although advances in diagnosis and cancer therapy were encouraging in the past decade, CRC remains a high-risk digestive tract tumor with the low overall survival rate due to CRC chemoresistance [21]. So, L-OHP and 5-FU resistance in CRC is a major medical problem. Therefore, there is currently an urgent need to explore the molecular mechanisms of chemoresistance in CRC and identify predictors of therapy in CRC.
The aim of the present study was to identify the molecular mechanisms in CRC with chemoresistance, and a robust prognostic gene signature, that would support the development of novel targeted therapies. Our initial genome-wide screening, conducted in a small panel (n = 22) of CRC patients treated with 1st line L-OHP and 5-FU chemotherapy, pointed to a list of 262 candidate mRNA associated with chemoresistance by WGCNA. Subsequently, six hub genes were identi ed by the prognostic signature module based on the TCGA data set (n = 524). And its e ciency was validated in GSE17536 (n = 177). Simultaneously, KEGG over-representation test and GSEA were performed to identify the key pathway. Furthermore, the molecular details of the interactions between hub genes and key pathways were explored by the PPI network and shortest pathway analysis. Moreover, ANO1 was considered the potential drug target based on the subcellular location and ZINC000018043251 was veri ed the potential drug by virtual screening Taking together, in this study, we explored the detailed mechanisms involved in the chemotherapeutic (L-OHP and 5-FU) resistance and constructed a six-gene prognostic model to predict the outcome of CRC.
Unfortunately, most of CRC patients eventually generate chemoresistance [22] and approximately 50% of treated patients do not obtain any bene t [23]. Thus, the identi cation of predictive biomarkers for response is of great importance. Considering the multi-faceted role of mRNAs in the development and prognosis of CRC [24], we screened the genes related to the L-OHP and 5-FU resistance by WGCNA and then constructed a prognostic risk assessment model based on multiple mRNAs to serve as the basis for clinical treatment. Unlike previous studies on CRC prognosis, the present study used the genes related to chemoresistance to construct the prognosis model, so it might help to improve the outcome of the CRC who suffered from chemoresistance. In terms of the validity and stability assessment of the prognostic model, we not only performed the standard ROC analysis used in most studies [25], but also validated the six-gene prognosis risk assessment model using the additional date sets. Combing the results of Fig. 3 and Fig. 5, the six-gene prognostic model could signi cantly distinct the outcome of CRC. This suggested that our analysis was accurate and robust and supported the development of novel targeted therapies.
Moreover, six hub genes (RBM6, PNN, LEF1, ANO1, PAFAH1B3 and BHLHE41) were identi ed by WGCNA and prognostic model, were consistent with previous studies. Such as, knockdown ANO1 expression could increase the apoptosis effects induced by L-OHP and 5-FU [26]. Moreover, RNA sequencing revealed PNN could be a predictive biomarker of clinical outcome in stage colorectal cancer patients treated with L-OHP and 5-Fu chemotherapy [27]. Additionally, it was reported that LEF1 was associated with 5-FU [28] and L-OHP caused bending of DNA produced targets for LEF1-binding [29]. And a study showed that LEF1 could regulate MACC1 to promote the drug resistance [30]. What' more, BHLE41 could be used to identify patients who would actually receive bene t from oxaliplatin treatment [31]. Though, there were no evidence that PAFAH1B3 was associated with L-OHP and 5-FU resistance in CRC, PAFAH1B3 was related to prognosis and played an important role in multiple type of cancers including Lung, breast, ovarian, melanoma [32,33]. In the present study, we also explored the potential drug-repurposing by virtual screening targeted ANO1, which might expand potential therapeutic strategies for L-OHP and 5-FU resistant CRC treatment. We found the ZINC000018043251and was the potential drugs for ANO1. Compared with previous studies, which were focused on the relationship between the expression of ANO1 and chemoresistance, we identi ed not only ANO1 was the therapeutic target but also its potential drugs.
In this way, we could accelerate drug development to improve the outcome of L-OHP and 5-FU resistant CRC patients. Taking together, previous studies support our nding that the hub genes (PNN, ANO1, LEF1 and BHLHE41) was associated with 5-Fu or L-OHP resistance, which suggested the accuracy and robust of our analysis.
CRC remains a high-risk digestive tract tumor with low overall survival rate due to CRC chemoresistance [21]. Therefore, a thorough understanding of L-OHP and 5-FU mechanisms of action and interventions to overcome resistance are still relatively required. In this study, the key pathway Ribosome KEGG pathway involved in chemoresistance, was identi ed by employing a KEGG over-representation test and GSEA. And the result was consistent with the previous studies. As was reported that L-OHP killed cells by inducing ribosome biogenesis stress, so the imbalance of Ribosome pathway might in uence the L-OHP resistance in CRC [34]. And Eukaryotic initiation factor 4A2 (EIF4A2) was required for mRNA binding to ribosome and promoted experimental metastasis and L-OHP resistance in CRC [35]. Generally speaking, 5-FU used in chemotherapy worked by hindering the process of ribosome biogenesis [36,37]. And emerging evidence supports targeting the Ribosome biogenesis could avoid chemoresistance [38]. Taken together, the previous studies suggested that Ribosome pathway might involve in the L-OHP and 5-FU resistance in CRC. However, the previous researches didn't show the detailed mechanisms of CRC chemoresistance to L-OHP and 5-FU. So, in order to solve the problems, based on the PPI of the hub genes and core genes involved in Ribosome pathway, we found the molecular details of these interactions by short pathway analysis. That suggested one most possible molecular mechanism involved in chemoresistance, which might help overcome the chemoresistance.

Conclusion
Taken together, our study identi ed a novel six-gene resistant signature for CRC prognosis prediction based on TCGA and GEO data set. That might help patients to obtain an objective response to rst-line treatment. What's more, the molecular details of these interactions between hub genes (RBM6, PNN, LEF1, ANO1, PAFAH1B3 and BHLHE41) and Ribosome key pathways might help a thorough understanding of its mechanisms of action and interventions to overcome resistance. Furthermore, ANO1 was considered the potential drug target based on the subcellular location and ZINC000018043251 was veri ed the potential drug by virtual screening, which might help to improve the outcome of CRC patients.