Exploiting Synthetic Lethal Network for Precision Treatment of Clear Cell Renal Cell Carcinoma

The emerging of targeted therapies has revolutionized the treatment modalities of advanced clear cell renal cell carcinoma (ccRCC) over the past fteen years. However, lack of personalized treatment limits the development of effective clinical guidelines and improvement of patient prognosis. In this study, large-scale genomic proles of ccRCC cohorts were exploited for conducting an integrative analysis. Method Based on synthetic lethality (SL), a concept that simultaneous losses of two genes cause cell death while a single loss does not, we sought to develop a computational pipeline to infer potential SL partners of ccRCC. Drug response prediction were received from three pharmacological databases to select agents which are likely to be effective in precisely treating patients with target gene mutation. We developed a credible method to identify SL pairs and determined a list of 72 candidate pairs which might be utilized to selectively eliminate tumors with genetic aberrations through SL partners of specic mutations. Further analysis identied BRD4 and PRKDC as novel medicine targets for patients with BAP1 mutations. After mapping these target genes to comprehensive drug datasets, two agents (BI-2536 and PI-103) were found to have considerable therapeutic potential in BAP1 mutant tumors. Overall, our ndings provide offer novel


Introduction
Renal cell carcinoma (RCC) is one of the most lethal malignancies in the genitourinary system. A recent study showed that there were 431,288 (2.2%) new cases and 179,368 (1.8%) deaths of RCC in 2020 [1].
Approximately 70% of renal cancer patients present with localized stage, suggesting a possible complete excision of tumor by radical nephrectomy [2,3]. Clear cell renal cell carcinoma (ccRCC) is the most prevalent subtype, accounting for more than 70% of all RCC [4]. Although most ccRCC can accept effective treatment, including surgery or ablation when early diagnosed, the rate of distant metastasis is up to a third after treatment [5]. Considering the poor prognosis of ccRCC patients, more efforts are required to develop optimal adjuvant or targeted therapies bene ting these patients.
were imputed using the K-nearest neighbors (KNN) method [18] located in R package Impute. Notably, expression pro les and molecular data of CCLs were downloaded from the same CCLE Project, and were used for subsequent PRISM and CTRP analyses. In order to investigate the cancer survival-essential genes, the genome-wide gene dependency scores, including CERES scores from clustered regularly interspaced short palindromic repeats (CRISPR) knockout screens [19] and DEMETER scores from RNA interference (RNAi) screens [20], were achieved from the Cancer Dependency Map (DepMap) portal (https://depmap.org/portal/download/), which lower CERES or DEMETER scores denote that relevant genes are more likely to be essential in cell survival and proliferation of CCLs.

BAP1 mutation prediction
Due to missing mutation data of part samples in E-MTAB-3267, E-MTAB-3218 and GSE29609, elastic net (EN)-based prediction model, a generalized linear model in the R package glmnet [21], was utilized to forecast BAP1 mutation status. The RNA-seq metadata mentioned above were then used as training cohort to construct the prediction model, and samples with mutation annotations in E-MTAB-1980 were considered as external validation for evaluating the performance of BAP1 prediction model. To select signi cant genes which were taken as input into EN model (abs (Log2FC) > 1.5 & adjust P < 0.05), differential expression analysis between the BAP1 mutant and WT samples from the training cohort was performed using the R package limma [22]. Additionally, the leave-one-study-out cross-validation was performed to evaluate the accuracy of EN model. Speci cally, after splitting a dataset into a training set and a testing set, and using all but one observation as part of the training set, the prediction model was built using data from the training set. Lastly, this process was repeated n times (where n is the total number of observations in the dataset), leaving out a different observation from the training set each time, which meant that it provided a much less biased measure of test mean squared error compared to other cross-validation methods. Notably, the penalty was set as 0.9 in tting a generalized linear model.
The predictive performance of the EN model in training and validation cohorts was evaluated using receiver operating characteristic (ROC) curve via the R package pROC [23].

Detection of cancer driver mutations
To discern likely DMs regulating gene network of tumor expression from thousands of mutations, the DriverNet algorithm [24] was applied in this study, which could evaluate the DM probability through integrating genome and expression data. Accordingly, a mutation matrix, a corresponding expression matrix and an in uence graph were taken as input documents of DriverNet. In this analysis, the in uence graph was derived from the Reactome Functional interactions [25], an updated protein functional interaction network (Version 2020). Notably, the results of DriverNet indicated the probabilities whether imported mutations belong to DMs, and genes with P value < 0.05 were deemed statistically signi cant. To make our prediction more reliable, we compiled a comprehensive list of cancer-associated driver genes which have been validated from prior studies and made a comparison between our prediction and previous results. These same DMs were saved for constructing the network between DMs and druggable genes (DGs) subsequently.

Collection of drug-target interactions
The medicine information about drug-target was acquired from the Drug Repurposing Hub [26] and DrugBank [27], respectively. The Drug Repurposing Hub (released March 2020, https://clue.io/repurposing#download-data ) contained 6798 unique compounds and 2,183 targeted genes, and DrugBank (Version 5.1.8, released January 2021, https://go.drugbank.com/releases/latest) comprised 7,540 compounds and corresponding 3,976 targeted genes. Then two drug data were merged into one meta-drug set, and a total of 11,875 compounds and 4,465 DGs were identi ed after removing duplicated medicine information. In order to identify genes with potential therapeutic implications, DGs were utilized to construct DM-DG-drug network.

Mutual exclusivity analysis
Under the SL hypothesis, no somatic alteration happens on both genes of candidate partners in ccRCC simultaneously. Based on the somatic mutation data of 1,211 patients, the analysis was performed by using the DISCOVER R package to determine signi cant mutual exclusivity [28]. Gene pairs with adjusted P value < 0.1 were considered statistically signi cant.

Connectivity map analysis
To identify potentially therapeutic compounds, connectivity map (CMap) analysis (https://clue.io/) was used for searching compounds of which gene expression patterns were opposite to the BAP1 mutant expression pattern. Differential analysis between BAP1 mutant and WT samples was performed to select 150 up-regulated and down-regulated genes with the most signi cant fold changes respectively. Through the CMap analysis, the standardized connectivity score for each perturbation was calculated, which ranges from -100 to 100. Compounds with the CMap score < -80 were considered to have a potential therapeutic effect for ccRCC.

Identi cation of ccRCC subclasses
Network-based strati cation (NBS) was performed to identify subclasses of ccRCC via Python package pyNBS [29], which divides tumor samples with available somatic mutation pro les into molecularly and clinically relevant subtypes on the basis of the mutation characters of the combined RNA-seq cohort [30].
Through integrating a high-quality cancer reference network from the recent study [29] and a mutation matrix of driver genes, we acquired the resulting data which contained the clustering information and corresponding consensus matrix from NBS. To evaluate the robustness of clusters k ranging from 2 to 5, the cophenetic correlation coe cient was calculated using the R package NMF [31] and the value of k that resulted in the maximum cophenetic correlation coe cient was considered as the optimal number of clusters. In addition, the nearest template prediction (NTP) analysis was conducted via R package CMScaller, which could predict the previously published RCC classi cations based on the provided subclass signatures [32].

Functional similarity analysis
In this study, GOSemSim, an R package for measuring semantic similarity among GO terms and gene products [33], was utilized to estimate the similarity of molecular function (MF) and cellular component (CC) among different genes. Gene pairs achieved from DM-DG network above were used to measure the functional similarity score (FSS), which was calculated based on the semantic similarity in MF (SsMF) and CC (SsCC), as following formula: Notably, gene pairs with FSS > 0.45 were considered to have high functional correlations and were used for further analysis.

Rank aggregation analysis
To obtain a consistent result across multiple sources, rank aggregation algorithm, an order statisticsbased method located in R package RobustRankAggreg proposed by Kolde et al [34], was applied in this study, of which the result (P value) indicates whether the ranking of a particular gene pair is statistically signi cant. In this analysis, we chose the order statistics method proposed by Stuart et al [35] by assigning the corresponding parameter to 'the Stuart' and de ned the rank aggregation score (RAS) as follows: The ranking of candidate gene pairs was determined by the RAS, and a higher RAS denoted a more concordant ranking.

Predicting drug response in clinical samples
Three large pharmacogenomic datasets, including CTRP, PRISM and GDSC, contained massive drug screening and gene expression data across hundreds of cancer cell lines. Previous studies have demonstrated that drug response in clinical samples can be predicted using data from in vitro cell line experiments [36]. To perform drug response prediction, we intended to test different machine learning methods, including support vector machine, random forest and multivariate linear regression, based on the actual drug sensitivity and molecular data. In this study, the ridge regression model that exhibited great and precise performance in the previous research [37] was utilized for transcriptome data-based drug response prediction using the R package pRRophetic. Through exploiting the expression and drug response data of solid CCLs from CCLE and GDSC projects (excluding hematopoietic and lymphoid tissue-derived CCLs), this predictive model was trained with a satis ed predictive accuracy evaluated by default 10-fold cross-validation and then applied to calculate different drug response across clinical samples. These compounds with positive response calculated by this model were matched to their DGs for subsequent construction of the DM-DG-drug network.

Enrichment analysis
We performed gene set variation analysis (GSVA) using the R package GSVA based on the hallmark de nitions (h.all.v7.4.symbols) extracted from the Molecular Signatures Database (https://www.gseamsigdb.org/gsea/msigdb/) [38] to explore the differential expression of certain pathway or signature between BAP1 and WT patients [39]. Notably, the resulting P value from the hypergeometric test was adjusted for multiple comparison testing and P adjust < 0.05 was considered signi cant.

Statistical analysis
All the statistical tests and graphical visualization were conducted utilizing R statistical software, version 4.0.5 (https://cran.r-project.org/). Student's t-test or Wilcoxon rank-sum test was applied for comparison of two groups with or without normally distributed variables, respectively. Similarly, correlation between two continuous variables was measured by either Pearson's r correlation (measure of linear relationship between two continuous variables) or Spearman's rank-order correlation (nonparametric measure of statistical dependence between two variables). Contingency table variables were analyzed by Fisher's exact tests. The Kaplan-Meier method was applied to perform survival analysis and the statistical signi cance of differences was determined using the log-rank (Mantel-Cox) test. The hazard ratios (HR) were calculated using the univariate Cox proportional hazards regression model located in R package survival. The Benjamini-Hochberg method was utilized to adjust P value of multiple testing in those analyses with more than 20 comparisons. P value < 0.05 was considered statistically signi cant for all computational analysis unless otherwise stated.

Result
Overview of the SL interaction analysis A total of 1,174 ccRCC transcriptome pro les together with clinical information were collected from numerous publicly available cohorts, including TCGA-KIRC, RECA-EU, CM-009, CM-010, CM-025, E-MTAB-1980, E-MTAB-3218, E-MTAB-3267 and GSE29609. Of these patients, 928 were from RNA sequencing data, which were used for further SL framework construction, and 246 were from microarray data, which were considered as external validation for evaluating the result of DM-DG-drug network. A Schematic diagram of the procedures of SL inference and the overall study design was presented in Fig. 1.
The SL interaction analysis is a computational pipeline for identifying candidate SL interactions drawing on the experiences from several previous researches, such as DAISY [9], MiSL [10], SELECT [40]. This analysis consists of four statistical inference procedures: 1. Differential gene expression: The procedure exploited gene expression and somatic alternations of the inputting tumor samples to discover potential SL gene pairs under the assumption that carcinoma cells may increase the expression of its SL partners as a compensatory mechanism when a driver gene loses its function due to the mutation. Differential expression analysis was conducted using Wilcoxon rank-sum test between samples with and without DMs and only target genes with higher expressions in mutated samples were saved as potential SL partners of corresponding DMs.
2. Pairwise gene co-expression: The procedure tended to select gene pairs which could have similar functions of cell metabolism and growth, and be likely to co-express in para-carcinoma normal tissues on the notion that there is often an intensive relationship between both genes of SL pair. Gene pairs presenting signi cant correlations (Spearman correlation coe cient > 0.1 and P adjust < 0.05) were considered as SL candidate pairs.
3. Functional similarity: The procedure aimed to lter out gene pairs having high semantic similarity, motivated by the assumption that SL partners tend to engage in closely related biological processes. And accordingly, their locations in Gene Ontology (GO) topological structure should be close. FSS which was de ned as the geometric mean of semantic similarities of MF and CC ranges from 0 to 1. And FSS ≤ 0.45 between gene pairs were considered to have no signi cant functional similarity and thus excluded from the candidate SL pairs.
4. Mutual exclusivity: The procedure selected those gene pairs which the incidence of simultaneous mutation was signi cantly lower than common gene pairs, based on the concept that simultaneously mutating two genes in an SL pair would affect the cellular process and cause tumor cell death. The gene pairs with P adjust < 0.15 were considered as potential SL pairs.
Those candidate pairs passing all the four procedures composed the nal output set of candidate SL pairs and were subsequently used for constructing the DM-DG-drug network.

Detection of driver genes in ccRCC
The current consensus viewpoint on tumorigenesis and tumor progression was that only a few mutational events affecting driver genes were determined to be the origin of malignancy, which confers selective growth advantage to the tumor cell. Compared with traditional chemicals, small molecule compounds targeting DM have the advantageous property of avoiding impairment of normal tissue, and thus screens on these DMs are more likely to identify clinically signi cant targets. In this study, to identify candidate drivers, the DriverNet algorithm was applied in the most comprehensive metadata set of ccRCC currently, which contained 610 patients from ve clinical cohorts with both expression and mutation data available ( Fig. 2A). A total of 36 candidate genes had been yielded with P adjust < 0.1 and mutation frequency beyond mean (Additional le 2: Table S1). Notably, due to the limitations of the in uence graph derived from the Reactome Functional interactions, SETD2 which has been con rmed as driver genes in previous studies [11,41,42] was added in our prediction model to achieve more reliable results. Of these genes, 25 genes (67.6%) which demonstrated the reliability of our prediction have been reported by at least one previous research and were then taken as robust drivers of ccRCC for subsequent analysis.
To explore the clinical implications of DMs in ccRCC, the NBS algorithm was applied to stratify patients into different subtypes utilizing their mutation pro les. According to the resulting cophenetic correlation coe cients, 610 patients were assigned into two groups (Additional le 1: Fig. S2A and S2B). The result indicated that each group had distinguishing mutation features (Fig. 2B). NBS2 contained a higher proportion of common DMs, including VHL, PBRM1 and SETD2, while NBS1 exhibited a high frequency of BAP1 and increased mutational burden (Fig. 2C and Additional le 1: S1C). In addition, we analyzed the relationship between the NBS classi cation and the previously reported RCC molecular subgroups, including Rini's (Low-High recurrence score group) [43], Brooks' (ccA-ccB group) [44] and Motzer's (Poor-Favorable risk group) [45]. NBS1 was found to be positively associated with Rini's High recurrence score group (P = 0.1713), Brooks' CCB group (P = 0.046) and Motzer 's Poor risk group (P = 0.0416), while NBS2 exhibited opposite patterns (Additional le 2: Table S2). Next, the correlation between NBS classi cation and clinical characteristics, containing the clinical stage, pathological stage and survival time, was investigated in the combined cohort. There was a signi cant difference in survival outcome between NBS groups, which NBS2 exhibited a better prognosis than NBS1(P = 0.0021) (Fig. 2D). However, other clinical characteristics were found to be weakly correlated with NBS classi cation (Additional le 2: Table S2).
Taking together, the NBS classi cation provided novel insight into the DM-based clinical subclass of ccRCC patients and enhanced our understanding of the crucial role which driver genes played in tumorigenesis and development.

Selection of druggable genes
The SL candidates of DMs can be achieved leveraging our computational pipeline, while there encounters another problem that not all identi ed partners of DMs could be targeted when performing genome-wide scanning for potential SL partners. Therefore, to infer statistically candidate SL partners which could be targeted with conventional chemical agents, a list of 4,465 DGs was compiled from current public pharmacological databases and considered as the input of SL analysis. Of these DGs, only 1,981 targets were used for constructing DM-DG network due to low expression of some DGs after removing batch effects.

Inference of driver mutation-druggable gene interactions
Based on the above mentioned 25 DMs and 1,981 DGs, SL interaction analysis was conducted to infer DM-DG pairs which meet the corresponding criteria. In total, 72 DM-DG pairs (containing 69 unique drug targets) passed all the screening procedures and thus composed SL candidates for ccRCC (Fig. 3A). Additionally, rank aggregation analysis was performed to integrate the results of each procedure in SL interaction in order to obtain a robust ranking of the 72 DM-DG pairs. Accordingly, the ranks of candidate pairs were ordered based on FS scores (functional similarity), fold change values (differential expression), correlation coe cients (pairwise co-expression), and P adjust (mutual exclusivity) respectively. Then, the Stuart method was applied to integrate all the rankings and calculated the RAS of each DM-DG pair (Additional le 2: Table S3).
To validate whether the DM-DG pair exhibits SL interaction, we performed univariate survival analysis between DG expression in patients with speci c DM and progression-free survival (PFS) using Cox proportional hazards regression models. Part signi cant DGs were found to be associated with shorter recurrence time (HR>1) among patients with relevant DMs (Additional le 2: Table S4). Additionally, Kaplan-Meier analysis was conducted to reveal the clinical relationship between PFS and the status of DG in patients with corresponding DM. Speci cally, we mainly de ned the functional status of one gene by dividing expression data into active (>median) and inactive groups (<median) for the lacking of aberration situation of DGs ( Fig. 3B and 3C). As depicted from the gure, BRD4 and TYK2 inactive groups had signi cant survival advantage in ccRCC patients with BAP1 and VHL mutations compared with active groups, respectively. These survival data-based analyses demonstrated the clinical phenotypes of these 72 DM-DG pairs could be well compatible with their roles as SL candidates.

Estimation of drug response in clinical samples
Three pharmacogenomic datasets described in the Materials and Methods section, containing drug sensitivity data and gene expression pro les of multiple CCLs, were utilized to construct the drug prediction model. Notably, chemical compounds with NAs in more than 20% of the samples and CCLs derived from hematopoietic and lymphoid tissue were excluded to achieve precise prediction result. After removing duplicated or invalid compounds, there were 1801 compounds in total. Of these, 669 CCLs with 402 compounds in CTRP dataset, 474 CCLs with 1,285 compounds in PRISM dataset and 786 CCLs with 320 compounds in GDSC dataset were used for subsequent drug prediction analysis (Fig. 3D). The ridge regression model located in the package pRRophetic was applied to perform the drug response prediction of clinical samples on the basis of their expression pro les, and the estimated AUC value of each compound among clinical samples was used as an evaluation indicator of drug sensitivity.
Before proceeding further, the results of drug response estimation have been validated computationally. Pazopanib, an oral small-molecule multi-kinase inhibitor for the treatment of advanced renal cell carcinoma, was used to evaluate whether the estimated drug sensitivity was consistent with its clinical e cacy. A retrospective cohort study found that the mutation status of BAP1 has independent prognostic value in advanced RCC patients treated with rst-line tyrosine kinase inhibitors [46]. Compared with WT patients, those patients harboring BAP1 mutation performed reduced clinical bene t from pazopanib treatment, and exhibited worse PFS and overall survival (OS). Therefore, patients from the combined RNA-seq cohort were divided into two groups according to their alteration statuses of BAP1 (altered versus unaltered: 84 versus 526). The Wilcoxon rank-sum test was applied to compare the estimated AUC values of pazopanib between two groups, and the result suggested that a signi cantly higher value of patients with mutant BAP1 than WT (P = 0.018) (Fig. 3E), consistent with what pazopanib behaved clinically.

Constructing prediction model of BAP1 mutation
On the basis of the combined RNA-seq cohort, the EN algorithm described in the Materials and Methods section was utilized to construct a robust model for predicting BAP1 mutation status. The differential genes between the BAP1 mutant and WT samples should be provided to this prediction model. Therefore, the limma package was applied to investigate the expression difference of these samples and differential genes were de ned as P adjust< 0.05 and absolute log2 fold change (FC) > 1.
Survival analysis on 1,207 patients with both available prognosis and mutation data was conducted to investigate whether the functional status of BAP1 was associated with the survival outcome of cancer patients. The result denoted that there was a signi cant prognostic difference between the two groups, with longer median survival time (MST) in WT patients (MST=6.16 years, 95% con dence interval [CI]: 5.31-7.95 years) than in BAP1 mutant patients (MST=2.46 years, 95%CI: 2.00-3.52 years), which was consistent with the results of the MSKCC and the TCGA-KIRC cohorts (Additional le 1: Fig. S3).
To discern the characterization of biological processes affected by BAP1 mutation, the enrichment analysis was performed using R package GSVA. The result showed that the up-regulated genes in BAP1 mutant group were enriched in multiple carcinogenesis associated pathways, such as E2F targets, MTORC1 signaling and DNA repair, while the up-regulated genes in WT group were enriched in metabolism-associated pathways, such as pancreas beta cells, bile acid metabolism (Fig. 4A and Additional le 2: Table S5).
Based on BAP1 mutation prediction model, the prediction accuracy achieved 93.1% in the training cohort (combined RNA-seq cohort) and 84.2% in the independent validation cohort (E-MTAB-1980) (Additional le 1: Fig. S4A and S4B). To evaluate the predictive ability of the prediction model, receiver operating characteristic (ROC) curve was applied using R package pROC, which a higher AUC indicates a preferable performance of the model. The AUC of this prediction model was 0.956 in training cohort and 0.895 in validation cohort (Additional le 1: Fig. S4C and S4D), suggesting that this model was e cient and robust enough for predicting BAP1 alteration in other transcriptomic cohorts. Therefore, this model was used to identify estimated BAP1 mutant samples from combined microarray cohort (E-MTAB-3267, E-MTAB-3218, E-MTAB-1980 and GSE29609).
Identi cation of potential therapeutic agents for BAP1 mutant ccRCC According to target annotation, 167 associated drugs were retained after mapping drugs to 69 unique targets in the DM-DG pairs. Differential drug response analyses between WT and mutant patients were conducted to further connect DMs with these DG-associated drugs. Compared with WT samples, only drugs with signi cantly lower estimated AUC values in mutated samples (logFC < 0 & P value<0.05) were considered as SL-associated drugs. There remain 149 DM-drug pairs and 49 DM-DG pairs meet the screening requirements, which were then visualized in a DM-DG-drug network ( Fig. 4B and Additional le 2: Table S6). Among the nal candidate SL pairs, the number of BAP1 mutant gene pairs was far more than other DM-DG pairs, which provided more potential therapeutic agents for this kind of patients. Since BAP1 mutated tumors were signi cantly associated with worse overall survival than tumors without mutated BAP1 [6], it is essential to investigate specialized therapeutic agents for BAP1 mutant ccRCC. Accordingly, the BAP1 mutation was selected for further investigations regarding its therapeutic potential in renal cancer.
In the DM-DG-drug network, these analyses yielded 26 compounds with potential therapeutic effects for treating BAP1-mutant ccRCC. We compared the dependency scores of speci c compound targets between BAP1 mutant and WT cells from RCC to validate the effect of these potential drugs (Fig. 4C-F).
Although there was no statistically difference in results, CCLs with BAP1 mutation still exhibited a trend toward lower dependency scores. Through integrating drug prediction results, survival and dependency analyses, it was found that BRD4 and PRKDC could be the optimal targets for treating ccRCC patients with BAP1 mutations (Fig. 5A). Nevertheless, above analyses alone cannot fully support the conclusion that the actual effect of compounds applied to tumors was consistent with the theoretical inference. Therefore, to explore the potential effect of these compounds in treating ccRCC, the multiple perspective approaches for drug prediction were adopted. First, the CMap analysis was utilized to nd medicines whose drug signatures, namely drug-induced pro les of expression changes, were opposite to the BAP1 mutant expression pattern. A total of three compounds, including ZSTK-474, BI-2536 and PI-103, had CMap scores less than -80, representing that these drugs might have therapeutic effects in patients with BAP1 mutations. Second, we calculated the expression differences of candidate DG between normal and tumor tissue, and compounds with higher fold change values were considered as greater potential agents for ccRCC treatment. Thirdly, through searching relevant literature about compounds in Pubmed (https://pubmed.ncbi.nlm.nih.gov/), we found out the experimental and clinical evidence of candidates in treating ccRCC. Lastly, the dependency analysis of DGs across kidney CCLs was also conducted, and lower CERES or DEMETER scores denote that relevant genes are more likely to be essential for CCLs survival. All results were presented in Fig. 5B and Additional le 2: Table S7. In general, BI-2536 and PI-103, which had robust abilities in vitro and in silico evidence, were considered as the best therapeutic compounds for BAP1 speci c ccRCC treatment.
In addition, the independent dataset, which comprised molecular pro les and mutation data of 246 ccRCC patients from the combined microarray cohort, was also used for further external validation. Through comparing the estimated AUC values of two speci c agents (BI-2536 and PI-103) between BAP1 mutant and WT groups, the result suggested that mutant group was indeed more sensitive to both BI-2536 and PI-103 than WT group, highly consistent with the results of our in silico prediction (Fig. 5C, 5D and Additional le 2: Table S7).

Discussion
A recently accepted concept of tumorigenesis and progression is that tumor cells are susceptible to mutation events, thus they depend on other genes to gain survival advantages. Considering a pivotal challenge to rescue the activity of driver targets, it is urgent to discover alternative approaches.
Fortunately, pharmaceutical agents based on SL strategy provide novel insight for precisely killing tumor cells with certain mutations. The PARP inhibitor Olaparib is the rst drug to be clinically used in treating breast cancer patients with BRCA1/2 mutation based on the SL interaction mechanism [47]. Although pan-cancer analysis has obtained considerable results [10,48], the practical application value in ccRCC patients may be limited due to their distinct metabolism process, proliferative characteristic and genetic feature.
The applications of RNA interference (RNAi) and clustered regularly interspaced short palindromic repeats (CRISPR) are preferable choices to identify SL pairs, but such methods are expensive and only suitable for screening partners of few fascinating driver genes [8,49]. Currently, given the easily accessible genomic data, using computational procedure is attractive to predict SL pairs. In the current study, we performed SL interaction analysis in the most comprehensive metadata set of ccRCC so far, which included 610 patients from ve clinical cohorts with available expression and mutation data, to predict the potential gene pairs. The rst predictive method, differential gene expression, assumes that most mutations of driver genes result in loss-of-function and hence allows the tumor cells to compensatively up-regulate the expression of the SL partners [9]. The second predictive method, pairwise gene co-expression, depends on the concept that SL pairs seem to exert related biological functions and co-express in WT tumor samples [9]. The third predictive method, functional similarity, indicates that gene pairs with SL interaction are likely to engage in similar biological process, thus their locations in GO topological network should be neighboring. The last one, mutual exclusivity, is based on the notion that inhibition of two genes with SL interplay can reduce tumor cells vitality and hence two genes of tumor samples express in a mutually exclusive manner [50].
Classifying the genomic characteristics provides a brilliant prospect for the occurrence, progression and precise treatment of RCC. That is, VHL mutation acted as an initiative event to induce tumor occurrence, while PBRM1, BAP1 and SETD2 cause DNA repair defect and cell overgrowth. Subsequently, the effective pathways, such as PI3K-mTOR activation, confer tumor cells the potential to evade death signals and metastasis [6]. In this study, chromatin remodeling gene BAP1 accounts for 59.7% of potential SL-based driver genes, followed by another frequent mutating gene PBRM1 (23.6%). It is revealed that BAP1 and PBRM1, residing closely on chromosome 3p, are frequently mutated (approximately 10% and 40%, respectively) in RCC patients [51][52][53]. Several studies have proved the crucial role of BAP1 and PBRM1 in tumor development. Brie y, BAP1 interacts with BRCA1/ BARD1 complex to regulate crucial biological processes, such as chromatin modi cation, DNA damage repair and cell cycle control [54,55]. Depletion of BAP1 was associated with aggressive histological grade [52], advanced tumor stage [56] and poor prognosis [54]. Additionally, BAP1 mutation was correlated with high genome instability index (GII) and low intratumoural heterogeneity (ITH), conferring the adaptive advantage and single lethal target to ccRCC clone [57]. In regards to PBRM1, its depletion promoted the upregulation of HIF-1α, STAT3 and the activation of mTOR signaling induced by VHL mutation [58]. Such phenomenon may explain that patients with BAP1 mutation experienced a worse outcome than patients with PBRM1 mutation after receiving rst-line VEGFR inhibitor everolimus and mTOR inhibitor sunitinib treatment [53]. This study still has several limitations. First, several studies employed pairwise survival analysis to SL identi cation [9,70], which was not included in our screening criteria, for the reason that the relatively low mutation frequency of crucial driver genes like BAP1 and some inaccessible survival data of cohorts would reduce the statistical power and thus ignore several important SL interactions. Second, despite robust evidence from pharmaceutical database, there is still a lack of experimental validation. Related experiments are needed in the future to support our conclusions. Third, BI-2536 is also considered as PLK1 inhibitor [71], so further exploration of the target of BI-2536 is essential to elucidate its anti-cancer mechanism in ccRCC.

Conclusion
In conclusion, capitalizing on extensive screening data combined with molecular and clinical data from multiple cohorts, this study developed a novel computational-based strategy to identify SL pairs for ccRCC patients harboring genetically mutation as well as some potential therapeutic agents for BAP1 mutated patients. The potential SL-associated partners for BAP1 and PBRM1, two frequent altered genes, have complemented the current VHL-predominant research and mapped a comprehensive landscape for SL interaction in ccRCC, which might help to deepen our understanding of ccRCC mutation patterns and provide an alternative strategy of personalized renal cancer treatment.  two subclasses were presented simultaneously. (C). Difference in mutation frequency of driver genes, molecular characteristics strati ed by Brooks and MSKCC between two subclasses. Fisher's exact tests were applied to compared the statistical differences. (D). Kaplan-Meier survival curve of two subclasses.
Statistical difference was calculated by log-rank test.  Determining sensitivities of identi ed drugs on renal cancer cell lines. (A). Gene set enrichment analysis between BAP1 mutated and wild-type groups.