Overview of the SL interaction analysis
A total of 1,174 ccRCC transcriptome profiles together with clinical information were collected from numerous publicly available cohorts, including TCGA-KIRC, RECA-EU, CM-009, CM-010, CM-025, E-MTAB-1980, E-MTAB-3218, E-MTAB-3267 and GSE29609. Of these patients, 928 were from RNA sequencing data, which were used for further SL framework construction, and 246 were from microarray data, which were considered as external validation for evaluating the result of DM-DG-drug network. A Schematic diagram of the procedures of SL inference and the overall study design was presented in Fig. 1.
The SL interaction analysis is a computational pipeline for identifying candidate SL interactions drawing on the experiences from several previous researches, such as DAISY , MiSL , SELECT . This analysis consists of four statistical inference procedures:
Differential gene expression: The procedure exploited gene expression and somatic alternations of the inputting tumor samples to discover potential SL gene pairs under the assumption that carcinoma cells may increase the expression of its SL partners as a compensatory mechanism when a driver gene loses its function due to the mutation. Differential expression analysis was conducted using Wilcoxon rank-sum test between samples with and without DMs and only target genes with higher expressions in mutated samples were saved as potential SL partners of corresponding DMs.
Pairwise gene co-expression: The procedure tended to select gene pairs which could have similar functions of cell metabolism and growth, and be likely to co-express in para-carcinoma normal tissues on the notion that there is often an intensive relationship between both genes of SL pair. Gene pairs presenting significant correlations (Spearman correlation coefficient > 0.1 and P adjust < 0.05) were considered as SL candidate pairs.
Functional similarity: The procedure aimed to filter out gene pairs having high semantic similarity, motivated by the assumption that SL partners tend to engage in closely related biological processes. And accordingly, their locations in Gene Ontology (GO) topological structure should be close. FSS which was defined as the geometric mean of semantic similarities of MF and CC ranges from 0 to 1. And FSS ≤ 0.45 between gene pairs were considered to have no significant functional similarity and thus excluded from the candidate SL pairs.
Mutual exclusivity: The procedure selected those gene pairs which the incidence of simultaneous mutation was significantly lower than common gene pairs, based on the concept that simultaneously mutating two genes in an SL pair would affect the cellular process and cause tumor cell death. The gene pairs with P adjust < 0.15 were considered as potential SL pairs.
Those candidate pairs passing all the four procedures composed the final output set of candidate SL pairs and were subsequently used for constructing the DM-DG-drug network.
Detection of driver genes in ccRCC
The current consensus viewpoint on tumorigenesis and tumor progression was that only a few mutational events affecting driver genes were determined to be the origin of malignancy, which confers selective growth advantage to the tumor cell. Compared with traditional chemicals, small molecule compounds targeting DM have the advantageous property of avoiding impairment of normal tissue, and thus screens on these DMs are more likely to identify clinically significant targets. In this study, to identify candidate drivers, the DriverNet algorithm was applied in the most comprehensive metadata set of ccRCC currently, which contained 610 patients from five clinical cohorts with both expression and mutation data available (Fig. 2A). A total of 36 candidate genes had been yielded with P adjust < 0.1 and mutation frequency beyond mean (Additional file 2: Table S1). Notably, due to the limitations of the influence graph derived from the Reactome Functional interactions, SETD2 which has been confirmed as driver genes in previous studies [11, 41, 42] was added in our prediction model to achieve more reliable results. Of these genes, 25 genes (67.6%) which demonstrated the reliability of our prediction have been reported by at least one previous research and were then taken as robust drivers of ccRCC for subsequent analysis.
To explore the clinical implications of DMs in ccRCC, the NBS algorithm was applied to stratify patients into different subtypes utilizing their mutation profiles. According to the resulting cophenetic correlation coefficients, 610 patients were assigned into two groups (Additional file 1: Fig. S2A and S2B). The result indicated that each group had distinguishing mutation features (Fig. 2B). NBS2 contained a higher proportion of common DMs, including VHL, PBRM1 and SETD2, while NBS1 exhibited a high frequency of BAP1 and increased mutational burden (Fig. 2C and Additional file 1: S1C). In addition, we analyzed the relationship between the NBS classification and the previously reported RCC molecular subgroups, including Rini’s (Low-High recurrence score group) , Brooks’ (ccA-ccB group)  and Motzer’s (Poor-Favorable risk group) . NBS1 was found to be positively associated with Rini’s High recurrence score group (P = 0.1713), Brooks’ CCB group (P = 0.046) and Motzer ‘s Poor risk group (P = 0.0416), while NBS2 exhibited opposite patterns (Additional file 2: Table S2). Next, the correlation between NBS classification and clinical characteristics, containing the clinical stage, pathological stage and survival time, was investigated in the combined cohort. There was a significant difference in survival outcome between NBS groups, which NBS2 exhibited a better prognosis than NBS1(P = 0.0021) (Fig. 2D). However, other clinical characteristics were found to be weakly correlated with NBS classification (Additional file 2: Table S2). Taking together, the NBS classification provided novel insight into the DM-based clinical subclass of ccRCC patients and enhanced our understanding of the crucial role which driver genes played in tumorigenesis and development.
Inference of driver mutation-druggable gene interactions
Based on the above mentioned 25 DMs and 1,981 DGs, SL interaction analysis was conducted to infer DM-DG pairs which meet the corresponding criteria. In total, 72 DM-DG pairs (containing 69 unique drug targets) passed all the screening procedures and thus composed SL candidates for ccRCC (Fig. 3A). Additionally, rank aggregation analysis was performed to integrate the results of each procedure in SL interaction in order to obtain a robust ranking of the 72 DM-DG pairs. Accordingly, the ranks of candidate pairs were ordered based on FS scores (functional similarity), fold change values (differential expression), correlation coefficients (pairwise co-expression), and P adjust (mutual exclusivity) respectively. Then, the Stuart method was applied to integrate all the rankings and calculated the RAS of each DM-DG pair (Additional file 2: Table S3).
To validate whether the DM-DG pair exhibits SL interaction, we performed univariate survival analysis between DG expression in patients with specific DM and progression-free survival (PFS) using Cox proportional hazards regression models. Part significant DGs were found to be associated with shorter recurrence time (HR>1) among patients with relevant DMs (Additional file 2: Table S4). Additionally, Kaplan-Meier analysis was conducted to reveal the clinical relationship between PFS and the status of DG in patients with corresponding DM. Specifically, we mainly defined the functional status of one gene by dividing expression data into active (>median) and inactive groups (<median) for the lacking of aberration situation of DGs (Fig. 3B and 3C). As depicted from the figure, BRD4 and TYK2 inactive groups had significant survival advantage in ccRCC patients with BAP1 and VHL mutations compared with active groups, respectively. These survival data-based analyses demonstrated the clinical phenotypes of these 72 DM-DG pairs could be well compatible with their roles as SL candidates.
Estimation of drug response in clinical samples
Three pharmacogenomic datasets described in the Materials and Methods section, containing drug sensitivity data and gene expression profiles of multiple CCLs, were utilized to construct the drug prediction model. Notably, chemical compounds with NAs in more than 20% of the samples and CCLs derived from hematopoietic and lymphoid tissue were excluded to achieve precise prediction result. After removing duplicated or invalid compounds, there were 1801 compounds in total. Of these, 669 CCLs with 402 compounds in CTRP dataset, 474 CCLs with 1,285 compounds in PRISM dataset and 786 CCLs with 320 compounds in GDSC dataset were used for subsequent drug prediction analysis (Fig. 3D). The ridge regression model located in the package pRRophetic was applied to perform the drug response prediction of clinical samples on the basis of their expression profiles, and the estimated AUC value of each compound among clinical samples was used as an evaluation indicator of drug sensitivity.
Before proceeding further, the results of drug response estimation have been validated computationally. Pazopanib, an oral small-molecule multi-kinase inhibitor for the treatment of advanced renal cell carcinoma, was used to evaluate whether the estimated drug sensitivity was consistent with its clinical efficacy. A retrospective cohort study found that the mutation status of BAP1 has independent prognostic value in advanced RCC patients treated with first-line tyrosine kinase inhibitors . Compared with WT patients, those patients harboring BAP1 mutation performed reduced clinical benefit from pazopanib treatment, and exhibited worse PFS and overall survival (OS). Therefore, patients from the combined RNA-seq cohort were divided into two groups according to their alteration statuses of BAP1 (altered versus unaltered: 84 versus 526). The Wilcoxon rank-sum test was applied to compare the estimated AUC values of pazopanib between two groups, and the result suggested that a significantly higher value of patients with mutant BAP1 than WT (P = 0.018) (Fig. 3E), consistent with what pazopanib behaved clinically.
Constructing prediction model of BAP1 mutation
On the basis of the combined RNA-seq cohort, the EN algorithm described in the Materials and Methods section was utilized to construct a robust model for predicting BAP1 mutation status. The differential genes between the BAP1 mutant and WT samples should be provided to this prediction model. Therefore, the limma package was applied to investigate the expression difference of these samples and differential genes were defined as P adjust< 0.05 and absolute log2 fold change (FC) > 1.
Survival analysis on 1,207 patients with both available prognosis and mutation data was conducted to investigate whether the functional status of BAP1 was associated with the survival outcome of cancer patients. The result denoted that there was a significant prognostic difference between the two groups, with longer median survival time (MST) in WT patients (MST=6.16 years, 95% confidence interval [CI]: 5.31-7.95 years) than in BAP1 mutant patients (MST=2.46 years, 95%CI: 2.00-3.52 years), which was consistent with the results of the MSKCC and the TCGA-KIRC cohorts (Additional file 1: Fig. S3).
To discern the characterization of biological processes affected by BAP1 mutation, the enrichment analysis was performed using R package GSVA. The result showed that the up-regulated genes in BAP1 mutant group were enriched in multiple carcinogenesis associated pathways, such as E2F targets, MTORC1 signaling and DNA repair, while the up-regulated genes in WT group were enriched in metabolism-associated pathways, such as pancreas beta cells, bile acid metabolism (Fig. 4A and Additional file 2: Table S5).
Based on BAP1 mutation prediction model, the prediction accuracy achieved 93.1% in the training cohort (combined RNA-seq cohort) and 84.2% in the independent validation cohort (E-MTAB-1980) (Additional file 1: Fig. S4A and S4B). To evaluate the predictive ability of the prediction model, receiver operating characteristic (ROC) curve was applied using R package pROC, which a higher AUC indicates a preferable performance of the model. The AUC of this prediction model was 0.956 in training cohort and 0.895 in validation cohort (Additional file 1: Fig. S4C and S4D), suggesting that this model was efficient and robust enough for predicting BAP1 alteration in other transcriptomic cohorts. Therefore, this model was used to identify estimated BAP1 mutant samples from combined microarray cohort (E-MTAB-3267, E-MTAB-3218, E-MTAB-1980 and GSE29609).
Identification of potential therapeutic agents for BAP1 mutant ccRCC
According to target annotation, 167 associated drugs were retained after mapping drugs to 69 unique targets in the DM-DG pairs. Differential drug response analyses between WT and mutant patients were conducted to further connect DMs with these DG-associated drugs. Compared with WT samples, only drugs with significantly lower estimated AUC values in mutated samples (logFC < 0 & P value<0.05) were considered as SL-associated drugs. There remain 149 DM-drug pairs and 49 DM-DG pairs meet the screening requirements, which were then visualized in a DM-DG-drug network (Fig. 4B and Additional file 2: Table S6). Among the final candidate SL pairs, the number of BAP1 mutant gene pairs was far more than other DM-DG pairs, which provided more potential therapeutic agents for this kind of patients. Since BAP1 mutated tumors were significantly associated with worse overall survival than tumors without mutated BAP1 , it is essential to investigate specialized therapeutic agents for BAP1 mutant ccRCC. Accordingly, the BAP1 mutation was selected for further investigations regarding its therapeutic potential in renal cancer.
In the DM-DG-drug network, these analyses yielded 26 compounds with potential therapeutic effects for treating BAP1-mutant ccRCC. We compared the dependency scores of specific compound targets between BAP1 mutant and WT cells from RCC to validate the effect of these potential drugs (Fig. 4C-F). Although there was no statistically difference in results, CCLs with BAP1 mutation still exhibited a trend toward lower dependency scores. Through integrating drug prediction results, survival and dependency analyses, it was found that BRD4 and PRKDC could be the optimal targets for treating ccRCC patients with BAP1 mutations (Fig. 5A). Nevertheless, above analyses alone cannot fully support the conclusion that the actual effect of compounds applied to tumors was consistent with the theoretical inference. Therefore, to explore the potential effect of these compounds in treating ccRCC, the multiple perspective approaches for drug prediction were adopted. First, the CMap analysis was utilized to find medicines whose drug signatures, namely drug-induced profiles of expression changes, were opposite to the BAP1 mutant expression pattern. A total of three compounds, including ZSTK-474, BI-2536 and PI-103, had CMap scores less than -80, representing that these drugs might have therapeutic effects in patients with BAP1 mutations. Second, we calculated the expression differences of candidate DG between normal and tumor tissue, and compounds with higher fold change values were considered as greater potential agents for ccRCC treatment. Thirdly, through searching relevant literature about compounds in Pubmed (https://pubmed.ncbi.nlm.nih.gov/), we found out the experimental and clinical evidence of candidates in treating ccRCC. Lastly, the dependency analysis of DGs across kidney CCLs was also conducted, and lower CERES or DEMETER scores denote that relevant genes are more likely to be essential for CCLs survival. All results were presented in Fig. 5B and Additional file 2: Table S7. In general, BI-2536 and PI-103, which had robust abilities in vitro and in silico evidence, were considered as the best therapeutic compounds for BAP1 specific ccRCC treatment.
In addition, the independent dataset, which comprised molecular profiles and mutation data of 246 ccRCC patients from the combined microarray cohort, was also used for further external validation. Through comparing the estimated AUC values of two specific agents (BI-2536 and PI-103) between BAP1 mutant and WT groups, the result suggested that mutant group was indeed more sensitive to both BI-2536 and PI-103 than WT group, highly consistent with the results of our in silico prediction (Fig. 5C, 5D and Additional file 2: Table S7).