Target and pathway recovery depends on network and algorithm
To understand which factors are most important when employing causal reasoning algorithms to elucidate compound mechanism of action, we performed full-factorial fixed effects ANOVA (see Materials and Methods for full details) modelling of each evaluation metric (illustrated in Figure 2) as a dependent variable and the parameters as regressors (network, algorithm, platform, gene set, and cell line). Table 3 displays the significant (p < 0.05) factors and interaction effects for each evaluation metric. For both evaluation metrics the Network and Algorithm factors as well as the Network:Algorithm interaction effect were statistically significant (Table 3, Supplementary Tables 2 & 3 for the full ANOVA results). Additionally, for the recovery of compound-associated pathways, the Algorithm:Gene set interaction effect was found to be statistically significant (p = 0.00129), however through examining the resulting interaction diagram we found that the effect was not practically relevant (only a significant but small performance increase using landmark genes vs. all genes with CARNIVAL, Supplementary Figure 2), hence we did not investigate this in more detail.
Table 3
Table of significant (p<0.05) factors and interaction effects for each evaluation metric, from ANOVA analysis
Metric | Factor(s) | p-value |
Direct target recovery | Network | 7.1 x 10−6 |
Algorithm | < 2 x 10−16 |
Network:Algorithm | < 2 x 10−16 |
Compound-associated pathway recovery | Network | < 2 x 10−16 |
Algorithm | < 2 x 10−16 |
Network:Algorithm | < 2 x 10−16 |
Algorithm:Gene set | 0.00129 |
To next understand exactly how Network:Algorithm interact, we employed post-hoc least square means tests – computing the mean metric scores for every combination of network and algorithm, and algorithm and gene set, adjusted for the means of all other factors in the models. In this way we were able to discover how, and to what extent, the networks affected the ability of each algorithm to recover direct targets or compound-associated pathways. The resulting interaction diagram for the target recovery metric can be found in Figure 3, and the full results table from the least square means analysis can be seen in Supplementary Table 4. From the interaction plot in Figure 3 it can be seen that the use of the smaller, less dense Omnipath network led to a higher mean target recovery for both the CARNIVAL (0.37 targets per compound, 6.14% significant) and SigNet (1.38 targets per compound, 23.36% significant) algorithms compared to using the larger MetaBase™ networks (for each of “High”, “Medium” and “All” confidence levels), in terms of both the overlap and the percentage of cases where this overlap was significant (p<=0.05, Fisher’s Exact Test, when considering the number of targets, size of network and number of nodes recovered). While SigNet retained the highest percentage of significant (>20% for all networks), the CARNIVAL-recovered direct targets were more statistically significant with Omnipath compared to the MetaBase™ networks (6.14% vs. 3.21 and 3.4%). Conversely, the two CausalR algorithms recovered more compound targets directly with the larger, denser MetaBase™ networks, achieving a mean target overlap of just 0.16 (CausalR Results Table) and 0.31 (CausalR Subnetwork) targets per compound with Omnipath, lower than the gene expression DEG-target overlap baseline (0.38), with the percentage significant cases also decreasing. We did not observe any significant difference in performance between the different confidence MetaBase™ networks. The improved performance of CARNIVAL with Omnipath compared to the larger MetaBase™ networks for recovering direct targets is likely related to CARNIVAL being optimised using the same Omnipath network used in this study[29]. The results also indicate that SigNet tends to highly rank true positives using the smaller Omnipath network, whereas CausalR works more effectively when there are more interactions to “reason” over.
To investigate this finding further, we also computed the Enrichment Factor of targets for each ranked list output (SigNet and CausalR results table) within the top 5%, and found that indeed, the SigNet scoring function ranks true positives more highly with Omnipath compared to MetaBase™ networks (Enrichment Factor of 6 vs. ~3), and vice versa for CausalR (Supplementary Figure 3). This is in contrast to previous literature which suggests that the potential negative effects of noise are outweighed by a comprehensive inclusion of interactions when using biological networks as prior knowledge for node prioritisation algorithms [22, 50]. In fact, we can conclude from our analysis that - when using causal reasoning algorithms to prioritise compound targets – this behaviour is wholly dependent on the algorithm being used; the CausalR algorithms do benefit from a large prior knowledge network, whereas SigNet and CARNIVAL recover compound targets more effectively with the smaller Omnipath network which contains less potential noise or false positive interactions.
Overall, we found that SigNet with Omnipath showed the best performance with regard to recovering compound MoA in terms of direct targets (Figure 3). This performance was far higher than the gene expression baseline, indicating that differentially expressed genes only infrequently correspond to modulated targets. This general finding is in agreement with previous studies; Jaegar et al found that 15% of the time, and Iskar et al 8% of the time, did drug target expression level change upon drug treatment[21],[42]. The best individual overlap of DEGs with targets was 1.1 targets per compound, only 9.3% of overlaps being statistically significant (p<= 0.05), with all genes measured in the PC3 cell line using CMap data (Supplementary Figure 5). Hence, causal reasoning approaches such as SigNet provide a way to better relate downstream transcriptional changes to upstream target engagement than solely looking at DEGs themselves.
When considering the recovery of compound-target associated pathways (Figure 4A, full least square means analysis results in Supplementary Table 5), pathways derived from causal nodes from SigNet with the Ominpath network had the most significant least square mean enrichment (-log10(p-value)) of compound-associated pathways, with an enrichment value of 16.8, with the worst overall performance seen for the CausalR results table (and enrichment values between 1.02 and 2.08). The CausalR network output achieved a better performance over the results table (enrichment values between 7.91 and 8.83), meaning that CausalR does not highly rank direct compound protein targets, but it is able to find intermediate connecting nodes which represent proteins participating in target-annotated signalling pathways, and hence are potentially related to compound mechanism of action. Additionally, the results do not indicate that any one network tested has a significantly higher performance for the CausalR subnetwork results, according to the pairwise mean comparisons in Figure 4A. Like SigNet, CARNIVAL showed a significantly higher performance (enrichment value of 6.48) with the Omnipath network compared to the MetaBase™ networks (enrichment values of 2.19, 2.85 and 3.07 for Medium, High, and All, respectively) Notably, all least square mean enrichment results improved on the average gene expression baseline (enrichment value of 0.66), meaning that the Causal Reasoning algorithms were better able to recover relevant signalling proteins compared to using differentially expressed genes alone as a proxy for modulated signalling mediators. The best individual baseline (DEG) performance using gene expression data was using the CMap data, all genes, in the PC3 cell line, with a mean enrichment value of 2.4 (Supplementary Figure 6) – in agreement with Liu et al’s findings that CARNIVAL outperformed DEG enrichment for recovering relevant pathways[29].
As well as considering the enrichment p-value as a performance measure, we next aimed to understand how informative, and hence practically useful for understanding compound mechanism, the recovered pathways were. To quantify this, we assumed that pathways which are lower down in the Reactome pathway hierarchy (e.g., the high-level Cell Cycle pathway vs. the lower-level Stabilization of p53 pathway), and that contain fewer genes/proteins, are more informative than higher-level, larger pathways. Figure 4B and 4C show the mean results for recovered pathway size (number of attributed proteins/genes), and hierarchy (number of superpathways), respectively. It was found that, despite the good performance in terms of statistical significance, SigNet with Omnipath pathways were both larger (300 genes on average) and higher up in the Reactome hierarchy (3.6 superpathways on average) compared to the MetaBase™ networks (gene set size below 160, number of superpathways greater than 4). CARNIVAL also recovered pathways that were larger (225 genes on average), but lower down in the Reactome hierarchy (4.9 superpathways on average), with Omnipath compared to the MetaBase™ networks (gene set size below 160, number of superpathways about 4.5 on average). Pathways recovered by the CausalR results table were the least informative, with the lowest number of superpathways on average, while the CausalR subnetwork recovered pathways had better results in comparison (smaller gene set sizes and greater number of superpathways), showing that the subnetwork methodology is superior to the results table when using CausalR to retrieve compound-associated pathways. Other than with respect to average pathway size for SigNet with Omnipath, all combinations of network and algorithm recovered more informative pathways compared to average baseline gene expression results (gene set size = 280, number of superpathways = 1.9), indicating that pathways recovered from DEGs generally capture higher-level processes compared to causally inferred proteins. This was also found in an application of causal reasoning to understand the mechanisms of a DGAT1 inhibitor, where the authors found that enrichment with gene expression data generally pointed to higher-level processes[18]. We believe that this is due to transcriptional changes capturing the effect of protein signalling, while causal nodes represent the signalling proteins themselves.
In general, the causal reasoning methodologies outperformed the DEG baseline results when deriving compound mechanism of action from gene expression data. We also found that, based on our experimental setup, the algorithms are generally robust with respect to the choice of transcriptomics platform, gene set and cell line – with these not having any significant effect on the performance metrics based on our full factorial fixed-effects models, except for the Algorithm: Gene set finding which only had a marginal influence with the CARNIVAL algorithm. Our study hence indicates that the LINCS L1000 data is therefore suitable for use with causal reasoning algorithms, at least based on the other factors used in this work, and that the 978 landmark genes are truly informative enough to gain insight into compound mechanism of action, as hypothesised in the original publication[9]. We do note however that the cell lines used in this study (PC3 and MCF7, chosen as they were also present in the original CMap), are transcriptionally similar on the baseline level (Supplementary Figure 7), and the use of other cell lines (e.g., non-cancer, or indeed in vivo models) is likely to alter performance.
Target recovery with Causal Reasoning is dependent on network bias and biological function
We next investigated performance of the algorithms as a function of the connectivity of a protein target in the network, due to the known connectivity bias present in biological networks[51], as well as its biological function.
For each combination of network and algorithm we calculated how many times each target was recovered. We then normalised this value to account for annotation prevalence by dividing it by the number of times the target was annotated in the compound set. Finally, we calculated the Spearman rank correlation of the normalised target recovery with their degree in the corresponding PPI network (Figure 5). The lowest correlation can be seen for the CausalR results table (mean of -0.04) which is likely due to the fact that each protein is given a significance value to explicitly correct for the known connectivity bias, and we take as output only the nodes with p <= 0.05. The correlation between target recovery and network connectivity was highest using the CausalR subnetworks (mean of 0.72) – an explanation for this is that they connect key drivers to input TFs through correctly explained interactions, and will therefore go through “hub” nodes more often. Despite this large difference in correlation, the two CausalR outputs performed roughly similarly in terms of direct target recovery (Figure 3), which indicates that the CausalR ranked table is better able to prioritise less-connected (and hence less-studied) targets compared to the subnetwork output. SigNet (mean of 0.21) and CARNIVAL (mean of 0.23) showed roughly similar correlation patterns, with the correlation between target recovery and network connectivity increasing (to 0.38 and 0.56, respectively) with use of the Omnipath network, corresponding to their increased performance with this network (Figure 3). Additionally, we found that targets recovered with the Omnipath network showed a higher correlation with network connectivity (mean of 0.42) compared to the MetaBase™ networks (means of 0.25, 0.24, 0.24), this is potentially due to the small size of the network making hub effects more prominent. Overall, target recovery performance with causal reasoning is generally associated with network connectivity – excluding the targets recovered with the CausalR results table. We further examined the degree distributions of targets vs non-targets on each network, finding that drug targets were more often found to have a higher connectivity on each network compared to non-targets (Supplementary Figure 8, and generally seen in previous studies[52]). Therefore, a high correlation between target recovery and network connectivity is not necessarily detrimental - however, this bias would affect the recovery of less-studied proteins which must be kept in mind depending on the disease area being studied.
We next analysed the best-performing algorithm (namely, SigNet with the Omnipath network) in more detail with respect to its ability to recover targets across different protein classes (annotations retrieved from ChEMBL), comparing the findings with SigNet with the full MetaBase™ network, the results of which are shown in Figure 6. We computed protein class recovery in the same way as target recovery, calculating how many times a target in protein class was recovered and normalising this value by the annotation prevalence of the protein class in the compound set, converting this value to an overall percentage. Protein classes which had a higher connectivity in the Omnipath network such as transcription factors and protein kinases were recovered more often with SigNet (37% and 23%, respectively), while those with lower connectivity such as hydrolases and other enzymes had a much lower recovery (1% and 0.7%, respectively). In the case of the MetaBase™ network, we found that nuclear receptors were recovered frequently (22%) despite their relatively low connectivity compared to protein kinases and other cytosolic proteins. This could be due to the fact that nuclear receptors are just upstream from transcription factors[53] hence such targets are recovered more easily from transcriptomics data. The findings were consistent with the observed correlations in Figure 5, in that the protein class recovery was more dictated by node connectivity with the Omnipath network compared to the MetaBase™ network.
We next sought to understand how the recovery of signalling pathways related to protein class and function. To this end, we plotted the distribution of compound target-associated pathway enrichment significance for compounds targeting proteins in different classes (Figure 7). We chose to focus on SigNet and CARNIVAL with Omnipath because both showed a high performance, SigNet in terms of statistical significance, and CARNIVAL in terms of recovering informative pathways. In general, we found the highest performance for compounds targeting protein kinases and ligand-gated ion channels, which are both key mediators of cellular signalling – protein kinases transmit cellular signals through phosphorylation, and ligand-gated ion channels function to receive and transmit signals. The worst performance was seen for compounds targeting nicotinic acetylcholine and monoamine receptors; these receptors modulate signalling in the CNS [54, 55], and were additionally not expressed in high levels in the breast-cancer and prostate-cancer cell lines used (Supplementary Figure 7), hence we propose that the biological context of the cell-lines used influenced the results. We hypothesise that gene expression data measured in biological models derived from the CNS would lead to a higher recovery of such signalling pathways. We note that this particular analysis was complicated by the fact that compounds can target proteins from multiple classes.
Overall, the results of this section (Figures 5-7) show that the performance of Causal Reasoning algorithms for recovering compound targets, and compound-associated pathways, is not equal across protein classes. The connectivity of targets on the prior knowledge network were shown to heavily impact their direct recovery by the algorithms, with algorithms which correct for uneven connectivity (CausalR) showing less of an association between target degree and its successful recovery. The biological role of the considered targets was also reflected in the results, with protein kinases recovered most successfully both in terms of direct target recovery, and the recovery of relevant pathways. A potential way to mitigate the connectivity bias is for random simulation studies be carried out to identify which network nodes may be recovered by chance, an approach which has been used previously[41]. We note that one argument against this is that well-connected nodes in networks are well-studied, and have found to be essential proteins with key roles in diseases[51], and correcting against them could lead to discarding potential true positives, so that the bias described in this section is at least to an extent also desired and useful for elucidating compound targets.
Case Study – Understanding the Mechanism of Action of Chloroquine
To demonstrate the utility of causal reasoning algorithms to recover compound mechanism of action, we now present a case study for the compound Chloroquine. We chose to use CARNIVAL with the Omnipath network for this case study due to the observed good performance for recovering informative compound target-associated pathways, and the fact that the algorithm is implemented in a free, open-source R package. Briefly, we took the output for Chloroquine (CQ) generated as described in Materials and Methods, retaining network nodes appearing in 6 or more networks across all 12 outputs (generated with the two cell lines, two platforms, and three input gene-sets used in this study), herein named consensus nodes. We then used the consensus nodes to performed pathway over-representation analysis with the ReactomePA package, using all Omnipath network nodes as the universe. The top 20 most significantly enriched (BH-adjusted p-value) pathways, with their relevance to the MoA of Chloroquine (if applicable) are outlined in Table 4. The full pathway enrichment results, as well as the node counts can be found in Additional File 2. The pathways observed in Table 4 encompass literature-confirmed specific mechanisms of Chloroquine for various indications, including inflammatory mediation (MAP kinase activity, AP-1 activation, interleukin signalling, MyD88 cascade), anti-cancer action (apoptosis, mitotic cell-cycle, PKC signalling), other processes such as NMDAR-LTD, RUNX2 expression, and cellular senescence. For some pathways, namely “PIP3 activates AKT signaling” and “MyD88:MAL(TIRAP) cascade initiated on plasma membrane”, there were no literature evidence, and there were also some very general pathways enriched (“Developmental Biology”, “Diseases of signal transduction”, “Intracellular signaling by second messengers” and “Disease”). Though TLR9 signalling, a known MoA of CQ, did not appear in the top 20 by statistical significance, the “Toll Like Receptor 9 (TLR9) Cascade” was enriched with an adjusted p-value of 2.48E-05 and an Odds Ratio of 9.47 (Additional File 2).
Table 4
Reactome pathway enrichment results for consensus nodes (present in >=6 outputs across all combinations of platform, gene set and cell line) derived from CARNIVAL analysis of Chloroquine-induced gene expression using the Omnipath network. The top 20 most significantly enriched pathways (BH-adjusted p-value) are shown here with their corresponding Odd’s Ratio (OR).
Pathway Name | p. Adjust | OR | MoA Relevance |
Developmental Biology | 1.09x10−06 | 4.70 | (General pathway) |
MAPK targets/ Nuclear events mediated by MAP kinases | 8.07x10−06 | 27.45 | CQ interferes with the activation of ERK-MAP kinase proteins to regulate TNF transcription for anti-inflammatory effects[56] |
Activation of the AP-1 family of transcription factors | 6.13x10−05 | 54.91 | CQ shows immunomodulatory effects in T-cells through activation of AP-1 [57] |
Intrinsic Pathway for Apoptosis | 6.32x10−05 | 17.16 | CQ was found to induce intrinsic apoptosis in cancer cells, both alone and through synergistic effects with other treatments[58–60] |
Signaling by Interleukins | 6.23x10−05 | 4.87 | CQ modulates interleukin release, promoting Th17 cell inflammation[61] |
MAP kinase activation | 1.64x10−04 | 13.73 | CQ interferes with the activation of ERK-MAP kinase proteins to regulate TNF transcription for anti-inflammatory effects[56] |
Diseases of signal transduction | 1.64x10−04 | 5.56 | (General pathway) |
Transcriptional regulation by RUNX2 | 1.64x10−04 | 13.50 | Inhibition of lysosome function by CQ in vascular smooth muscle cells significantly enhanced RUNX2 expression[62] |
Interleukin-17 signaling | 1.76x10−04 | 13.07 | CQ-treated Langerhans-like cells promoted IL-17 secretion by T cells[61] |
Intracellular signaling by second messengers | 1.96x10−04 | 6.21 | (General pathway) |
Gastrin-CREB signalling pathway via PKC and MAPK | 2.34x10−04 | 30.51 | CQ activates p38 MAPK and stimulates PKC translocation in glioma cells, and Gastrin-releasing peptide has been found to mediate the CQ itch response[63, 64] |
Cell Cycle, Mitotic | 2.50x10−04 | 5.07 | CQ treatment shows anti-cancer effects through G(2)/M (mitotic) phase arrest in a human breast cancer cell line, and potentiates the effect of anti-mitotic drugs in resistant cancer cells[65],[66] |
Senescence-Associated Secretory Phenotype (SASP) | 2.68x10−04 | 11.44 | CQ has been shown to significantly inhibit the widely-used biomarker of cellular senescence, (beta)-galactosidase, in endothelial cells[67] |
CREB1 phosphorylation through NMDA receptor-mediated activation of RAS signalling | 2.68x10−04 | 27.45 | A study of CQ-induced autophagy inhibition found an induction of NMDAR-LTD (NMDA receptor-dependent long-term depression) [68] |
Toll Like Receptor 10 (TLR10) Cascade | 2.68x10−04 | 10.84 | No literature evidence for TLR10 and TLR5, but CQ has been shown to inhibit TLR9 signalling[69] |
Toll Like Receptor 5 (TLR5) Cascade | 2.68x10−04 | 10.84 |
MyD88 cascade initiated on plasma membrane | 2.68x10−04 | 10.84 | CQ blocks MyD88 signaling by decrease in the levels of downstream signalling molecules[70] |
Disease | 2.68x10−04 | 3.79 | (General pathway) |
PIP3 activates AKT signaling | 2.77x10−04 | 6.50 | No literature evidence |
MyD88:MAL(TIRAP) cascade initiated on plasma membrane | 2.77x10−04 | 10.17 | No literature evidence for this specific pathway |
This case study has hence demonstrated the use of causal reasoning algorithms with gene expression data for informing of specific mechanistic pathways for drug action, including off-target interactions such as CQ-induced itch response. We note that none of the known targets of CQ appeared in the consensus node list (Additional File 2), but we argue that the knowledge of targets can be derived from the pathways, and complementary approaches such as structure-based target prediction can be carried out to hypothesise direct target engagement.
Limitations of this study
While we aimed for a comprehensive parameter exploration and benchmarking of causal reasoning algorithms in this work, it still has some limitations as well. Firstly, we were limited by the annotations available for the compounds used, and chemical space (and mode of action) coverage in this set in the first place. This has profound implications for our work (and indeed, in any work where a ‘ground truth’ must be set for the mode of action of compounds): Different areas of chemical and mode of action space behave differently, and assuming unavailable data as inactive punishes ‘false positives’, which may very well be novel true positives which are just not annotated as such. Furthermore, the causal reasoning algorithms additionally infer node directionality (i.e., whether the recovered signalling proteins activated or inhibited), which we did not consider when benchmarking the results as it was not possible to obtain consistent and complete functional pharmacology information about the compound-target interactions (i.e., are they activated or inhibited upon pharmacological modulation).
Furthermore, the cell lines used in the gene expression experiments considered (MCF7 and PC3) are quite similar in terms of baseline gene expression (Supplementary Figure 7), which is why we potentially did not see any significant difference in performance when using data derived from either cell line, but these are the only cell lines available in the original CMap. As LINCS provides data derived from 99 cell types[9], it would be interesting to investigate these other cell lines and relate these to the applicability domain of the methodologies – for example, non-cancer cell lines from a variety of tissues such as HA1E (normal kidney) or CNS cells such as NPC/NEU. It is also important to note that cell lines are in vitro models which cannot necessarily recapitulate in vivo processes, and the limitations of using cell lines in general has been extensively discussed[71, 72].