The cellular response to drug perturbation is limited: comparison of large-scale chemogenomic fitness signatures

doi:10.21203/rs.3.rs-781592/v1

Download PDF

Research Article

The cellular response to drug perturbation is limited: comparison of large-scale chemogenomic fitness signatures

https://doi.org/10.21203/rs.3.rs-781592/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 11 Mar, 2022

Read the published version in BMC Genomics →

You are reading this latest preprint version

Background

Chemogenomic profiling is a powerful approach towards understanding the genome-wide cellular response to small molecules. Developed in Saccharomyces cerevisiae, chemogenomic screens provide direct, unbiased identification of drug target candidates as well as genes required for drug resistance. While many laboratories have performed chemogenomic fitness assays, they have not been assessed for reproducibility and accuracy. Here we analyze the two largest independent yeast chemogenomic datasets comprising over 35 million gene-drug interactions and more than 6000 unique chemogenomic profiles; the first from our own academic laboratory and the second from the Novartis Institute of Biomedical Research (NIBR).

Results

Combining the datasets revealed robust genetic interaction response signatures that point to common mechanism of action, despite the substantial differences in experimental and analytical pipelines. We previously reported that the cellular response to small molecules is limited and can be described by a network of 45 chemogenomic signatures. In the present study, we show that the majority of these signatures (66%) are also found in the companion dataset, providing further support for their biological relevance as systems-level, small molecule response systems.

Conclusions

Our results demonstrate the robustness of chemogenomic fitness profiling in yeast, while offering guidelines for performing other high-dimensional comparisons including parallel CRISPR screens in mammalian cells.

Epigenetics & Genomics

genomics

chemogenomics

fitness assay

reproducibility

Saccharomyces cerevisiae

drug target

mechanism of action

HIPHOP

A major, persistent challenge in drug discovery is validation of the molecular targets and the target pathways that can be modulated by bioactive small molecules in cellular assays. This is especially true for target-based approaches where drug candidates are selected based on high throughput biochemical screens, because their behavior when tested in cells can be unpredictable. Drugs that fail in the clinic often do so because of incomplete characterization of their effects in vivo. Perhaps, as a consequence, phenotypic, cell-based screens have seen renewed interest. Yet, despite advances in the complexity and sophistication of phenotypic screens, the unambiguous assessment of a drug’s primary, secondary and tertiary effects, in vivo, remains a significant challenge. Successful implementation of chemogenomic assays and a computational framework with which to analyze them would help bridge the gap between bioactive compound discovery and drug target validation.

Chemogenomics integrates drug discovery and target identification through the detection and analysis of chemical-genetic interactions. Despite the increase in such studies (including an increasing number that are performed at single-cell resolution), most chemogenomic methods currently rely on correlation to infer drug-target interactions; i.e. few directly identify drug-target chemical-genetic interactions [1]. For example, genome-wide differential expression analysis (aka transcriptomics) is one strategy used to probe Mechanism of Action (MoA). In these studies, gene expression changes induced by chemical perturbation are typically compared to a compendium of profiles (derived from genetic perturbations and compounds of known mechanisms) to uncover “guilt- by-association”, e.g., https://lincsproject.org/LINCS/. In practice, the profile with the best “match” is then used to infer the drug target through the assumption that the expression profile of a chemically induced knockdown of the drug target will mimic a genetic mutation of the drug target or cells treated with a compound of the same MoA. Such approaches, while having been greatly expanded in the past decade, still depend on the composition and quality of their reference database and are therefore prone to systematic bias and lab-to-lab variations. Further complicating differential expression approaches is the fact that a genetic knockdown or knockout often lacks a discrete phenotype but nevertheless results in the differential expression of hundreds or thousands of transcripts. In contrast, drug perturbation of the proteins encoded a locus (or loci) of interest is consequential. By way of example, nocodazole treatment will depolymerize microtubules composed of multiple tubulin isoforms and thereby result in a phenotype, a genetic perturbation of a single isoform may have no effect.

Finally, despite the impressive scale and scope of expression consortia such as the LINC group, such approaches are challenging to compare across experimental platforms or between laboratories.

Encouragingly, assays that directly identify drug-target interactions such as gene-editing assays (including pathway-wide and genome-wide loss-of-function CRISPR-Cas9 screens) are being industrialized. In these assays, robustness and quality control issues are being addressed directly [1]. However, with a few notable exceptions [2], standardized protocols are still lacking, and laboratory-to-laboratory reproducibility remains challenging [3, 4].

Chemogenomic profiling, first developed in yeast, identifies novel therapeutic targets in a cellular context and can therefore provide strong evidence for direct drug-target engagement [5, 6]. These functional genetic screens provide mechanistic insight because they report all chemical-genetic interactions that are required for drug resistance. In yeast, the HaploInsufficiency Profiling and HOmozygous Profiling (HIPHOP) platform [5, 6] employs the barcoded heterozygous and homozygous yeast knockout collections. HIP exploits drug-induced haploinsufficiency; a phenotype where strain-specific sensitivity (decreased growth rate) is observed in a heterozygous strain deleted for one copy of an essential gene upon exposure to a drug targeting the product of this gene. In HIP, the 20bp molecular identifiers unique to each strain allow the ~1100 essential heterozygous deletion strains to be grown competitively in a single pool and fitness to be quantified by barcode sequencing. The resulting fitness defect (FD) scores report the relative abundance, and therefore the drug sensitivity of each strain. Those heterozygous strains (deleted for essential genes) with the greatest FD scores identify the most likely drug target candidates. Similarly, the complementary HOP assay interrogates ~4800 non-essential homozygous deletion strains; and identifies genes involved in the drug target biological pathway and those required for drug resistance. The combined HIPHOP chemogenomic profile, reporting drug-target candidates in the HIP assay and genes required for resistance in the HOP assay, provides a comprehensive genome- wide view of the cellular response to a specific compound [5, 6].

Here we present a comparative analysis of the two largest independent yeast chemogenomic HIPHOP datasets published to date [6, 7]. Specifically, we compare a dataset generated in our lab (aka HIPLAB) [6] to a dataset generated by a group at the Novartis Institute for Biomedical Research (NIBR) [7]. The datasets are distinct; they were obtained from two independent platforms, using different experimental designs and distinct analytic pipelines (Table 1). The primary aims of this study were to 1) assess the data concordance at different levels of analysis 2) to assess their reproducibility and 3) to analyze the NIBR and HIPLAB datasets in parallel so that both datasets might be more broadly used by the research community. A secondary aim was to identify any biological themes in the combined data that were not obvious from either of the individual datasets.

Our comparison shows excellent agreement between chemogenomic profiles for established compounds and correlations between entirely novel compounds. Our analysis revealed global properties common to both datasets, including specific drug targets, correlation between chemical profiles with similar mechanism and cofitness between genes with similar biological function. Unique features of each dataset were also uncovered. In our previous report, we identified 45 major cellular response signatures [6]. We also hypothesized that these 45 signatures were comprehensive because, in our simulations, we found that 80% of these clusters would have been identified after screening < 30% of the ~3200 compounds. In the new independent analysis presented here, we found that the majority of these signatures (66.7%) are also present in the NIBR dataset- an observation that supports the fundamental biological relevance of these 45 core drug responses. In addition, by combining the two datasets we were able to: 1) identify robust chemogenomic responses both common and research site-specific, the majority (81%) enriched for GO biological processes and associated with gene signatures 2) infer chemical diversity/structure and 3) gauge screen-to-screen reproducibility within replicates and between compounds with similar MoA. We present the data on a website that provides a resource for the discovery of functional interactions between genes, compounds and biological processes (Comparative chemogenomics).

Overview of NIBR and HIPLAB screens

Because all our comparisons are based on the ability to compare both datasets, we describe each dataset in detail. The data processing strategies of the raw data were fundamentally different between the two research sites (Table 1). In the HIPLAB dataset, the raw data was normalized separately for the strain-specific uptags and downtags, independently for the heterozygous and homozygous strains, creating 4 sets of results: uptag/het, uptag/hom, downtag/het, downtag/hom. For each set, logged raw average intensities were normalized across all arrays using a variation of median polish that incorporates batch effect correction [6]. Because the performance of the two tags in each strain can vary significantly, a ‘best tag’ was identified for each strain, defined as the tag with the lowest robust coefficient of variation across all of the control microarrays. For each array, tags were removed if they did not pass the computed compound and control background thresholds, calculated from the median + 5MADs of the raw signal from the unnormalized intensity values of the used (corresponding to strain tags) and unused (control) features on the array across all arrays. In contrast, in the NIBR dataset, arrays were normalized by “study id”, (a set of ~40 compounds) but were not corrected for batch effects. Rather, tags that performed poorly, based on their correlation values of uptags and downtags across different intensity ranges in the control arrays, were removed and the remaining tags were averaged to obtain strain intensity values.

In the HIPLAB dataset, relative strain abundance was quantified for each strain as the log₂ of the median signal in the control condition divided by the signal from the compound treatment. The final fitness defect (FD) score is expressed as a robust z-score where the median of the log₂ ratios for all strains in a given screen is subtracted from the log₂ ratio of a specific strain and divided by the MAD of all log₂ ratios for all strains in that screen. In the NIBR dataset, the inverse log₂ratio_HIPLAB was used with three differences: 1) average intensities of controls were used (instead of median signals) and 2) because NIBR used replicates for each compound, the average of signals of the compound samples were used instead of a single value (Table 1) and 3) the final gene-wise z-score normalizes for median and standard deviation of each strain across all experiments using quantile estimates (see Methods).

Both laboratories constructed pools of heterozygous and homozygous strains in a similar manner and collected samples robotically for both the HIP and HOP assays as previously described [8]. For NIBR experiments, samples were collected at fixed time points (which served as a proxy for the number of cell doublings), whereas in the HIPLAB experiments cells were collected based on actual doubling time. Notably, in the NIBR pools, ~300 strains fewer homozygous deletion strains were detectable compared to the HIPLAB pools. These strains correlate with strains known to be slow- growers in the absence of drug [9] and their absence is likely due to the fact that the pool was allowed to grow overnight (~16hrs) in the NIBR assays, during which slow-growing strains drop out before the start of the experiment.

Another difference between protocols was that NIBR screened all heterozygous strains, deleted for both essential and nonessential genes, while the HIPLAB screened only the essential heterozygotes. We decided against screening non-essential heterozygotes based on the following logic: because the concept of the HIP assay relies on a fitness defect resulting from gene dosage being decreased from two copies to one in a heterozygous diploid deletion strain, it follows that such fitness defects should not be observed if that gene is not required for growth, as is the case for nonessential genes [10]. Indeed, we find in practice that the HIP profiles of the nonessential heterozygotes do not correlate with HOP profiles for the same drug, nor are these nonessential heterozygote profiles biologically informative. This is illustrated by the profiles for DNA damaging agents. For example, In HOP screens of nonessential deletion strains, RAD genes have high FD scores in the presence of a DNA damaging agent (mechlorethamine), but none of these strains were sensitive as heterozygotes (Figure S1). The exception to this is the small number of nonessential heterozygous strains that exhibit severe fitness defects as homozygotes. As these strains exhibit ‘nearly essential’ phenotypes as homozygotes, they would be expected to exhibit drug-induced haploinsufficiency as heterozygotes and therefore should be included in the HIP assay.

We next compared the depth and breadth of each screening dataset. The NIBR screening library included 1641 propriety compounds and 135 reference compounds with known mechanisms of action. In total there were 2956 HIP and 2923 HOP experiments, for 1776 discrete chemical structures.

However, because NIBR HIP screens included heterozygous strains deleted for both essential and nonessential genes (as mentioned above) when we combined the HIP and HOP NIBR datasets for shared compounds 2725 full HIPHOP screens spanning 1771 distinct compounds remained. ~56% of the NIBR screening library, however (representing 596 compounds) could practically be considered replicate screens because they exhibit correlations on par with true replicates, even though they were screened at different concentrations. For example, we observe such “practical replicates” when a particular compound is screened at a different concentration, yet the level of inhibition is comparable. Supporting this observation, the majority of such “replicates” clustered together (~65%; those with more than one replicate are included if at least one pair is clustered together). Given these experimental caveats, the informative datapoints were reduced from ~30 million to ~15 million due to the nonessential heterozygotes, and to ~9 million unique datapoints if the replicates were excluded.

The HIPLAB screening library comprised 3356 screens and 3250 unique compounds selected from a set of > 50,000 maximally diverse small molecules (~20 million data points) with unknown mechanisms and ~characterized drugs or chemical probes.

The structural diversity of the screening libraries reflects the scale of a large screening effort. While NIBR did not provide the compound structures of their libraries, they reported that 50% of the pairwise comparisons between compounds had Tanimoto coefficients less than 0.1 [7]. In comparison, the HIPLAB compounds were of lower diversity; ~43% of the pairwise comparisons had Tanimoto coefficients less than 0.1. Because the NIBR structures were not provided, however (with the exception of 135 reference compounds and 15 novel inhibitors) this claim is not verifiable [7].

Coinhibition between chemogenomic profiles

To compare the HIPLAB and NIBR screens, we first compared ~150 chemogenomic profiles representing ~50 reference compounds with known MoA that were screened by both NIBR and HIPLAB (Table 2). For many of these compounds, the drug target is well-established in yeast. Chemogenomic profiles were compared individually using ‘coinihibition’ values, where coinhibition is defined as the degree of similarity between two chemogenomic profiles, i.e., the FD scores across all genes in each screen, using Pearson correlation as a metric. The HOP profiles for the mechlorethamine, a DNA damaging agent, identified a similar set of DNA repair genes including RAD1, RAD2, RAD4, RAD5, RAD10, RAD14, RAD18, REV7, REV3, SRS2 and PSO2 (Figure 1A). Likewise, we did a pairwise comparison of four nocodazole chemogenomic profiles exhibiting between-drug correlations of 0.48 and greater across the entire set of deletion strains (Figure 1B). These correlation values increased when comparing only those individual genes exhibiting significant FD scores in the NIBR and HIPLAB in the HIP nocodazole profiles (Figure 1C). In this case, the HIP genes identified are enriched for genes required for tubulin folding (CCT genes). Finally, based on a correlation value of > 0.5 with the nocodazole profiles, we highlight the HIP profiles of two novel compounds, NIBR 2667 and HIPLAB 5790901, both identifying a nearly identical set of genes (Figure 1D). It should be noted that because the screens were performed at different concentrations, a linear correlation of one is not expected.

In addition to measuring the correlation between chemical profiles, a valuable metric is the correlation of gene fitness scores across compounds and between datasets. For this comparison we employed ‘cofitness’; the degree of similarity between fitness profiles in which the FD scores between two genes are measured across all compounds, using Pearson correlation as a metric. Genes that exhibit a high degree of correlation or cofitness between fitness profiles across compounds are often functionally related. In this case, where we have two independent datasets, we expect the same gene to be cofit across the 50 compounds that were shared between the two datasets. Overall, we observed an overall correlation of ~0.15 between the same gene. Because most genes are not perturbed in any given experiment, we expect that those with highly variable scores (therefore more likely to be more biologically informative) to exhibit greater correlation. This is indeed the observation, when only significantly sensitive strains are considered (genes with standard deviations in the top 5%) the correlation between genes increases to ~0.5. As the correlation increases, the mechanistic similarity between drugs that significantly perturb a given deletion strain also increases. For example, while the IDP1 gene profiles exhibit a similar pattern of perturbation (R-value ~0.4) (Figure 2A), RAD5 and HMG1 exhibit higher correlations (R-value ~0.7, ~0.9, respectively) and significant perturbations are seen in mechanistically related compounds such as 1) the DNA damaging agents’ hydroxyurea, mechlorethamine and methyl methanesulfonate (MMS) and 2) the sterol pathway inhibitors fluconazole and fluvastatin (Figure 2B, 2C). Similarly, in the case of TOR1 (R-value 0.85), outlier fitness deviations all arise from the same compound (rapamycin) (Figure 2D). When we examined the genes exhibiting cofitness within the shared 50 compounds, we observe enrichment for pairs in both sets that reflect the mechanistic enrichment of the compounds as a whole. For example, 6 of the 50 compounds were DNA damaging agents, and as a result, several of the top cofit genes were pairs where both genes were involved in DNA damage.

To examine the agreement between the NIBR and HIPLAB datasets at a more comprehensive level, we combined, and then hierarchically clustered the two HIPHOP datasets together for a subset of mechanistically related compounds. In the first case ~100 compounds representing 19 distinct mechanistic classes including TOR signaling, microtubule poisons, FAS1 inhibitors, cell wall inhibitors, statins, ionophores, ion channel blockers, azoles and morpholine antifungals, (Figure 3A), and in the second case ~40 DNA damaging agents representing eight mechanistic classes including the drug and tool compounds doxorubicin, camptothecin, hydroxyurea, mechlorethamine and MMS (Figure3B). In the resulting heatmaps, in both cases, the two identical dendrograms reveal that the screens cluster primarily by the mechanism of drug action and not by the research institute. Screens from NIBR and HIPLAB were interspersed, and all replicates and compounds with the same mechanism clustered together. In the DNA damaging clustergram, one notable exception was observed for two aclarubicin profiles (one from each research site) that did not cluster with the other anthracycline compounds including doxorubicin, daunorubicin and epirubicin. These differences between specific anthracyclines likely reflect true mechanistic differences between these closely related compounds [11, 12]. For example, the individual aclarubicin HIPHOP profiles implicate RPO31 (encoding an RNA polymerase III subunit) as a potential target. In select cases, compounds with similar mechanisms (i.e., part of the same pathway) also clustered together, including the morpholine antifungals, e.g. fenpropimorph and amorolfine, both targeting ERG2, the azoles, e.g. fluconazole and clotrimazole, targeting ERG11, and the statins, e.g. atorvastatin and fluvastatin, targeting HMG1, with all three targets in the sterol biosynthesis pathway). Other examples include clustering of ion channel blockers next to ionophores, amiodarone and nigericin, respectively, and clustering of rapamycin next to caffeine, known to target the TOR pathway in Saccharomyces cerevisiae (Figure 3A).

Common response signatures

Our previous global analysis of the HIPLAB dataset [6] revealed that, despite the complexities of pharmacological inhibition, the cellular response to small molecules is limited and can be described by a network of 45 major response signatures. These responses comprise chemogenomic profiles with; 1) a characteristic gene signature, 2) distinct GO enrichments and 3) enriched chemical sub-structures. In this 2014 study we found that, by subsampling, the majority of these signatures (~80%) could be identified after screening less than 30% of the compounds, suggesting that the cellular response to small molecules is limited. To test if these response signatures are also present in the NIBR dataset, we used the same methodology as in Lee et al., (2014)[6] to hierarchically cluster the NIBR screens using coinhibition as a distance metric and a dynamic branch cutting method [13] to generate discrete clusters. 96 robust clusters were initially identified covering ~41% of the profiles. Compared to the 45 major responses in the HIPLAB dataset covering ~36% of the profiles, the number of NIBR response signatures was two-fold greater. However, many of these signatures were redundant with respect to their GO enrichments and associated gene signatures. Gene signatures were also longer, as would be expected when clusters are small and when compounds within a cluster are replicates. While it is not entirely clear, one explanation for this observation is that the NIBR screening library contains a large number of replicates (56% of all screens) which would produce partially redundant clusters. To identify discrete clusters with minimal redundancy, dynamic branch cutting parameters were modified to be less sensitive to smaller clusters (see Methods), which resulted in a final set of 42 robust NIBR clusters, comparable to the 45 HIPLAB response signatures. The median number of genes in the response signatures was similar between the final NIBR signatures (7 genes) and HIPLAB signatures (8 genes). Using the overlap between gene signatures to measure similarity between response types, we found that ~66.7% of the 45 major HIPLAB response types were detectable in the NIBR clusters. These common signatures include; iron & copper homeostasis, cell wall signaling, mitochondrial stress, and perturbation of the plasma membrane. More specific responses, often including drugs of known mechanism, included the responses: unfolded protein, anthracycline transcription coupled DNA repair, azoles and statins, ERAD & cell cycle, heme biosynthesis & mitochondrial translocase, NEO1-PIK1, tubulin folding & SWR complex, superoxide and DNA damage.

The majority of these conserved chemogenomic response signatures are enriched for biological processes, details of which can be visualized at the accompanying website (Comparative chemogenomics). Taken together, these results provide further support for the concept that the cellular response to small molecules is limited and that it can be defined by chemogenomic signatures.

Because the chemogenomic signature comparison may be impacted by biases in screening library composition, we asked which of the final 42 responses were unique to the NIBR dataset. Response signatures that were not detectable in the HIPLAB responses included those comprising the three TOR signaling clusters, the GPCR inhibitor response as well as the eukaryotic translation initiation factor (eIF) complex inhibitor signature. Other responses were also gene/target-specific and included: inhibitors of VRG4, encoding a Golgi GDP-mannose transporter, RPL15A & SPP41, encoding a ribosomal gene and a regulator of spliceosome components, respectively, and FAS1, encoding fatty acid synthase. The finding that these ‘missing’ responses likely reflect NIBR screening a small number of target- focused compound sets, combined with our initial finding of many, small and highly redundant clusters suggests that the NIBR libraries are enriched for sub-libraries of mechanistically and/or structurally related molecules.

We used the same approach to compare the signatures of the combined dataset to the HIPLAB responses. In this case, of the resulting 47 chemogenomic signatures, ~84% of the original 45 HIPLAB signatures were detected. Of these 38 overlapping signatures, common signatures included all the DNA damage responses, as well as the azole & statin, superoxide, tubulin folding & SWR complex, unfolded protein and mitochondrial-specific stress responses. Interestingly, by combining the two datasets, some of the NIBR signatures that had not previously matched a HIPLAB response were merged into one of the 38 overlapping responses. Only three signatures were comprised solely of HIPLAB profiles: the NEO1, ubiquinone biosynthesis & proteosome, and the RSC complex & mRNA processing signatures. Conversely, the signatures driven by NIBR profiles largely overlapped the target-specific responses unique to the NIBR dataset including: TIM54, RPL15A & SPP41, VRG4, eIF, and GPCR inhibitors as well as the major TOR signaling response.

Target frequency comparison

Compared to the HIPLAB dataset, which focused on screening diverse compounds with unknown mechanisms, NIBR clusters were highly enriched for screens identifying genes as potential drug targets. The most frequently identified targets that dominated specific clusters in the NIBR dataset include, 1) ERG11 (of the sterol biosynthesis pathway) and KOG1, AVO1, and TOR2, encoding subunits of the Targets of Rapamycin (TOR1 and TOR2) complexes, 2) FAS1, encoding fatty-acid desaturase, and 3) the mitochondrial transport gene TIM54. The coherence of these signatures suggests that the contributing compounds represent structural analogs. In the azole & statin and ERG11-GCN responses, ERG11 is identified as the target in 31% and 67% of the screens, respectively. In the three rapamycin clusters, the TOR1 and TOR2 subunits are identified as targets in over half (26) of the 51 screens. In 2012, NIBR published a study of novel Erg11 inhibitors, suggesting these published inhibitors may be present in the NIBR screening library [14]. Similarly, the high frequency of targeting mTOR (mammalian target of rapamycin) complexes (as evidenced by the three responses associated with TOR signaling) suggests an enrichment of rapamycin analogs in the NIBR compound library. This is consistent with the fact that Rapamycin and aging are active areas of inquiry at NIBR [15–17].

HIPHOP profiles presented in studies previously published by NIBR researchers allowed us, in select cases, to infer the structure of blinded screens. For example, a Nature Chemical Biology study published by the group demonstrated TIM23-dependent mitochondrial import as the target of the natural product stendomycin [18]. A HIPHOP profile in the NIBR dataset was nearly identical to the published version, particularly after accounting for differences in concentration (Figure S2). Similarly, the NIBR group also published a novel geranylgeranyltransferase inhibitor (uncovering sensitivities of strains encoding subunits of the CDC43/RAM2 heterodimer) that was highly correlated to the HIPHOP profile for NIBR compound 5692 in the NIBR dataset [19] (Figure S3).

Compounds and mechanism of action inferred by clustering with reference compounds

One of the NIBR clusters revealed the mechanism of NIBR compounds by virtue of its correlation to reference compounds. The HIPLAB amphotericin B HIP screen was highly correlated with the NIBR 4247 and 1020 HIP screens (> 0.7, p-value < 1e-16) (Figure 4A). In another example, the NIBR compounds 1208, 1209, 1210 and 1211 exhibited correlations of > 0.8 (p-value < 1e-16) with the hydroxyurea screens, a correlation value on par with that observed between replicates, suggesting these compounds are most likely structural analogs of hydroxyurea or closely related derivatives (Figure 4B).

Our global analysis of the HIPLAB and NIBR datasets provides a systems-level view of the cellular response to small molecules. Despite the enormous complexity of the cell, the ~ 35 million chemical- genetic quantitative measurements reported here can be described by ~ 45 chemogenomic signatures, defined by chemical structure- and biological process-based properties. These drug signatures provide a framework for understanding drug action and importantly, the impact of genetics on the in vivo response to small molecule perturbation. Because we observed saturation of the 45 major signatures in our previous dataset and also detected 66.7% of these responses in the NIBR dataset, we expect that these signatures represent fundamental, systems-level small-molecule responses. 60% of these responses were also detectable in an earlier large-scale screening campaign [5]. 40% of the responses were conserved in all three datasets. We suggest that the proteins encoded by the genes that comprise these shared, conserved signatures represent potential starting points for therapeutic intervention. The power of functional genetic screens to uncover drug targets and target pathways, and to delineate the mechanism of action of therapeutics has been demonstrated both in the yeast model system and more recently in meta-analyses of mammalian-cell based CRISPR screens [20]. As the complexity of these screens increase (e.g., in vivo assays, applying combined perturbations, etc.) the ability to perform integrated analyses will grow in importance. Based on our analysis of the two largest gene-drug comprehensive datasets collected to date, we show, using standardized protocols and analytics that yeast-based screens can be performed, at scale, across laboratories and that the resulting data are robust.

Source of Datasets

The NIBR dataset was downloaded from Hoepfner et al., 2014 [7] through the Drayd digital repository at: http://doi.org/10.5061/dryad.v5m8v. Gene-wise z-scores data of the essential genes present in the heterozygous dataset were selected and combined with the nonessential homozygous dataset for 2725 screens present in both datasets. The HIPLAB data consists of 5905 strains and 3356 screens [6]. For clustering the combined datasets, the two matrices were merged into a final matrix of 5894 strains x 6081 screens. 309 strains in this dataset were absent in the NIBR dataset.

Identification of significant chemical-genetic interactions

FD scores were calculated for both datasets using slightly different techniques. Specifically, for each HIPLAB strain, log₂ ratios were calculated for as follows:

(1) log₂ratio_HIPLAB = log₂[<median signal from control samples> / <signal from chemical sample>]

To facilitate comparisons between screens, log₂ ratios were standardized (separately for heterozygous and homozygous strains).

The FD score of strain i in screen j was computed as follows:

(2) FD_i,j _HIPLAB = (log₂ratio_i,j - <median of log₂ratios for screen j>) /<MAD of log₂ratios for screen j>

Because the FD scores follow a standard normal distribution, the probability that a given score is an outlier in this distribution was obtained using a one-tailed P test. P < 0.001 were identified as significant chemical-genetic interactions. To identify outlier screens for a given deletion strain, FD scores were converted into gene-wise Z-scores and P-values [6].

As described [7], the NIBR dataset defined the log₂ratio roughly as 1/log₂ratio_HIPLAB:

(1) r_L = log₂ratio_NIBR = log₂[<average signal from chemical sample>/<average signal from control samples>]

the normalized MADL score FD of strain i in screen j was computed as:

(2) MADL_i,j = FD_i,j _NIBR = (log₂ratio_i,j - <median of log₂ratios for screen j>) /<MAD of log₂ratios for screen j>

The MADL or FD_NIBR is roughly equivalent to the negative value of the FD_HIPLAB.

Lastly, the MADL scores were multiplied with the t-test p-value between replicates and the controls to be adjusted for highly variable strains (a_MADL). The gene-wise z-scores were further estimated using a_MADL of strain i over n experiments and the standard deviation (σ) obtained from the middle 70% of the quantiles:

(3) z-score_i = a_MADL(i)/σ_i

A z-score cutoff of -5 is used to define the significant chemical-genetic interactions [7].

Identification of HIP hits

‘HIP hits’ are defined as potential targets of profiled compounds with high specificity [6]. In the HIPLAB dataset, ‘clearance’ was defined as a measure of specificity that identifies significant hits in strains exhibiting FD scores greater than zero in a given HIP profile where:

Strains are ordered by FD scores in descending order, where FD_(i) is the i^th greatest FD score in the profile and clearance is defined as the difference between FD scores:

clearance = FD_(i) – FD_(i+1)

Clearance_max is the maximum clearance’ associated with the profile, and FD_max is the FD score of the

strain with clearance’ = clearance_max

If any FD_(i) ≥ FD_max, clearance = clearance_max

otherwise, clearance = clearance’

Clearance thresholds were optimized using the gold standard compounds with known targets and in the dataset resulting in a threshold of 5.75. Therefore, strain(s) with significant FD scores (P < 0.001) and clearance_max ≥ 5.75 are designated HIP hits [6]. We used this clearance scoring system to identify hits in both datasets.

Hierarchical clustering

Our chemogenomic dataset is in a matrix format where each screen is a column, and each row is a gene (corresponding to its homozygous or heterozygous deletion strain). To identify robust clusters in the NIBR dataset and to fairly compare the two datasets, we followed the same hierarchical clustering methodology used in Lee et al. 2014 [6]. We first replace insignificant scores (standard normal P > 0.001) in the NIBR screening matrix with zero, to focus on the most significant cellular responses to chemical perturbation. We then compute coinhibition, the pairwise Pearson correlation between all screens, representing the similarity between the NIBR profiled compounds. Profiles were then hierarchically clustered using (1 – coinhibition) as the distance metric, and the Ward agglomeration method. Discrete clusters were obtained using a dynamic branch [13] cutting method. For the full NIBR data set we used the following parameters: deepSplit = 4, minClusterSize = 3, as was done in Lee et al. [6]. For the final version with 41 clusters we used deepSplit = 2, cutHeight = 20, minClusterSize = 3. For the combined HIPHOP dataset, we used a minGap = 0.098, deepSplit = 2, minClusterSize = 3.

Chemogenomic response signatures

In our previous study, we classified HIPHOP cellular response types into chemogenomic signatures defined by characteristic genes and associated biological processes [6]. To determine whether these major response signatures exist in the NIBR dataset, we used the same analytic methods. Specifically, an FD matrix was provided using all profiled compounds as columns and all deletion strains (genes) as rows. Similarity between the cellular responses to the profiled compounds was measured using the Pearson correlation between the matrix columns (coinhibition). The functional similarity between two genes was measured using the Pearson Correlation between the matrix rows (cofitness). To identify robust clusters in the NIBR dataset, profiles were hierarchically clustered using (1 – coinhibition) as the distance metric, and the Ward agglomeration method. For each cluster, we calculated the median FD scores of each deletion strain across all profiles in that specific cluster to generate a median profile. Strains with significantly positive FD scores (standard normal distribution P < 0.001) identify the characteristic gene signatures that are an important part of the cellular responses. A standard normal distribution of P < 0.001 was used for comparing HIPLAB and NIBR signatures. For the signatures in the combined dataset, we used a threshold standard normal distribution P < 0.05.

In the NIBR dataset, 49 clusters were identified and 41 were associated with characteristic genes or gene signatures. Response signatures with fewer than two genes that were not enriched for biological processes were omitted. We performed GO enrichment analysis on each response signature.

Signatures and GO enrichments are available at the accompanying website: Comparative chemogenomics or http://matrika.pharmacy.ubc.ca:3838/ggiaever/Comparitive chemogenomics/

NIBR

Novartis Institute of Biomedical Research

MoA

Mechanism of Action

HIPHOP

HaploInsufficiency Profiling and HOmozygous Profiling

Fitness Defect

MMS

methyl methanesulfonate

eIF

eukaryotic initiation factor

Jost M, Weissman JS. CRISPR Approaches to Small Molecule Target Identification. ACS Chem Biol. 2018;13:366–75.
Niepal M, M H, Ce M, K S, Eh W, M C, et al. A Multi-center Study on the Reproducibility of Drug-Response Assays in Mammalian Cell Lines. Cell Syst. 2019;9. doi:10.1016/j.cels.2019.06.005.
Moffat JG, Vincent F, Lee JA, Eder J, Prunotto M. Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat Rev Drug Discov. 2017;16:531–43.
Wawer MJ, Li K, Gustafsdottir SM, Ljosa V, Bodycombe NE, Marton MA, et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc Natl Acad Sci U S A. 2014;111:10911.
Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science. 2008;320:362–5.
Lee AY, St.Onge RP, Proctor MJ, Wallace IM, Nile AH, Spagnuolo PA, et al. Mapping the Cellular Response to Small Molecules Using Chemogenomic Fitness Signatures. Science. 2014;344:208–11.
Hoepfner D, Helliwell SB, Sadlish H, Schuierer S, Filipuzzi I, Brachat S, et al. High-resolution chemical dissection of a model eukaryote reveals targets, pathways and gene functions. Microbiol Res. 2014;169:107–20.
Pierce SE, Davis RW, Nislow C, Giaever G, Suter B, Fetchko MJ, et al. Genome-wide analysis of barcoded Saccharomyces cerevisiae gene-deletion mutants in pooled cultures Examining protein protein interactions using endogenously tagged yeast arrays: the cross-and-capture system. Nat Protoc. 2007;2:2958–74.
Deutschbauer AM. Mechanisms of Haploinsufficiency Revealed by Genome-Wide Profiling in Yeast. Genetics. 2005;169:1915–25.
Hillenmeyer ME, Ericson E, Davis RW, Nislow C, Koller D, Giaever G. Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action. Genome Biol. 2010;11:R30.
McGowan JV, Chung R, Maulik A, Piotrowska I, Walker JM, Yellon DM. Anthracycline Chemotherapy and Cardiotoxicity. Cardiovasc Drugs Ther. 2017;31:63–75.
Miles JS, Sojourner SJ, Jaafar L, Whitmore A, Darling-Reed S, Flores-Rozas H. THE ROLE OF PROTEIN CHAPERONES IN THE SURVIVAL FROM ANTHRACYCLINE-INDUCED OXIDATIVE STRESS IN SACCHAROMYCES CEREVISIAE. Int J Adv Res. 2018;6:144–52.
Langfelder P, B Z, S H. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinforma Oxf Engl. 2008;24. doi:10.1093/bioinformatics/btm563.
Hoepfner D, Karkare S, Helliwell SB, Pfeifer M, Trunzer M, Bonnechose SD, et al. An Integrated Approach for Identification and Target Validation of Antifungal Compounds Active against Erg11p. Antimicrob Agents Chemother. 2012;56:4233–40.
Bourgoint C, Rispal D, Berti M, Filipuzzi I, Helliwell SB, Prouteau M, et al. Target of rapamycin complex 2-dependent phosphorylation of the coat protein Pan1 by Akl1 controls endocytosis dynamics in Saccharomyces cerevisiae. J Biol Chem. 2018;293:12043–53.
Mannick JB, Morris M, Hockey H-UP, Roma G, Beibel M, Kulmatycki K, et al. TORC1 inhibition enhances immune function and reduces infections in the elderly. Sci Transl Med. 2018;10. doi:10.1126/scitranslmed.aaq1564.
Shimada K, Filipuzzi I, Stahl M, Helliwell SB, Studer C, Hoepfner D, et al. TORC2 Signaling Pathway Guarantees Genome Stability in the Face of DNA Strand Breaks. Mol Cell. 2013;51:829–39.
Filipuzzi I, Steffen J, Germain M, Goepfert L, Conti MA, Potting C, et al. Stendomycin selectively inhibits TIM23-dependent mitochondrial protein import. Nat Chem Biol. 2017;13:1239–44.
Pries V, Cotesta S, Riedl R, Aust T, Schuierer S, Tao J, et al. Advantages and Challenges of Phenotypic Screens: The Identification of Two Novel Antifungal Geranylgeranyltransferase I Inhibitors. J Biomol Screen. 2016;21:306–15.
Dempster JM, Pacini C, Pantel S, Behan FM, Green T, Krill-Burger J, et al. Agreement between two large pan-cancer CRISPR-Cas9 gene dependency data sets. Nat Commun. 2019;10:5817.

Acknowledgments

We thank Nislow-Giaever lab members for their constructive feedback on the website material.

Funding

Funding was supplied by the Canada Research Chair Program and the Canadian Foundation for Innovation to CN.

Author information

Marjan Barazandeh and Divya Kriti have contributed equally to this work.

Affiliations

Department of Pharmaceutical Sciences, University of British Columbia

Contributions

CN and GG designed the analysis and wrote the text. DK and GG built the database. GG designed the website and performed the analysis. MB edited the website and text, tested the analysis, and made the figures with help from GG. All authors read and approved of the final version of the manuscript.

Corresponding author

Correspondence to: Guri Giaever

Ethics declarations

Ethics approvals and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

Authors have no competing interests to declare.

Table 1. Experimental and analytical pipelines of the HIPLAB and Novartis datasets

	HIPLAB	Novartis
Number of screens	3356	2725 (1776 unique)
Number of unique compounds	3250	1776
Number of HET strains	1095-nonessential	5796-essential+nonessential
Number of HOM strains	4810	4520
Bioassay	IC20	IC30
HIPHOP assay plates/ media	48-well/700ml YPD	24-well/1600ml YPD
experiments per plate	42 drug-treated samples + 6 negative controls (1% DMSO)	10 drug-treated samples in duplicates + 2 negative controls (no drug) + 1 positive control (Benomyl) + 1 contamination control (no cells)
HIPHOP assay device	Tecan Genios spectrophotometer	Cytomat Robotic shaking incubator
starting number of cells	O.D.₆₀₀ of 0.02 (~400 and ~200 cells/strain for HIP and HOP respectively)	100ml and 110ml of a 1.5 O.D.₆₀₀ /ml culture (~600 and ~700 cells/strain for HIP and HOP, respectively
Frequency of OD measurement	15’	60’
Collection time	log-phase cells; 20 and 5 generations for HIP and HOP, respectively	saturated cells; ~20 and ~5 generations for HIP and HOP, respectively
Final strain intensity value	‘best tag’: tag with the lowest robust coefficient of variation in the control arrays	average of uptag and downtag intensities
z-score calculation for strain_i in screen_j	log₂ ratio_ij=log₂(median signal from controls/signal from drug-treated sample)	log₂ ratio_ij=log₂(average signal from replicates of drug-treated samples/average signal from controls sample)
	z scores= FD_ij= MADL= (log₂ ratio_ij - median of log₂ ratio_j) / MAD log2 ratio screen_j	Sensitivity scores= FD_ij= MADL= (log₂ ratio_ij - median of log₂ ratio_j) / MAD log2 ratio screen_j
		Adjusted MADL based on the variability between replicates: a_MADL = min(0.05/p,1)*MADL
		z scores= a_MADL/standard deviation of a_MADL values of strain i over n screens
Significant chemical- genetic interactions	standard normal distribution P ≤ 0.001	z-score < -5
Clustering method	Ward hierarchical clustering with dynamic branch cutting	Average-linkage two-way hierarchical clustering

Table 2. Reference drugs present in both datasets

Drug Class	Drug name
Anthracyclines:	aclarubicin; dactinomycin, doxorubicin
Antidepressants:	chlorpromazine; trifluperazine
Antimetabolites:	5-fluorouracil; 5-fluorocytosine; methotrexate
Cell wall inhibitors:	caspofungin
Cytoplasmic translation inhibitors:	anisomycin; cycloheximide
DNA damaging agents:	camptothecin; hydroxyurea, mechlorethamine, methyl methanesulfonate
ERG11 and other sterol pathway inhibitors:	clotrimazole; fluconazole; voriconazole
FAS1 inhibitors:	cerulenin
HSP90 inhibitors:	geldanamycin
Ion channel blockers/ionophores:	amiodarone; nigericin
Iron chelators:	curcumin
Microtubule inhibitors:	nocodazole
Other sterol pathway inhibitors:	dyclonine; fenpropimorph; fluvastatin; terbinafine
TOR signaling:	caffeine, rapamycin
Transcription inhibitors:	6-azauridine; mycophenolic acid
Unfolded protein response pathway:	tunicamycin
Miscellaneous:	bleomycin; ebselen; gliotoxin; myriocin; sphingosine; tofa

No competing interests reported.

FigureS1.pdf
Figure S1. Nonessential heterozygous strains are not biologically informative. DNA damaging agents induced fitness defects in homozygous deletion strains compared to little or no fitness defects in the corresponding nonessential heterozygous deletion strains in response to the same compounds.
FigureS2.pdf
Figure S2. TIM23 is identified as the target of Stendomycin (aka 5692 in the NIBR dataset) in two different studies despite dosage differences [7, 18].
FigureS3.pdf
Figure S3. A geranylgeranyltransferase inhibitor (5692 in the NIBR dataset and compound 1 in Pries et al. (2016)) targets similar genes in two different studies [14, 19].

Download PDF

Journal Publication

published 11 Mar, 2022

Read the published version in BMC Genomics →

Editorial decision: Major revision
07 Dec, 2021
Reviews received at journal
05 Dec, 2021
Reviewers agreed at journal
12 Nov, 2021
Reviews received at journal
14 Oct, 2021
Reviewers agreed at journal
01 Oct, 2021
Reviewers agreed at journal
17 Sep, 2021
Reviewers invited by journal
16 Sep, 2021
Editor assigned by journal
16 Sep, 2021
Editor invited by journal
09 Sep, 2021
Submission checks completed at journal
09 Sep, 2021
First submitted to journal
04 Aug, 2021

You are reading this latest preprint version

The cellular response to drug perturbation is limited: comparison of large-scale chemogenomic fitness signatures

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Results And Discussion

Conclusions

Methods

Abbreviations

References

Declarations

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1