CRISPR and shRNA essentiality screen data
We obtained CRISPR-Cas9 essentiality screen (or dependency profile) data in 436 cell lines from Meyers et al.10 for 16,368 genes, whose expression, CNV and mutation data are available via CCLE portal33. We obtained the shRNA essentiality screen data in 501 cell-lines from DepMap portal34 for 16,165 genes, whose expression, CNV and mutation data is available publicly via CCLE portal33. The 248 cell-lines and 14,718 genes that appear in both datasets were used in this analysis (Table S1). For mutation data, only non-synonymous mutations were considered. Synonymous (silent) mutations were removed from the pre-processed MAF files downloaded from CCLE portal33.
Identifying CRISPR specific differentially essential genes of a potential CRISPR-selected cancer driver
For a given CCD (e.g. p53 or KRAS), we checked which gene’s essentiality (viability after knockout) is significantly associated with the mutational status of the CCD using a Wilcoxon rank sum test in the CRISPR and shRNA datasets, respectively (FDR<0.1). CRISPR-specific differentially essential positive (CDE+) genes are those whose CRISPR-KO is significantly more viable when the CCD is mutated while their shRNA silencing is not, whereas analogously CDE- genes are those whose CRISPR-KO is significantly more viable when the CCD is WT while their shRNA silencing is not. We filtered out any candidate CDE genes whose copy number was also significantly associated with the given mutation to control for potentially spurious associations coming from copy number (we removed genes showing significant association (FDR<0.1)) – the exact procedure used is described below in the section titled “Identifying potential CRISPR-selected cancer drivers”).
Identifying CDEs considering functional impact of mutations
Out of a total of 248 cell lines that we analyzed, 173 cell lines (69.7%) have p53non-synonymous mutations. In addition to identifying CDEs by considering all non-synonymous mutations, we additionally employed a more conservative approach where we aimed to consider only p53 loss-of-function (LOF) mutations in the CDE identification process. To this end, we considered a mutation to be LOF if it was classified as non-sense, indel, frameshift, or among the 4 most frequent non-functional hotspot mutations (R248Q, R273H, R248W and R175H within the DNA-binding domain, determined as pathogenic by COSMIC35). Using this definition we obtained new mutation profiles for p53 and identified CDE genes via the same method described in the section titled “Identifying CRISPR specific differentially essential genes of a potential CRISPR-selected cancer driver”. We repeated a similar process with the top three known gain-of-function hotspot mutation variants of KRAS.
Identifying potential CRISPR-selected cancer drivers of CRISPR-KO
To identify additional CCD genes like p53, we considered 121 cancer driver genes identified by Vogelstein et al.27, whose nonsynonymous mutation is observed in at least 10 cell lines (N=61). We determined whether each of these genes is a CCD as follows: for each of the 61 candidate genes, we tested the association between the essentiality of each of genes in the genome (reflected by post-KO cell viability) with the mutational status of the candidate CCD gene using a Wilcoxon rank sum test. We then counted the number of genes, whose essentiality is: (i) significantly positively associated with the candidate CCD mutational status (FDR-corrected p-value<0.1, median essentiality of WT>mutant of the cancer gene), (ii) significantly negatively associated with the candidate CCD mutational status (FDR-corrected p-value<0.1, median essentiality of WT<mutant of the cancer gene), and (iii) not associated (FDR-corrected p-value>0.1) with the candidate CCD mutation status; we performed this computation separately for the CRISPR and the shRNA screens, respectively. This computation results in a 3-by-2 contingency table for each candidate CCD gene. We then checked whether the distribution of the above three counts in the CRISPR dataset significantly deviates from that in the shRNA dataset via a Fisher’s exact test on the contingency table. If each of the values in the contingency table was greater than 30, we used the chi-squared approximation of the Fisher’s exact test. We further filtered out any candidate CDE genes whose copy number was also significantly associated with the given mutation to control for potentially spurious associations coming from copy number (we removed genes showing significant association (FDR<0.1)). We performed this procedure for all 61 candidate genes one by one and selected those with FDR corrected Fisher’s exact test <0.1. We further filtered out the candidate CCD whose mutation profile is correlated with p53 mutation profile via a pairwise Fisher test of independence (FDR<0.1). We finally report the CCD genes that have a substantial number of CDE+ genes (N>300).
Pathway enrichment analysis of CDE+/CDE- genes
We analyzed the CDE+/CDE- genes of each of the CCDs for their pathway enrichment with annotations from the Reactome database36 in two different ways. First, we tested for significant overlap between our CDE genes with each of the pathways with hypergeometric tests (FDR<0.1). Second, we ranked all the genes in the CRISPR-KO screen by the differences in their median post-KO cell viability values in mutant vs WT cells, and the standard GSEA method21 was employed to test whether the genes of each Reactome pathway have significantly higher or lower ranks vs the rest of the genes (FDR<0.1). We repeated the GSEA analysis with the genes ranked by differential post-KD cell viability in the shRNA screen, and only reported significant pathways specific to CRISPR but not shRNA screens. We confirmed that for p53, the GSEA method was able to recover the top significant pathways identified by the hypergeometric test (e.g. those in Figure 1d), although extra significant pathways were identified (Table S2). For p53 and KRAS CDE- genes respectively, the enriched pathways were clustered based on the Jaccard index and the number of overlapping genes with Enrichment Map37, and the largest clusters were visualized as network diagrams with Cytoscape38.
To study the potential enrichment of CDE genes in common fragile sites (CFSs), we obtained chromosomal band locations of CFS16, and defined the CFS gene set as the set of all genes located within these chromosomal bands (obtained from Biomart39). We tested for a significant overlap between our CDE genes and the CFS gene set with a hypergeometric test, and also confirmed the lack of significant overlap with the corresponding shRNA-DE genesets. Similarly, for the common highly accessible chromatin (HAC) regions, we obtained a list of these regions defined by a consensus of DNAsel and FAIRE across seven different cancer cell lines from a previous study40. Next, we identified sgRNAs which are expected to target such HAC regions (see Calculating off-target scores section) and ranked genes based on the number of targeting such sgRNAs. Taking the top genes equal to the number of p53 CDE+ genes, we computed the enrichment for p53 CDE+ genes via a hypergeometric test.
Testing the clinical relevance of copy number alterations of the p53 or KRAS CDE genes
We tested the hypothesis that copy number alterations in CDE+ genes (as a possible surrogate for the number of DSBs in these genes) can reduce the fitness of the CCD (p53 or KRAS) WT tumors with patient data. The cancer genome atlas (TCGA)32 data of somatic copy number alteration (SCNA) and patient survival of 7.547 samples in 26 tumor types were downloaded from the UCSC Xena browser (https://xenabrowser.net/). In these tumor types p53 is mutated in more than 5% of the samples. For each sample, the copy number alterations (genomic instability, GI) of a given set of genes, which quantifies the relative amplification or deletion of genes in a tumor based on SCNA was computed as follows41:
where si is the absolute log ratio of SCNA of gene i in a sample relative to normal control, and I() is the indicator function. Wilcoxon rank-sum test was then used to test whether the GI of CDE+ geneset is significantly lower than that of control non-CDE genes in CCD-WT but not in CCD-mutant tumors. Further, we tested if higher absolute levels of SCNA of the CDE+ genes are associated with increased rate of CCD (p53 or KRAS) mutation accumulation with cancer stage, as this would further testify that such amplification/deletion events in the CDE+ genes can drive the selection for CCD mutants. To this end, the following logistic regression model was used to identify the genes whose high absolute SCNA computed as above is associated with higher rate of CCD mutation accumulation with cancer stage, while controlling for cancer type and overall mutation load:
logit(P(CCD)) =β0 + ∑k βcaner_typek cancer_typek + βmutation_load mutation_load + βGI GIi + βstage stage + βinteract GIi * stage
where CCD denotes the binary CCD mutational status of the patient, logit(P(CCD)) is the logit function of the probability of the CCD being mutant; cancer_typek is the dummy variable for the category of the kth cancer type; GIi denotes the absolute value of SCNA levels of the given gene i as computed above; GIi * stage is the interaction term between the GI of gene i and cancer stage, that latter is made into a binary variable whose value is 0 for early stages (I and II) and 1 for late stages (III and IV). We tested the enrichment of CDE+ genes among the genes whose high absolute SCNA levels are significantly associated with higher rate of CCD mutation accumulation with cancer stage (i.e. genes with significantly positive βinteract coefficients in the above model) using a hypergeometric test.
Constructs and stable cell lines
MOLM13 cells were obtained from DSMZ (Cat. ACC-554) and maintained in RPMI-1640 medium (Life Technologies, Carlsbad, CA) supplemented with 10% v/v heat-inactivated fetal bovine serum (Sigma-Aldrich, Saint Louis, MI), 2 mM L-Glutamine (LifeTechnologies) and 100 U/mL penicillin/streptomycin (LifeTechnologies). p53 R248Q was PCR amplified from a bacterial expression plasmid (kind gift of Dr. Shannon Lauberth, UCSD) and KRASG12D the pBabe-KRASG12D plasmid (Addgene plasmid 58902, from Dr. Channing Der) using the Kappa Hi-fidelity DNA polymerase (Kappa Biosystems). These PCR amplicons were separately cloned into the MSCV-IRES-tdTomato (pMIT) vector (a kind gift from Dr. Hasan Jumaa, Ulm) using Gibson Assembly. We first generated high-efficiency Cas9-editing MOLM13 leukemia cells by transducing these cells with the pLenti-Cas9-blasticidin construct (Adggene plasmid 52962 - from Dr. Feng Zhang) and selecting stable clones using flow-sorting. Clones were then tested for editing efficiency by performing TIDE analysis42. These MOLM13-Cas9 cells were then transduced retrovirally with the pMIT-p53R248Q or pMIT-KRASG12D mutants and sorted for tdTomato using flow-cytometry (LSR Fortessa, BD Biosciences) to generate isogenic mutant MOLM13-Cas9 cell lines. Immortalized hTERT RPE1 cells were obtained from ATCC® (Cat. CRL-4000™) and maintained in DMEM-F12 medium (Life Technologies, Carlsbad, CA) supplemented with 10% v/v heat-inactivated fetal bovine serum (Sigma-Aldrich, Saint Louis, MI), 2 mM L-Glutamine (LifeTechnologies) and 100 U/mL penicillin/streptomycin (LifeTechnologies).
Generation of pooled sgRNA libraries
For pooled library cloning, 10 sgRNAs per gene were designed using the gene perturbation platform (https://portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design) Genetic Perturbation Platform. Guides targeting p53 CDE+ and CDE- genes were synthesized as pools using array-based synthesis and cloned in the Lentiguide puro vector (Addgene plasmid 52963 - kind gift from Dr. Feng Zhang) using Golden Gate Assembly. In each assay, we have used ~240 unique non-targeting sgRNAs and 49 not expressing non-essential genes. A similar approach was used for the KRAS CDE libraries.
Pooled sgRNA library screen
30 million MOLM13-Cas9 cells or their isogenic MOLM13-p53 or KRAS mutant counterparts were transduced with the pooled CDE library virus in RPMI medium supplemented with 10% fetal bovine serum, antibiotics and 8 μg/ml polybrene. The medium was changed 24 hours after transduction to remove the polybrene and cells were plated in fresh culture medium. 48 hours after transduction, puromycin was added at a concentration of 1 μg/ml to select for cells transduced with the sgRNA library. Puromycin was removed after 72 hours and then cells were cultured for up to 30 days. 7 days after transduction, approximately 4 million cells were collected, and genomic DNA was prepared for the time zero (T0) measurement and also from time 30 (T30). Genomic DNA from these cells was used for PCR amplification of sgRNAs and sequenced using a MiSeq system (Illumina). Fold depletion or enrichment of sgRNAs from the NGS data was calculated using PinAplPy software43.
CDE+/- genes identified in isogenic experiments
From the read counts per million for each sgRNA at Day 0 and Day 30 from the above pooled CRISPR screens across two replicates, we removed all the sgRNAs with read count < 20 at Day 0. We calculated an average fold change (FC) of reads from Day 0 to Day 30. For each sgRNA, we calculated this FC-rank difference in p53 WT vs mutant in both CRISPR-KO and CRISPRi screens. For consistent comparison with AVANA, we only considered sgRNAs used in both libraries. The top and bottom genes are differentially essential (DE) from each screen. Taking the top ranked genes based on the difference of this score in two screens, we identify the CDE+ and CDE- genes.
CRISPR Competition experiments
sgRNAs were cloned using standard cloning protocols and lentiviral supernatants were made from these sgRNAs in the 96-well arrayed format. 100,000 MOLM13 cells or tdTomato-positive isogenic mutants were plated in a 96 well pate and transduced with the sgRNA viral supernatants by spinfection with polybrene-supplemented medium. After selection of sgRNA transduced cells with puromycin for 48 hours, sgRNA transduced MOLM13 cells or mutants were mixed together in a ratio of 95:5 respectively, and the percentage of p53 WT or p53 mutant cells was monitored progressively up to 25 days using high-throughput flow-cytometry as described previously23.
Quality control of publicly mined genetic screens used in the study
We first obtained gold-standard essential and non-essential geneset from Hart et al.44 . To test the quality of each genetic screen we computed an area under the precision-recall curve (AUPRC) using the average logFC across replicates and cell lines. In this study, we only considered the genetic screens with an AUROC > 0.6 (random model AUPRC=0.5). We also employed this method to test the quality of our in-house generated genetic screens.
CRISPR-Cas9 Ribonucleoprotein transfection experiments
We generated sgRNAs by in vitro transcription using the HiScribe™ T7 Quick High Yield RNA Synthesis Kit (New England Biolabs, Beverley, MA) and performed the Ribonucleoprotein (RNP) complex formation using TrueCut Cas9 Protein v2 (ThermoFisher Scientific, Waltham, MA) according to published protocols45. MOLM13 cells without Cas9 and expressing pMIT-p53R248Q or pMIT-KRAS were generated as described in the Constructs and stable cell lines section. 1M cells were transfected with NFDUFB6 sgRNA or NTC sgRNA in triplicates with 1mg of Cas9 and 1mg of RNA in 10 ml of Buffer R using the Neon™ transfection system (ThermoFisher Scientific; 1500V, 20 ms, single pulse). Cells were maintained in culture for 48 hours before harvest for imaging, dye-dilution and editing estimation assays. For NDUFB6 and NTC editing estimation, we used the Synthego Performance Analysis ICE tool according to the instructions, using un-transfected parental MOLM13 samples as controls and samples from 48 hours post-transfection as the Day 0 initial timepoint and Day 10 as a final time point, in triplicates. For RPE1 experiments, mutant cells with p53R248Q or pMIT-KRAS similarly and transfections were performed using Lipofectamine™ CRISPRMAX™ Cas9 Transfection Reagent (ThermoFisher Scientific) according to the manufacturer’s instructions for 12-well plate format, in triplicates. Cell harvesting time-points were similar to those of MOLM-13.
We used the CellTrace™ Violet Cell Proliferation Kit (ThermoFisher Scientific) to stain MOLM13-WT and MOLM13-p53 mutant cells transfected with Cas9 RNP complexed with NTC or NDUFB6 RNA, according to the manufacturer’s instructions. Cells were maintained in culture in the dark and assayed by flow cytometry using the LSR Fortessa every 2 days for 14 days. FCS files were analyzed using FlowJo software.
Analysis of γ-H2AX foci in MOLM13-Cas9 and MOLM13-p53 mutant cells
MOLM13-WT and MOlM13-p53 mutant cells were left untreated or treated with 1μM doxorubicin for 2 h at 37°C 5% CO2, which served as negative and positive controls for DNA damage mediated γ-H2AX foci formation, respectively. MOLM13-WT and MOLM13-p53 mutant cells transfected with Cas9 RNP complexed with NTC or NDUFB6, and negative control cells were pelleted at 400g for 5 min at 4°C, washed two times in PBS and fixed in 4% paraformaldehyde in PBS for overnight at 4°C. The cells were washed two times in PBS and permeabilized in 0.25% triton X-100 in PBS for 5 min at room temperature. Following two washes with PBS, the cells were incubated in blocking buffer (3% BSA in PBS) for 30 min at room temperature and subsequently incubated with APC conjugated H2AX phospho (Serine 139) antibody (BioLegend; Cat # 613415) at an antibody dilution of 1:200 in blocking buffer for overnight at 4°C in dark. Cells were washed two times with PBS and resuspended in 150 μl PBS. Cell suspensions were spotted on poly-lysine coated glass slides using cytospin (Cytospin 4; Thermo Scientific) centrifugation at 800 rpm for 4 min. Coverslips were mounted onto the slides using ProLong Gold antifade reagent with DAPI (Invitrogen) and cured for overnight at room temperature in dark. Slides were imaged in Nikon A1R HD confocal microscope. Sequential z-sections were imaged using a 60x oil objective and maximum projection images were obtained using the Nikon NIS-Elements platform.
Cas9 activity in cancer cell lines with KRAS (or other cancer driver) WT vs mutant
We downloaded the exogenous Cas9 activity of 1601 cancer cell lines from DepMap portal and their KRAS mutation status considering only non-synonymous variants profiled using whole exome sequencing (1375 and 226 KRAS WT and mutant, respectively)15. We tested whether the Cas9 activity is higher in KRAS mutant vs KRAS WT cell lines using one-sided wilcoxon rank-sum test. We repeated this process for each cancer driver gene and used the FDR corrected significance to rank them in addition to the fold change of Cas9 expression.
Subclonal expansion of KRAS mutant in parent vs high Cas9-expressed cell lines
We downloaded the deep targeted sequencing of cancer driver genes performed on 42 parental and matched Cas9-expressed cancer cell lines from Enache et al.13. In this analysis, we discarded the cell lines with <20% Cas9 activity and thus low DNA damage. We asked whether mutant allele frequency of a cancer driver (e.g. KRAS) significantly increased in Cas9-expressed cell lines compared to matched parental cell lines using Wilcoxon signed-rank test. In this analysis, we have considered both intronic and exonic variants provided from sequencing.
Analysis of differentially expressed pathways in KRAS wildtype and mutant cells in response to Cas9 induction
Gene expression profiles of 163 pairs of parental (without Cas9) and the corresponding Cas9-expressed cell lines (138 KRAS WT, 25 KRAS mutant) were obtained from Enache et al.13. Differential expression analysis between the Cas9-expressed cells and the parental cells was performed for the KRAS WT and mutant cells separately, and GSEA analysis21 (genes ranked by logFC) was performed to identify the hallmark pathways from MSigDB46. We next identified pathways that are differentially regulated upon Cas9 expression between KRAS WT and mutant cells. These include the pathways that are up-regulated in the KRAS mutant cells but down-regulated or non-significantly altered in the KRAS WT cells, and vice versa. The pathways are ranked by the difference of normalized enrichment score in WT vs mutant cells. This analysis is performed using the fgsea R package21.