Target gene selection from transcriptomics data
The selection of target genes was based on RNA-Seq data sampled at 12 time points (0h, 3h, 6h, 9h, 12h, 18h, 24h, 36h, 48h, 72h, 120h, 168h) during transdifferentiation of human BLaER1 cells to macrophages [15]. The RNAseq data was quantified with GRAPE-nf (https://github.com/guigolab/grape-nf). Read mapping was performed with STAR [37] and gene expression quantification with RSEM [38] using the GENCODE annotation v22 [39]. Two biological replicates were analyzed separately.
The 19,814 pc-genes and 14,855 lncRNAs (union of the following biotypes: processed transcript, 3 prime overlapping ncRNA, sense intronic, antisense, macro lncRNA, lincRNA, non-coding and sense overlapping from GENCODE v22) were filtered for a minimum average expression of at least 1 FPKM for pc-genes (0.1 FPKM for lncRNAs) and at least 4x fold change for protein pc-genes (2x fold for lncRNAs) between highest and lowest expression value along the temporal profile. In addition, lncRNAs were required to have a minimum expression of 1 FPKM in at least one time point and to be non overlapping with other genes in a 5 Kb window on the same strand and 50 bp on the opposite strand relative to their TSS. This resulted in 4,804 pc-genes remaining for replicate 1 and 4,552 for replicate 2, and 642 lncRNAs for replicate 1 and 536 for replicate 2. Those genes were clustered separately for each replicate into 36 expression profiles for pc-genes and 16 for lncRNAs with k-means clustering in R. We focused on two types of expression profiles: “peaking profile” (genes that increase their expression level at the beginning of the transdifferentiation process and later on decrease) and “upregulated profile” (genes that are upregulated throughout the process). Pooling those profiles within each replicate and then intersecting between the replicates, resulted in a final list of 939 protein-coding and 174 lncRNA candidate genes.
Paired guide RNA library design
For lncRNAs, CRISPETa [16] was used to target genes’ TSS. For pc-genes, we developed a new version of CRISPETa to target ORFs (code available at https://github.com/Carlospq/CRISPETa_PC). In this case, we first obtained the principal isoform from the APPRIS database [40]. The exonic sequence of this isoform was extracted from the human genome sequence version h19, using the GENCODE annotation v22, and searched for all possible protospacers (20mers followed by a PAM sequence of NGG). sgRNA were scored using the RuleSet2 algorithm [41] and paired. Pairs were ranked according to: 1) location in the ORF sequence, 2) the pair score calculated as the sum of the two individual sgRNA scores, and 3) the deletion region of the pair (prioritizing those predicted to create an out-of-frame deletion). The first coding exon was preferentially targeted. In case not all designs could be placed at the first coding exon, the window was extended to the second and third exons. For lncRNAs, the region targeted around the TSS was increased stepwise from 500 to 5,000 bp in consecutive runs of CRISPETa until the required number of pgRNAs was designed. Selected pgRNAs for lncRNAs were filtered so as to not overlap pc-genes. In all cases, sgRNAs were filtered to remove possible off-targets using CRISPETa’s pre-computed database with default value [-t 0,0,0,x,x] for the first run and relaxing this cutoff for consecutive runs, as described in [16]. CRISPETa output parameters were adjusted to provide the sequence of the 165 nt oligonucleotide (Insert-1) needed for library cloning using DECKO method [7], which includes the targeting regions of the pgRNAs separated by a cloning site (Supplementary Table S2).
Up to ten pgRNAs were designed per target gene with a minimum distance of 50 bps between any pair of gRNAs. In total, we designed pgRNAs for 166 lncRNAs and 874 pc-genes. In addition, we designed 50 pgRNAs for each ratCEBPa, SPI1 and ITGAM positive controls. For negative controls, we designed pgRNAs for 100 intergenic regions, 10 pgRNAs each. As a non-targeting negative control for library sorting assays we used a pgRNA against Firefly luciferase, called “pDECKO-non targeting”.
Library cloning
A ssDNA library of 12,000 oligos of 165 nt (insert-1) (Supplementary Table S2) was purchased from Twist Biosciences. The library was amplified to obtain dsDNA using emulsion PCR as described in [42], and cloned into pDECKO_mCherry vector ([16], Addgene 78534) following the 2 cloning steps described in [7]. ENDURA electrocompetent cells (Bionova Cientifica) were used to ensure high efficiency transformation and avoid recombination errors. Several transformations were performed in parallel. For the first cloning step (intermediate plasmid), approximately 500,000 bacterial colonies were collected and processed together in a single maxiprep. To eliminate the remaining empty plasmid, we took advantage of the fact that insert-1 (in the intermediate plasmid) contains unique restriction sites (EcoRI and BamHI), which are not present in the original backbone. Digesting the intermediate plasmid resulted in a linear product that could be distinguished from the circular empty backbone and purified in an agarose gel. For the 2nd step of cloning, 50 ng of BsmbI-digested intermediate plasmid was mixed with 1 μl annealed Insert-2 (gRNA1 constant region coupled to an H1 promoter, previously assembled from four oligonucleotides and diluted 1:20) and 1 μl of T4 DNA ligase (Thermo Scientific) and incubated for 4h at 22ºC (as described in [7]). Several transformations with ENDURA electrocompetent cells were done in parallel. For the 2nd cloning step (final plasmid) more than 100,000 bacterial colonies were collected and processed together in a maxiprep. A scheme of the final plasmid can be found in Supplementary Fig. S4A.
Cell culture and library infection
Human BLaER1 cells [14] were kindly provided by Thomas Graf (CRG, Barcelona) and grown in RPMI medium (Invitrogen) supplemented with 10% heat-inactivated foetal bovine serum (FBS), 2 mM L-glutamine, and 100 U/ml penicillin G sodium [14]. BLaER1 cells were first infected with a plasmid containing Cas9 fused to BFP ([16], Addgene 78545), selected for more than 5 days with blasticidin (15 µg/ml) and sorted using a BD FACS Aria instrument at the Flow Cytometry Unit of the Center for Genomic Regulation. These cells, stably expressing Cas9, were then infected with the pDECKO library. For lentivirus production, we performed 80 co-transfections of HeK293T virus packaging cells (at approximatelly 60-70% confluence on 10 cm dishes) with 3 μg of the pDECKO_mCherry plasmid library and 2.25 μg of the packaging plasmid pVsVg (Addgene 8484) and 750 ng of psPAX2 (Addgene 12260) using Lipofectamine 2000 (according to manufacturer's protocol). Transfection media was changed on the following day to RPMI. In total, 400 ml of viral supernatant were collected 48h post transfection, filtered through a cellulose acetate filter, and used for overnight infection of 90x10E6 BLaER1-Cas9 cells at a density of 250,000 cells/ml with presence of polybrene (10 μg/ml). The percentage of infection was computed as the number of mCherry positive cells compared to the total number of cells with a Fortesa cell cytometer analyser. Infection rate ranged between 2-4%, ensuring a low multiplicity of infection (less than 1 viral integration per cell) [43]. After 48h of infection, the cells were double selected with blasticidin (20 μg/ml) and puromycin (2 μg/ml) for 18-19 days. 15 million of the BLaER1-Cas9 library infected cells were induced for transdifferentiation into macrophages by using 100 nM β-estradiol and 10 ng/ml of IL-3 and M-CSF, as described previously [44]. After incubation for 3 days (T3) /6 days (T6) they were collected for FACS sorting.
Individual target validation
For paired guide RNA pDECKO-mCherry plasmid cloning we used the method described in [16] (sgRNA sequences are listed in Supplementary Table S1 and the cloning oligos are detailed in Supplementary Table S5). For single guide RNA pDECKO-mCherry plasmid cloning we used the method described in [45] (see Supplementary Table S6 for details of the oligos used). Plasmids constructed for this study can be found in Supplementary Table S7 (plasmids available at Addgene.org are indicated).
For lentivirus production, we co-transfected HeK293T virus packaging cells with 3 μg of each pDECKO_mCherry plasmid and packaging plasmids as described previously. Viral supernatant was collected 48h post transfection and filtered through a cellulose acetate syringe filter. Polybrene (10 μg/ml) was added. We pelleted 5x10E5 BLaER1-Cas9 cells in two microcentrifuge tubes and resuspended each of them with 1 ml of viral supernatant. We performed spin-infection for 3h at 1,000 g. After infection, the viral supernatant was removed and infected cells were resuspended with RPMI media supplemented with 10% heat-inactivated foetal bovine serum (FBS), 2 mM L-glutamine, and 100 U/ml Penicillin Streptomycin. After 48h of infection, we performed double selection with blasticidin (20 μg/ml) and puromycin (2 μg/ml) antibiotics. The selection was maintained for a minimum of 2 weeks.
Flow cytometry
For cell sorting: 30x10E6 cells were counted and resuspended in 300 μl PBS + 3% FBS in the presence of FcR blocking reagent. Cells were incubated for 10 minutes and 15 μl of the human anti-CD19 antibody conjugated with BV510 (Becton Dickinson, 562947) and 15 μl of human anti-cd11b (Mac-1) antibody conjugated with PE-Cy7 (eBioscience, 25-0118-41) were added. Cells were incubated for 30 minutes in the dark, washed with PBS and resuspended in 2 ml of PBS + 3% FBS. Topro-3 was added as a viability marker. Cells were sorted in a BD FACS Aria instrument at the Flow Cytometry Unit of the Center for Genomic Regulation.
For flow cytometry analysis: 1x10E6 cells were counted and resuspended in 100 μl PBS + 3% FBS in the presence of FcR blocking reagent. Cells were incubated for 10 minutes and 5 μl of each of the corresponding antibodies were added. For the CD19 knockout experiment, we used the antibody anti-CD19 conjugated with APC-Cy7 (Becton Dickinson, 557791). Cells were incubated for 30 minutes in the dark, washed with PBS and resuspended in 500 ul of PBS + 3% FBS. Topro-3 was added as a viability marker. Cells were measured in a BD Fortessa analyser. For the Stain Index calculation we used the formula: (mean positive - mean background) / (2 * SD background), as previously described [46].
Cell cytometry data is available in FlowRespository database (https://flowrepository.org) [47] under accession link for reviewers: https://flowrepository.org/id/RvFrJjJMbz8QIDOYBj7yF17rR2dtZTD0MUqauN7Sb4iVDXkxNRQlinhEnZeUPDEw.
Sample processing for deep sequencing
Genomic DNA was extracted from the FACS sorted cells with the GeneJET Genomic DNA purification kit (Thermo Scientific) and 2 PCR steps were performed (see Fig. 3C). A scheme of oligo binding sites is shown in Supplementary Fig. S4.
A first PCR step was done by Phusion polymerase (Thermo Fisher) using 500 ng of genomic DNA and staggered oligo mix (Supplementary Table S8) with the presence of 6% DMSO, annealing temperature of 60ºC and a total of 20 cycles of amplification. We used staggered oligos to avoid the same bases being read for the constant region during Illumina sequencing and to minimize technical issues during base calling. Up to 6 PCR reactions were combined, the amplicons were gel-purified, and 2 ng were used as a template for a second PCR.
The second PCR step was also done by Phusion polymerase but without the presence of DMSO. We used Illumina barcoded oligos (Supplementary Table S9), an annealing temperature of 60ºC and a total of 8 cycles of amplification. Samples were purified with Agencourt Ampure beads (Beckman Coulter), quantified with a Qubit fluorometer (Thermo Scientific) and checked for quality in a Bioanalyzer (Agilent). We then pooled the libraries and sequenced them on the Illumina HiSeq 2500 at the Genomics Unit of the Center for Genomic Regulation (150 bp paired-end sequencing) to have about 20 million reads per sorted subfraction. Sequencing data is available in the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress) [48] under accession number for reviewers: Reviewer_E-MTAB-10445, password: txxykkqo.
Mapping and quantification of sequencing reads
For read mapping, based on the initial pgRNA library with two guides per target (Supplementary Table S2), an artificial genome was generated by concatenating the 41 bp of the two pgRNAs (gRNA1 21 bp, gRNA2 20bp) and converted into FASTA format. STAR mapper (version 2.4.2a) [37] was used to index the genome, adjusting the standard settings by the following parameter for small genomes:
--genomeSAindexNbases 6
In the resulting genome after removing duplicated constructs, each pgRNA pair is represented by each one of the 11,550 chromosomes with a length of 41 bp.
Dynamic trimming of Illumina reads was done in perl by pattern matching the insertion site of the pgRNAs in the plasmid sequence (“ACCG” for pgRNA1 in the window of 15-55 bp of read2, “AAAC” for pgRNA2 in the window of 100-150 bp of read1). The extracted 20 bp fastq sequences for the pgRNA2 were reverse complemented and concatenated to the 21 bp fastq sequences for the pgRNA1. Fusion reads with fewer than 20 bp sequence length were filtered out.
Mapping was performed with STAR version 2.4.2a with the following parameters:
STAR --runMode alignReads --runThreadN 8 --genomeDir /users/resources/genome --readFilesCommand zcat --readFilesIn pgRNA1_pgRNA2.fastq.gz --alignIntronMax 1 --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --limitBAMsortRAM 3000000000 --outFilterMultimapNmax 1 --outFilterMismatchNmax 11 --outFilterMatchNmin 30 --outFilterMatchNminOverLread 0.1 --outFilterMismatchNoverLmax 0.9 --outFilterScoreMinOverLread 0.1
Given the distance between the sequencing primer and gRNA2, the pipeline was conceived to be adjustable to a variable number of mismatches. Running the pipeline without allowing for any mismatches, we could only make use of about 25 to 30% of the reads. Hence, we increased the number of allowed mismatches in progressive steps that resulted in a steep increase of mapped reads until a saturation point was reached between 10-15 mismatches, depending on the sample (Supplementary Fig. S6C). For further analysis, we allowed for a maximum of 13 mismatches to stay below 1% of multi-mapped reads for all samples of both replicates. Spearman correlation values of 0.95-1.00 between samples, mapped with zero mismatches compared with up to 13 mismatches, justified the usage of the quantification data with substantially more reads and therefore higher statistical power (Supplementary Fig. S6D). For quantification, the count for each guide pair within the mapped libraries was aggregated from the BAM files with SAMtools [49].
Due to the low memory footprint of the artificial genome, this quantification strategy can be applied even on laptops with moderate specifications (minimum requirements: single core CPU, 4GB RAM, 10GB disk space). The mapped reads were clustered to check for reproducibility between replicates (data not shown).
LNA GapmeRs assay
LNA antisense oligonucleotide GapmeRs (Exiqon) complementary to human lncRNA LINC02432 (ENSG00000248810.1) (GCATGAAAGAGTTGGT) and lncRNA MIR3945HG (ENSG00000251230.1) (CTGAGAGGTGGCAAGC) were designed. A LNA oligonucleotide containing a scrambled sequence (AACACGTCTATACGC) was used as a negative control. We seeded 40,000 BLaER1 cells in a 24-well plate and the cells were grown in 1 ml complete RPMI media containing LNA GapmeRs at a final concentration of 2 μM. After 3 days of incubation, we induced transdifferentiation as described previously [44]. Total RNA was isolated from cells after 3 days of induction.
RNA extraction, retro-transcription and quantitative PCR
RNA extractions from 1x10E6 cells were performed with Quick RNA Miniprep Kit (Zymo Research). 140 ng-500 ng RNA were retro-transcribed with Reverse Aid reverse transcriptase (Thermo Scientific). Quantitative PCR (qPCR) was performed with NZY Speedy qPCR Green Master mix (NZY tech) and in a LightCycler 480 Real-Time PCR System (Roche). Primer sequences are detailed in the Supplementary Table S10. Quantifications were normalized to an endogenous control (Glyceraldehyde 3-phosphate dehydrogenase, GAPDH). The relative quantification value for each target gene compared with the calibrator is expressed as 2^(Ct-Cc).
Western blot
1x10E6 cells were resuspended with 100 μL of Lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris pH 8, protease inhibitors). The cell lysate was sonicated in a Branson sonicator for 10 seconds (50% amplitude and power 7). The samples were run in a 10% SDS-PAGE gel and transferred to a nitrocellulose membrane. The membrane was blocked with blocking buffer (TBS, 0.1% Tween 20, 5% non fat milk) O/N at 4ºC, and incubated for 1h 30’ at room temperature with primary antibodies: anti-FURIN rabbit polyclonal antibody (Proteintech, 18413-1-AP) 1:1,000 in blocking buffer or anti-NFE2 rabbit polyclonal antibody (Proteintech, 11089-1-AP) 1:1,000 in blocking buffer. After 5 washes with TBS-0.1% Tween 20, the membranes were incubated for 1h with the secondary antibody goat anti-rabbit-HRP (Sigma, G9545) 1:10,000 in blocking buffer. After 5 washes with TBS-0.1% Tween 20, the membranes were incubated either with Amersham ECL western blotting detection reagent (GE Healthcare, RPN2209), or Super Signal West Femto Maximum Sensitivity Substrate (Thermo Fisher, 34096), and imaged in an Amersham Imager 600. As a protein loading control, the membranes were re-blotted with primary antibody rabbit anti-GAPDH-HRP polyclonal antibody (Proteintech, 10494-1-AP) 1:4,000 in blocking buffer, and incubated for 1h at room temperature. Washes and secondary antibody incubation were performed as previously described. The presence of two bands in NFE2 western blot likely corresponds to different post-translational modifications of NFE2 [17].
TA cloning
In order to sequence the edited region in BLaER1-Cas9 cells, we amplified the deletion junctions by PCR using oligos outside the cut region (Supplementary Table S11). The resulting PCR products were cloned using a TA cloning kit (Life Technologies), according to manufacturer’s instructions, and sequenced by Sanger sequencing.