Human embryonic kidney (HEK293T), primary human skin-derived fibroblasts (Fib), U2OS, SKOV-3, and PC9 cells were cultured in DMEM media containing 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin in a tissue culture incubator at 37 °C with 5% CO2. PCR mycoplasma detection kit (cat no. PM008) was routinely used to test the mycoplasma contamination. The cells used in this study have given negative results in mycoplasma contamination test. SpCas9-expressing HEK293T (HEK293T-SpCas9) cells were generated by a PiggyBac transposon system followed by selection in the presence of 50 µg/ml hygromycin to ensure high Cas9 activity. HEK293T cells were transient transduced with pPB-TRE-spCas9-Hygromycin vector and pCMV-hybase vector with a 9:1 ratio to generate SpCas9-expressing HEK293T.
The LentiU6-LacZ-GFP-Puro (BB) vector was generated by our group previously (Addgene ID: 170459). This plasmid can also be acquired from the Luo lab (https://dream.au.dk/tools-and-resources).
SURRO-seq library design
Each SURRO-seq oligo consists of a BsmBI recognition site “cgtctc” with 4 bp specific nucleotides “acca” upstream, following the GGA cloning linker “aCACC”, one bp “g” for initiating transcription from U6 promoter, 20 bp gRNA sequences of “gN20”, 82bp gRNA scaffold sequence, 37bp surrogate target sequences (10bp barcode sequences, 20 bp protospacer and 3 bp PAM sequences, 4 bp downstream sequences), the downstream linker “GTTTg” and another BsmBI binding site and its downstream flanking sequences “acgg”.
The SURRO-seq pool was designed as follows: (1) LibA contains 11 on target and corresponding 170 off target gRNAs from three published off-target detection methods (T7E1, GUIDESeq, CIRCLESeq); (2) LibB contains 110 gRNAs retrieved from published studies, which we expect to have sequence characteristics representative of gRNAs in gene therapy use (Cancers, PD-1, DMD, β-hemoglobinopathies, SCD, CCR5, HTT, CEP290). (3) We predicted off-target sites of each gRNA with FlashFry (v 1.80) and retrieved potential off-target with up to 4bp mismatches in human genome hg19. (4) For each surrogate site, we added 10bp barcode (fixed “AC” for the first two nucleotides + 8 bp Unique molecular identifiers ( UMIs ) sequences) to the upstream sequence of each selected gRNA, constructed the surrogate target sequence as 10 bp barcode+ 23 bp gRNA (include PAM) + 4 bp downstream = 37 bp; (5) Off target sites with BsmBI recognition site were discarded, because of GGA cloning; (6) LibC contains surrogate sites with all possible 1bp mismatch for five RGNs. The oligo pools were synthesized in Genscript® (Nanjing, China), and all sgRNA sequences and their oligos are provided in Supplementary Data.
Construction of SURRO-seq plasmid library
PCR amplification was used to amplify the 170-nt oligonucleotide pool. Firstly, the SURRO-seq oligos diluted to 1 ng/µl and then performed PCR amplifications using the primers: SURRO (BsmBI GGA)-F and SURRO (BsmBI GGA)-R (Supplementary Data 6). The PCR reaction was carried out using PrimeSTAR HS DNA Polymerase (Takara, Japan) following the manufacturer’s instruction.
The PCR products of SURRO-seq oligos were then used for Golden Gate Assembly (GGA) to generate the plasmids library. 36 parallel GGA reactions were performed, and the ligation products were pooled into one tube. Transformation was then carried out using chemically competent DH5α cells. For each reaction, 10 µl GGA ligation product was transformed in to 50 µl competent cells and all the transformed cells were plated on one LB plate (15 cm dish in diameter) with Xgal, IPTG and Amp selection. High ligation efficiency was determined by the presence of very few blue colonies. To ensure that there was sufficient coverage of each surrogate vector in the oligonucleotide library. For one library containing 12,000 synthetic oligos, 42 parallel transformations were performed, and all the bacterial colonies were scraped off and pooled together for plasmids midi-prep (PureLink™ HiPure Plasmid DNA Midiprep Kit). For small library, equal ratio reduction can be adjusted accordingly. For NGS-based quality quantification of library coverage, midi-prep plasmids were used as DNA templates for PCR amplifications, followed by gel purification and NGS sequencing.
SURRO-seq plasmid library lentivirus packaging.
Supernatants containing lentiviral particles were produced by transient transfection of HEK293T cells using PEI 40000 (Polyethylenimine Linear, MW 40000). For 10 cm dish transfection, the DNA/PEI mixture contains 13 µg pLenti-TRAPseq vectors, 3 µg prove-REV, 3.75 µg pMD.2G, 13 µg pMDGP-Lg/p-RRE, 100 µg PEI 40000 solution (1 µg/µl in sterilized ddH2O) and supplemented by Opti-MEM without phenol red (Invitrogen) to a final volume of 1 mL. The transfection mixture was pipetted up and down gently several times, and further incubated and kept at room temperature (RT) for 20 min. The transfection complex was added to 80%-confluent HEK293T cells in a 10-cm dish containing 10 ml of culture medium. After 48 h viral supernatant was harvested and filtered with a 0.45 µm filter. Polybrene solution (Sigma-Aldrich) was added to the crude virus solution to a final concentration of 8 µg/mL. The crude virus solution was aliquoted into 15 mL tubes (5 mL/tube) and store in -80 °C freezer until used.
Lentivirus titer quantification by flow cytometry (FCM)
The LentiU6-LacZ-GFP-Puro (BB) vector expresses an EGFP gene. The functional titer of our lentivirus prep was assayed by FCM. Briefly, 1) Day 1: Seed HEK293T cells to 24-well plate. 18 wells were used to perform the titter detection, a gradient volume of the crude lentivirus was added into the cells and each volume was tested with two replicates; 2) Day 2: Transduce cells at 60~80% confluence. Before transduction, determine the total number of cells using one well of cells. The remaining wells were changed to fresh culture medium containing 8µg/mL polybrene. A gradient volume of crude virus was added to each well and mix gently; 3) Day 3: Change to fresh medium without polybrene; 4) Day 4: Harvest all the cells and wash with PBS twice. The cells were fixed in 4% formalin solution at RT for 20 min and spun down the cell pellet at 2,000 rpm for 5 min. Cell pellet was washed with PBS twice and re-suspended in PBS solution, followed by FCM analysis. FCM was performed using a BD LSRFortessaTM cell analyzer with at least 30,000 events collected for each sample in duplicates. The FCM output data was analyzed by the software Flowjo vX.0.7. Percentage of GFP-positive cells was calculated as: Y% = NGFP-positive cells / Ntotal cells x 100 %. For accurate titter determination, there should be a linear relationship between the GFP positive percentages and crude volume. The titter (Transducing Units (TU/mL) calculation according to this formula: TU/mL = (Ninitial x Y% x 1000)/V. V represents the crude volume (µl) used for initial transduction.
SURRO-seq library lentivirus transduction
HEK293T-SpCas9 cells were cultured in D10 medium with 50 µg/ml hygromycin throughout the whole experiment. For SURRO-seq library transduction, at Day -1: 2.5 x 106 cells per 10 cm dish were seeded. For a 12K SURRO-seq library, transductions were performed in 10 replicates to reach 4000X coverage. For each group, one plate was used for cell number determination before transduction and another plate was used for drug-resistance (puromycin) test control. The remaining 10 plates were used for the SURRO-seq lentivirus library transduction (transduction coverage per gRNA exceeds 4000X of a 12K library); 2) Day 0: We first determined the approximate cell number per dish. This was used to determine the volume of crude lentivirus used for transduction using a multiplicity of infection (MOI) of 0.3. The low MOI (0.3) ensured that most infected cells receive only 1 copy of the lentivirus construct with high probability. The calculation formula is V = N x 0.3 / TU. V = volume of crude lentivirus used for infection (ml); N = cell number in the dish before infection; TU = the titter of crude lentivirus (IFU/mL). The infected cells were cultured in a 37 ℃ incubator; 3) Day 1: 24 hours after transduction, the cell was passaged at a ratio of 3 folds. 4) Day 2: The transduced cells were cultured in D10 medium containing 50 µg/ml hygromycin, 1 µg/mL puromycin, and 1 µg/mL doxycycline to induce Cas9 overexpression. 5) The transduced cells were spitted every 2~3 days when cell confluence reaches up to 90% at a ratio of 1:3. Cells from day 10 were harvested for further genomic DNA extraction. Parallel experiments were performed using wildtype HEK293T cells as MOCK controls.
PCR amplicons of TRAPs from cells
Genomic DNA was extracted using the phenol-chloroform method. Then the genomic DNA was purified and subjected to SURRO-seq PCR. The PCR primers were SURRO-NGS-F and SURRO-NGS-R1 (Supplementary Data 6). In this study, 5 ug genomic DNA was used as template in one PCR reaction which contained approximately 7.6 x 105 copies of surrogate construct which covered about 63 times coverage of a 12K SURRO-seq library. For a 12K library, 32 parallel PCR reactions were performed to achieve approximately 2,016 times coverage of each construct. Then the PCR products were purified by 1.5% gel and mixed with equal amounts and deep sequenced.
All synthetic gRNAs used for validation of OTs were chemically modified to increase stability in cells and synthesized by Synthego Co. (California).
The CRISPR RNP was delivered into cells by nucleofection. For one nucleofection, 6 µg SpCas9 protein (Cat# 1081059, IDT) and the 3.2 µg synthetic gRNA (Synthego) was mixed in a PCR tube by pipetting and incubated at room temperature for at least 10 min and maximum 1hr. Then 200,000 suspended cells were gently resuspended cells in 20 uL nucleofection buffer (OptiMEM) by pipetting up and down. The cells and RNP complex were then transferred to a 4D-Nucleofector 16-well nucleocuvette strip (Catalog #: AXP-1004, Lonza). The samples should cover the bottom of the wells, and any air bubble must be avoided. Nucleofection was performed with program CM-138. Immediately after electroporation, prewarmed culture media was added to the cells (180µL per well of the Nucleocuvette strip). The cells were then transferred into one well of a 12-well cell culture plate with prewarmed medium. Cells were harvested for amplicon PCR and deep sequencing 48 hours after transfection.
Deep amplicon sequencing
The on-target and off-target site were amplified by PCRs. All primers used for PCR were showed in Supplementary Data 6. The amplicons were subjected to deep sequencing on the MGISEQ-2000 (MGI of BGI in China) platform. All the samples were subjected to pair-ended 150 bp deep-sequencing on MGISEQ-2000 platform.
Raw data processing
FastaQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and fastp (https://github.com/OpenGene/fastp) with default options were used for data quality control and filtering with the default parameters. The pair-end data was assembled using flash (http://www.cbcb.umd.edu/software/flash). BWA-MEM with default options was used to map the assembled data to the designed oligos sequence to preliminarily distinguish the data of each TRAP site.
The pysam module of Python-3.8 was used to split the aligned data according to the site number of the chip, and the reads of different sites were obtained. Then, we used three steps of strictly controlling parameters to filter the data of each site. Firstly, according to the structure of the chip, g + gRNA (20bp) + scaffold (82bp) + barcode (10bp) + GTTT should remain unchanged at the beginning and end of each site. Then, in order to remove the chip synthesis errors, the pseudo editing sequences found in WT group were removed from spcas9 group. Finally, in order to remove the interference of sequencing errors on the data, the extracted sequence of each site was re-aligned to the reference sequence, and the 1bp indel on N1-N14 and N22-N27 of TRAP (27bp) sequence were removed. The above three filtering steps were completed with julia-1.5.3 language.
Fisher's exact test and statistical analysis
In order to obtain stable and effective off-target efficiency, false positive results must be excluded. We used the number of reads of indel and no indel in spcas9 group and WT group to form a 2 × 2 matrix. Fisher's exact test was used to confirm whether the editing of each site was effective. In order to reduce False Discovery Rate (FDR), all p-values were corrected by BH (Benjaminiand Hochberg) method. Next, we used strict parameters (Total read numbers(spCas9) 32, Indel read numbers (spCas9) 5, Indel Frequency (IF%) (WT) 25) to filter off-target efficiency with bias. Then we used parameters (Fold Change (FC) > 2, p-value (adjusted by BH) < 0.05) to divide the off-target data set into two parts for downstream analysis. The calculation formula of indel efficiency is as follows:
All NGS data generated by this study have been shared via the CNGB public data depository with the following accession numbers: CNP0001979, CNP0002648. A complete list of 704 NGS samples were summarized in Supplementary Data 7.