We developed a fast computational pipeline (Extended Data Fig. 1a) in order to search for novel CRISPR-Cas13 systems from metagenomic data sets.
Using the CRISPR array as a search anchor, we first obtained all metagenomic assemblies from the JGI database8 and adapted existing algorithms for de novo CRISPR array detection9. This led to the identification of 340,425 putative CRISPR repeat arrays (Extended Data Fig. 1a). Up to 10 kilobases (kb) of genomic DNA sequence flanking each CRISPR array was extracted to further identify predicted protein-coding genes in the immediate vicinity. In order to identify compact Cas13 effectors for effectively in vivo delivery, we searched for 250, 901 candidate proteins with 400-900 amino acid residues and within 10 protein-coding genes of the repeat, and found 24,959 proteins containing two predicted RxxxxH motifs of the HEPN ribonuclease domain separately located at the N- and C-terminus of the protein (Extended Data Fig. 1a). Among all RxxxxH motif-containing proteins, 64 contained two RxxxxH motifs of the three following types: RNxxxH, RHxxxH, and RQxxxH. These three types were also found in the majority of previously known Cas13 (Extended Data Fig. 1b). Based on the fact that all reported CRISPR/Cas13 systems have a single crRNA with conserved stem- loop structure10, we identified thirty-one Cas13 candidates (Extended Data Fig. 1a). After excluding proteins with known functions in NCBI NR database, we obtained six candidate Cas13 proteins (Extended Data Fig. 1a). Further alignment of the six proteins back to the original pool of 24,959 proteins yielded one more candidate protein with RNxxxH and RxxxxH motif (Extended Data Fig. 1a). Based on sequence divergence, seven candidate proteins with exceptionally small sizes (775 to 803 amino acids) could be classified into two novel Cas13 families (Fig. 1a, b). We named them as “type VI-E” and “VI-F” families of Cas13, with two members (“Cas13e.1”, “Cas13e.2”) in VI-E, and five members (“Cas13f.1” to “Cas13f.5”) in VI-F (Fig. 1a, b).
The RNA-targeting activity of these seven novel Cas13 proteins was further screened in order to identify highly active Cas13 orthologs, using a eukaryotic cell-based mCherry reporter system (Extended Data Fig. 2a). By synthesizing the human codon-optimized version of each protein, we generated mammalian expression plasmids carrying the catalytically active or inactive proteins by mutating RxxxxH motifs11,12 (Extended Data Fig. 2b). Each protein was then fused with both N- and C-terminal nuclear localization signals (NLS). These VI-E and VI-F proteins were paired with two distinct forms of guide RNAs, either with a 30-nucleotide (nt) spacer flanked by two 36-nt direct repeat (DR) sequences to mimic an unprocessed guide RNA (pre-crRNA) or a 36-nt direct repeat with 30-nt spacer (crRNA) predicted to mimic mature guide RNAs (Fig. 1c, d). To determine crRNA architecture, we first tested DR position at 5’ or 3’ end of crRNA with reporter inhibition assay (Extended data Fig. 2c). The crRNA with 3’ DR instead of 5’ DR showed substantial suppression of reporter expression (Extended Data Fig. 2d), indicating that the crRNA accompanying Cas13e.1 shared similar 3’ DR structure with that of previously reported Cas13b13. We then assessed the ability of different VI-E and VI-F proteins to knock down the mCherry reporter level in cultured HEK293T cells. Two days after transfection with the plasmid expressing each of the VI-E and VI-F protein and corresponding single target-specific crRNA, we observed significant reduction of mCherry protein, with Cas13e.1 exhibiting the highest knockdown efficiency (Fig. 1d, e). In contrast, transfection with non-targeting (NT) crRNA together with each Cas13, or alternatively, crRNA with inactive Cas13, had no significant effect on the mCherry level (Fig. 1d, e, Extended Data Fig. 2d), suggesting crRNA- and HEPN-dependent knockdown. It was found that both the single DR crRNA and pre-crRNA with dual DR could mediate potent knockdown, and NLS significantly improved knockdown activity of Cas13e.1 (Fig. 1d, e). To determine the optimal spacer length for efficient Cas13e.1 targeting, we generated a series of spacer ranging from 20 to 50 nt in length (Extended Data Fig. 2e). Reporter inhibition activity dropped significantly below 30-nt spacer (Extended Data Fig. 2e), and thus crRNAs with 30-nt spacer were used for the following RNA interference experiments unless otherwise indicated.
We next sought to compare the knockdown efficiency of Cas13e.1 and Cas13f.1 against that of previously identified Cas13 proteins, Cas13a14, Cas13b13 and Cas13d12 (Extended Data Fig. 1c). Across three crRNA target loci in mCherry, Cas13e.1, Cas13f.1 and RfxCas13d overall outperformed over LwaCas13a and PspCas13b in HEK293T cells at 48 hr after transfection (Fig. 2a, Extended Data Fig. 3a). Further experiments in examining the knockdown efficiency for endogenous transcripts (3 genes, each with 3 crRNAs) showed that Cas13e.1 and RfxCas13d were similarly efficient (80.7 ± 2.1 % vs. 78.8 ± 2.9 %, mean ± s.e.m, p = 0.6; Fig. 2b, Extended Data Fig. 3b). To confirm that RNA interference by Cas13e.1 is broadly applicable, we selected a panel of 12 additional human genes with diverse roles in mammalian cells, using 3 crRNA per gene. We found that Cas13e.1 consistently showed high-level knockdown activity for each gene, using any of the three crRNAs (Fig. 2c), indicating the uniformity of the Cas13e.1 system for RNA interference. Because Cas13 family is capable of processing its own CRISPR array12, we next leveraged this property for the delivery of pre-crRNA for multiple targeting with a simple single-vector system (Fig. 2d). We found that robust simultaneous knockdown of four RNA transcripts could be achieved by transfection of Cas13e.1 together with an array encoding four crRNAs, each tiling one mRNA (EZH2, HRAS, or PPARG) and a nuclear localized long non-coding RNA (lncRNA, MALAT1) (Fig. 2d). Furthermore, we also found that Cas13e.1 could achieve ~90% knockdown of mouse endogenous transcripts Pten in cultured mouse N2a cells (Extended Data Fig. 4a). Cas13e.1 targeting was specific, because transcriptome-wide RNA-seq analysis showed that more than half of top-ranked genes with altered expression were related with Pten (Extended Data Fig. 4b, Extended Data Table 2).
By fusion with various versions of the ADAR2 deaminase domain (ADAR2dd), Cas13 could also be engineered for RNA base editing15,16, i.e., to convert adenine (A) to inosine (I) or cytosine (C) to uridine (U). We have fused dCas13e.1 with high fidelity ADAR2dd (with E488Q/T375G, referred as ADAR2dd*) to generate A-to-I RNA base editors (named as “eABE”). To test the activity of eABE, we generated an RNA-editing reporter using a mutated mCherry with a nonsense mutation [W98X (UGG to UAG)], which could functionally be repaired to the wild-type codon through A-to-I editing, and mCherry fluorescence could be detected after eABE editing (Extended Data Fig. 5a). We found that eABE indeed effectively induced mCherry fluorescence in cells transfected with mutant mCherry transcripts, together with both eABE and 50-nt crRNA, but not with either alone (Fig. 3a). To reduce the size of dCas13e.1 for efficient in vivo delivery, we generated various base editors by fusing the truncated dCas13e.1 (using structure- guided method) with ADAR2dd* (Fig. 3b, Extended Data Fig. 5b). We then systematically screened the editing activity of a variety of fused base editors with different dCas13e.1 truncations at either or both N- and C-terminus (Fig. 3b) in search of miniature editors, and identified the smallest and functional editor (“mini”) with 150 aa and 180 aa truncation at C- and N- terminus, respectively (Fig. 3b), suitable for packaging into commonly used adeno- associated virus (AAV). We then examined the effect of mismatched base position with 50-nt spacer on A-to-I editing efficiency for both full size and mini eABE (Extended Data Fig. 6a), and found that mismatched base position from 15 to 25 nt on crRNA sequence yielded higher editing efficiency than other positions (Extended Data Fig. 6b). The RNA editing efficiency of the full-size and mini eABE system was further examined in mammalian cells for several endogenous transcripts. We found A-to-I conversion were efficiently achieved by both editors (Fig. 3c, Extended Data Fig. 6c). To extend the base editing capability of the dCas13e.1 protein, we further generated C-to-U base editor (“eCBE”) by fusing full-length or truncated dCas13e.1 with RNA cytosine deaminase derived from evolved ADAR216, and found both full-length and mini eCBE could achieve efficient C-to-U editing as well in HEK293T cells (Fig. 3d).
To create effective and specific crRNA sequences to target and cleave SARS- CoV-2, we first performed a bioinformatics analysis by aligning published SARS-CoV-2 genomes17 and selected 30 crRNAs targeting RNA sites coding for RdRP (RNA-dependent RNA Polymerase) and E (envelop) proteins (with 15 crRNAs for each). Proof-of-concept experiments were performed on RdRP and E sequences that are conserved among SARS-CoV viruses (Fig. 4a).
The RdRP protein is the antiviral target for Remdesivir18 and E protein is critical for SARS-CoV pathogenesis19. To evaluate whether Cas13e.1 is effective for degrading SARS-CoV-2 sequences, we created a reporter by fusing GFP with synthesized partial SARS-CoV-2 fragments of RdRP (genome coordinates 15,037-15,158 bp) and E (26,232-26,394 bp) (Fig. 4a). At 48 hours after co-transfection of HEK293T cells with the reporter and Cas13e.1/crRNAs, we observed that nearly all RdRP- and E-targeting crRNAs tested (27 out of 30) were able to support the suppression of GFP fluorescence in the cells by about 70%, as compared to that found for control transfection with the non-targeting crRNA (Fig. 4a).
To examine whether Cas13e.1 can tolerate mismatches between the crRNA and the targeted viral RNA, we calculated the knockdown activity for an example crRNA (SARS-CoV-2 crRNA_1) with 1 or 2 mismatches (Fig. 4b) and found that Cas13e.1 could well tolerate single nt mismatch at different positions on the example crRNA (Fig. 4b). Results on two-tandem mismatches revealed a critical (seed) region between 16-30 nt of the crRNA for efficient Cas13e.1-induced knockdown (Fig. 4b). We next examined the minimal number of crRNAs that required to target the majority of known coronaviruses found in both humans and animals, using a similar strategy previously described20. From all known 3,137 coronavirus genomes, we identified approximately 7.1 million potential crRNA targets (Extended Data Fig. 7a). Based on the above results on the tolerance for nt mismatch, we estimated that only five 22-nt and six 30-nt crRNA with zero mismatch were able to target over 90% of coronavirus genomes (Extended Data Fig. 7b, c). With the tolerance for single-nt mismatch, we estimated that 3, 10 and 17 crRNAs could target 95.3%, 99.1% and 100% of all coronaviruses, respectively (Fig. 4c, d). The ability to use a relatively small number of crRNAs to broadly target nearly all coronavirus strains points to the uniqueness of Cas13e.1-based RNA interference approach, in contrast to traditional vaccination or pharmaceutical ones.
Next, we applied the CRISPR/Cas13e.1 strategy for inhibiting influenza RNA virus H1N1 that has a tropism for respiratory tract epithelial cells similar to SARS-CoV-2. We directly designed four crRNAs targeting at the nucleoprotein segment of the H1N1 genome that is essential for viral replication and transcription19,21. To test antiviral ability of Cas13e.1 in a setting that mimics virus infection, we used an influenza H1N1 strain “A/Puerto Rico/8/1934”22 in the MDCK (Madin-Darby canine kidney) cell line (Fig. 4e). Compared with non-targeting crRNA, 3 out of 4 crRNAs showed high knockdown efficiency on the nucleoprotein transcript (Fig. 4f).
Consistently, target-specific crRNAs significantly reduced the abundance of nucleoprotein-positive H1N1 virus found in the supernatant of infected cultures, indicating effective inhibition of viral growth (Fig. 4g). Together, these results showed that Cas13e.1 system could be used to confer antiviral ability for mammalian cells.
In summary, by mining of metagenomic sequence data sets of natural uncultivated microbes, we identified two novel families of small CRISPR/Cas systems (type VI-E and VI-F), highlighting the diversity of natural microbial CRISPR systems. Furthermore, we demonstrated that Cas13e.1-based gene targeting can effectively cleave RNA sequences of SARS-CoV-2 fragments and influenza IAV in cultured cell models. Notably, Cas13e.1 exhibited good mismatch tolerance, capable of targeting over 99% coronaviruses using a minimal set of 10 crRNAs with single nucleotide mismatch. This may help to prevent the virus from escaping antiviral inhibition through mutation.
Compared with previously reported Cas13a/b/d, Cas13e.1 has a very compact size and robust RNA-guided ribonuclease activity, thus useful for in vivo RNA editing-based research and therapeutic applications23-26.