Since the early 1960s, scientists have put in a lot of work in devising a precise, safe, and time-effective method of genome editing. Ranging from the infamous recombinant technology as a backbone for gene therapy to the zinc-finger nuclease (ZNF) technology and TALENS, all the methods had significant shortcomings which made them ineffective as a reliable genome editing tool [1]. This was until CRISPR technology came along as a genome editing tool in 2012. Initially discovered as a part of the bacterial adaptive immune system, Clustered Regularly Interspaced Short Palindromic Repeats, or CRISPRs, are short repetitive nucleotides found within the genome of prokaryotes (such as bacteria and archaea). In 1987, Atsuo Nakata and colleagues first reported the presence of repetitive sequences separated by spacers that are non-repetitive (later this DNA arrangement was termed CRISPR) in the genome of E. coli [2]. CRISPRs in bacteria confer them protection from attack of bacteriophages, viral DNA, and plasmids. Foreign DNA sequences called Spacers are found nestled between the palindromic repeats of bacterial origin. This arrangement accords a memory of the infection to the bacterial immune system. Mobile genetic elements, such as transposons and bacteriophages that have infected the bacterium at some point, give rise to these Spacers [3]. During infection, bacteria acquire a small piece of the foreign viral DNA and integrate it into the CRISPR locus to generate CRISPR arrays [4]. Transcription and associated modifications followed, giving rise to a CRISPR RNA (crRNA). An involvement of CRISPR-associated nuclease protein (Cas9) is thereafter established. Alongside, occur the molecules of tracrRNA, possessing sections complementary to and thereby can anneal to the palindromic repeats. Ribonucleases cleave the strands between the entire association of different RNA and protein molecules generating individual effector complexes of three components. When the effector complex encounters a section of viral DNA with complementary sequences to that of the crRNA, nucleases coordinate with it. A unique viral genome sequence called the protospacer adjacent motif (PAM) acts as a binding signal for nuclease and both domains of the latter cleave the two strands of DNA, just a few bases upstream from the PAM. Hereby, the viral genome is neutralized and infection is avoided.
This entire mechanism within the bacteria formed the basis for CRISPR-Cas9 being proposed as a method of genome editing in modern applications. Dr. Jennifer Doudna and Dr. Emmanuelle Charpentier received Nobel Prize in Chemistry (2020) for their work propounded that the bacterial CRISPR-Cas 9 could be used as a programmable toolkit for site-specific genome editing in humans and other animal species [5]. A breakthrough was achieved by in vitro joining of crRNA and tracrRNA thus generating a single guide RNA (sgRNA) [6]. The association of Cas9 protein to sgRNA forms a two-component functional effector complex that is as competent as the bacterial three-component system. From the studies of Carroll [7], it is understood that, with just the generation and insertion of an appropriate sgRNA with accurate complementary sequence and Cas9 sourced from Streptococcus pyogenes, it is possible to determine any 20 base pairs target sequence for editing along the PAM sequence. Nuclease forms an incision at the two target DNA strands and thereby a natural DNA repair mechanism occurs via either of the two routes: Homology-directed repair (HDR) or Non-homologous end joining (NHEJ). NHEJ, common in a eukaryotic domain, does not require a homologous template DNA and is error-prone due to the creation of indels which are DNA strands with either insertion or deletion nucleotide sequences [8]. While the complex and uniform HDR pathway, common in bacteria and archaea, uses a DNA template with homology to the adjacent sequences surrounding the site of cleavage to incorporate new DNA fragments.
CRISPR-Cas systems are classified into two major classes including six types and are further divided into sub-types [9]. Class I CRISPR systems possess multiple subunit effector molecules and include DNA targeting Type I (7 subtypes; carry Cas3 loci), DNA and RNA targeting Type III (4 subtypes; carry Cas 10 loci), and a putative Type IV. Class II possesses single large proteins and include Type II, Type V and Type VI each with 3 subtypes carrying loci for Cas9, Cas12, and Cas13 respectively.
The raft of CRISPR applications has only expanded ever since the work of Doudna and Charpentier was published. CRISPR becomes a vital tool in genetic screening to identify genes such as in cancer immunotherapy [10], therapeutic management of AIDS [11], and an assay of SARS-CoV-2 [12]. CRISPR carries with it the promise of curing allergies [13] and preventing certain gene-linked diseases [14]. Gootenberg et al [15] has harnessed the knowledge of Cas13 to generate a CRISPR diagnostic kit-SHERLOCK which has successfully shown the detection of both Zika virus and certain strains of dengue fever. Major work has already been done in targeted epigenome modification by the alteration of Cas9 [16]. This site-specific genome-modifying tool also finds several applications in conferring disease resistance along with improvement of phenotypes, quality, and crop yield in agriculture.
As per Miano et al [17] studies, traditional methods of generation of target clones, their incorporation into the blastocyst of animals and further breeding and validation of these animals to produce knock-in or knockout models has been a complicated maneuver. With the rise of CRISPR technology, it is now possible to generate new mouse models with high specificity and efficiency in shorter time frames by disrupting the gene sequence [18].
Components of CRISPR Genome editing
In bacteria, CRISPR is made up of three essential elements: a trans-activating crRNA(tracrRNA), a CRISPR RNA (crRNA), and a CRISPR-linked endonuclease (Cas9). Through a straightforward base pairing, the tracrRNA binds to Cas9 and the crRNA attaches to the tracrRNA. The complex is subsequently bound to DNA at the desired location, where Cas9 carries out its cleaving action [19].
In a CRISPR system in which a designed, tracrRNA and crRNA are joined to create a single guide RNA (sgRNA). Because of this, the CRISPR system is made up of just two parts: a Cas9 protein and a single guide RNA. The CRISPR-Cas system is now a more flexible and practical tool for site-directed gene editing as a result of this simplification. The gRNA and Cas9 are joined to form a nucleoprotein complex in CRISPR investigations [20]. To allow an endonuclease to cleave the target DNA, the Cas9-gRNA complex recognizes the protospacer adjacent motif (PAM) region and forms a Watson-Crick base pairing with the 20 nucleotide target DNA [21].
Based on the type of protein that cleaves the target nucleic acid and the structure of the CRISPR-Cas locus, the CRISPR-Cas system is split into two classes and six types. Type I, III, and IV CRISPR-Cas systems make up class 1. The first type is known as CASCADE (CRISPR-associated complex for antiviral defense); it consists of numerous Cas proteins and crRNA. It consists of Cas3, which degrades the target by containing helicase and DNase domains. Cas10 is a component of type III. They use crRNA complementarity to break transcriptionally active RNA. While Cas 10 cleaves ssDNA, Cas 7 cleaves RNA. Type IV is present in plasmid-like areas and may be necessary for plasmid maintenance [22].
Types II, V, and VI of the CRISPR-Cas system are categorized as Class 2. The Cas9, Cas1, Cas2, and Cas4 molecules make up Type II. Gene therapy employs them. With the aid of the endonucleases HNH and RuvC, Cas9 breaks the target DNA into dsDNA segments. They need tracrRNA, a non-coding RNA, in addition to crRNA to cleave DNA. Adaptation involves Cas1 and Cas2. The Cas12 protein found in Type V uses the RuvC domain to cut DNA [22]. Additionally, it has the cpf1 endonuclease, which can identify the PAM 5'-TTN that is widely found in the human genome [21]. The complementary RNA target is located by the Cas13 protein of type VI by binding to crRNA. Due to its high efficiency and simplicity when compared to other tools and its capacity to combine with multiple single-guide RNAs to achieve effective genome editing in cells, the Cas9 from type II CRISPR system is one of these that is frequently used to facilitate genetic manipulation in organisms and various cell types [23, 24].
The Cas9 protein, also known as a genetic scissor, is a multi-domain, multifunctional DNA endonuclease enzyme that cleaves the genome at certain locations to create a double-strand break [4]. It is an RNA-guided DNA endonuclease that is non-specific. The specific DNA location where Cas9 breaks the double-strands of DNA is directed by the sgRNA. Cas9 remains in an inactive state when sgRNA is not present. SpCas-9, the most widely used Cas9 nuclease, and the first Cas9 nuclease to be utilized for genome editing, is one of several Cas nucleases that have been identified from bacteria [21].
The recognition (REC) lobe and the nuclease (NUC) lobe are the two lobes that make up the Cas9 protein. The REC lobe, which is made up of the REC 1 and REC 2 domains, is in charge of binding gRNA. The biggest domain, REC 1, is in charge of binding sgRNA. RuvC and HNH-like nuclease domains are the two endonuclease domains that make up the NUC lobe. The complementary strand is cut by the HNH domain, while the RuvC domain cuts the second non-complementary strand [4,19].
Other Cas9 variations have been created, including nickase Cas9 (nCas9) and dead Cas9 (dCas9). Any one of the nuclease domains (HNH or RuvC) will be rendered inactive in the case of nCas9, whereas both domains will be rendered inactive in the event of dCas9. One strand of the DNA is cut when either one of these domains is inactive, and when both domains are inactive, the endonuclease still can connect to DNA but is unable to limit DNA. Dead Cas9 is a DNA-binding protein that serves as a site-specific vehicle and can be applied in experimental research. A pair of nCas9, which produce paired nicks rather than double-stranded breaks, can lessen off-target cleavage [21].
The guide RNA (gRNA) is a particular RNA that directs the CRISPR system to the precise editing spot. For Cas9 to bind, a short synthetic RNA that has already been designed is required. The sgRNA is made up of a scaffold that binds the endonuclease and a spacer that contains 20 nucleotides that are intended to target particular genomic locations [21]. One tetraloop and two or three stem-loops make up the T-shaped strand of RNA that makes up gRNA. It consists of two RNAs that direct Cas9 to the intended site: 1) The CRISPR RNA (crRNA), which pairs with the target sequence to specify the target DNA, and 2) The trans-activating RNA (tracrRNA), which acts as a scaffold for Cas9 nuclease interaction. A linker connects the crRNA and tracrRNA [4].
The spacer sequence that guides the complex to the target DNA and a region that binds to the tracrRNA are the two main components of crRNA. It is made up of two domains, the first of which is at the 3' end and is joined to the 5' terminal area of the tracrRNA by Watson-Crick pairing. The second domain, which is target-specific and can be designed to base pair with the target DNA, is situated at the 5' end [20].
Each species' tracrRNA is distinct and attaches to the host-specific Cas9 as part of the host immune system. The maturation of crRNA from precrRNAs is facilitated by the tracrRNA, which connects crRNA to Cas9. A functional gRNA is created when tracrRNA base pairs with crRNA [24].
The PAM (protospacer adjacent motif) sequence, a brief sequence found on the target DNA strand, precedes the gRNA. It is required for Cas9 to function as an endonuclease. Another crucial element of the CRISPR system is PAM. It is recognized by the Cas nucleases and resists cleaving in the absence of a PAM. They ensure that CRISPR arrays, not foreign viral DNA, are cleaved. Each Cas9 enzyme uses a different set of these sequences. Depending on the type of bacterium from which Cas9 was produced, the PAM sequence differs. The PAM sequence is NGG and is found on the 3' end of the gRNA sequence in the most widely used type II CRISPR system, which is generated from Streptococcus pyogenes [19].
The Cas9 is localized to the target genomic sequence by the gRNA/cas9 interaction, and the Cas9 cleaves both strands of DNA to create a double-strand break (DSB) [23]. Cas9 removes DNA 3–4 nucleotides upstream of the PAM region, and DNA repair that follows uses one of the following mechanisms:
1. Non-homologous end joining (NHEJ): This method involves random base pair insertions or deletions at the cleaved site, and it is more common in most cell types. As a result, it is prone to errors and frequently causes frameshift mutations that result in premature stop codons or non-functional peptides.
2. Homology-directed repair (HDR): An error-free mechanism. The right sequence of a repair template is utilized to rectify mutations that cause disease [4].
The exploitation of CRISPR for Genomic Engineering
Oncogenesis is characterized by multiple aspects like multiple gene interactions, systemic signals, and cell types, all of which can be interrogated only at the organizational level [25]. Although the Genetically engineered mouse models (GEMM) of human cancer have been effective in interrogating cancer biology the protocol is very laborious, lengthy and the transgenic generation of GEMM and targeting of a gene is very expensive [26]. These setbacks open up new avenues to gene editing in mammalian cells by employing the CRISPR/Cas9 adaptive immune system [27].