Background CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragment. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level.
Results Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings).
Conclusion Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models. The code of CnnCrispr can be freely downloaded from https://github.com/LQYoLH/CnnCrispr.