Background: Single-cell RNA sequencing (scRNA-seq) technology has advanced in past decade, which enables gene expression analysis to be carried out at higher resolution. This technology is of great significance for exploring the dynamic development process of life, studying the regulation mechanism of genes and discovering new cell types. However, there are still some limitations in scRNA-seq technology. scRNA-seq technology only detect 5-15 percent of the mRNA molecules. Therefore, low-expression genes are difficult to detect in scRNA-seq data.
Method: The scRNA-seq data tend to be bimodal expression distribution because the expression is either strongly zero or high expression. In this paper, we proposal a method scRNA-seq complementation (SCC) to solve the dropouts in scRNA-seq data. Firstly, we find the nearest neighbor cells of every cell, and then use a mixture model to impute the dropouts of scRNA-seq data. The model can identifies the possibility of dropouts and estimates the reasonable gene expression value.
Results: We use SCC and two existing algorithms to test performance on simulated data and three real scRNA-seq datasets (Kolod, Pollen and Usoskin). The result shows that SCC outperforms existing tools. SCC significantly reduce the intra-class distance of cells and enhance the clustering of cell subpopulation, which is significant for future research on gene expression.
Conclusions: SCC is an effective tool to resolve the noise in scRNA-seq data. The code is freely accessible at the website:https://github.com/nwpuzhengyan/SCC.