Efficient CTCF knockdown and single cell RNA-seq
We knocked down CTCF in EL4 cells by short hairpin RNA (shRNA). Western blotting showed a dramatic decrease of the CTCF protein level in shCTCF #1 and shCTCF #2 compared to shRNA luciferase controls (shLuc) (Fig. 1A). Quantitative polymerase chain reaction (qPCR) revealed the mRNA levels in shCTCF #1 and shCTCF #2 had been reduced to 38% and 40% of that in shLuc, respectively (Fig. 1B). These results confirmed the efficient knockdown of CTCF expression in shCTCF#1 and shCTCF#2, with a significant reduction consistently at both RNA and protein level.
In order to investigate the changing landscape of cell-to-cell variation after CTCF knockdown, we successfully conducted single cell RNA-seq for shLuc#1, shLuc #2, shCTCF #1 and shCTCF#2 using 4 integrated fluidics circuits (IFCs) (Fig. 1C). We noticed the gene expression levels of pooled shLuc single cells were highly correlated with that of bulk data from our previous study [18] (r2=0.86; Additional file 1: Figure S1A). The gene expression of pooled single cells from shLuc #1 was also highly correlated with that of shLuc #2 (r2=0.87; Additional file 1: Figure S1B). In addition, the gene expression of pooled single cells repeats and bulk cell repeats in CTCF-KD cells were highly correlated (Additional file 1: Figure S1C-S1D).
Systematic differences between WT cells and CTCF-KD cells
A total of 95 cells, including 24 cells from shLuc #1, 24 cells from shCTCF #1, 23 cells from shLuc #2 and 24 cells from shCTCF #2, were kept for further analyses after quality control. We conducted principal component analysis (PCA) on 11,361 genes shared by those 95 cells (Fig. 2A). The coordination of cells from experiment1 and experiment2 on PCA projection is not significantly different (P=0.8; student’s t-test), indicating no obvious batch effect between experiment1 and experiment2. Further analysis showed that WT cells and CTCF-KD cells were distinguishable on PCA projection and concentrated in two different clusters on PC1 (Fig. 2B, 2C), implying a systematic difference of gene expression profiles between CTCF-KD and WT cells. We also noticed a correlation between CTCF expression level and its coordination on the PCA projection (using the first 10 PCs), among which CTCF expression level and PC2 exhibited the highest correlation (Additional file 2: Figure S2A) (r2=0.18, P = 0.22´10-6).
We further calculated the differential gene expression between WT and CTCF-KD cells using edgeR [26]. We identified 195 up-regulated and 107 down-regulated genes in CTCF-KD cells compared to WT cells (Additional file 2: Figure S2B). Heatmap of the most differentially expressed genes between WT cells and CTCF-KD cells exhibited a cellular heterogeneity within the same cell population (Fig. 2D). The most enriched gene categories in down-regulated genes include glycolytic processing, Prolyl 4-hydroxylase a subunit, iron-dependent dioxygenase and carbon metabolism (Additional file 2: Figure S2D). Whereas the most enriched gene categories in up-regulated genes include RNA binding, ribosome biogenesis, WD40 repeat domain and RNA processing (Additional file 2: Figure S2C), consistent to our recent study based on bulk data in some way [18].
CTCF knockdown changed the landscape of cell-to-cell variation
In order to distinguish true signals of cellular variation from technical noise, we calculated the expression noise of each gene (s2/m2) [27, 28]. The expression noises exhibited two distinct scaling properties: negative association with expression at low expression levels and no association at high expression levels (log2TPM>1) (Fig. 3A). We filtered out low expressed genes (log2TPM≤1) to reduce the impact of technological noise, resulting in 7,843 genes for further analysis (Fig. 3A). Coefficient of variation (CV) was calculated to measure cell-to-cell variation of each gene across the cell populations. The distribution of alterations of cell-to-cell variation pre-and post-CTCF knockdown followed a normal distribution (Additional file 3: Figure S3). We identified 602 cellular variation increased genes and 890 cellular variation decreased genes after CTCF knockdown by mean±SD (Fig. 3B).
GO analyses showed that variation-increased genes were significantly enriched in GO terms such as regulation of transcription, DNA binding, zinc finger proteins, covalent chromatin modification and transcription factor binding (Fig. 3C). In fact, almost all genes involved in DNA binding, zinc finger proteins and transcription factor binding are transcription factors. The significant enrichment of transcription factors in CTCF-KD-specific highly variable genes potentially indicates a high sensitivity of transcription factors to CTCF level. Dysregulation of a lot transcription factor potentially explains why knockdown of CTCF leads to a systematic change of gene expression. In contrast, variation-decreased genes were significantly enriched in housekeeping genes related GO terms such as rRNA processing, DNA repair, tRNA processing and RNA modification (Fig. 3D). The enrichment of housekeeping genes in WT-specific highly variable genes potentially indicates a higher cellular variation of cell activity in WT cells compared to CTCF-KD cells.
CTCF Knockdown simultaneously altered expression levels and cellular variations of its regulated genes
We identified 302 expression-changed genes and 1,490 cellular variation-changed genes pre-and-post CTCF knockdown. Next, we were interested to examine whether those cellular variation-changed genes were enriched in expression-changed genes. Venn diagram showed that 47 genes out of total 107 down-regulated genes exhibited increased cellular variation (Fig. 4), which were significantly over-represented (P=0.29´10-23, c2 test). The results indicate CTCF knockdown simultaneously reduced the expression level and increased the gene expression noise. Among those genes with decreased expression and increased cellular variation, EGR1 and JUNB played an important role in maintaining the cell type-specific gene regulation. For instance, EGR1 belongs to the EGR family of C2H2-type zinc-finger proteins, and encodes a nuclear protein that participates in transcriptional regulation.
Meanwhile, there were 96 genes out of the total 195 up-regulated genes exhibiting decreased cellular variation (Fig. 4), which were also significantly over-represented (P=0.48´10-23, c2 test). The 96 genes with decreased cellular variation and increased expression level were significantly enriched in poly(A) RNA binding, rRNA processing, WD40, purine nucleobase biosynthetic processing, rRNA methylation and RNA methyltransferase activity. It is obvious that those enriched GO terms were associated with basic cellular functions belonging to housekeeping genes. Taken together, our results clearly indicate that distortion of CTCF expression could simultaneously change the gene expression level and cell-to-cell variation of its regulated genes.
Furthermore, we identified CTCF binding sites using CTCF ChIP-seq data in WT EL4 cells from our previous study [18]. We identified each gene-associated CTCF by counting the CTCF binding sites within 20Kb of the transcriptional start site (TSS) for each gene. The numbers of gene-associated CTCF of variation-increased genes are significantly higher than that of variation-decreased genes (P= 0.0033; Wilcoxon test), and are significantly higher than that of variation-unchanged genes (P= 0.16´10-6; Wilcoxon test). The numbers of gene-associated CTCF of variation-decreased genes do not show significant differences from that of variation-unchanged genes (P= 0.5; Wilcoxon test). These results suggest that genes regulated by multiple CTCF binding sites tend to possess a higher cellular variation after CTCF knockdown.