For the last few decades, although advanced techniques, such as flow cytometry, can be used to identify CNS myeloid cell-subtypes, it is still difficult to be very accurate due to the lack of absolutely specific markers and the instability of marker expression under different pathophysiological conditions [16]. Although, scRNA-Seq is a promising new technology to solve this problem (Cembrowski, 2019), for ordinary researchers, various programming language analysis packages for scRNA-Seq data are really not an easy task, and for bioinformatics experts, they do not necessarily know the specific markers for CNS myeloid cell-subtype identifies. Therefore, building a bridge to connect the knowledge gap between ordinary researchers and bioinformatics experts is the key to solve this problem.
In this report, a simple excel template was designed, in which a panel of gene makers corresponding to the myeloid cells, lymphocytes, common CNS cells, and proliferative cells were included. For users, as long as the gene expression data of cell clusters are obtained, the clusters can be named directly using this excel template. It should be emphasized that this template is mainly suitable for determining the major categories of myeloid cells. If researchers need to further distinguish the subtypes of certain cells, it is necessary to add corresponding gene markers. Therefore, this Excel template is open, and researchers can modify or add new genes based on their need. In addition, in the selection of gene markers, we consider not only their relative specificity, but also the crossover and commonality of different cells. Therefore, in the Excel template, we defined the positive gene marker as “P”, negative as “N”, and if the marker could be positive or negative, we defined it “P/N” (Fig. 1 and Table S1). For example, Ptprc (the gene of CD45) was the common marker of myeloid cells and lymphocytes [34–36]. Therefore, we used it as a common marker of myeloid cells and lymphocytes to distinguish CNS non-myeloid cells (such as astrocytes, oligodendrocytes, neurons, etc.). In addition, in theory, the protein molecule CD45 expressed by Ptprc gene is positive in many leukocytes, but in the process of collecting gene markers and drawing the Excel template, we found that Ptprc gene is not expressed in every cell cluster, so we defined it as P/N. In addition to Ptprc, there are many similar examples. We will not list them one by one. Please see Fig. 1 and Table S1 for details. For a certain cell, although there are some relatively specific gene markers, we do not use a single or a small number of markers to identify it. We use a panel of gene markers to comprehensively evaluate it and then define it. This can effectively distinguish the cell-types with similar or cross gene expression and ensure the accuracy of cell cluster identification. In this Excel template, there are 73 gene markers (excluding non-myeloid CNS cells) in each panel can be used to distinguish myeloid cell-subtypes and lymphocytes (Fig. 1 and Table S1). For example, MNC could express Ptprc (P/N), Cd14 (P/N), Itgam (P/N), Itgax (P/N), Csf3r (P/N), Adgre1(P/N), Ly6c1 (P/N), S100a4 (P/N), Cd68 (P), Ly86 (P/N), Ctsb (P/N), Ccr2 (P/N), Ly6c2 (P), Plac8 (P), Pf4 (P/N), Lyz1 (P), Hmox1 (P/N), F13a1(P), Lyst (P/N), Prtn3 (P/N), Elane (P/N), and Pilra (P/N). Although, several molecules (Cd68, Ly6c2, Plac8 and Lyz1) are positive (P) in MNC, they are also expressed in other cells. Therefore, there is no absolute specific marker of MNC in this template. Nevertheless, we can still determine its cell type using comparative analysis. The typical examples can be found in table S4 (C8 and 11). For those cell-types with their own specific gene markers, it is easy to identify cell clusters using comparative analysis. Typical examples are Ms4a7, Lyve1, Cbr2, Mrc1 and CD163 for MAC; Hexb, Olfml3, Sparc, Tgfbr1, P2ry12 and Tmem119 for MG; Ltf, Ly6g, Mmp8, Camp, Ngp, Fcnb, Cebpe, Retnlg, S100a8, S100a9, Lcn2, G0s2, Wfdc21 for NEUT. Of course, due to the limitations of knowledge background and research level, this Excel template still has some defects. For example, for DC, the expressions of H2-Ab1, H2-Eb1, H2-Aa, Cd74 and Cd209a should be positive, but these markers can also be expressed in MAC and B cells, especially B cells do not belong to myeloid cells, which is easy to cause misjudgment. Therefore, in this template, we also added B cell markers to facilitate distinguish B cells from DC.
In order to verify the accuracy of this Excel template, the 83 cell clusters from several recently reported single-cell data were used (Table 1). The results showed that comparing with literatures, the overall consistency rate was 93.98%. The Bowker’s test showed that there was no statistically significant difference between the two groups (P >0.05). Kappa symmetric measures showed that the Kappa value = 0.642 (P < 0.01). These indicate that our method is general consistency with the literatures. Next, we will analyze the possible causes of inconsistency.
Comparing with the report of Ximerakis, et al. [31], only one cluster is inconsistent (Table 3). Our results showed that there were a few NEUT and DC mixed with their MNC. The possible reason is that they take Plac8 as a specific marker of MNC. In fact, Plac8 is also expressed in NEUT and DC [10]. Comparing with the cell-type identifies in adult brain of Han, et al. [10], the cluster 4 is inconsistent (Table 4). The reason may be that the reported cluster 4 was mixed with a few MG, because we can find the typical microglia markers (Hexb, Olfml3, Sparc, Tgfbr1, P2ry12 and Tmem119) in Table S3. Comparing with the report of Sankowski, et al.[32], the clusters 6 and 9 are inconsistent (Table 5). Both clusters were identified as CAMs, however, the expression of typical genes of MACs (Mrc1, Cd163, Lyve1, Pf4, Ms4a7, Stab1, and Cbr2) were not elevated in both clusters. In contrast, MG specific markers (Hexb, Olfml3, and Sparc) were significantly elevated in cluster 6, while the other genes in cluster 9 were not within the scope of our evaluation. Comparing with the cell-type identifies in peripheral blood and bone marrow of Han, et al. [10], excepting cluster 18 of peripheral blood was mixed with a few NEUT, the others were completely consistent. These indicate that our Excel template is also very effective for the analysis of non-CNS myeloid cells.
From the above analysis, we can deduce that the appropriate gene markers and ideal scRNA-Seq data clustering are key factors for the accuracy of cell definition. We can understand the importance of cell clustering through the following example. When we analyzed another data (Table S2 of Mimouna et al.) [33], both the reported and our results were not ideal. Analyzing the reasons, we find that their data clustering methods are different from the other literatures mentioned above. The cell clustering method in this literature is Louvain graph-based community clustering, which may be the reason why clustering is not ideal. Although, our Excel template still can be used to identify the cell-types based on the author’s data, the cell-types in each of the nine clusters were mixed (Table 6). Therefore, the data used in this Excel template should be processed through the standard scRNA-Seq analysis process, including quality control, standardization, data correction, feature selection and data dimensionality reduction, finally the cells were divided into different clusters according to the similarity of gene expression.