To evaluate the performance of the proposed model four complementary analyses were carried out: 1) several unknown cell-type scenarios were simulated to check the ability of the model to properly cluster the new cell types, 2) the DNN were trained using a small database of single cells and annotate a bigger collection of unseen cells (the so-called retrieval analysis), 3) the unsupervised visualization capabilities of the representation learned by the model was demonstrated, and finally 4) the biological interpretability of the pathway-primed network was assessed. For the first and second steps, the validation schemes previously discussed in Materials and Methods were followed, which have allowed to compare this proposal in the same terms as in previous proposals [20], using the same data (a collection of mouse single cell datasets) and splits. Table 1 describes the architectures used here. See Supplementary Table 1, Additional File 1 for a complete list of the datasets, their public accession codes and how to combine them. However, before the performance of our model was benchmarked in unsupervised tasks, a more classical supervised performance experiment was carried out.
Table 1
Details of input and hidden layers, as well as parameters for each architecture used
Dataset | Architecture | Nodes Layer 1 | Nodes layer 2 | Effective parameters (million) |
| Dense | 100 | - | 0.95M |
| Dense + pathway | 100 + 92 | - | 0.95M |
| Dense + PPI | 100 + 348 | - | 0.96M |
Mouse | Dense + PPI and GRN | 100 + 696 | - | 1.01M |
| Dense + PPI and GRN | 100 + 696 | 100 | 1.08M |
| Pathway | 92 | - | 0.01M |
| Pathway | 92 | 100 | 0.02M |
| Dense | 100 | - | 1.80M |
Human | Pathway | 93 | - | 0.01M |
| Pathway | 93 | 100 | 0.02M |
Architecture, parameters and hyperparameter selection
Several activation functions (tanh, relu and sigmoid) and prepossessing steps (normalization, log-transform and [-1, 1]-scaling) for the DNN were tested. The normalization and tanh were the better options for the mouse experiment, which is congruent with previous findings [20]. Regarding the preprocessing steps, the combination of relu with logarithmic pre-processing worked better than other approaches in the human melanoma dataset. Detailed information about the number of parameters and the dimensions of each architecture is shown in Supplementary Table 2, Additional File 1 and Table 1, respectively.
Clustering and supervised performance
To test the performance of the learned representation of the proposed DNN, it was compared against similar dimensionality-reduction-based clustering approaches. The analysis consists of the following multi-step simulation: 1) combine a set of different mouse single cell experiments into a single dataset (the so-called learning set), 2) keep only those genes that appear in all the included experiments, 3) conduct a supervised analysis and 4) simulate the clustering of unknown cells.
Table 2 provides a comparison of the performance of different models for supervised tasks following a 100-times repeated stratified holdout cross-validation validation schema with a test size of 0.30. Under this scheme, the model learns how to predict the cell types (the output) from the cell’s gene expression profiles. The class distribution can be consulted in Supplementary Table 3, Additional File 1: a clearly unbalanced set. For each realization, the accuracy, the imbalanced-aware balanced accuracy, precision, recall and F1 scores were computed. Despite there were 16 classes to predict with several underrepresented cell-types, the results were very good (and equivalent) for all the models, with mean F1 scores above 0.8. The performance of the classification was similar the supervised performance reported [20]. Figure 1 shows the global metrics distribution for each design, whereas, Supplementary Fig. 1, Additional File 1 shows the per-class metric distributions, where it is clear that the lower sampled classes clearly cause a decline of the all-class scores.
Table 2
Average performance (F1, accuracy, precision, recall) of the different models in a supervised task scenario. Although our pathway-primed models are nearly ten times smaller (sparse), the performance is very close to the PPI-based NN. We report the mean for 100 iterations of train test splits.
| | | | F1 | PRECISION | RECALL |
Architecture | Number of nodes (2nd hidden layer) | Accuracy | Balanced accuracy | Macro | Micro | Weighted | Macro | Micro | Weighted | Macro | Micro | Weighted |
Dense | - | 0.825 | 0.788 | 0.748 | 0.825 | 0.802 | 0.769 | 0.825 | 0.844 | 0.788 | 0.825 | 0.825 |
Dense with pathways | - | 0.810 | 0.781 | 0.743 | 0.810 | 0.783 | 0.763 | 0.810 | 0.823 | 0.781 | 0.810 | 0.810 |
Dense with PPI | - | 0.802 | 0.770 | 0.730 | 0.802 | 0.774 | 0.753 | 0.802 | 0.817 | 0.770 | 0.802 | 0.802 |
Dense with PPI/GRN | - | 0.800 | 0.777 | 0.735 | 0.800 | 0.771 | 0.757 | 0.800 | 0.815 | 0.777 | 0.800 | 0.800 |
Signaling pathways | - | 0.813 | 0.781 | 0.743 | 0.813 | 0.790 | 0.764 | 0.813 | 0.834 | 0.781 | 0.813 | 0.813 |
100 | 0.766 | 0.724 | 0.673 | 0.766 | 0.728 | 0.690 | 0.766 | 0.762 | 0.724 | 0.766 | 0.766 |
However, the previous validation procedure does not properly address the key questions in scRNA-seq, such as how to identify unseen cell types or finding the most similar known cells to a new set of previously uncharacterized cells. To overcome the limited conclusions of a conventional supervised testing approach, the validation scheme described in Materials and Methods was followed, where each network is evaluated for randomly selected left out cells (LPGO) for P equal to 2, 4, 6, or 8 cell types of the 16 cell types (repeated 20 times for each P). Therefore, each selection of P leads to a clear division of train and test splits, where the knowledge-primed models can be safely trained under a supervised modality using the train split and then, an independent encoding can be computed (see Materials and Methods) for the test set. Thus, by definition, none of the P cell types of the test have been seen by the model before, which ensures a fair comparison between the models in terms of the clustering performance using the K-Means algorithm (see Materials and Methods). For each realization, the following scores (higher is better) were computed: homogeneity, completeness, V-measure, adjusted random index (ARI), adjusted mutual information (AMI) and Fowlkes-Mallows. In addition, each realization was summarized by means of the average of all the previous scores.
Table 3 provides a summary (the mean for each metric) of the results for the LPGO (P = 4) analysis. For a complete report (all P) refer to Supplementary Table 4, Additional file 1 and Fig. 2, which show the distribution of the different metrics across the different realizations. In all cases the pathway-based NN offers a comparable performance to the less interpretable PPI- and GRN-based models and the baseline dense NN .
Table 3
Unknown cell-type clustering performance of the different models analyzed for the LPGO experiments (P = 4). Although our pathway-based models are nearly ten times smaller (sparse), the performance is very close to the PPI-based NN. The mean of 20 splits was reported
Architecture | Number of nodes (2nd hidden layer) | Homogeneity | Completeness | Vmeasure | ARI | AMI | Fowlkes-Mallows | Average |
Dense | - | 0.801 | 0.799 | 0.798 | 0.725 | 0.786 | 0.814 | 0.787 |
Dense with pathways | - | 0.804 | 0.797 | 0.798 | 0.718 | 0.786 | 0.811 | 0.786 |
Dense with PPI | - | 0.811 | 0.804 | 0.805 | 0.728 | 0.794 | 0.817 | 0.793 |
Dense with PPI and GRN | - | 0.820 | 0.808 | 0.812 | 0.746 | 0.802 | 0.827 | 0.802 |
Signaling pathways | - | 0.797 | 0.788 | 0.790 | 0.716 | 0.778 | 0.809 | 0.780 |
Signaling pathways | 100 | 0.775 | 0.803 | 0.786 | 0.729 | 0.774 | 0.820 | 0.781 |
Retrieval analysis
Cell type assignment or annotation is one of the most important tasks in single cell expression analyses since each study usually profiles several different types of cells [26]. However, cell-type retrieval is also a major challenge due mainly to marker absence, overlap and noise-based problems when computing gene expression. Thus, the cells need to be clustered and compared with reference pre-annotated databases. Although groups of cells can be successfully grouped using unsupervised methods, it might not be easy to find patterns while clustering, leading to cell groups that are difficult to interpret from a biological point of view. In this work, an unsupervised biologically meaningful encoding was extracted of new data by leveraging the representation and comprehension capabilities of supervised neural networks with the interpretability of pathway-based gene clusters, which is in line with the recent literature where it is observed that supervised cell type assignment/annotation of newly-generated data using annotated labels has become more desirable than unsupervised approaches [22, 27].
In this work, two datasets that simulate the problems previously outlined (see Materials and Methods) were constructed: for the mouse dataset, the so-called learning set is used for training and optimizing the different learning methods (i.e. the pre-annotated database), whereas the retrieval set is comprised of new cells extracted from different experiments. For each cell in the latter set its encoded representation is computed using a model fitted with the former set and the most suitable cell type is retrieved by using the top k = 100 matches. Table 4 shows the performance in terms of the Mean Average Precision (MAP) for each cell type and method. In contrast with the previous experiments, the knowledge-based networks need the addition of dense nodes in order to achieve a performance comparable to PCA. However, the biological priors are still relevant since the average performance of knowledge-based networks adding dense nodes is higher than the baseline dense network. Note that the validation scheme provides a fair comparison between supervised and unsupervised models since the cells of the retrieval set are encoded using pre-fitted models that could never use the ground truth cell types. Also, Supplementary Fig. 2, Additional File 1 shows a visualization of the encoding representation of the retrieval set.
Table 4
Average retrieval performance across the different cell type
Architecture | Number of nodes (2nd layer) | HSC | 4cell | ICM | Spleen | 8cell | Neuron | Zygote | 2cell | ESC | Mean |
PCA 100 (with full gene space) | - | 0.181 | 0.669 | 0.026 | 0.975 | 0.176 | 0.627 | 0.462 | 0.675 | 0.106 | 0.433 |
PCA 100 (with signaling gene space) | - | 0.179 | 0.561 | 0.128 | 0.989 | 0.191 | 0.624 | 0.455 | 0.676 | 0.205 | 0.445 |
Dense | - | 0.243 | 0.643 | 0.000 | 0.734 | 0.147 | 0.404 | 0.569 | 0.514 | 0.148 | 0.378 |
Dense with signaling pathways | - | 0.259 | 0.648 | 0.000 | 0.849 | 0.236 | 0.486 | 0.509 | 0.656 | 0.130 | 0.419 |
Dense with PPI | - | 0.196 | 0.638 | 0.041 | 0.927 | 0.212 | 0.550 | 0.575 | 0.686 | 0.179 | 0.445 |
Dense with PPI/GRN | - | 0.194 | 0.645 | 0.007 | 0.930 | 0.294 | 0.542 | 0.600 | 0.711 | 0.190 | 0.457 |
Dense with PPI/GRN | 100 | 0.068 | 0.771 | 0.182 | 0.956 | 0.849 | 0.561 | 0.415 | 0.553 | 0.710 | 0.563 |
Signaling pathways (+) | - | 0.163 | 0.438 | 0.011 | 0.619 | 0.179 | 0.475 | 0.344 | 0.465 | 0.128 | 0.314 |
Signaling pathways (+) | 100 | 0.149 | 0.307 | 0.050 | 0.402 | 0.198 | 0.352 | 0.592 | 0.368 | 0.137 | 0.284 |
Signaling pathways (parameter tuning) (+) | - | 0.107 | 0.768 | 0.049 | 0.960 | 0.625 | 0.549 | 0.471 | 0.627 | 0.110 | 0.474 |
Signaling pathways (parameter tuning) (+) | Size* | 0.155 | 0.803 | 0.117 | 0.955 | 0.568 | 0.550 | 0.497 | 0.623 | 0.150 | 0.491 |
* The size of second layer (size) is defining after tuning in hyperparameter tuning networks |
Furthermore, the optimized versions of the pathway-based models that do not contain dense nodes (see Materials and Methods) are on par with PCA and DNN designs that use dense nodes. A remarkable achievement since the non-dense architectures are sparser (~ 50 times) than the networks that depend on the addition of dense nodes for the first hidden layer. In addition, there are slight increases in the average performance when using deeper optimized models: from 0.474 to 0.491 in the pathway-based network (see Table 4). It is important to note that best scoring networks use a biological layer free of dense nodes, which makes them easier to interpret. Note that this kind of network designs are marked with a “+” sign in the tables.
Encoding visualization and functional analysis in melanoma dataset
One of the key advantages of the method presented is the ability to use the model for clustering analysis while retaining a sense of the underlying biological meaning of the learned weights. The biological interpretation can be inferred from the pathway activation scores of the first hidden layer, whereas the clustering is performed by computing the activation values of the last hidden layer, which can be used for data visualization by coupling it with a 2D reduction method (t-SNE in this case).
To check the visualization and biological intelligibility capabilities of the proposed pathway-based DNN, a recent human melanoma dataset [28] comprising 33 human melanoma tumors from 31 patients, with more than 17.000 genes and 2.761 single cell expressions, in which 5 different cell types were profiled, was analyzed. To fairly evaluate the proposed model, it was tested using the data splits (labelled as training and testing) defined in the original publication [28]. The model rendered an excellent performance (see Table 5), with F1-scores above 0.9, comparable to previously reported metrics [20]. In addition, the clusters found by the model in the testing set can be visualized in Fig. 3 for either the one- or two-layer designs. In all cases, the training set was used for fitting any given model, whereas the testing set has been used for checking the performance (supervised, visualization and interpretability). The cell-type distribution of both splits can be consulted in Supplementary Table 5, Additional File 1.
Table 5
Proposed network performance with log normalization
| | | | F1 | PRECISION | RECALL |
Architecture | Number of nodes 2nd layer | Accuracy | Balanced accuracy | Macro | Micro | Weighted | Macro | Micro | Weighted | Macro | Micro | Weighted |
Dense | - | 0.938 | 0.839 | 0.859 | 0.938 | 0.934 | 0.923 | 0.938 | 0.938 | 0.839 | 0.938 | 0.938 |
Pathways | - | 0.936 | 0.844 | 0.861 | 0.936 | 0.933 | 0.922 | 0.936 | 0.938 | 0.844 | 0.936 | 0.936 |
Pathways | 100 | 0.930 | 0.834 | 0.847 | 0.930 | 0.926 | 0.901 | 0.930 | 0.932 | 0.834 | 0.930 | 0.930 |
A functional analysis using the 10 highest-weighting groups (hidden nodes) for each cell type was performed. Since the nodes represent pathways, the functions of the pathways represented by those nodes were assessed. Supplementary Table 6, Additional File 1 summarizes the top pathways related to each cell type. The learned NN associates the 5 cell types with some very relevant pathways for the specific functions that are performed by these cell types.
For example, B cells mediate the production of antigen-specific immunoglobulin (Ig) against pathogens. The most important pathways for B cell functioning would be those engaged in cell-to-cell communication, proliferation, protein expression, and secretion. The pathways that appear among the 10 most relevant only for B-cells are: Aldosterone-regulated sodium reabsorption (hsa04960), Pancreatic secretion (hsa04972), Complement and coagulation cascades (hsa04610) and Ovarian steroidogenesis (hsa04913). Complement and coagulation cascades (hsa04610) consist of a nonspecific defense mechanism against pathogens. Although the complement system works in the innate immunity, complement effectors are engaged with humoral immunity at multiple stages of B-cell differentiation and can influence B-cell biology on several levels through the complement receptors that they express [29]. The rest are pathways mainly involved in the secretion of substances and ion flow through the cellular membranes. It is expected in B cells that modules engaged in secretion processes are active, given that they expand their secretory organelles in their differentiation process [30].
Macrophages are innate immunity specialised cells that detect, phagocyte and destroy pathogens and debris. They can interact with the adaptive immune system by presenting antigens to T cells, and are secretory active, since they release cytokines and other substances that modulate the activities of other cell types. Among the most relevant pathways for macrophages alone, we can find PI3K-Akt signaling pathway (hsa04151) and Osteoclast differentiation (hsa04380). In the first one converge inflammatory and metabolic signals that are very relevant for the regulation of macrophage responses modulating their activation phenotype [31, 32]. The second one is relevant because both osteoclasts and macrophages derive from the monocyte-macrophage lineage, and monocytes use receptor/ligand systems that share signalling mechanisms with those used by immune cells [33].
Furthermore, the GO analysis of the active genes in the top 10 pathways in NK cell types reported the following enriched GO Terms related to immunity response: GO:0002228 (natural killer cell-mediated immunity), GO:0002443 (leukocyte-mediated immunity) and GO:0002449 (lymphocyte-mediated immunity).
The GO analysis of the active genes in the top 10 pathways in TCD8 and TCD4 cell types reported also enriched GO Terms related to T-cell proliferation and cytokine production: GO:0042098 (T cell proliferation), GO:0050852 T cell receptor signalling pathway) and GO:0042110 (T cell activation).
There are also several pathways that appear among the 10 highest-weighting nodes for 4/5 of the cell types. In particular, it is worth mentioning Serotonergic synapse (hsa04726). Although we are not working with nervous system cells, immunological and nervous systems share molecular and functional properties, such as cytokine networks and cell surface receptors [34, 35]. Synapses are interfaces between cells that transfer information from one cell to another, and in the immune system we can find somewhat similar processes: immunological synapses and kinapses [36].
Another pathway that is relevant throughout cell types is Taste transduction (hsa04742). The role of taste receptors in the immune response has been widely discussed and their expression has been reported in peripheral lymphocytes [37], NK cells [38], and macrophages [39].
The aim was to shed some light on how the gene expression of the genes contained in these pathways contribute to the algorithm being able to discern these cell types, but, overall, among the top relevant pathways, many are related to the molecular mechanisms expected most active in immune cells, or contain mechanisms common to the ones expected in their physiological functioning.