Differential Network Analysis Reveals Regulatory Patterns in Neural Stem Cell Fate Decision

Deciphering regulatory patterns of neural stem cell (NSC) differentiation with multiple stages is essential to understand NSC differentiation mechanisms. Recent single-cell transcriptome datasets became available at individual differentiation. However, a systematic and integrative analysis of multiple datasets at multiple temporal stages of NSC differentiation is lacking. In this study, we propose a new method integrating prior information to construct three gene regulatory networks at pair-wise stages of transcriptome and apply this method to investigate five NSC differentiation paths on four different single-cell transcriptome datasets. By constructing gene regulatory networks for each path, we delineate their regulatory patterns via differential topology and network diffusion analyses. We find 12 common differentially expressed genes among the five NSC differentiation paths, with one common regulatory pattern (Gsk3b_App_Cdk5) shared by all paths. The identified regulatory pattern, partly supported by previous experimental evidence, is essential to all differentiation paths, but it plays a different role in each path when regulating other genes. Together, our integrative analysis provides both common and specific regulatory mechanisms for each of the five NSC differentiation paths.


Introduction
Neural stem cell (NSC) differentiation is a complex biological process with many unresolved mysteries [1,2]. Identifying the key regulators on NSC fate decision have the potential of treating neurological diseases [3]. For the past ten years, researchers have developed multiple biological methods [4,5] and computational methods [6][7][8] to explore the key regulators in the complex differentiation systems. However, differentiation is thought to require one or more discrete transitions from one intermediate state to another, each of which is determined by a set of genes that interact in a complex network instead of a single perturbed gene [9]. Although these approaches have identified the key participants for lineage programming of each differentiation stage, the regulatory patterns (the set of genes that drive differentiation) are not yet fully understood and determined.
The bulk RNA-seq technology [10] was developed to describe the average level of gene expression in cell populations. Still, it is challenging to analyze heterogeneous systems such as brain development and cell differentiation [11]. Recently, the gradual rise of single-cell RNA-seq technology, which measures gene expression distribution at the single-cell level, allows for the study of new biological problems [12]. Based on single-cell RNA-seq profiles, researchers have explored the matters of NSC differentiation more comprehensively. For instance, multiple clusters of covarying genes enriched in differentiation were identified by pseudotime trajectory analysis based on unsupervised clustering [8]. Thirty-four key regulatory genes related to neural differentiation were predicted by a four-way stochastic gradient-boosting classification model [13]. New neuronal cell subtypes were identified based on hierarchical clustering and principal components analysis [14].
The researches mentioned above at a single-cell level have furthered our understanding of NSC differentiation. However, they focused on a single path or dataset of neural stem cell differentiation, and there is no regulatory pattern that drives multi-path differentiation has ever been found. Recently, it is surveyed that the same cell type family may share common regulatory programming driving differentiation, while the expression of some effector genes is lost or gained anew to affect cellular phenotypes [15]. Likewise, the state-of-the-art study of pan-cancer suggests that there may exist common and specific patterns in different cancers, which means that different cancers may be cured by the same pan-cancer genes [16]. However, there is still no clear understanding of such mechanisms in the NSC differentiation system, so that identifying such regulatory patterns will open up a new perspective of NSC differentiation.
The traditional biological analysis (e.g., differential expression analysis) is hard to capture regulatory patterns at a system level [17]. Unlike differential expression analysis, network analysis, especially differential network analysis, provides critical novel biological insights by identifying essential genes or modules implicated in complex life processes from both system biology and bioinformatics perspectives [18,19]. Many researchers have contributed a lot of new methods in the field of network construction. For example, BC3NET is an ensemble network construction method based on a Bayesian model with noninformative priors [20]; GRNBoost2 is an efficient algorithm for regulatory network inference using gradient boosting [21]; The nlnet algorithm combines local false discovery rate reasoning to recover hidden nonlinear modules in networks [22]. All the above methods try to construct biological networks by single omics data from different views. However, they ignore the uncertainty and bias that remains in single omics [23], causing the structure of constructed networks inconsistent from biological networks.
In this study, we propose a new regulatory network construction method based on multi-omics integration (RNCMI) at pair-wise cell stages. RNCMI serves the PPI network as a background network, adopting the edge deletion strategy to tailor cell-type-specific regulatory networks. We also apply RNCMI with other computational and biological methods into investigating five NSC differentiation paths on four single-cell RNA-seq datasets to explore regulatory patterns (RP) in neural stem cell fate decisions. First of all, differential expression analysis and biological function enrichment analysis were performed to identify the differentially expressed genes (DEGs). Second, cell-type-specific networks and differential networks integrated with single-cell transcriptomics and proteomics datasets were constructed based on the common identified DEGs of each cell differentiation path. And then, all differential networks were topologically overlapped to explore the regulatory pattern. Next, network analyses, including differential network analyses and network diffusion analyses, were performed on the constructed networks to assess the importance of the RP. Finally, literature and pathway enrichment analysis validated the RP.

Data Preparing and Preprocessing
NSC lineage differentiation tree contains various differentiation paths (as is shown in Fig. 1). In the study, five paths of neural stem cell differentiation were obtained, excluding the path from NPC to OPC. This is because two biologically comparable datasets are considered when they meet the requirement that the same experimental conditions produce them. However, the single-cell transcriptomics datasets of neural stem cells are still not abundant, and we have not found an available single-cell transcriptomics dataset of the path from NPC to OPC yet. For convenience, each path's name is given by its trajectory from one cell type to another, described as follows: NSC_NPC, NPC_ AST, NPC_NEU, OPC_OLI, OPC_MO. Then gene filtering was performed for each path, one gene was selected if it was expressed by five cells over ten counts in each path, and two cells over four counts in each type of cells, the parameters used here is similar to the previous study [13].

Differential Expression Analysis
For each path, we performed differential expression analysis based on statistical methods. The DEGs in each path were detected by the R package 'edgeR' [27], with p-value < 0.05 and | | log 2 fold change (FC) | | > 0.5.

Network Construction by Integrating Single-cell Transcriptome and Proteome
Network analysis requires a robust network skeleton. The choice of the network skeleton is important, which would allow the network-based approaches to achieve higher precision [28]. Therefore, network construction here integrated single-cell transcriptome and proteome, based on both the correlation of each pair of gene expression and confidence score in a background network. Figure 2 showed the workflow of network construction. Each path will produce three networks, containing two-state networks and one differential network. Firstly, for the path from one cell state to another, the identified DEGs are used to build a background network by STRING database (http://strin g-db. org). For example, suppose p DEGs are identified by differential expression analysis, then a background network with p nodes is described by an adjacency matrix A p×p , such that A i,j > 0, i, j = 1, … , p , if protein i and protein j are functionally associated in STRING database. Next, the absolute value of the spearman correlation coefficient (SCC) of expression is used to estimate the strength of connections between adjacent genes and edges A i,j for which SCC(i, j) < 0.1 are discarded (see Additional File 1 for more detail about the parameter selection). Following this procedure, we construct a pair of state networks based on the correlation of gene expression and the topological structure of the protein-protein interaction network, described as A 1 l×l , A 2 r×r , in which l, r ≤ p. Finally, the differential network was produced by the differential SCCs across two state networks; We removed edges with absolute differential SCCs lower than 0.1.

Comparison Strategy for Network Construction Methods
RNCMI is compared with three state-of-the-art methods (BC3NET [20], GRNBoost2 [21] and nlnet [22]) on three single-cell transcriptomics datasets (GSE65525 [29], GSE81682 [30] and PRJNA324289 [13]) by the assessment approach used in [31]. The assessment approach computes and ranks each node in the constructed network by degree centrality (local) and betweenness centrality (global), separately, and adopts the overlaps of top-k nodes of different methods with the golden gene list (GO0045595: regulation of cell differentiation in the Gene Ontology Resource [32]) as assessment criteria. In our experiments, k ranges from 1 to 200.

Network Diffusion Analysis for Each Differential Network
Differential topology analysis fails to capture some critical genes with subtle topological differences in networks, such as bridge genes. To investigate the direct and indirect influence of such genes on the network, we employ the MC method [33], one of the diffusion network analysis methods based on perturbation, which enables us to quantify the potency of label propagation(gene signaling). The MC method uses a network topology structure and a list of initial node scores as input. After a perturbation test, a list of node influence scores is calculated. In this study, the topology of the differential network and the average scores of differential topology analysis are used to perform diffusion network analysis. This method is implemented by the R package 'dif-fuStats' [34], with all default parameters.

Differential Topology Analysis between Two State Networks
Differential topology analysis focuses on the topological differences between two state networks. The differential network is an object created from the differential SCCs across a pair of state networks (two biologically comparable conditions), which needs to be further calculated to reflect on the importance of nodes by some scoring methods (e.g., the network diffusion method used in our study). The differential topology is a kind of approach to quantify the differences of topology on the same nodes between a pair of Fig. 2 The workflow of network construction. Two state networks are constructed by the correlation of gene expression and the topology of the protein-protein interaction network. A differential network is constructed by the differential correlation of the two state networks state networks, and it can be used to score the importance of nodes. The greater the differences between topologies, the more important the nodes. However, different differential topology analysis methods rely on different network topology structures, which may not comprehensively balance the importance of genes between different biological states. In this study, five differential topology analysis methods were employed for each path, including differential degree centrality (DDC) [35], differential eigenvector centrality (DEC) [19], differential PageRank centrality (DPC) [36], differential betweenness centrality (DBC) and DiffRank [37], among which there is one local measurement method, three global measurement methods, and one hybrid (local and global) measurement method. DDC takes the normalized differences between a node's degree in two networks. DEC/DPC/DBC takes the different global centrality measurement methods to find changes between two networks. DiffRank linearly combines both differential connectivity and DBC into a single score to order the importance of genes. All methods above are implemented by the R package 'igraph'. To prioritize genes at the same level, the result of each differential topology analysis is standardized, ranging from 0 to 1, where 1 characterizes the most significant difference and 0 the least significant difference. Then, the average score of the results of all standardized differential topology analysis is regarded as the overall important score of genes in each path.

Enrichment Analysis
Enrichment analysis is performed by the R package 'Clus-terProfiler' [38] to evaluate the functional relevance and biological processes of selected genes. It contains gene ontology (GO) enrichment analysis and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis. The significant enrichment terms were obtained by Benjamini-Hochberg with adjusted p-value (p.adjust) < 0.05.

Comparison of RNCMI and Other State-of-the-art Methods
We compared RNCMI with three state-of-the-art methods (BC3NET [20], GRNBoost2 [21] and nlnet [22]) on three single-cell transcriptomics datasets (GSE65525 [29], GSE81682 [30] and PRJNA324289 [13]) by the assessment approach used in [31]. It can be found from the results that RNCMI outperforms the three state-of-the-art methods in overlapping more known driver genes from both local (Fig. 3a) and global (Fig. 3b) measurements. This is because RNCMI serves the PPI network as a background network. On the one hand, the PPI network is characterized as small-world and scale-free, which is a typical kind of biological network that obeys the rules proposed by Barabasi [39], so that the networks constructed by background networks are consistent with biological networks. On the other hand, adopting the edge deletion strategy to tailor cell-typespecific regulatory networks, RNCMI integrates single-cell transcriptomics data and proteomics data. Therefore, such integrated networks built by RNCMI are able to decrease the uncertainty and bias of constructing networks and reflect more information on the similar structure of PPI network and the specificity of transcriptomics data.

Differential Expression Analysis Reveals the Common Differentially Expressed Genes among Five NSC Differentiation Paths
Differential expression analysis was performed on the five preprocessed NSC differentiation paths (NSC_NPC, NPC_ AST, NPC_NEU, OPC_OLI, OPC_MO). It yielded a set of DEGs with different numbers (ranging from 566 to 2582) for each path. Interestingly, 12 common DEGs co-occurred in the five NSC differentiation paths (Fig. 4a), among which some genes presented over-expressed (represented by the red bar) and down-expressed (represented by the blue bar) across different NSC differentiation paths, while some genes were consistently down-expressed across different NSC differentiation paths. Commonly, the alterations of gene expression reflect on the functional changes in biological processes. Therefore, as shown in Fig. 4a, it implies that the specific genes play different roles in different cell differentiation paths.
To evaluate the biological processes related to these common DEGs, we performed the GO enrichment analysis. The GO enrichment analysis (Fig. 4b) showed that these 12 common DEGs were significantly enriched in the differentiation and development of the neural system. For example, Cdk5, App, Gsk3b, Ednrb and Ptprz1 were enriched in negative regulation of neuron differentiation (adjusted p-value = 7e−5); Cdk5, Dclk1, Gsk3b, Ptprz1 were enriched in neuron migration (adjusted p-value = 1.64e−4); Cdk5, App, Dclk1, Gsk3b, Ptprz1 were enriched in axon development (adjusted p-value = 2.04e−4). These GO terms serve as further evidence to support that the identified common DEGs play important roles in NSC differentiation and development.
Although differential expression analysis helped us narrow the candidate gene sets, it failed to assess the importance of these candidate genes and the association among them. In contrast to conventional differential expression analysis, network analysis, especially differential network analysis, allows us to study these complex life processes quantitatively and qualitatively from bioinformatics and system biology perspectives [19]. We conducted a comprehensive analysis of the different networks in the following sections to explore which genes, pathways, or patterns determine the cellular fate of NSC differentiation.

Overlapping Differential Networks of Five Paths Reveals the Candidate Regulatory Pattern
To assess these common DEGs compared with all DEGs in a systematic perspective, three networks (two-state networks and one differential network) were separately built for each path, which integrated single-cell transcriptome and proteome data (see "Methods" for detail). A state network characterizes the association of each pair of genes in a specific cell type, while a differential network characterizes the changed association of each pair of genes from one cell type to another. After overlapping each path's differential networks, two topologically overlapped subnetworks were identified: subnetwork 1 (Slc1a3, Slc38a2) and subnetwork 2 (Gsk3b, App, Cdk5). After consulting the related study of each gene in the two subnetworks, we found that Slc1a3 [40] and Slc38a2 [41] may be a conserved regulatory pattern, while Gsk3b signaling has a critical role in the regulation of neurogenesis, neurodevelopment, and in neuroplasticity [42]. Therefore, the Gsk3b_App_Cdk5 maybe play as a candidate RP, and the associated genes of Gsk3b in differential networks may play a similar role. The weight of edges in a differential network measures each pair of genes' changed correlation and indicates the alterations of regulation between genes. It can be observed that the differences of edges correlation in the RP present an inconsistency in each path (Fig. 5). In path NSC_NPC, NPC_NEU, and OPC_OLI, the correlations among Cdk5, Gsk3b, and App were weakened, while in path NPC_AST and OPC_MO, the correlations among Cdk5, Gsk3b, and App were strengthened, which reflects the specificity of the RP.
According to the recently published literatures relating to the RP Gsk3b_App_Cdk5, two genes in the RP were found to be associated with NSC differentiation to some degree. Gsk3b inhibits the degradation of b-catenin and thereby potentiates its downstream signaling, significantly enhancing neuronal differentiation [43]. Fine-tuning of Synapsin III expression and phosphorylation by Cdk5 activation through Sema3A activity is essential for proper neuronal migration and orientation [44]. However, App has long been considered a key driver of neurodegenerative disease [45] and has never been implicated in neuronal differentiation. As such, it is not clear whether the RP is involved in the regulation of all differentiation stages of the NSC lineage.

Network Diffusion Analysis Reveals an Indirect Role of the Regulatory Pattern
To explore those essential genes that indirectly influence topological differences (gene signaling propagation), the MC method based on perturbation test was performed on the differential network of each path [33], which is capable of quantifying the indirect influence of perturbating one gene on network propagation and network stability. The median score of the network diffusion in each path is around 0.75 (the white dots of Fig. 6a), while most of the 12 common DEGs are greater than the median score (Fig. 6b). The result of network diffusion suggests that the scores of these three genes (Gsk3b, App, Cdk5) in the RP are consistently high. And the result of gene hierarchical clustering based on their scores on network diffusion (Fig. 6b) shows that these three genes play an interactive role in complex life processes.

Differential Network Analysis of Common DEGs shows Inconsistent Topology in Each Path
Five kinds of differential network measurement analysis methods (DDC, DEC, DPC, DBC, and DiffRank) were performed on each path to assess the differences of network topology comprehensively, and the average score of the five methods is regarded as the overall network topological differences (see "Methods" for details). Figure 7 showed the topological differences of 12 common DEGs in five paths. There were inconsistencies in the network topology of these 12 DEGs during different differentiation paths. Compared to the analysis of network diffusion, most of the 12 DEGs show no significant network topology differences (the score smaller than 0.2), which implies that they play indirect regulatory roles in the five differentiation paths. However, it should be noted that more than one of the top three genes with the most significant topology difference in each path were involved in the RP Gsk3b_App_Cdk5. Therefore, the inconsistent scores of topological differences and consistently high scores of network diffusion reveal that these three genes in the RP may alternate their direct or indirect regulatory roles in different differentiation paths to promote cell differentiation towards different trajectory development.

KEGG Enrichment Analysis Reveals the Common and Specific Pathways Implicated in the Regulatory Pattern
To explore the pathways of the RP Gsk3b_App_Cdk5 and their interactive genes, we performed the KEGG enrichment analysis on the pattern and their first-order neighbors in each differential network. The top ten enriched pathways in each path are shown in (Fig. 8a). Different paths are enriched in pathways related to distinct chemical signaling, suggesting the heterogeneity of NSC lineage differentiation. After intersecting all enriched pathways in each path, four common pathways are identified (Fig. 8b), including axon guidance, Alzheimer's disease, prostate cancer, and endocrine resistance. Notably, one of the common pathways, axon guidance, refers to the process by which growing neural axons follow specific, predictable paths to reach their target locations, which may play an important role in cell differentiation and development [46]. In this manner, specific regulators are able to establish celltype identity through shaping molecular programs for axon guidance [47], while aberrant axon guidance regularly results in most neurodegenerative diseases, such as Alzheimer's disease [48]. The above analysis confirms that the identified pattern plays an important role in NSC lineage differentiation and development.

Fig. 5
The network topology of the candidate RP among five NSC differentiation paths. The subnetworks are extracted by selecting the first-order neighbors of Gsk3b from the differential network in differ-ent paths. The color of the edge means the correlation difference; red represents correlation gain, while blue represents correlation loss

Discussion and Conclusion
Investigating regulatory patterns across multiple NSC differentiation paths is essential for understanding the mechanisms of the different cell-fate decisions in stem cell differentiation. In this work, we proposed the new regulatory network construction method based on multiomics integration (RNCMI) at pair-wise cell stages. In comparing RNCMI and other state-of-the-art methods, RNCMI shows its advantages in decreasing the uncertainty and bias of constructed networks. We also applied RNCMI with other computational and biological methods into investigating five NSC differentiation paths on four single-cell RNA-seq datasets to explore regulatory patterns in neural stem cell fate decisions. We found that the new RP Gsk3b_App_Cdk5, partly supported by previous experimental evidence, is essential to all of the five differentiation paths through our systematic analyses. We further quantitatively measured the new RP contributing to network differences in each path, including one network diffusion analysis and five differential topology analysis methods. In the diffusion analysis, most of the 12 common DEGs presented a consistently high diffusion score among five NSC differentiation, suggesting that they play an indirect role in NSC differentiation. For the five differential topology analyses, more than one gene in the RP shows the highest topological differences, suggesting that they play a direct role in NSC differentiation. All of the above analyses revealed that the identified RP simultaneously influence and contribute to NSC differentiation, while different differentiation paths present specific pattern, that is, how the same genes play different roles (indirect or direct) in different differentiation paths.
In summary, we explored the common and specific patterns through different neuronal cell differentiation. The RP Gsk3b_App_Cdk5 is considered responsible for the cell-fate decision pattern of neural stem cell lineage, while different regulatory rules (indirect or direct) would contribute to different cell differentiation trajectories. Although we have investigated the common and specific pattern across different cell differentiation from a system perspective, some problems still need to be addressed. The differential networks and state networks we built were undirected networks that could not support the directed association between each pair  The heatmap of the average score of 5 differential topology analyses reveals the direct role in the RP. The different columns describe different paths. The row represents the 12 common differentially expressed genes among five paths. The scores and colors of the heatmap represent the difference in network topology of a specific gene in a specific path of genes. Limited to the technology and cost, the causal relationship among genes is still unknown [49]. More perturbation researches are needed (e.g., gene knockdown) to explain which genes are upstream or downstream of the cascade. In the future, we will conduct further biological experiments on this pattern to provide more concrete evidence to validate our results.