Consensus clustering presents quantitative evidence for estimating the number of possible clusters and their membership in the datasets.
The GEO database has 2 LRRk2-related microarray transcriptome datasets and was conducted in three cell lines. With the threshold FDR ≤ 0.05 criterion, a total of 607 differentially expressed genes (DEGs) were confirmed. The differential expression profiling and distribution of these DEGs were shown (Figure 2A). Cluster analysis was performed using the ConsensusClusterPlus program, which includes agglomerative pam clustering with 1-Pearson correlation distances and resampling 80% of the samples for 10 times. The optimal cluster number was found using the cumulative distribution function (CDF). When k = 2, the cluster was discovered to be the most stable clustering based on the CDF Delta area curve (Figure 2B, C, and D). We obtained four Immune Subtype (IS) results using k = 2. (Fig. 2E and F).
WGCNA evaluated the genes that had varied expression patterns.
Different databases contribute to the overall number of genes differently, with varying degrees of database overlap (Figure 3A and B). The 607 DEGs were hierarchically classified into 18 categories using WGCNA analysis, respectively: brown2, firebrick4, darkgreen, coral2, greenyellow, lightsteelblue, darkolivegreen4, darkred, darkolivegreen, brown4, yellow4, darkseagreen4, plum2, lightyellow, lightcoral, darkorange, royalblue, darkgrey, and black (Figure 3C). The horizontal axis was the network's average connectivity (Figure 3D), and the vertical axis was the scale-free topology fitting index R^2 (values in the SFT.R.Sq column in the statistical data) (Figure 3E). As a consequence of respringing the cluster tree, we detected 18 modules in the heat map of association between the tree diagram of gene expression and module attributes (Figure 3F). The relationship of modules and a phenotype heat map was exhibited in the grouping of LRRK2 wild and mutation. The blue module showed the most robust inverse link with the LRRK2 mutation phenotype. (Figure 3G).
The most highly enriched pathways were found using GO and KEGG.
To learn more about the biological processes and pathways involved, the DAVID was utilized to conduct GO and KEGG analyses (Table 3). The 607 genes were primarily enriched in the biological progress (BP) category for system development, anatomical structure morphogenesis, regulation of the multicellular organismal process, nervous system development, cellular developmental process and cell differentiation; the cellular component (CC) category for the extracellular region, extracellular region part, vesicle, endomembrane system, extracellular space, plasma membrane part, extracellular vesicle, extracellular organelle and extracellular exosome (Figure 4A, B and C). Many transcription factors were enriched, which have been associated with the LRRK2 mutation (Figure 4D) (Table 4).The 607 genes were primarily enriched in focal adhesion, proteoglycans in cancer, ECM-receptor interaction, MAPK signalling pathway, PI3K-Akt signalling pathway, amoebiasis, axon guidance, vascular smooth muscle contraction and Ras signalling pathway according to KEGG pathway analysis (Figure 4E).
The PPI network reveals interacting proteins and hub genes
The frequent DEGs were entered using STRING 11.5, and the file created from the analysis was reintroduced into Cytoscape for further exploration, including hub gene discovery. The PPI network looked at gene-gene and protein-protein interactions, finding 600 links with a medium confidence interaction score (0.4) (Figure 5A). We chose 50 genes for the heat map that indicated apparent differences in PD compared to controls (Figure 5B). The PPI network was utilized with a medium confidence interaction score to investigate gene-gene/protein-protein interactions with inflammation-related genes (0.4). Then, hub genes and essential modules were detected based on the PPIs network in Figure 5A. We selected the top 10 top hub genes of CD44, CTGF, THBS1, VEGFA, SPP1, EGF, VCAM1, MMP3, CXCR4, LOX and analyzed their interactions (Figure 5C), and the relative expressions of the interacting proteins were shown in Figure 5D. Moreover, the gene expression of CD44, CTGF, THBS1, SPP1, EGF, and LOX was considerably higher in the LRRK2 Gly2019Ser mutant group than in the LRRK2 wild group (Figure 5E), and operating characteristic (ROC) analysis was used in Figure 5F. The expressions between CD44 and CTGF, CD44 and THBS1, CD44 and SSP1, had a significant positive correlation (Figure 5G).
The validation of hub genes in iPSC-induced DA cells.
We enrolled three wild healthy volunteers who had not been diagnosed with a central nervous system disease and three from PD patients who had the LRRK2 Gly2019Ser mutation verified. The WT fibroblast and PD patient fibroblasts with LRRK2- Gly2019Ser mutation were reprogrammed to iPSCs. Further, the iPSCs were differentiated into DA neurons. We validated the DA neurons differentiation with tyrosine hydroxylase (TH) staining (Figure 6A). RT-qPCR analyzed the peripheral blood samples of PD and healthy control to measure the gene expression. We measured the 10 hub genes expression. As a result, the levels of CD44, CTGF, THBS1, VEGFA and SPP1 were positively correlated to the mutation of LRRK2, displayed promising effectors for discriminating the pathogenesis of PD (Figure 6B, C, D, E and F).
The PPI network reveals interaction for the effctors responding to LRRK2 mutation.
A PPI network for the genes of CD44, CTGF, THBS1, VEGFA and SPP1 was built using the STRING online tool (Figure 7A, B, C, D, and E). The PPIs network contains a number of for CD44: number of nodes: 41, number of edges: 352, average node degree: 17.2; for CTGF: number of nodes: 41, number of edges: 327, average node degree: 16; for THBS1: number of nodes: 41, number of edges: 262, average node degree: 12.8; for VEGFA: number of nodes: 41, number of edges: 341, average node degree: 16.6, and for SPP1: number of nodes: 41; number of edges: 317; average node degree: 15.5.