Multimodal Analysis Reveals New Genetic Effectors in Parkinson'S Disease-Related Gene LRRK2 Gly2019Ser Mutation

Mutations in the LRRK2 gene, which encodes leucine-rich repeat kinase 2 (LRRK2), generate one of the most prevalent monogenic forms of Parkinson's disease (PD). Patients with autosomal dominant PD and apparent sporadic PD, who are clinically indistinguishable from those with idiopathic PD, are found to have LRRK2 mutations, particularly the most prevalent variant Gly2019Ser. Nonetheless, potential effectors of Gly2019Ser remain unknown.


Abstract Background
Mutations in the LRRK2 gene, which encodes leucine-rich repeat kinase 2 (LRRK2), generate one of the most prevalent monogenic forms of Parkinson's disease (PD). Patients with autosomal dominant PD and apparent sporadic PD, who are clinically indistinguishable from those with idiopathic PD, are found to have LRRK2 mutations, particularly the most prevalent variant Gly2019Ser. Nonetheless, potential effectors of Gly2019Ser remain unknown.

Methods
We used the GEO database to undertake and evaluate a multitiered bioinformatic investigation to look into the gene expression implicated in the development of Parkinson's disease. Individual differences in gene expression were then con rmed in whole blood samples collected in the clinic. These genetic factors were also subjected to an interaction analysis and prediction.

Results
In total, 607 genes in the LRRK2 Gly2019Ser mutation group expressed differently from those in the wild group. The following 10 top hub genes were discovered in protein-protein interaction (PPI) networks: CD44, CTGF, THBS1, VEGFA, SPP1, EGF, VCAM1, MMP3, CXCR4, and LOX. The gene expression of CD44, CTGF, THBS1, SPP1, EGF, and LOX was considerably higher in the LRRK2 Gly2019Ser mutant group than in the LRRK2 wild group. Meanwhile, CXCR4 gene expression in the LRRK2 Gly2019Ser mutant group was signi cantly lower than in the LRRK2 wild group. We then con rmed the expression of the hub genes in LRRK2 Gly2019Ser mutated iPSC-induced DA cells. As a result, the levels of CD44, CTGF, THBS1, VEGFA, SPP1 were positively correlated to the mutation of LRRK2, displayed promising effectors for discriminating the pathogenesis of PD.

Conclusions
We identi ed CD44, CTGF, THBS1, VEGFA, and SPP1 as the potential genetic effectors responding to the mutation of LRRK2. They could be a promising mechanism for discriminating the PD and potential factors contributing to the disease's development.

Background
The aberrant deposits of α-synuclein aggregates in the brain and the loss of dopaminergic neurons in the substantia nigra pars compacta (SNpc) are pathological hallmarks of Parkinson's disease (PD) [1,2]. The most common cause of autosomal-dominant hereditary PD is missense mutations in the LRRK2 gene, which encodes leucine-rich repeat kinase 2 (LRRK2) [3]. The gene has emerged as one of the most important to familial and sporadic PD [4]. Because the lifetime risk of LRRK2 mutations is predicted to be 22-32% in clinical populations, the penetrance of LRRK2 mutations in PD is incomplete, implying substantial modi ers of LRRK2 illness [4]. Patients with autosomal dominant PD and apparent sporadic PD, who are clinically indistinguishable from those with idiopathic PD, are found to have LRRK2 mutations, particularly the most prevalent variant Gly2019Ser [3]. The Gly2019Ser LRRK2 mutation is located in the majority of patients with LRRK2-PD, and it is linked to neuropathological changes that are similar to those seen in idiopathic PD [5,6]. These changes include typical α-synuclein aggregates in the form of Lewy bodies and neurites, as well as cell loss in vulnerable areas [7].
LRRK2 is a 286 kDa protein having repetitive motifs on the N-terminal half and kinase and GTPase domains surrounded by protein-protein interaction domains on the C-terminal half [8]. The majority of pathogenic mutations are found in the two catalytic domains, but they all cause the kinase to become hyperactive in cells [9]. Reduced LRRK2 activity or expression has been shown to be neuroprotective in preclinical research; small-molecule LRRK2 inhibitors and antisense oligonucleotides have been produced and are now regarded safe for clinical use [5]. However, it is yet unknown how LRRK2 kinase activity is regulated at the molecular and cellular levels and how aberrant LRRK2 kinase activity causes PD.
In this study, we used the datasets from Gene Expression Omnibus (GEO) and validated the top gene targets affected by the LKKR2 mutation. We utilized weighted gene co-expression network analysis (WGCNA), Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses (GOFEA, KEGGFEA), Gene Ontology (GO), and STRING protein-interacting network to examine the variable genes and con rmed the expression of hub interacting proteins/genes, indicating that the focal adhesion and mitogen-activated protein kinase (MAPK) signalling pathway were closely linked in pathogenic and pathogenesis.

Materials And Methods
Data acquisition and processing Datasets were employed from GENE EXPRESSION OMNIBUS (GEO) public database (https://www.ncbi.nlm.nih.gov/gds) for the keywords "LRRK2". The R software has been employed for background correction of sample expression matrix, data downgrade, normalization, log 2 transformation, and probe reannotation. RMA, devtools, AnnoProbe, limma, Biobase, affyPLM, and GEOquery were among the R packages used in this research. The gene's expression level was calculated by averaging the values of these probes if a single gene in the chip corresponded to many probes. Using Fisher's combined probability test, P values were pooled, then adjusted for multiple comparison correction using the false discovery rate (FDR). FDR ≤ 0.05 was chosen as the signi cance level (Figure 1).
WGCNAand hub genes selection WGCNA is a common data mining method for analyzing biological networks that employs pairwise correlations between variables. While it can be used to analyze various high-dimensional data sets, it is most commonly utilized in genomics. It can be used to construct modules (clusters), intramodular hubs, and network nodes based on module membership, as well as investigate co-expression module connections interactions and compare network topologies (differential network analysis). Correlation networks make it easier to uncover prospective biomarkers or treatment targets using network-based gene screening methodologies. The WGCNA, an R package, was used to perform the weighted correlation network analysis. Pearson's correlation matrices and the average linkage technique were employed rst for every pairwise Gene, and then a weighted adjacency matrix was constructed using a power function A_mn=|C_mn|^β (C_mn = Pearson's correlation between Gene_m and Gene_n; A_mn=adjacency between Gene m and Gene n). β was a soft-thresholding parameter that might amplify strong gene-gene interactions while penalizing weak ones. The adjacency was transformed into a topological overlap matrix (TOM), which could be used to quantify Gene's network connection, which was de ned as the sum of its adjacency with all other Genes for network generation, and the corresponding dissimilarity (1-TOM) was calculated after choosing the power of 7. Using the TOM-based dissimilarity metric and a minimum size (Gene group) for the Genes dendrogram, average linkage hierarchical clustering was utilized to classify Genes with similar expression pro les into Gene modules. To investigate the modules further, we calculated their dissimilarity, selected a cut line for the module dendrogram, and merged some modules. We additionally merged modules with a distance of less than 0.35, resulting in a total of 5 co-expression modules.

Cluster analysis
Cluster analysis was carried out using ConsensusClusterPlus [10] was used to perform cluster analysis, which included agglomerative pam clustering with 1-Pearson correlation distances and resampling 80 percent of the samples for 10 repetitions. The empirical cumulative distribution function plot was used to nd the appropriate number of clusters.

Functional enrichment analysis
To determine the most highly enriched pathways, we utilized DAVID (https://david.ncifcrf.gov/home.jsp) to run GO and KEGG functional enrichment investigations, indicating the module's potential biological importance. To obtain statistically signi cant gene sets with meaningful functional annotations and signalling pathways, the P values were determined using Fisher's exact probability technique. P < 0.05 was used as the signi cant level.

Protein-protein interaction network and hub genes selection in the Key Module
The STRING 11.5 database (https://string-db.org/cgi/input? sessionId=boTaerIW4Ew9&input_page_show_search=on) was used to map the protein-protein interaction (PPI) network and predict the interactions of all of the proteins discovered in the study. The PPI activity is a primary focus of cellular biology research and serves as a prerequisite for system biology [11]. Proteins interact with other proteins within the cell to carry out their functions, and the information supplied by a PPI network improves the perception of the protein's function [12]. We used the STRING database's data analysis mode and an integrated con dence value of 0.4 to generate a network map of PPI. The con dence score, which is a medium con dence score, was calculated using the STRING platform. For a better visual depiction of the network and to identify hub genes, the obtained PPI was assessed using the software Cytoscape (https://cytoscape.org/).

Reprogramming of human broblasts to induced pluripotent stem cells and their differentiation to dopaminergic neurons
Three samples were taken from three wild healthy volunteers (WT) and three from PD patients who had the LRRK2 Gly2019Ser S mutation veri ed. WT broblast and PD patient broblasts with LRRK2-Gly2019Ser mutation were reprogrammed to induced pluripotent stem cells (iPSCs) as described in the previous report [13].
Immuno uorescence staining and confocal laser scanning microscopy The protocols for double immuno uorescence staining were carried out as described previously in Kavanagh et al [15]. In brief, free-oating 30-µm sections of the midbrain from each group were incubated with rabbit-derive tyrosine hydroxylase antibody (1:1000; Novus Biologcials) at 4°C overnight followed by a goat anti-mouse IgG conjugated with Alexa Fluor 488 (GB25301, 1:400; Servicebio) for 1 h at room temperature (22 ± 2°C). Immunoreactivity uoresced green under an LSM 880 laser scanning confocal microscope (Carl Zeiss, Oberkochen, Germany). ZEN light software was used to capture and analyze confocal images (Carl Zeiss).

Reverse transcription-quantitative real-time polymerase chain reaction
The cells were collected, and total RNA was extracted according to the manufacturer's instructions using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). For the quanti cation of messenger RNA (mRNA) of protein-encoding genes, RNA was reverse transcribed to complementary DNA (cDNA) with a random primer (Sangon Biotech, Shanghai, China) using a Reverse Transcription Kit (RR047A; Takara, Dalian, China) and the mRNA levels were determined using reverse transcription-quantitative real-time polymerase chain reaction (RT-qPCR). A human glyceraldehyde-3-phosphate dehydrogenase (GAPDH) primer pair served as an internal control. The Ct technique ∆∆C t method was used to calculate and normalize relative gene expressions. All the sequences of the primers used are listed in Table 1.

Statistical analysis
The results were provided as the mean SD of three separate studies. A two-tailed Student's t-test was used in the statistical analysis. A statistically signi cant difference was de ned as one with P < 0.05.

Results
Consensus clustering presents quantitative evidence for estimating the number of possible clusters and their membership in the datasets.
The GEO database has 2 LRRk2-related microarray transcriptome datasets and was conducted in three cell lines. With the threshold FDR ≤ 0.05 criterion, a total of 607 differentially expressed genes (DEGs) were con rmed. The differential expression pro ling and distribution of these DEGs were shown ( Figure  2A). Cluster analysis was performed using the ConsensusClusterPlus program, which includes agglomerative pam clustering with 1-Pearson correlation distances and resampling 80% of the samples for 10 times. The optimal cluster number was found using the cumulative distribution function (CDF). When k = 2, the cluster was discovered to be the most stable clustering based on the CDF Delta area curve ( Figure 2B, C, and D). We obtained four Immune Subtype (IS) results using k = 2. (Fig. 2E and F).
WGCNA evaluated the genes that had varied expression patterns.
Different databases contribute to the overall number of genes differently, with varying degrees of database overlap ( Figure 3A and B). The 607 DEGs were hierarchically classi ed into 18 categories using WGCNA analysis, respectively: brown2, rebrick4, darkgreen, coral2, greenyellow, lightsteelblue, darkolivegreen4, darkred, darkolivegreen, brown4, yellow4, darkseagreen4, plum2, lightyellow, lightcoral, darkorange, royalblue, darkgrey, and black ( Figure 3C). The horizontal axis was the network's average connectivity ( Figure 3D), and the vertical axis was the scale-free topology tting index R^2 (values in the SFT.R.Sq column in the statistical data) ( Figure 3E). As a consequence of respringing the cluster tree, we detected 18 modules in the heat map of association between the tree diagram of gene expression and module attributes ( Figure 3F). The relationship of modules and a phenotype heat map was exhibited in the grouping of LRRK2 wild and mutation. The blue module showed the most robust inverse link with the LRRK2 mutation phenotype. (Figure 3G).
The most highly enriched pathways were found using GO and KEGG.
To learn more about the biological processes and pathways involved, the DAVID was utilized to conduct GO and KEGG analyses ( Table 3). The 607 genes were primarily enriched in the biological progress (BP) category for system development, anatomical structure morphogenesis, regulation of the multicellular organismal process, nervous system development, cellular developmental process and cell differentiation; the cellular component (CC) category for the extracellular region, extracellular region part, vesicle, endomembrane system, extracellular space, plasma membrane part, extracellular vesicle, extracellular organelle and extracellular exosome ( Figure 4A, B and C). Many transcription factors were enriched, which have been associated with the LRRK2 mutation ( Figure 4D) ( Table 4).The 607 genes were primarily enriched in focal adhesion, proteoglycans in cancer, ECM-receptor interaction, MAPK signalling pathway, PI3K-Akt signalling pathway, amoebiasis, axon guidance, vascular smooth muscle contraction and Ras signalling pathway according to KEGG pathway analysis ( Figure 4E).

The PPI network reveals interacting proteins and hub genes
The frequent DEGs were entered using STRING 11.5, and the le created from the analysis was reintroduced into Cytoscape for further exploration, including hub gene discovery. The PPI network looked at gene-gene and protein-protein interactions, nding 600 links with a medium con dence interaction score (0.4) ( Figure 5A). We chose 50 genes for the heat map that indicated apparent differences in PD compared to controls ( Figure 5B). The PPI network was utilized with a medium con dence interaction score to investigate gene-gene/protein-protein interactions with in ammation-related genes (0.4). Then, hub genes and essential modules were detected based on the PPIs network in Figure 5A. We selected the top 10 top hub genes of CD44, CTGF, THBS1, VEGFA, SPP1, EGF, VCAM1, MMP3, CXCR4, LOX and analyzed their interactions ( Figure 5C), and the relative expressions of the interacting proteins were shown in Figure 5D. Moreover, the gene expression of CD44, CTGF, THBS1, SPP1, EGF, and LOX was considerably higher in the LRRK2 Gly2019Ser mutant group than in the LRRK2 wild group (Figure 5E), and operating characteristic (ROC) analysis was used in Figure 5F. The expressions between CD44 and CTGF, CD44 and THBS1, CD44 and SSP1, had a signi cant positive correlation ( Figure 5G).

The validation of hub genes in iPSC-induced DA cells.
We enrolled three wild healthy volunteers who had not been diagnosed with a central nervous system disease and three from PD patients who had the LRRK2 Gly2019Ser mutation veri ed. The WT broblast and PD patient broblasts with LRRK2-Gly2019Ser mutation were reprogrammed to iPSCs. Further, the iPSCs were differentiated into DA neurons. We validated the DA neurons differentiation with tyrosine hydroxylase (TH) staining ( Figure 6A). RT-qPCR analyzed the peripheral blood samples of PD and healthy control to measure the gene expression. We measured the 10 hub genes expression. As a result, the levels of CD44, CTGF, THBS1, VEGFA and SPP1 were positively correlated to the mutation of LRRK2, displayed promising effectors for discriminating the pathogenesis of PD ( Figure 6B, C, D, E and F).
The PPI network reveals interaction for the effctors responding to LRRK2 mutation.
A PPI network for the genes of CD44, CTGF, THBS1, VEGFA and SPP1 was built using the STRING online tool ( Figure 7A, B, C, D, and E)

Discussion
Increasing evidence demonstrated a role for the mutation of LRRK2 in the pathogenesis of PD. However, it is still unclear how LRRK2 kinase activity is regulated at the molecular and cellular levels, as well as how abnormal LRRK2 kinase activity leads to PD. Speci cally, recent technological advancements have made it possible to identify molecular changes in human diseases on a worldwide scale. However, these methods are constrained by the high complexity of data, which leads to many testing challenges, the often small number of samples examined, and the low technical and statistical reproducibility of the produced results [16]. In this study, we performed and tested a multitiered bioinformatic analysis using the GEO database to investigate global alterations that responded to the mutation of LRRK2. We introduced the recently developed WGCNA methodology, a commonly used data mining method based on pairwise correlations between variables and is especially useful for investigating biological networks [17]. Interaction analysis and prediction were also performed on these genetic variables.
A total of 607 DEGs were validated using the FDR ≤ 0.05 criterion. To learn more about the biological processes and pathways involved, the 607 genes were primarily enriched in the BP category for system development, anatomical structure morphogenesis, regulation of the multicellular organismal process, nervous system development, cellular developmental process and cell differentiation; the CC category for the extracellular region, extracellular region part, vesicle, endomembrane system, extracellular space, plasma membrane part, extracellular vesicle, extracellular organelle and extracellular exosome. Moreover, according to KEGG pathway analysis, the 607 genes mainly were abundant in focal adhesion, cancer proteoglycans, ECM-receptor interaction, MAPK signalling pathway, PI3K-Akt signalling pathway, amoebiasis, axon guidance, vascular smooth muscle contraction, and Ras signalling pathway. Then, top hub genes and essential modules were detected based on the PPIs network. Finally, the top 10 top hub genes were selected: CD44, CTGF, THBS1, VEGFA, SPP1, EGF, VCAM1, MMP3, CXCR4 and LOX. What's more, we further validated distinct differences in gene's alteration in iPSCs-induced DA cells from LRRK2 wild and mutated human samples. As a result, the levels of CD44, CTGF, THBS1, VEGFA and SPP1 were positively correlated to the mutation of LRRK2, displayed promising effectors for discriminating the pathogenesis of PD.
Meanwhile, we discovered that several transcription factors are involved in the pathogenesis of LRRK2 mutation. Among them, NF1 and HNF3 are in the highest priority position of the representative enriched terms. Nuclear factor I (NF1) has been reported to regulate neuronal apoptosis, in ammatory response, and oxidative stress in PD [18]. Therefore, we speculate that NF1 may be related to regulating downstream gene expression by increased kinase activity after LRRK2 mutation. To date, most preclinical research has used transgenic rodent models to assess the e cacy of LRRK2 inhibitors in various PD models [19]. As a result, investigating how LRRK2 in uences the process of NF1 regulating PD could be therapeutic. Meanwhile, hepatocyte nuclear factor 3 (HNF3) was the rst identi ed member of this extensive family of transcription factors, which binds to cis-regulatory elements in hundreds of genes encoding gluconeogenic and glycolytic enzymes, serum proteins and hormones [20]. To date, there is currently no information on how the LRRK2 mutation affects HNF3. However, we believe that this could lead to new study areas.
However, more long-term research is needed to see if gene expression levels can be used to track clinical development, including the transcription factors and the effectors responding to LRRK2 mutation that we validated in this study.

Conclusion
We performed multimodal approaches to identify 607 genes expressed differently in LRRK2-Gly2019Ser mutated iPSCs-induced DA cells than in the LRRK2 wild control group. They could be a promising mechanism for discriminating the PD and potential factors contributing to the disease's development.

Ethics approval
Ethical approval was obtained from the ethics committee of Zhujiang Hospital of Southern Medical University. To take part in this study, all participants signed a written informed consent form.

Competing interests
The authors declare that they have no competing interests.   Table 3. Top 20 clusters with their representative enriched terms (one per cluster). The number of genes in the user-provided lists that belong to the supplied ontology term is called "count." "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the P-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.  Figure 1 Study ow diagram  Consensus k = 2 sample cluster heat map. Different samples are represented by the rows and columns of the matrix. In yellow to dark red, the values of the consensus matrix range from 0 (cannot be grouped together) to 1 (always clustered together).