Application of serum SNCA gene in the diagnosis of Parkinson’s disease

Background: SNCA plays an important role in the development of Parkinson's disease. However, the mechanism in PD is still unclear. We integrate multi-omics data for systematic analysis to explore the potential mechanism and therapeutic target of SNCA in PD. Materials and methods: We download the gene expression, miRNA expression, and methylation data sets from GEO. After integration and standardization, analyze the diagnostic value and potential function of SNCA. Furthermore, according to the expression of SNCA, the important role of the genetic and epigenetic mechanism of SNCA co-expression in PD was analyzed by RNA-Seq, miRNA-Seq, and methylation data. Results: Six expression proling data sets showed that SNCA is lowly expressed in PD and could be used as a potential PD marker. SNCA co-expressed genes in serum and substantia nigra samples have signicant differences. The lncRNAs which most relevant to SNCA expression are related to a variety of tumors and nerve-related diseases meanwhile the miRNA which most relevant to SNCA expression is related to multiple neurodegenerative disease pathways. There are multiple CpG methylation sites in the SNCA promoter region, of which cg15133208 is signicantly hypomethylated in PD. Conclusion: This study proposes low expression of SNCA as a diagnostic biomarker for PD, explores the unique genome and apparent landscape associated with expression of SNCA, provides ideas for a better understanding of the molecular basis of PD, and serves as a diagnostic biomarker for expression of SNCA or as a potential therapeutic target for PD patients which provides more direct evidence.

serum biomarker [6]. Wang et al. used gene expression and methylation data to identify 53 genetic markers that can be used as PD serum markers [7], George et al. used a protein interaction network and successfully screened multiple PD-related drug targets [8] prove that using omics data to study the mechanism of the development of complex diseases has become an effective method.
The internal protein α-synuclein is mainly encoded by the SNCA gene in the nucleus [9] which is composed of Lewy bodies, is related to the neuropathological features of PD [10] and plays an important role in neurodegenerative diseases. The overexpression of α-synuclein has a close relationship with the pathogenesis of PD [11]. The abnormal accumulation of α-synuclein and the formation of Lewy bodies may trigger the body's in ammatory response. It has recently been con rmed that overexpression is one of the reasons [12], and it may also be related to epigenetic regulation [13]. Therefore, it is necessary to analyze the pathogenesis of PD correlation with α-synuclein expression based on multi-omics.
In this study, we propose a low expression of SNCA as a serum diagnostic biomarker for PD. We also explore the unique genomic and apparent landscape associated with the expression of SNCA. Compared with current basic experiments of SNCA, our work is to use SNCA expression provides more direct evidence as a diagnostic biomarker or as a potential therapeutic target for PD patients.

Data collection and processing
We downloaded the PD sample's data sets from the Gene Expression Omnibus (GEO) database then screened three sets of serum expression pro le data sets (GSE6613 [14,15], GSE57475[16], GSE99039 [17]) and three sets of substantia nigra expression pro le datasets(GSE7621[18], GSE20141 [19] ,GSE49036 [20]), one set of brain tissue data that matching with gene expression and miRNA expression(GSE110716 [21], GSE110717 [21]), and two sets of serum sample's methylation data sets (GSE111629 [22,23], GSE72774 [24,23]), each data statistics are showed as (Table 1). We download these standardized data sets on 2019-03-01. For the microarray expression pro le data, the probe corresponds to the gene for setting that if the probe is used to correspond to multiple genes, then it will be removed, or multiple probes correspond to one gene then getting median. For methylated data, the β value is evaluated for the CpG quantitative of methylation. Remove the methylation sites with NA samples greater than 30 and remove the CpG sites with cross-reactive in the genome, according to the cross-reactive sites provided by Chen et al [25]. Then, a weighted adjacency matrix was constructed using a power function A = | C | (C ( = Pearson's correlation between gene m and gene n; A = adjacency between gene m and gene n). Β was a soft-thresholding parameter that could emphasize strong correlations between genes and penalize weak correlations. After choosing the power of β, the adjacency was transformed into a topological overlap matrix (TOM), which could measure the network connectivity of a Gene de ned as the sum of its adjacency with all other Genes for network generation, and the corresponding dissimilarity (1-TOM) was calculated. To classify Genes with similar expression pro les into genes modules, average linkage hierarchical clustering was conducted according to the TOM-based dissimilarity measure with a minimum size (Gene group) of 30 for the Genes dendrogram. To further analyze the module, we calculated the dissimilarity of module eigen Genes, chose a cut line for module dendrogram and merged some modules. Finally, select the module where SNCA is located as the SNCA co-expression module.

Chip re-annotation
To get the latest lncRNAs expression pro le, we re-annotate the lncRNAs of the HG-U133 2.0 array. Firstly, we download the long non-coding RNA transcript sequences of the GRCh38.p12 version from GENCODE.
After that we use seqmap [27] to compare the probe clusters sequences to the lncRNA sequence, set the mismatch to 0, and select at least 11 clusters to be matched to the same lncRNA probe means that the re-annotation was successful. Finally, the total 2448 Affymetrix probe sets were included in our following analysis.
2.4 miRNA target gene prediction miRTarBase[28] is a curated database of MicroRNA-Target Interactions, which provides the largest available prediction and experimentally veri ed collection of miRNA-target interactions, with a variety of novel and unique features. We plan to use the miRTarBase online platform for predicting miRNA target genes meanwhile using default threshold lters target genes.

Functional enrichment analyses
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed using the R package clusterpro ler [29] for genes, to identify over-represented KEGG pathway. For this analysis, an FDR <0 .05 was considered to denote statistical signi cance.
ssGSEA [30] was performed by the GSVA [31] using the MSigDB [32] C2 Canonical pathways gene set collection, which contains 1320 gene sets. Gene sets with a p-value less than 0.05 after performing 1000 permutations were considered to be signi cantly enriched.

Statistical analysis
All analyses that do not specify parameters used the default parameters of the software. Signi cance was set at P < 0.05. All statistical analyses were performed in R 3.4.3. The correlation uses the Pearson correlation coe cient, and the network visualization uses Cytoscape [33] (http://www.cytoscape.org/). ROC analysis using R package plotROC and drawing heat map with R package pheatmap.

SNCA can be used as a serum marker in PD.
The expression of SNCA is related to the progress of PD. For the relationship between the transcription level of SNCA in serum and PD, we extracted the expression level of SNCA in each dataset from serum and substantia nigra samples. In serum samples, the transcriptional levels of SNCA in PD were signi cantly lower than that of the healthy group( Fig. 1A: GSE6613, GSE57475, GSE99039). Similarly, the PD in substantia nigra samples is also signi cantly lower than that of healthy group samples such as ( Fig. 1A: GSE7621, GSE20141, GSE49036 . To predict the performance of transcription level on PD, we further analyzed the diagnostic curve where the average AUC = 0.727 in the substantia nigra sample and the average AUC = 0.642 in the whole blood sample as (Fig. 1B), which means that the low expression of SNCA is related to the occurrence of PD and can be used as a diagnostic marker in PD.

SNCA has different regulatory mechanisms in different types of samples.
In order to observe the relationship between SNCA expression and pathways, we used ssGSEA to analyze the enrichment scores of KEGG Pathway in each dataset and selected the top 20 KEGG pathways which most related. In serum samples, the SNCA gene was mainly enriched to GLYCOSPHINGOLIPID BIOSYNTHESIS LACTO AND NEOLACTO SERIES, NITROGEN METABOLISM, PORPHYRIN AND CHLOROPHYLL METABOLISM, CITRATE CYCLE TCACYCLE, AMINOACYL TRNA BIOSYNTHESIS and other energy metabolism-related pathways shown in (Fig. 2A). In nigra samples, the SNCA gene was mainly enriched in PARKINSONS DISEASE, PATHWAYS IN CANCER, PROSTATE CANCER, LEUKEMIA, and other complex diseases related pathways shown in (Fig. 2B). There are obvious differences in the pathways of the SNCA expression in the two types of samples (Fig. 2C). It is worth mentioning that the most relevant AMINOACYL TRNA BIOSYNTHESIS pathways in two types of samples showing the opposite results (Fig. 2D). These results suggest that SNCA may have different regulatory mechanisms in different types of samples.

Relationship between genome-wide gene expression and SNCA expression
In order to further study the biological role of SNCA in the occurrence of PD, gene expression pro les closely related to SCNA were obtained based on whole-genome sequencing analysis. The R software package WGCNA was used to analyze the SCNA co-expression modules in different data sets, which were obtained from the data set GSE6613. A total of 19 modules were included, and the module related to SNCA expression was brown, as shown in (Fig. 3A), with a total of 485 co-expressed genes. A total of 11 modules were obtained from the dataset GSE99039, and the module related to SNCA expression was a red module, such as (Fig. 3B), which included 184 genes, a total of 9 modules were obtained from the data set GSE57475, of which the module related to SNCA expression is black as in (Fig. 3C), which contains a total of 139 genes, and a total of 17 modules were obtained from the data set GSE49036, of which SNCA expression related. The module is turquoise such as (Fig. 3D), which contains 4,916 genes, and a total of 22 modules are obtained from the data set GSE20141. Among them, the module related to SNCA expression is turquoise, such as (Fig. 3E), which contains 2745 genes, and a total of 28 genes from the data set GSE7621. Module, the module related to SNCA expression is turquoise as (Fig. 3F), which contains 4125 genes, we analyze the relationship between the genes in the SNCA co-expressed module in the data set, as expected, there is less intersection between genes in the serum sample data set, there is more intersection between the substantia nigra sample data set, and between the two types of sample data sets, the intersection is less like (Fig. 3G). Further, use the intersection gene of each data set in the two types of samples for KEGG functional enrichment, select FDR < 0.05, the serum sample contains a total of 61 genes, enriched into 1 KEGG Pathway, such as with Porphyrin and chlorophyll metabolism, the substantia nigra sample is enriched in 22 pathways, containing a total of 826 genes such as (WGCNA-Cross-gene). The most signi cant pathways are mainly PD, AD, and other pathways such as (WGCNA-Tp-KEGG). Among the six datasets, 6 genes are co-expressed with SNCA, such as GLRX5, HAGH, MPP1, MKRN1, XK, and BABAM1. These genes may be potential regulators in the occurrence and development of PD.
3.4 Relationship between SNCA and lncRNA expression lncRNAs play an important role in the occurrence and development of complex diseases. We re-annotated the gene chip, re-annotated 2448 lncRNAs from the GPL570 chip such as (lnc_probe), and used the Pearson correlation coe cient to calculate the relationship between SNCA and lncRNA as in (Fig. 4A), Select FDR < 0.01 and | R |> 0.35 to get the 15 most relevant lncRNAs, all of which are positively correlated. These lncRNAs are closely related to SNCA expression. (Fig. 4B), where ST7OT2 is related to autism [34], MEG3 is a tumor suppressor [35], involved in the invasion and metastasis of various tumors [36][37][38][39], HILS1 is associated with the prognosis of gliomas [40], FLJ22536 is associated with the aggressiveness of neuroblastoma [41], in short, these lncRNAs are involved in many complex diseases which the development process occurs.

SNCA miRNA expression relationship
In order to analyze the relationship between SNCA expression and miRNA, we screened the GEO database for data sets that simultaneously detected the expression of miRNA and SNCA (GSE110716 and GSE110717), and extracted miRNA expression pro les and SNCA expression pro les, which contained a total of 2565 miRNAs such as (mir .exp), use the Pearson correlation coe cient to calculate the relationship between SNCA and miRNA as in (Fig. 5A), choose p < 0.01 and | R |> 0.6 to get a total of 40 miRNA, and their expression pro le heat map with SNCA is as in (Fig. 5B), of which 31 are negative Correlation, 9 positive correlations such as (mir.exp.cor), the correlation heat map between these miRNAs is as in (Fig. 5C), miRNAs mainly inhibit mRNA transcription by targeting mRNA, and further select 31 negatively related miRNAs to use the mirTarBase database to predict the target genes of these miRNAs, select at least 70% of the target genes targeted by negatively related miRNAs to contain a total of 102. These genes are mainly enriched in the Calcium signaling pathway, Dopaminergic synapse is equal to nerve conduction related pathways as in (Fig. 5D). The results indicate that these miRNAs may participate in the pathogenesis of neurodegenerative diseases by regulating SNCA expression.

Relationship between genome-wide methylation and SNCA expression
Epigenetic disorders are essential for the development of various neurodegenerative diseases such as PD [42]. In order to observe the SNCA-related epigenetic landscape in PD samples, we rst analyzed all CpG sites in the SNCA promoter region. In the two sets of serum methylation data sets, we identi ed that the site numbered cg15133208 was signi cantly hypomethylated in PD as in (Fig. 6A, B), which is consistent with previous reports. Hypomethylation may affect the expression of SNCA and thus participate in the occurrence of PD and the progress process, we calculated the CPG sites that were signi cantly related to cp15133208 and selected the sites with FDR < 0.01 and | R |> 0.6. After taking the intersection of the two sets of methylation data sets, a total of 735 CpG sites were obtained. All of them are positively correlated such as (methyl.cor). The heat map of these most relevant methylation sites is as shown in (Fig. 6C, D). Their low methylation is related to PD. These differential methylation sites are mainly distributed in the open-sea region is as shown in (Fig. 6E). Gene expression is closely related to the methylation of the promoter region [43]. The promoter region of 670 genes annotated by these CpG sites is (lst.cpgs.anno), these genes are enriched to Rap1 signaling pathway, transcriptional misregulation in cancer, and other important biological pathways such as (Fig. 6F).

Discussion
The pathogenesis of PD is unclear, but previous studies have reported many abnormal molecular characteristics, such as an abnormal expression or altered epigenetic modi cation of PARK1-15LRRK2, MAPT, and GBA to be related to PD pathology [44,45]. These characteristics can be used as diagnostic factors and promote the understanding of the pathogenesis of PD. However, these molecules are only suitable for some speci c conditions and cannot meet the needs of clinical applications, which means that identifying robust molecular characteristics is still a challenge and requires more research to clarify the mechanism of the occurrence and development of PD. SNCA-encoded α-synuclein is the main component of Lewy bodies. Over-expressed α-synuclein relates to the pathogenesis of PD [11]. The abnormal accumulation of synuclein and the formation of Lewy bodies may trigger the body's in ammatory response. It may be subject to multiple omics molecular regulation in PD. We systematically analyzed the diagnostic value of SNCA in PD. SNCA is low in PD in serum and substantia nigra sample types Up, which can serve as an effective marker serum.
Genetics plays an important role in the etiology of many complex human diseases. Genes are the functional unit of genetic material. It is believed whether the expression of genes will affect the synthesis of downstream proteins, which are a basic part of the human body. However, recent studies have shown that individual genes cannot function alone. Instead, genes interact and affect human health together. Therefore, we identify genes that are co-expressed with SNCA based on gene expression pro les. First, we use WGCNA to separately calculate SNCA co-expression in different data sets. Module, screening coexpression module, analysis of the intersection of genes in the SNCA co-expression module in different data sets for mutual veri cation, of which the co-expressed genes in the three data sets of the substantia nigra sample have high reproducibility. The three datasets have poor reproducibility, which may be related to blood samples being easily affected by the environment. The genes co-expressed in the substantia nigra samples are mainly enriched in neurodegenerative disease pathways such as Parkinson disease, Huntington disease, Alzheimer disease, etc., Which is consistent with the expected result, and nally, we sift In the six data sets, six genes co-expressed with SNCA, such as GLRX5, HAGH, MPP1, MKRN1, XK, and BABAM1, are reported. Among them, GLRX5 is associated with tumor [46], anemia [47], schizophrenia [48]diseases, HAGH is associated with schizophrenia [49], autism [50], and other neurological diseases, MPP1 is associated with tumor drug resistance [51,52], MKRN1 is involved in the development of gastric cancer [53], these genes are in a variety of complex. They play an important role in the occurrence and development of diseases. In PD, their co-expression with SNCA may be a new candidate marker for PD.
Recent studies have shown that genetics and epigenetic mechanisms play an important role in tumorigenesis. The former mainly focuses on abnormal gene expression, and the latter mainly include post-transcriptional regulation of microRNA and DNA methylation. Therefore, we further the possible mechanisms by which SNCA highly in uences the results of these three aspects were studied. First, the SNCA table was analyzed using miRNA expression pro ling for related miRNAs, the 40 most relevant miRNAs were identi ed, including 31 negatively related miRNAs. Through functional enrichment, these miRNAs were found to be mainly enriched in the Calcium signaling pathway, Dopaminergic synapse is equal to nerve conduction related pathways, which means The abnormal expression of these miRNAs may play an important role in the occurrence of PD. Eryilmaz et al. found that the SNCA promoter region is hypomethylated in early-onset Parkinson's disease [54]. Based on the DNA methylation pro le, we identi ed SNCA. There are signi cant hypomethylation sites in PD in the promoter region, and 735 CpG sites that are signi cantly related to this site were further identi ed. Cluster analysis revealed that the methylation of these sites was highly consistent. Sexuality is generally hypomethylated in PD. These CpG sites are mainly distributed in the open sea region. A total of 670 genes was identi ed by functional annotation of these CpG sites, and these genes were enriched in many important biological pathways. This suggests that these abnormal methylation sites may play an important role in the epigenetic regulation of PD.
Although we have comprehensively analyzed the role of SNCA expression in the development of PD through bioinformatics technology from multiple omics levels, these results can help us better understand the occurrence and development of PD. However, our research also has some Limitations. First, the source of our sample cohort is not uniform. Second, this work is a study in the eld of bioinformatics, and its purpose is often to verify the accuracy and reliability of the analysis results or biomarkers through statistical signi cance and scienti c literature veri cation of the diagnostic value of SNCA. In future research, if conditions permit, we will increase the experimental design and conduct more detailed mechanism research to further verify our conclusions.
In summary, in this study, we used a large sample research cohort to provide direct evidence that the low expression of SNCA is related to the occurrence of PD. The data analysis based on multi-omics clari es the potential mechanism of SNCA in the occurrence and development of PD.

Conclusions
This study proposes low expression of SNCA as a diagnostic biomarker for PD, explores the unique genome and apparent landscape associated with expression of SNCA, provides ideas for a better understanding of the molecular basis of PD, and serves as a diagnostic biomarker for expression of SNCA or as a potential therapeutic target for PD patients which provides more direct evidence.

Declarations
Ethics approval and consent to participate Not applicable Consent for publication Not applicable

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Con ict of Interest
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.

Author Contributions
Yi Quan had full access to all of the study data and takes responsibility for the integrity and accuracy of the data analysis; Study concept and design: Yi Quan, Jia Wang; Critical revision of the manuscript for important intellectual content: All authors;   A-F The six data sets use the co-expression modules identi ed by WGCNA; G: the intersection relationship between the genes of the SNCA co-expressed modules in the six data sets.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.