The data pertaining to brain tissues (substantia nigra) of patients with PD were specifically selected here for data mining. We obtained the gene expression data of PD from the GEO database. Compared with the TCGA database, the GEO data are scattered; therefore, we were only able to collect the data manually. Via chip re-annotation, 1970 lncRNA probes were obtained to study the regulatory mechanism of the mRNA/lncRNA co-expression network in PD and the possible regulatory mechanisms of disease pathways. Such a large sample in the study of PD is unique and will help improve the reliability of the research results .
The co-expression network analysis performed here identified five WGCNA modules, among which the midnight-blue module was most significantly enriched in neurodegenerative diseases. Furthermore, an enrichment analysis showed that the Parkinson’s disease (PD) pathway (hsa05012) was one of the representative pathways related to this module. In addition, the remaining modules exhibited low intersection and were enriched in different pathways with different functions, such as the turquoise module, which was enriched for the Helicobacter pylori infection (hsa05120) and epithelial signal pathways; previous studies have reported that Helicobacter pylori infection is associated with PD  or that the ubiquitin-mediated proteolytic pathway (hsa04120) in astrocytic glutamine metabolism is associated with PD . Thus, we inferred that this is a pathogenic entity. Because of the complexity of its underlying mechanisms, PD is a complex disease that cannot be attributed to the dysfunction of a single pathway. Therefore, further data mining was performed on the disease-related modules and 12 key lncRNAs were selected using the ROCR package in the R language. A PubMed literature search system was used for literature Dig and to explore the relationship between these lncRNAs and PD. According to previous reports, AC093323.3 exhibits differential expression in the midbrain of cocaine abusers , which shows that these 12 genes can be used as potential lncRNA diagnostic markers of PD. Subsequently, we used the SVM model to perform a disease prediction analysis of the 12 candidate lncRNAs. The 10-fold cross-validation of the PD dataset showed that our model had 12 lncRNAs. The sensitivity of RNA verification was 86.95% and the specificity was 76.92%, which further showed that these 12 lncRNAs can be used as reliable biomarkers for PD diagnosis. Finally, the 12 selected lncRNAs were re-analyzed through the KEGG pathway database. We identified 10 positive correlations for AC093323.3, including cancer-related pathways such as PD, and 10 negative correlations, mainly related to the JAK/STAT signaling pathway; several negative correlations for AC120114.3, including Huntington’s disease, AD, PD, and other related pathways; and a negative correlation between LOC153684 and the AD pathway.
We analyzed the lncRNA/mRNA network and related pathways in PD using bioinformatics techniques. These results can help understand the occurrence and development of PD. However, our research also had some limitations. First, we used probes to re-annotate the pipeline and identify functional lncRNAs related to PD; although this approach has been widely used in many bioinformatics studies, we admit that this pipeline filters out many lncRNAs that do not match the probe sequence. Second, in addition to gene expression, epigenetic- and protein-level information also plays a very important role in the drug-response mechanism; therefore, this information should be included in the pre-expression model. Third, in the field of bioinformatics, the validity of the results is often assessed based on statistical significance and literature verification, which were used here to validate the accuracy and reliability of the lncRNA/mRNA network, lncRNA-related functional modules, or the diagnostic potential of the lncRNA biomarkers.
In this study, we used the GEO database to analyze systematically potential lncRNA molecular markers in PD based on lncRNA re-annotation. We screened out 12 lncRNA molecules and verified them through the SVM model, to obtain satisfactory results. It is concluded that the expression of these 12 lncRNAs may be related to the occurrence and development of PD. This study provided new molecular entities for the diagnosis of PD, which may promote the early detection of this disease and the development of personalized therapies.