MicroRNA Candidate Biomarkers for Parkinson’s Disease and Idiopathic REM Sleep Behavior Disorder

Ligang Wu (  lgwu@sibcb.ac.cn ) Institute of Biochemistry and Cell Biology State Key Laboratory of Molecular Biology: Chinese Academy of Sciences Center for Excellence in Molecular Cell Science https://orcid.org/0000-0003-4010-9118 Jun Liu Shanghai Jiao Tong University Medical School A liated Ruijin Hospital https://orcid.org/0000-00018300-8646 Yuanyuan Li Shanghai Jiao Tong University Medical School A liated Ruijin Hospital Ying Cao Chinese Academy of Sciences State Key Laboratory of Molecular Biology Wei Liu Chinese Academy of Sciences State Key Laboratory of Molecular Biology Hongdao Zhang Chinese Academy of Sciences State Key Laboratory of Molecular Biology Aonan Zhao Shanghai Jiao Tong University Medical School A liated Ruijin Hospital Ningdi Luo Shanghai Jiao Tong University Medical School A liated Ruijin Hospital

Conclusions: Current study provides a valuable and highly informative dataset of EV-associated sncRNAs from plasma of iRBD and PD patients. We identi ed miRNA signature features that could serve as minimally-invasive, blood-based surveillance biomarkers for distinguishing iRBD or PD from healthy individuals with high sensitivity, speci city, and accuracy.

Background
Parkinson's disease (PD) is the second most common neurodegenerative disease, affecting approximately 1% of people over the age of 60 globally, and imposes increasingly heavy social and economic burdens on aging societies 1,2 . Idiopathic rapid eye movement (REM) sleep behavior disorder (iRBD) is a parasomnia characterized by the loss of normal atonia during the REM stage of sleep which results in overt motor behaviors that frequently represent the enactment of dreams 3 . In the past decade, iRBD has been established as one of the earliest and most speci c prodromal signs of αsynucleinopathies, including PD, dementia with Lewy bodies, and multiple system atrophy 3 . Thus, accessible and reliable biomarkers for early diagnosis of PD and iRBD are urgently needed to identify candidate therapeutic targets and to monitor disease progression during therapeutic interventions 4,5 .
Currently, the diagnosis of PD mostly relies on clinical symptoms, which hampers the detection of the earliest phases of the disease, the time at which treatment may have the greatest therapeutic effect. A variety of biomarkers for diagnosing PD are under investigation, including factors based on pathological, imaging, biochemical, and genetic data 4 . Bio uid biomarkers have markedly expanded over past 5 years, including α-synuclein, lysosomal enzymes and neuro lament light chain in CSF 6,7 . Blood biomarkers, such as α-synuclein, are also under investigation; whereas its quantities are strongly in uenced by red blood cell (RBC) contamination and haemolysis, which limit the utility for diagnostic purposes 8 . Besides, although non-invasive neuroimaging techniques can provide in-depth information about brain structure and function, these methods require major investments in infrastructure which limit their wide clinical deployment 9 . Therefore, a minimally invasive test for early detection of iRBD and PD and for monitoring disease progression still poses a major challenge.
In consideration of potential biomarkers, microRNAs (miRNAs) are small non-coding RNAs (sncRNAs) that are generally expressed in all eukaryotic cells and which perform critical regulatory functions at the posttranscriptional level. Cellular miRNAs released into body uids can be readily detected, making them ideal biomarkers for diagnosis of various diseases 10 . Previous studies have reported that salivary miR-153 and miR-223 can be used as biomarkers for idiopathic PD 11 , while serum miR-221 is a potential predictor of PD 12 . In addition, circulating brain-enriched miRNAs have been used to distinguish idiopathic and genetic PD 13 . However, all of these studies used reverse transcription followed by real-time quantitative PCR (RT-qPCR) method to measure the relative expression of a few miRNAs in bodily uids.
Since this method detects a limited number of known miRNAs and is also ine cient in distinguishing miRNAs with similar sequences 14 , the most diagnostically informative miRNAs could be potentially overlooked.
Extracellular vesicles (EV) are nano-scale, membrane-enclosed particles released from (possibly) all eukaryotic cells to transport proteins, lipids, RNAs, and DNA fragments 15 , a process shown to have important biological functions 16, 17 . EVs have been found in serum, plasma, urine, saliva, cerebrospinal uid, and breast milk 15 . Moreover, the EV membrane effectively prevents degradation of the enclosed miRNAs by ribonucleases that are abundant in bio uids 18,19 . These features of EV-associated miRNAs cumulatively enhance miRNA biomarker reliability compared with unprotected, cell-free, miRNAs [20][21][22][23] .
However, the full spectrum of EV-associated miRNAs present in the plasma of iRBD and PD patients remains unknown, and characterizing the miRNA population in PD or iRBD patients may unveil several effective diagnostic biomarkers. In this study, we used high throughput sequencing to pro le the EVassociated sncRNA population in plasma of iRBD and PD patients. Here, we report several such candidate miRNA biomarkers for diagnosis of iRBD and PD patients.

Study population
A total of 169 participants were included in this study and divided into three groups: 60 healthy individuals, 56 patients with iRBD, and 53 patients with PD. The healthy individuals were community volunteers without neurodegenerative disorders. The diagnosis of iRBD was made based on videopolysomnography evidence according to standard International Classi cation of Sleep Disorders-II criteria. All iRBD patients were examined by neurologists to exclude those with motor signs of parkinsonism or secondary causes. The diagnosis of PD was performed according to the International Parkinson and Movement Disorder Society (MDS) diagnostic criteria by at least two neurologists skilled in movement disorders 24 , and all of the patients were diagnosed as idiopathic PD.
This study was approved by the ethics committee of Ruijin Hospital a liated with the Shanghai JiaoTong University School of Medicine and was carried out at the Department of Neurology and Institute of Neurology of Ruijin Hospital. All participants or their guardians provided written informed consent.

Clinical assessment
For iRBD and PD patients, essential demographic and clinical information, including a study questionnaire for motor and nonmotor manifestations of their disease, was collected and documented.
The motor subscale of the Uni ed PD Rating Scale (UPDRS) was used to evaluate motor symptoms and ON medications. iRBD symptoms and their severity were evaluated by the REM Sleep Behavior Disorder Screening Questionnaire (RBDSQ). Nonmotor symptoms and autonomic dysfunction were evaluated by the Non-Motor Symptom Questionnaire (NMSQ) and Scale for Outcomes in PD-Autonomic (SCOPA-AUT), respectively. Depressive state was measured using the 17-item Hamilton Depression Rating Scale (HAMD); the Sni n' Sticks 16-item test (SS-16) was performed to assess olfactory function.

Plasma sample collection
Plasma samples (0.5-1 ml) were collected from subjects in the fasting state. For each blood collection, 2 ml of venous blood was collected into 10-ml BD K 2 EDTA Vacutainer tubes (BD; Cat# 367525) to prevent coagulation. The anticoagulant-treated blood samples were immediately inverted several times and transferred to 2-ml conical tubes (Eppendorf; Cat# 0030120094). To obtain plasma, blood samples were centrifuged at 1,300 × g for 10 min at room temperature (RT). The upper layer containing plasma was transferred to new tubes and centrifuged twice at 2,500 × g for 15 min at RT to obtain platelet-poor plasma (PPP). Each plasma supernatant was carefully transferred to fresh 1-ml tubes and then stored at -80 °C until further use.

EV isolation and RNA extraction
Frozen plasma samples were thawed in a thermostat water bath for 2 min at 37 °C and then centrifuged at 2,500 × g for 15 min at 4 °C to remove precipitated proteins and lipids. Approximately 0.5-1 ml of plasma was used as the starting material for EV isolation with an exoRNeasy Midi kit (Qiagen; Cat# 77144) according to the manufacturer's instructions. For EV-RNA extraction, 1 ml of TRIzol reagent (Invitrogen; Cat# 15596026) was added to the column, which was centrifuged at 3,000 × g for 1 min. Total EV-RNA was isolated by TRIzol reagent according to the manufacturer's recommendations. The RNA pellet was dissolved in 11 μl of deionized water treated with diethylpyrocarbonate (DEPC); 1 μl was used for yield measurement of EV-RNA by Quant-iT RiboGreen RNA Assay Kit (Thermo Fisher Scienti c; Cat# R11490); 2 μl was used for quality and size distribution examination by a 2200 Bioanalyzer (Agilent Technologies, CA, USA). The remaining 8 μl was used for sncRNA library construction and sequencing.
For EV characterization, 1 ml of XE buffer was added to the column and centrifuged at 3,000 × g for 1 min. Eluates were concentrated by ultra ltration using Amicon Ultra-0.5 ml Centrifugal Filters with a molecular weight cutoff of 100 kDa (Millipore; Cat# UFC5003). EV was resuspended in 40 μl of phosphate-buffered saline (PBS) and was ready for downstream applications, including transmission electron microscopy (TEM), nanoparticle tracking analysis (NTA), and western blotting.

Transmission electron microscopy (TEM) analysis
Puri ed concentrated EV in PBS were mixed with an equal volume of 4% paraformaldehyde (PFA). Five microliters of resuspended pellets were added to a formvar-carbon-coated EM grid and absorbed for 20 min in a dry environment. The grids were washed once with PBS for 1 min, followed by once with 1% glutaraldehyde for 5 min and seven times with Milli-Q water for 2 min per wash. The grids were then negatively stained with 2.5% uranyl-oxalate solution for 10 min and air-dried for 5 min under incandescent light. Images were acquired on a Tecnai G2 Spirit TEM (FEI, The Netherlands) with a wideangle AMT 2k CCD camera operating at 120 kV at the Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences.
Nanoparticle tracking analysis (NTA) The particle size and concentration of puri ed EV in PBS were analyzed using a ZetaView PMX 110 (Particle Metrix, Germany) equipped with a 405-nm laser and a high-sensitivity sCMOS camera. The ZetaView system was calibrated using 110 nm polystyrene particles. All measurements were performed at RT. Furthermore, NTA measurement was recorded and analyzed at 11 positions for 60 seconds at a frame rate of 30 frames/second. The result for each sample is presented as an average of the 3 measurements. Particle movement was analyzed using NTA software (ZetaView 8.04.02 SP2).

Construction of the cDNA libraries from plasma EV-RNA
EV-RNA obtained from the plasma of healthy controls, iRBD patients, and PD patients was pro led by EVsmall-seq. For each sample, 0.5 ng RNA per sample was used as input material for generating sncRNA libraries. Ligation reaction for 5' and 3' adapters was performed according to the procedure described previously with some modi cations (manuscript in preparing). In brief, adapter-ligated RNA was mixed with ProtoScript II Reverse Transcriptase, RNase Inhibitor, and reaction Buffer (NEB; Cat# M0358), and incubated for 1 hour at 44 °C. After PCR ampli cation of RT products, the sequence-speci c sgRNAs were assembled with Cas9 to cleave self-ligating adapters and RNA contaminants in the libraries. The oligo sequences of 3'adapter, biotin-RT primer, 5'adapter, adapter-sgRNA, ysRNA-sgRNA (Y RNA-derived small non-coding RNA sgRNA), P5 primer, and P7 primer used for sncRNA library construction were listed in Additional le 1. Size selection of PCR products was performed by high-resolution polyacrylamide gel electrophoresis through 6% gels. For miRNAs, the bands corresponding to 135-145 bp were puri ed for sequencing.
High throughput sequencing and data analyses of sncRNAs All the cDNA libraries of sncRNAs were sequenced on NovaSeq (Illumina). We used cutadapt to clip adapters and lter out low-quality reads 25 . Reads failing to match the adaptor or reads with lengths shorter than 17 nt were discarded. Redundant sequences were collapsed as useful reads for further analysis 26 . Then, we aligned the reads to reference sequences by bowtie 27 . The reads that matched the 5' start site of annotated miRNAs and matched the 3' ends with at most 3 nt deletions and/or 3 nt additional sequences derived from pre-miRNAs were counted in the abundance of miRNAs. The miRNA expression level was normalized by size factor 28 . SncRNAs with more than one annotation were characterized in the following order: miRNA, ysRNA, tsRNA (tRNA-derived small non-coding RNA), rsRNA (rRNA-derived small non-coding RNA), snRNA, snoRNA, lncRNA, and mRNA. Sequences that were not annotated with any of the RNA categories above were classi ed as others. The miRNA expression pro les in human tissues were downloaded from TissueAtlas 29 . The gene ontology enrichment analysis of miRNA' target genes was performed by miRPathDB 30 . miRNA's target sites for corresponding genes were predicted by TargetScan 31 .

Statistical analysis
We used the R software package for statistical analysis. Batch correction was conducted by limma 34 .
Differentially expressed miRNAs were calculated by limma 34 (p value<0.05, fold change>1.4). We used the prcomp package for principal component analysis (PCA), which was visualized by ggplot2 35 .
Unsupervised hierarchical clustering analysis was conducted and visualized from pheatmap 36 . The comparisons between two groups were performed with a Wilcoxon rank sum test, whereas comparisons among three or more groups were performed with a Kruskal-Wallis rank sum test. The statistical tests were performed and visualized by ggplot2 and ggpurb 37 . For the miRNAs with increasing expression in the healthy-iRBD-PD hierarchy, the mean value of miRNA expression in each group has to meet the condition of healthy < iRBD < PD and P value <0.05 based on the Kruskal-Wallis sum test; for the miRNAs with decreasing expression in the healthy-iRBD-PD hierarchy, the mean value of miRNA expression in each group has to meet the condition of healthy > iRBD > PD and P value <0.05 based on the Kruskal-Wallis rank sum test.
Features were selected with the R package boruta to nd all relevant variables for machine learning 38 .
Classi cation of different groups was carried out with the svm function in the R package e1071 39 . The log 2 -transformed normalized read counts of every miRNA were de ned as the input data. Models were trained on a training set (60% of data) and evaluated on a validation set (remaining 40% of the data), with an equal distribution of individuals in the disease groups (Supplementary Table S1). The diagnostic e cacy was evaluated by receiver operating characteristic (ROC) curve analysis for the training and validation cohorts 40 . The comparison between areas under the curve (AUCs) of different classi ers was evaluated by the bootstrap method with 100 iterations.

Results
Plasma sample collection and EV-RNA isolation A total of 169 participants were enrolled in this study and subsequently divided into three groups consisting of 56 iRBD patients, 53 PD patients, and 60 healthy individuals. The demographics and clinical characteristics of the enrolled participants are listed in Table 1. No signi cant differences were found in age distribution among the three groups, although the PD group contained a signi cantly higher proportion of female subjects. Average disease duration for iRBD patients were 7 years, and 5 years for PD patients. In addition, the iRBD patient group exhibited a higher average RBDSQ score than that of the PD group (P < 0.001), while PD patients showed lower SS-16 (P = 0.03) and higher SCOPA-AUT (P = 0.001) scores, on average, than iRBD patients. The HAMD scores were higher in PD patients than in healthy controls (P < 0.001). No signi cant differences in NMSQ and HAMD scores were observed between iRBD and PD groups. To ensure obtain high-quality plasma EVs from patients, we established a procedure for plasma sample collection and protocols for EV isolation ( Supplementary Fig. S1A, also see Materials and Methods). EVs were isolated from patient plasma using an exoRNeasy Midi kit (Qiagen). The morphology and size distribution of EVs were evaluated by transmission electron microscope (TEM) and nanoparticle tracking analysis (NTA). The size of isolated EVs was ranged from 30 -500 nm with a median diameter of 105 nm ( Supplementary Fig. S1B, C). Furthermore, enrichment for the EV markers Alix and CD63 was observed in the EVs isolated from the plasma samples ( Supplementary Fig. S1D). In contrast, Calnexin and GRP94 were undetectable in isolated EVs, which indicated the absence of contamination by endoplasmic reticulum proteins (Supplementary Fig. S1D). Total EV RNA was then extracted using TRIzol reagent instead of QIAzol, which is the gold standard for RNA extraction and highly e cient for isolating small RNAs. Aliquots of puri ed EV RNAs were further analyzed by a Quant-iT RiboGreen RNA Assay Kit and 2200 Bioanalyzer. The EV RNA obtained from 0.5 milliliter (ml) of plasma ranged between 0.61 and 2.93 ng, and showed normal size distribution ( Supplementary Fig. S1E) consistent with the previous study 20 . Approximately 0.5 ng of EV-RNA that passed the quantity and quality control criteria was used for small-RNA library construction and sequencing analyses.
Optimize sncRNA sequencing library construction method for low RNA inputs Pro ling the relative abundance of speci c sncRNAs by high throughput sequencing typically requires tens of nanograms of total RNA 41,42 and our initial attempts to construct a highly quality EV-associated sncRNA library using commercially available kits were unsuccessful, mainly due to the effects of low RNA inputs that result in 5' and 3' adapters generating the predominant ligation byproducts which severely compromise subsequent PCR ampli cation and sequencing quality. We therefore optimized the conditions for 5' and 3' adapter ligation and introduced an extra 5' exonuclease treatment step that effectively reduced adapter dimer formation. Previous studies have shown that Y RNA-derived small noncoding RNAs (ysRNAs) can comprise the most abundant sncRNA type in plasma 43,44 , and the prevalence of ysRNAs can signi cantly increase sequencing cost and interfere with the ampli cation of other, low abundance sncRNAs during cDNA library construction. Since the sgRNA-guided Cas9 nuclease is capable of cleaving double-stranded DNA bearing a protospacer adjacent motif (PAM) sequence both in vitro and in vivo, we designed sgRNAs and introduced a Cas9/sgRNA in vitro cleavage step to reduce both the ysRNAs (Supplementary Fig. S1F) and the adapter dimers (Additional le 1). The EV RNAs (0.5 to 2 ng) treated with Cas9/sgRNA showed comparable and clear miRNA bands, indicating that we were able to reduce the input of EV RNA to 1 ng or less (Fig. 1A). The Cas9/sgRNA in vitro cleavage step led to an effective reduction of both ysRNAs and adapter dimer byproducts, enabled a straightforward PAGE-based method for detection and recovery of miRNA products (~140 bp) (Fig. 1B, C). Although the Cas9/sgRNA treatment couldn't remove ysRNAs completely, the cleavage e ciency could be further increased by raising the Cas9/sgRNA concentration in the reaction. The comparison of sequencing results between the pairwise samples with or without Cas9/sgRNA treatment showed high correlations among miRNA pro les (Spearman correlation coe cient ≥ 0.99) (Fig. 1D, E), which indicated that Cas9/sgRNA treatment had no obvious effect on the individual miRNA abundance. Through these modi cations, we successfully reduced the total required EV-RNA inputs to 0.5 ng. We named this optimized library construction method for sncRNA sequencing with low EV-RNA inputs as EVsmall-seq. Sequencing EV-associated sncRNAs in plasma samples We then used the EVsmall-seq for high throughput sequencing-based pro ling of EV-associated sncRNA expression in 169 human plasma samples. Each library was sequenced to a mean depth of 19 million reads (median depth: 15 million) (Fig. 1F, Additional le 2), which is su cient for the detection of lowabundance sncRNAs with 0.5 ng input EV-RNAs ( Supplementary Fig. S2). After low-quality and short reads being discarded, the average proportion of high quality reads in these samples was 82.3% (Fig. 1G), and 65.2% of these high-quality reads could be mapped to the reference sequence for sncRNAs (Fig. 1H), indicating the overall high quality of the sncRNA libraries. We found that the miRNAs (17-24 nt), ysRNAs (25-33 nt), tRNA-derived small non-coding (tsRNAs, 30-33 nt), and rRNA-derived small non-coding (rsRNAs, 17-20 nt) were the most prevalent types of sncRNAs associated with EVs (Fig. 1I, Fig. 1J). On average, 390 different miRNA species were detected in each sample (average normalized read count > 1, expressed frequency of all samples > 25%) (Fig. 1K, Additional le 2-3).

Identi cation of iRBD-speci c miRNA biomarkers
Given the high risk of developing α-synucleinopathies by iRBD patients, such as PD, PD dementia, dementia with Lewy bodies or multiple system atrophy, in addition to the high speci city and long interval between iRBD onset and clinical manifestations of α-synucleinopathies, the prodromal phase of this disorder represents a unique opportunity for potential disease interventions 45 . To identify patients in this early stage of pathogenesis, we further explored whether EV miRNAs could distinguish iRBD patients from healthy individuals. A total of 75 differentially upregulated and 46 downregulated miRNAs were identi ed among the total plasma EV-associated sncRNAs expressed in iRBD patients compared with healthy subjects (Fig. 3A, B, Additional le 6). Unsupervised hierarchical clustering and PCA based on the differentially detected miRNAs separated iRBD patient samples from control samples with minor overlap (Fig. 3C).

Discussion
Accessible and reliable biomarkers for early diagnosis of PD and iRBD are urgently needed to identify candidate therapeutic targets and to monitor disease progression during therapeutic interventions 5 . In this study, we investigated the expression of sncRNAs associated with plasma-derived EVs from 169 individuals using an improved library preparation protocol for high throughput sequencing. Our analysis revealed 16 EV-associated miRNA features that were diagnostically informative for PD, and three miRNA signature features that could serve as biomarkers for iRBD. These results demonstrated that high throughput sequencing-based detection and quanti cation of sncRNAs expression can provide more informative pro les that include all types and relative abundance of sncRNAs and their sequence variants, thereby signi cantly improving the overall performance of machine learning diagnostic classi ers and ultimately the AUC values of miRNA biomarkers 11,13 .
However, the current study also has several limitations. First, as a single-center study, it is uncertain whether these predictive markers are applicable to other populations with different exposures to environmental or genetic factors (i.e., ethnicity). Future work with larger cohorts from multiple centers is required to con rm these results, ideally through a prospective validation study. Second, only the miRNA signature was used in this study. A combination of miRNAs with v-PSG, neuroimaging, and neuropathological assessment may improve the sensitivity and speci city of diagnosis for prodromal PD (i.e., iRBD) and PD.

Conclusion
In conclusion, we established a cDNA library construction method, named EVsmall-seq, for high throughput sequencing of sncRNAs in plasma-EV using a low as 0.5 ng of total RNA input and identi ed miRNA signature features that could serve as biomarkers for distinguish iRBD and/or PD from healthy individuals with high sensitivity and accuracy. Moreover, our study provides a valuable resource for sncRNA pro les in plasma EVs from the plasma of iRBD and PD patients and reveals an effective and non-invasive diagnostic strategy, with relevant biomarkers, for neurodegenerative diseases.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. The deep sequencing data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus under accession number GSE166070 (secure token evijmiektjqhngx).

Ethics approval and consent to participate
The study was reviewed and approved by all countries' respective Ethics committees and all participants signed an informed consent to take part in the research.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.   miRNAs with consistently increases or decreases in expression from healthy to iRBD to PD samples