Bilateral Globus Pallidus Internus Deep Brain Stimulation for Parkinson’s Disease: Therapeutic Effects and Motor Outcomes Prediction

Background: Deep brain stimulation (DBS) has emerged as a highly effective surgical treatment for advanced Parkinson’s disease (PD). Good response in levodopa challenge test has suggested as criterion to identify optimal candidates for surgery. However, the response to levodopa and DBS is not always congruent, and predictive value of the levodopa test remains controversial. This study was set out to identify predictors of response to DBS and develop a novel prediction model evaluating DBS candidacy. Methods: Herein, we retrospectively analyzed 62 consecutive PD patients who underwent bilateral globus pallidus interna (GPi) DBS from 2016 to 2019. The changes in UPDRS-III (Unied Parkinson’s Disease Rating Scale part III) total and subscores after surgery at one-year follow-up were evaluated and potential predictor variables were also collected. In the training cohort of 29 patients, we developed a novel machine learning method with 5-fold cross validations implementing on these variables to predict GPi DBS treatment outcomes in a multivariate linear analysis. Furthermore, the machine learning model was externally validated with another cohort of 33 GPi DBS PD patients. Results: GPi DBS signicantly improved postoperative motor function of PD patients. The overall UPDRS-III scores improved by 30.4%, with highest improvement in tremor (75.0%), followed by limb bradykinesia (27.5%), rigidity (27.3%) and axial bradykinesia (22.4%). Most intriguingly, improvement in tremor can be predicted with high accuracy using this prediction model (adjusted R 2 = 0.82 for absolute improvement, and adjusted R 2 = 0.76 for relative improvement), in which off medication tremor subscore was identied as the most powerful preoperative predictor. In the external validation cohort, the machine learning method showed good predictive performance. Conclusions: We conrmed the effects of bilateral GPi-DBS with a one-year follow-up. The good performance of the present prediction model demonstrated the utility of machine-learning based motor response prediction after GPi DBS, based on clinical preoperative variables.

However, subtle target difference exists and determination of the single best surgical target for DBS remains controversial despite extensive research in the eld (10).
Careful selection of applicable patients is the initial step in optimizing the e cacy of DBS and avoiding the risk of unsatisfactory outcome after surgery. Candidacy for DBS in PD is typically assessed by the preoperative motor response to levodopa using the levodopa challenge test (LCT) along with an interdisciplinary evaluation (9). According to the core assessment program for surgical interventional therapies in PD (CAPSIT-PD protocol), a levodopa-induced reduction of motor symptoms by > 30% of the UPDRS III (Uni ed Parkinson Disease Rating Scale III) has been suggested as criterion to identify optimal candidates for surgery (11). However, the response to levodopa is not always congruent with the effect of DBS, and predictive value of the levodopa test remains controversial (12,13). While preoperative levodopa responsiveness has proved to be predictive of DBS e cacy on motor function and activities of daily living (14)(15)(16), it failed to show predictive value for improvement of disease-speci c quality of life in a recent study (17). Therefore, it is critical to develop novel prediction models to reliably predicts postoperative motor response and evaluate DBS candidacy for individual patients. Establishing such a prediction tool facilitates the clinician to improve patient counselling, expectation management, and postoperative patient satisfaction. Notably, while non-conclusive results from previous prediction studies demonstrating preoperative factors that may potentially affect surgical outcomes of STN GBS, data on predictors of surgical outcomes of GPi DBS in PD patients is lacking so far.
Machine learning approaches based on routinely collected clinical data exclusively, without the need for any manual processing, are increasingly used in medical practice to predict clinical outcomes (18). In contrast to traditional statistics, predictive machine learning models generates outcome predictions for new, individual patients, instead of correlations between pre-and postoperative variables on a group level.
Instead of only reproducing valid clinical decisions, machine learning on PD has been suggested to help making challenging clinical decisions (19). For example, one study implemented on age and neuroimaging data to predict individual patient response to dopaminergic therapy for PD (20).
In the present retrospective study, we aimed to report the development and proof-of-concept of a machine learning prediction model that generates motor outcomes one year after GPi DBS for individual PD patient.

Patients
We retrospectively studied 62 consecutive PD patients who underwent bilateral GPi-DBS in Ruijin Hospital (Shanghai, China) from November 2016 to July 2019. A total of 29 patients assessed by movement disorder specialists in department of neurology were studied as the training group. Another 33 PD patients assessed by different movement disorder specialists in department of neurosurgery were enrolled in the validation study. All patients enrolled in the study have written informed consent, and the hospital ethics committee approved the study.
The inclusion criteria for surgery were as follows: (1) diagnosis of idiopathic PD based on the 2015 MDS-PD Criteria(21); (2) response to L-dopa following a preoperative L-dopa challenge test (see below); (3) disabling motor uctuations or dyskinesia despite all drug strategies; (4) informed consent for the surgery; and (5) good general health and accommodation of regular postoperative programming and follow-ups. The exclusion criteria were as follows: (1) contraindication for neurosurgery or high-eld magnetic resonance imaging (MRI); 2) severe dementia or neuropsychiatric disorders, and (3) organic cerebral abnormalities. Surgical procedure and programming All patients underwent 3.0 T MRI before surgery. Surgical procedures were performed under general anesthesia. We applied the Leksell stereotactic frame to the patient's head followed by a head CT scan.
The speci c target coordinates and trajectory were de ned using the SurgiPlan system after the coregistration of MRI-CT images, targeting the posterior GPi. The implantable pulse generator (IPG) was placed subclavicularly and was connected with electrodes via subcutaneous wires. Postoperative imaging scan was performed to con rm satisfactory electrode placement of DBS leads and absence of complications.
The initial IPG programming was performed on the following day after surgery. Parameters including voltage, pulse-width, and frequency were optimized within the rst 3 months of surgery and adjusted by two experienced movement specialists who referred to the Chinese standardized protocol (22). The contacts were individually tested to inspect patients' motor response and assess side effects. Preferably, the initial parameters were set to monopolar mode, with a pulse-width of ~ 90 µs and a frequency of ~ 135 Hz, and a stepwise increase in amplitude according to the patient's response.

Clinical Assessment
Patients were evaluated preoperatively and 12 months postoperatively. A detailed medical history of the participants was completed. This involved con rming the patient's age, gender, date of PD diagnosis, date of rst intervention with antiparkinsonian medication, and current medications. The L-dopa equivalent dose (LED) in the preoperative medication regimen was calculated according to Tomlinson et al.(23). Motor function was evaluated preoperatively using the Movement Disorder Society Uni ed Parkinson Disease Rating Scale-Motor Part (MDS UPDRS-III) (24) and was scored in both off (MedOff) and on antiparkinsonian medication (MedOn) conditions. The motor examination portion (part III) of the UPDRS was performed to provide a clinically de ned-"medication OFF" motor score (PreOFF). Then they were valued in a "medication ON" state when the best clinical response was obtained following a dose of levodopa (PreON).
Patients underwent the levodopa challenge test according to the CAPSIT-PD protocol. This test involved participants visiting the research center after at least 12 h without intake of PD medication (practically de ned "medication OFF" state), allowing for an appropriate washout of levodopa. A single suprathreshold dose of L-dopa (the usual effective dose taken in the morning × 1.5) (25) was administered subsequently for the MedOn condition, in which the patient and the investigator agreed that the best functional bene ts were achieved. Preoperative response to levodopa was calculated as the difference in UPDRS-III off and on drug: Absolute improvement from levodopa = PreOFF -PreON.
Postoperatively, all patients were tested under two conditions: medication and stimulation both 'OFF', and medication 'OFF' and stimulation 'ON'. The evaluation for the MedOff / StimOff condition was performed following overnight dopaminergic medication cessation and turning stimulation off for 1 h. For the MedOff / StimOn condition, evaluations were performed 1 h after restarting stimulation. The relative improvement of motor symptoms from GPi DBS was de ned as: To compare speci c symptom improvement, the UPDRS-III (motor) section was broken up into 4 composite symptom scores as follows: 1) tremor (Sect. 3.15 to 3.18;); 2) rigidity (Sect. 3.3); 3) axial bradykinesia (Sect. 3.1, and 3.9 to 3.13); and 4) limb bradykinesia (Sect. 3.2, 3.4 to 3.8, and 3.14).

Statistical analysis
Statistical analysis was performed using SAS JMP 13 and Python3.6. Continuous variables were presented as mean ± SD, while classifying variables were presented as percentage. Wilcoxon test (paired samples) with Bonferroni correction was used to determine whether there was a signi cant improvement of motor symptoms.
The prediction models were implemented on the preoperative predictors and trained in 29 patients. We used 5-fold cross-validation for stepwise predictor selection. Brie y, the original samples were randomly partitioned into 5 equal sized subsamples. Of the 5 subsamples, a single subsample was retained as the validation data for testing the model, and the remaining 4 subsamples were used as training data. In the process of feature selection, we took the maximum goodness of t of regression model as the criterion. When a new variable is put in, a new regression model with a new R-Square is generated. After we have traversed all variables, we selected the model with the largest R-square as the nal model. The candidate predictors tested in our study include: 1) patient's age at time of PD; 2) disease duration; 3) absolute values of PreOFF and PreON UPDRS-III total and subscores; 4) The combined of the above predictors such as predictor A multiply predictor B as a new predictor for enhancement of tting. F statistic test was performed for the total model by comparing with the null model. The coe cient of the predictors was tested by Students T test. The adjusted R-Square is used for measurement of the tness. RMSE is used for measurement of error. The predictor subset leading max R-Square was selected to build the prediction model. Models are retrievable by assembling the coe cients to linear equations. The models are validated by the extra 33 samples and the results are described in detail by different error percentiles.

Demographic of recruited patients
A total of 62 PD patients with complete data were enrolled in the study. The main clinical and demographic characteristics of patients at baseline were presented in Table 1. 40 (64.5%) patients were men and 22 (36.5%) were women. Across all patients, the mean age at disease onset was 51.5 ± 7.6 years. The mean age at the time of surgery received bilateral GPi stimulator implantation was 65.4 ± 7.8 years. The mean disease duration at the time of GPi DBS was 154.5 ± 60.0 months. Patients were followed up around 12 months after operation. GPi-DBS signi cantly improved the postoperative motor function of PD patients in the entire cohort As illustrated in Fig. 1 Prediction of the effect of GPi-DBS operation on motor function The candidate predictors tested in this study are: 1) patient's age at the onset of PD; 2) disease duration; 3) absolute values of PreOFF and PreON UPDRS-III total and subscores; 4) The combined predictors derived from the above predictors.
The correlations between preoperative predictors and postoperative outcomes for the UPDRS-III total-and sub-scores were measured by regression analysis (Table 2 and Fig. 2). After discarding factors with low predictive power, we chose the model with the highest R-Square and obtained the following formulas.

Prediction of improvement in UPDRS-III total scores
Total or subtotal scores would be calculated as the sum of the products of the corresponding entries of the team column and the coe cient column in Table 2. For example, UPDRS-III total score estimated as follows: UPDRS-III total = 11.14-0. 22  Additionally, the relative improvement in UPDRS-III total score was signi cantly correlated with preoperative predictors (adjusted R 2 = 0.345, p = 0.0061). In particular, PreOFF tremor subscores demonstrated the most signi cant prediction value (p = 0.0047). Combined predictor derived from age at disease onset and PreON axial bradykinesia was incorporated to t the nonlinear relationship.

Prediction of improvement in tremor subscores
The absolute improvement in UPDRS-III tremor score could be reliably predicted with preoperative predictors (adjusted R 2 = 0.82, p < 0.00001). In particular, PreOFF tremor subscores demonstrated the most signi cant prediction value (p < 0.001). Combined predictor derived from PreOFF rigidity and PreON tremor (p = 0.0057) was incorporated to t the nonlinear relationship.
Similarly, the relative improvement in UPDRS-III total score could be reliably predicted with preoperative predictors (adjusted R 2 = 0.76, p < 0.00001). In particular, PreON limb bradykinesia and tremor subscores both demonstrated signi cant prediction value (p < 0.001). Combined predictor derived from PreOFF limb bradykinesia and PreON tremor (p = 0.005) was incorporated to t the nonlinear relationship.

Prediction of improvement in rigidity, axial bradykinesia and limb bradykinesia subscores
In contrary to UPDRS total scores and tremors subscores, improvement in rigidity, axial bradykinesia and limb bradykinesia subscores were only weakly or poorly associated with potential predictors (adjusted R 2 < 0.5).

External validation
Given the relatively small size of the training cohort, we did external retrospective veri cation using to con rm the robustness of machine learning model. In the validation analysis, another cohort of 33 PD patients who received GPi DBS were included.
The clinical characteristics of the training and validation groups of patients were similar at baseline level and postoperative state ( Table 1). As shown in Table 3, prediction of the absolute improvement of tremor resulted in high accuracy. Most of the sample errors were symmetrically distributed around 0, and 50% of the sample prediction errors were between − 1.4 and 1.2, and 80% of the sample prediction errors were between − 2.4 and 2.5.

Discussion
The bene cial effect of GPi deep brain stimulation on cardinal motor symptoms of PD In this large, single-center, retrospective study, bilateral GPi DBS provided substantial improvement in motor function of advanced PD patients at one year after surgery. We observed improvement in the offmedication on stimulation UPDRS III total scores by 30.4% at one year after GPi implantation compared to baseline of off-stimulation state. This nding was comparable with those from previous studies (7)(8)(9). The signi cant effect on the tremor, rigidity, limb, and axial symptoms was also consistent with literature reporting bene cial effect of GPi DBS in controlling cardinal motor symptoms (26). In particular, the most dramatic bene t was observed in tremor symptoms.
In general, bilateral GPi and STN DBS may be equally effective for treating PD motor symptoms (7). At the present time, The subthalamic nucleus is used more commonly as the target, despite the evidence from available randomized trials has not con rmed an advantage or disadvantage for either target (10). Ideally, longer comparative follow-up is needed to clarify the long-term effect of GPi DBS and potential bene ts in nonmotor domains.

Machine learning method using absolute values as predictors
Preoperative levodopa responsiveness (LR) in the levodopa challenge test, a well-established predictor of motor improvements after STN DBS therapy (14), has been used to screen PD patients for DBS since its incorporation into the CAPSIT-PD protocol in 1999 (11). Despite the widespread use of the LR, little attention has been paid as to how it should be calculated. Whether the absolute change (aLR) or percentage change (%LR) in the UPDRS-III is a better measure remains unsupported (27). A traditional approach to quantifying motor improvement due to DBS treatment was simply implementing on the percent OFF to ON change in the UPDRS-III motor section scores (relative levodopa responsiveness). We term this the "relative" approach, because it indicates that only the relative magnitude of improvement is relevant, whereas the baseline score (PreOFF) is in itself irrelevant. For example, improvement from a baseline score of 100 down to 50 is equivalent to improvement from 20 down to 10, as both present an improvement by 50%. This strategy may potentially miss preoperative information with powerful predictive values. Use of absolute values (both PreOFF and PreON values) as predictors was a novel aspect of the present study, which increased prediction accuracy.
The prediction model performance and its relevant predictive variables In our study, improvement in UPDRS-III total and tremor scores could be reliably predicted with preoperative predictors (adjusted R 2 = 0.82 for absolute improvement, and adjusted R 2 = 0.76 for relative improvement). PreOFF tremor subscores demonstrated the most signi cant prediction value. These results of the presented model con rm the proof-of-concept of machine learning prediction of postoperative motor outcome based on preoperative clinical variables. To the best of our knowledge, our work is the rst to systematically describe the predictive value of preoperative L-dopa responsiveness for GPi-DBS responsiveness.
We failed to establish reliable models to predict improvement in rigidity and bradykinesia based on potential predictors obtained from levodopa challenge test. We speculate that biochemical mechanisms of these motor symptoms were not only related to degeneration of the dopaminergic system, but also greater involvement of other neurotransmitter systems, such as loss of noradrenaline in the locus coeruleus, glutamatergic hyperactivity and loss of cholinergic pedunculopontine nucleus neurons (28).
We stress that the clinical preoperative factors incorporated in our prediction model are limited. In the future, the accuracy of the prediction model could be improved if other potential variables such as gender, disease subtype before DBS, duration of motor uctuations(29) are included.

Limitations
Several limitations should temper the strength of these results. First, the main limitation of our study and its interpretation was its retrospective character, although we have systemically evaluated pre-and postoperatively the clinical features of all study subjects. Second, another potential shortcoming of this study is that the predictive value of tremor may have been overestimated because of a selection bias toward DBS patients who have medication-responsive tremor. The current evidence remains insu cient to clarify predictive value of medication-resistant tremor in PD patients who are considered candidates for GPi DBS. Third, the sample size of our training cohort and validation cohort is relatively small. Larger, prospective and multicenter studies in the future may help con rm our ndings. Figure 1 Effects of GPi DBS on motor outcome Patients were assessed at 12-months follow-up. Means are plotted with the error bar representing the standard deviation. ****p < 0.0001 for signi cant difference between conditions.