Predicting Parkinson’s Disease Medication Regimen Using Sensor Technology

Parkinson’s disease (PD) medication treatment planning is generally based on subjective data through in-ofﬁce, physician-patient interactions. The Personal KinetiGraph TM (PKG) has shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to subtype patients based on levodopa regimens and response. We apply k-means clustering to a dataset of with-in-subject Parkinson’s medication changes—clinically assessed by the PKG and Hoehn & Yahr (H&Y) staging. A random forest classiﬁcation model was then used to predict patients’ cluster allocation based on their respective PKG data and demographic information. Clinically relevant clusters were developed based on longitudinal dopaminergic regimens—partitioned by levodopa dose, administration frequency, and total levodopa equivalent daily dose—with the PKG increasing cluster granularity compared to the H&Y staging. A random forest classiﬁer was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 87 . 9 ± 1 . 3


Introduction
Levodopa remains the gold standard therapy for treating the cardinal motor symptoms of Parkinson's disease (PD). As Parkinson's progresses, the duration of levodopa's dose efficacy shortens with the emergence of "wearing-off" episodes and dyskinesia. These complications result from a variety of factors, including disease progression and pulsatile stimulation of dopamine receptors due to lack of continuous levodopa administration 1 . Typically, a patient's medication regimen is optimized by fragmenting and increasing levodopa dosages while utilizing monoamine oxidase B (MAO-B), dopamine agonists, or catechol-O-methyl transferase (COMT) inhibitors as adjunctive therapies.
The determination of the most effective treatment strategy is based on a clinician's overall impression of motor disability assessed by instruments such as the Unified Parkinson's disease Rating Scale (UPDRS) 2 and the Hauser paper-based diaries 3 . The lack of continuous motor assessment coupled with recall bias and limited integration of nonmotor symptomatology into the treatment paradigm present real-world limitations in managing such a heterogeneous condition. Sensor-based technology offers a real-time mechanism to objectively measure motor performance in Parkinson's disease 4,5 , moving beyond the "snapshot" in-office assessment of clinical impairment.
Motion sensors have demonstrated 70-90% accuracy in measuring fluctuations and dyskinesia [6][7][8][9] . One such inertial sensor is the Personal KinetiGraph TM (PKG) sensor (Global Kinetics Corporation (GKC), Melbourne, Australia). This wrist-worn logger utilizes an accelerometer to collect movement information every two minutes throughout the day resulting in approximately 500 daily assessments and reminds patients to register when they have taken their prescribed dopaminergic medication. The raw data are converted into bradykinesia and dyskinesia scores, as well as time series tracings, curated into a report 8 using validated algorithms 8, 10-12 . The report shows the continuous changes of bradykinesia and dyskinesia scores, as it relates to levodopa timing as the median, 25 th , and 75 th percentile, compared to a non-PD control group over six days. Additionally, these scores have been correlated with non-motor symptoms including gastrointestinal tract, sexual function and mood/cognition 13 .
By examining a spectrum of patients with clinical variability and gauging their responsivity to dopaminergic medication with sensor technology, inherent medication similarities may be present within specific clinical subtypes-offering an opportunity to cluster patients in a treatment-related manner. This approach could serve to predict optimal regimens, potentially reducing the lengthy process of optimizing medication for patients. Strategic treatment planning using PD patient subtyping has been shown effective 14,15 and stands to offer a data-driven approach to refining clinical management.

Study Cohort
Characteristics of the patient cohort and selection process are described in the study by Nahab et al. 16 , which explored the clinical utility of the PKG in routine care of Parkinson's patients. All patients were selected from the UCSD Movement Disorder Center from June 2016 to March 2017. The study's inclusion criteria included: an age range of 46-83, being on levodopa, and a H&Y range 1-3 16 . The participants underwent two clinical assessment visits. Before each visit, the PKG sensor was worn by the patient for a six-day period, which scores patients' key symptoms every two minutes. During the visit, an MDS-UPDRS motor subscales III & IV 2 and H&Y ranking were conducted. After the first visit, a management plan was developed based on the PKG report 16 , the impact of which was evaluated in the second visit by the same clinical metrics, including the PKG.

K-means Clustering
K-mean clustering is an unsupervised ML algorithm that partitions patients into a predetermined number of clusters (k) without a hierarchical structure 17 . In this algorithm, clusters are initially formed, and each patient is grouped into their nearest cluster (with respect to Euclidian distance to cluster centroid). The clusters' centroids are then recalculated, seeking to minimize the distance between patients and their assigned centroid. Patients are then reassigned to the nearest clusters. This process is performed iteratively and continues until no patients are reassigned in an update 17 .
Daily total levodopa equivalent dose, daily total carbidopa/levodopa IR dose, and administration frequency were used in k-means clustering to group patients into four clusters, per the Within Cluster Sum of Squares (WCSS) measurement, which minimizes the within-cluster variance (e.g., see Fig. S1 in Supplemental information). This technique has been effectively used in healthcare applications for clustering data 18 . Consequently, we evaluated demographic information (patient's age at visit 1, age at diagnosis, number of years experiencing PD symptoms, and gender) for each cluster under two comparisons.

Random Forest
A random forest classifier is a supervised ML algorithm that utilizes a large number of decision trees working as an ensemble 19 . We opt to use the random forest classifier in this study as it is generally very robust against noisy or high dimensional datasets; it is not susceptible to overfitting 20 .
Three random forest classifiers were trained to stratify subjects into their designated clusters-identified based on the patients' best medication regimen according to PKG scores-using either demographic information, the H&Y scores from visit 1, and the PKG time series from visit 1. Features were extracted and engineered from the PKG time series via TSFresh 21 , which calculates various time series characteristics frequently used in classification tasks and also included demographic information. Features importance ranking was conducted using the Gini index 22 . The topmost important features were identified, prior to cross-validation, using preliminary experiments (presented in Table S1 in the Supplementary information) and included in the analysis.
The random forest model's performance was evaluated using 5-fold stratified cross-validation such that in each fold, the ratio of cluster representation remained consistent with that of the original dataset. Due to the nature of the patient data, downsampling was used in each fold to learn from a balanced dataset such that, during training, the model did not favor the more representative cluster. Finally, the process was repeated 100 times, and the mean and confidence interval of the performance metrics were reported. Performance metrics included recall, precision, accuracy, F1 score, and the area under the receiver operating characteristic (AUC). Recall is the proportion of positives that are correctly identified. Precision is the measurement of positive and negative results that are true positives. Accuracy is the measurement of correct predictions out of all predictions. The F1 score is the harmonic average of precision and recall. The AUC is the aggregate comparison of the true positive rate and the false positive rate at different classification thresholds and provides an overall performance metric for the model.

Cohort Characteristics
A total of 26 subjects' (17 male and 9 female) clinical evaluations and PKG readouts were included from the study by Nahab et al. 16 . The patient cohort utilized in this study is a subset of that presented in 16 , as two participants were excluded from the evaluation: one participant did not have corresponding PKG readouts; the other had dosage inconsistencies in their recorded medication regimen. During visit 2, the overall mean MDS-UPDRS-III was significantly reduced (visit 1: 28.9 ± 14.1, visit 2: 24.1 ± 13.5, p-value < 0.028 16 ). Demographic information and clinical characteristics of the participants are provided in Table 1. The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Tennessee (UTK-IRB-20-06007-XP). Patient Clustering Using Medication Regimen K-means clustering was utilized to allocate patients to one of four clusters based on their prescribed daily total levodopa equivalent dose, daily total carbidopa/levodopa IR dose, and administration frequency. In the first experiment, subjects were clustered by their visit 2 regimens, in which physicians had adjusted individualized regimens to optimize motor symptoms based on the PKG report. In the second experiment, subjects were clustered according to the regimen associated with best motor function improvement, as defined by the H&Y or PKG scores consistent with the comparisons of Nahab et al. 16 . Figure 1(a-b) presents the clusters based on visit 2's medication regimen. According to Fig. 1(a), when H&Y is used as the comparison metric: four subjects show symptom improvements, while six demonstrated symptom worsening, and sixteen remained unchanged. As shown in Fig. 1(b), when the PKG score is used: seventeen subjects show symptom improvements, eight worsened, and one remains unchanged. The demographic information associated with each cluster is provided in Table 2. For each comparison participants were clustered into four groups denoted as clusters A-D. Cluster D was statistically different from clusters A and B with respect to disease duration and age at diagnosis (p < 0.05); no other clusters were statistically different within parameters (p > 0.05). Figure 1(c-d) presents the medication regimens' clusters associated with the most improved motor function between the two study visits. The centroid positions of the clusters are in different locations due to the fact that medication regimens related to improved clinical function differ according to H&Y and PKG measures. The subjects' demographic information for each cluster is provided in Table 2, while the breakdown in PD medication and dosing are provided in Table 3. It should be noted that for H&Y and PKG scores that were unchanged between visit 1 and visit 2, the regimen associated with visit 2 was considered "best" and used in this clustering. No two clusters are statistically different (p > 0.05) in terms of patients' age, gender, or age at diagnosis. Cluster D was statistically different from clusters A and B with respect to disease duration (p < 0.05) for both the best H&Y and PKG scores.

Random Forest Classification Using PKG Readout
Three random forest classifiers were trained to examine the efficacy of using a combination of demographic information (patient's age at visit 1, age at diagnosis, number of years experiencing PD symptoms, and gender), H&Y scores from visit 1, and PKG time series sensor data from visit 1, to stratify the subjects in clusters A and B, as identified through k-means clustering using the best PKG score (see Fig. 1(d)). As shown in Table 2, "Best PKG Score," clusters A and B contain 17 and 6 participants, respectively. As noted in the Results Section ("Patient Clustering Using Medication Regimen"), these two clusters were the most statistically similar with respect to the demographic information and contained the majority of participants. The performance of each of the classifiers is presented in Table 4.
The random forest classifier using PKG time series sensor data to stratify the subjects in clusters A and B-identified through k-means clustering using the best PKG score-had superior performance to the alternative methods. To train this random forest classifier, over 1000 features were extracted from PKG sensors' bradykinesia and dyskinesia time series for each patient, of which the top ten most important features were included in the analysis. These features are provided in Table S1 in Supplemental information. This random forest classifier achieved a recall of 81.0 ± 3.3%, a precision of 69.0 ± 3.5%, an accuracy of 87.9 ± 1.3%, an F1 score of 71.3 ± 3.3%, and an AUC of 0.94 ± 0.01.  Figures 1(a) and 1(b) demonstrates the clusters for the motor function changes between visit 1 to visit 2 based on the H&Y ranking and PKG report, respectively. Figures 1(c) and 1(d) highlight the subjects' best symptom control recorded using H&Y and PKG scores, respectively. The large shapes denote each cluster's centroid, and the exterior marker of each point corresponds to the cluster centroid shape. The capital letters (A-D) are used to refer to each cluster. Each point's interior maker in (a) and (b) represents the state of H&Y and PKG change for each patient, respectively. Note that due to the three-dimensional projection of the plot, the distance between points may appear skewed.  Table 2. Demographic and clinical information for clusters depicted in Fig. 1(a-d). † Pairwise p-value < 0.05 when compared with both Cluster A and B. Unless denoted by †, each pairwise cluster comparison yields no statistical difference (p-value > 0.05).

Discussion
Utilizing a Parkinson's dataset of within-subject medication regimen titrations-clinically assessed by the H&Y and PKG scores-k-means clustering was used to group patients in terms of daily total levodopa equivalent dose, daily total carbidopa/levodopa IR dose, and administration frequency. We demonstrate that subjects can be meaningfully clustered based on longitudinal dopaminergic treatment regimens. Further, the sensor-based assessments of the PKG enhance the granularity of this clustering method compared with the H&Y staging. Figures 1(a) and 1(b) show the difference between H&Y and PKG scores when determining subject improvement. This difference is more pronounced between the two assessment instruments with respect to the regimens yielding the best motor function. This result supports the growing body of literature that the H&Y score lacks the fidelity to capture more nuanced symptom improvement trends [23][24][25] . Specifically, at visit 2, the PKG scores indicate clinical improvement in seventeen patients (65%) compared to four patients (15%) based on the H&Y. This comparison shows that determining the optimization of a patient's medication regimen may be significantly more challenging using only H&Y. However, since the cohort was treated based on PKG changes, further conclusions regarding the robustness of the treatment approaches cannot be drawn. Therefore, our results suggest that efficiently establishing a patient's best performing regimen would be improved by objective measurements.
Additionally, the cohort's demographics and clinical characteristics are generally statistically indistinguishable between the identified clusters based on both PKG and H&Y scores. Only subjects with the longest disease duration were grouped into a separate cluster. This group required greater dosages and more frequent administration of dopaminergics for symptom control-a treatment strategy aligned with current clinical practice. Furthermore, using demographic information along with time series sensor data from the PKG yielded a classification model that enabled the random forest classifier to predict the cluster allocation of patients with a high accuracy.
In the past, studies classifying Parkinson's clinical subtypes have predominantly relied on clinometric scales. Early on, a tremor dominant phenotype was deemed more favorable than the postural instability and gait disorder (PIGD) phenotype 26 with younger age at onset associated with slower disease progression compared to a more rapid progression in those who developed symptoms at an older age. However, the myriad of nonmotor symptoms associated with the disease: REM behavioral disorder (RBD), constipation, autonomic dysfunction, and neurobehavioral dysregulation (e.g., impulse control behaviors, executive decline) have expanded the phenotypic spectrum. In several studies, four overarching phenotypic clusters have emerged: 1) mildly affected across motor and nonmotor domains; 2) non-motor dominant; 3) motor dominant; 4) severely affected across all domains 17, 27, 28 .  Table 4. Random forest classifier performance identifying subjects in clusters A and B, as stratified through k-means clustering under the best PKG score. Each classifier was trained with a combination of features being only demographics, demographics and visit 1 H&Y score, and demographics and visit 1 PKG time series data.
Longitudinal clustering found nonmotor symptoms, especially cognitive impairment, RBD, and orthostatic hypotension were significant determinants of three principle PD subgroups-motor/slow progression, intermediate, diffuse/malignant 29 . Patients in the motor cluster were least afflicted with NMS and had less worsening than those in the diffuse/malignant cluster, who had RBD, OH, MCI at baseline, and were associated with faster motor progression.
While the prognostication of disease progression is evident in clinical subtyping, the implications on treatment have yet to be established. Similar to the phenotypic variability of PD, the treatment approaches are also heterogeneous. Therefore, a continuous assessment of treatment response not only offers the possibility of more robust medication titrations, but the ability to cluster these sensor-based responses may help potentiate the impact of the emerging clinical phenotypes. This proof-of-concept study establishes that rich information can be extracted from data streams collected from sensors, such as the PKG-that measures both motor function and medication responses-and incorporated into ML algorithms to build predictive models capable of expanding the clinical treatment platform.
The small patient sample potentially introduces bias, such as the underrepresentation of various PD subtypes. Additionally, several subjects' motor control symptoms within the cohort were never successfully controlled as measured by the PKG's bradykinesia and dyskinesia scores 30 . Such subjects when further optimized clinically, may place them in a different cluster-altering the demographic and clinical information associated with that cluster. Future work to examine a more representative patient cohort followed longitudinally stands to enhance the clustering and possibly reveal other inherent treatment subtypes.
Clustering patients in a treatment-related manner has the potential to determine inherent regimen similarities within clinical subtypes. The identification of such regimen similarities could guide strategic treatment planning for a data-driven approach to refine clinical management. This approach can provide clinicians with estimated regimens based on patients' demographics and limited sensor measurements, potentially expanding the current clinical approach for optimizing regimens.